Speechmatics: STT API with On-Prem Deploy | hokai.io

Speechmatics speech-to-text API covers 55+ languages with on-prem deployment and a medical model hitting 93% accuracy. Starts at $2.75/hr, free 4 hrs/month.

Speechmatics is a Cambridge, UK-based speech AI company founded in 2006 with $90.6M in funding. Its API covers 55+ languages with real-time transcription under 250ms latency and a medical model achieving 93% accuracy with 50% fewer keyword errors than nearest competitors. Pricing starts at $2.75/hr (Standard) and $3.75/hr (Enhanced). A free tier offers 4 hours per month. Cloud, on-premises, and local edge deployment are all supported.

Pricing

Free: 4 hrs/month (2h Enhanced + 2h Standard). On-Demand Standard: $2.75/hr. On-Demand Enhanced: $3.75/hr. Enterprise: custom pricing with SLAs, HIPAA mode, and dedicated support.

Frequently Asked Questions

What is Speechmatics and what does it do?

Speechmatics is a speech AI API platform built by Speechmatics Ltd., founded in Cambridge, UK in 2006 by Dr. Tony Robinson. It provides speech-to-text APIs for real-time and batch audio transcription across 55+ languages, with specialised models for medical transcription. The company has raised $90.6 million in total funding and counts enterprise clients in finance, healthcare, broadcasting, and government. Real-time transcription returns partial results in under 250ms.

How much does Speechmatics cost?

Speechmatics offers a free tier of 4 hours per month (2 hours Enhanced plus 2 hours Standard). On-demand Standard transcription costs $2.75 per hour; on-demand Enhanced costs $3.75 per hour. Enterprise plans include custom pricing with dedicated SLAs, HIPAA-compliant processing mode, and support. Self-hosted and on-prem deployments require infrastructure costs not reflected in the per-hour API rate.

What are the main features of Speechmatics?

Speechmatics' main features include real-time streaming STT (under 250ms partial transcript latency), batch processing, speaker diarization, and support for 55+ languages including regional dialects. A medical model launched in September 2025 achieves 93% real-world clinical accuracy. The platform uniquely offers four deployment modes: cloud API, self-hosted Docker container, virtual appliance, and local edge runtime on laptop-sized hardware.

Is Speechmatics free to use?

Yes, Speechmatics provides a free tier of 4 hours per month: 2 hours on the Enhanced model and 2 hours on the Standard model. This resets each month and does not carry over. After free hours are exhausted, usage switches to on-demand pay-as-you-go billing at $2.75/hr (Standard) or $3.75/hr (Enhanced). No credit card is required to access the free tier.

What are the best alternatives to Speechmatics?

The main alternatives are Deepgram, AssemblyAI, Rev.ai, and Google Cloud Speech-to-Text. Deepgram is a better choice when you need faster real-time STT at lower cost ($0.0077/min vs $0.046/min) and do not need on-prem deployment. AssemblyAI is stronger for audio intelligence features like sentiment analysis across many languages. Rev.ai is cheaper ($0.003/min) for English batch transcription. None of these alternatives support local edge runtime deployment.

Who is Speechmatics best for?

Speechmatics is best for regulated enterprises needing data sovereignty (government, healthcare, financial services) and for broadcast or media teams working with multi-dialect European languages. The medical model makes it the top choice for clinical documentation platforms. It is not suitable for startups or cost-sensitive teams that need basic English transcription at under $0.01/min, where Deepgram or Rev.ai are significantly more affordable.

Does Speechmatics have an API?

Yes, Speechmatics is API-first. It provides REST endpoints for batch (asynchronous) transcription and WebSocket endpoints for real-time streaming. Official clients are available for JavaScript and Python. Documentation is at docs.speechmatics.com. The API covers all models including Standard, Enhanced, and Medical, plus domain adaptation, speaker diarization, and language identification features.