Last updated: 2026-05-29
Speechmatics speech-to-text API covers 55+ languages with sub-250ms latency, on-prem, and on-device deployment. Medical model 93% accuracy. Free tier 4 hrs/month, $2.75/hr Standard. SOC 2, HIPAA, GDPR compliant.
Speechmatics is a Cambridge, UK-based speech AI company (founded 2006, $90.6M funding) with speech-to-text APIs across 55+ languages. Real-time transcription delivers partial results under 250ms. Medical model achieves 93% accuracy with 50% fewer medical keyword errors. Pricing: $2.75/hr Standard, $3.75/hr Enhanced. Free tier 4 hrs/month. Cloud, on-premises, and local edge deployment supported. SOC 2 Type II, HIPAA, GDPR, ISO 27001 compliant.
Speechmatics is a speech AI platform built by Speechmatics Ltd., founded in 2006 in Cambridge, UK by Dr. Tony Robinson, a pioneer in applying neural networks to speech recognition. With $90.6 million in total funding including a $62 million Series B in 2022, the company provides APIs for speech-to-text, real-time transcription, and voice AI agents, used across enterprise customers in finance, healthcare, broadcasting, and government. The platform differentiates on three axes: the widest language and dialect coverage in the market (55+ languages including regional accents that generic models struggle with), a specialised medical transcription model, and the broadest deployment flexibility of any major STT vendor. Teams can run Speechmatics as a cloud API, a self-hosted Docker container, a virtual appliance on their own infrastructure, or as a local edge runtime on laptop-sized hardware for air-gapped or battery-sensitive environments. Real-time partial transcripts return in under 250ms, with full voice agent pipelines achieving 1-1.5 second total response time. In September 2025, Speechmatics launched a next-generation Medical Speech-to-Text model that achieved 93% real-world accuracy in clinical settings, with a Keyword Error Rate 50% lower than the nearest competing system. The platform also supports an Enhanced model tier for maximum accuracy on noisy, accented, or domain-specific audio, and a Standard tier for speed-optimised batch work. Pricing starts at $2.75/hour for the Standard model and $3.75/hour for Enhanced, with a free tier providing 4 hours per month (2h Enhanced plus 2h Standard). Enterprise plans carry custom pricing and include dedicated support, SLAs, and optional HIPAA-compliant processing. Speechmatics is best suited to regulated-industry teams and enterprises where data sovereignty, accent coverage, or medical-grade accuracy are non-negotiable requirements. It is not cost-competitive for high-volume English-only workloads where Deepgram Nova-3 at $0.46/hr or Rev.ai Reverb at $0.18/hr are viable alternatives.
Free: 4 hours/month (2 hrs Enhanced + 2 hrs Standard). Standard: $2.75/hr on-demand. Enhanced: $3.75/hr on-demand. Volume: 20% off usage >500 hrs/mo; additional discounts at 24k+ hrs/year. Enterprise: custom pricing with SLAs and HIPAA mode. Self-hosted/on-prem incur separate infrastructure costs.
Speechmatics is a speech AI platform founded in 2006 by Dr. Tony Robinson in Cambridge, UK, with $90.6 million in total funding. It provides speech-to-text APIs for real-time and batch audio transcription across 55+ languages, with specialized models including a medical transcription model launched in September 2025 achieving 93% clinical accuracy. The platform uniquely supports four deployment modes: managed cloud, self-hosted Docker containers, virtual appliances, and local edge runtime on laptop-grade hardware. Enterprise clients span finance, healthcare, broadcasting, and government sectors.
Speechmatics offers a free tier of 4 hours per month (2 hours on the Enhanced model, 2 hours on Standard), resetting monthly with no carryover. On-demand pricing: Standard model at $2.75 per hour, Enhanced model at $3.75 per hour. Volume discounts apply: 20% off usage above 500 hours per month; additional discounts available for 24,000+ hours annually. Enterprise plans include custom pricing with dedicated SLAs, HIPAA-compliant processing, and 24/7 support. Self-hosted and on-premises deployments incur additional infrastructure costs beyond the per-hour API rate.
Speechmatics' core features include real-time streaming speech-to-text with sub-250ms partial transcript latency, batch processing for pre-recorded audio, and speaker diarization included by default (not as a paid add-on). It supports 55+ languages with regional dialect variants. The medical model, optimized for clinical documentation, achieves 93% real-world accuracy with 50% fewer keyword errors than competitors and handles multi-speaker medical conversations. Additional capabilities include custom dictionaries, language identification, domain adaptation, and multi-channel audio processing. Support for four deployment modes enables edge processing on laptops, on-premises data residency, and cloud SaaS scalability.
Yes, Speechmatics provides a free tier with 4 hours per month: 2 hours using the Enhanced model and 2 hours using the Standard model. The free allocation resets each month and does not roll over. After free hours are exhausted, usage switches to pay-as-you-go billing at $2.75/hour (Standard) or $3.75/hour (Enhanced). No credit card is required to access the free tier. The free plan is suitable for testing and development but limited for production workloads.
The main alternatives are Deepgram, AssemblyAI, Rev.ai, and Google Cloud Speech-to-Text. Deepgram is the better choice when you need faster real-time STT at lower cost ($0.0077/min vs Speechmatics' $0.046/min) and do not require on-premises or edge deployment options. AssemblyAI leads in audio intelligence features like sentiment analysis, summarization, and entity extraction across many languages. Rev.ai is significantly cheaper for English batch transcription ($0.003/min) but lacks multilingual support and edge deployment. Google Cloud Speech-to-Text suits teams already committed to GCP infrastructure. Speechmatics is the only vendor offering full on-device model deployment.
Speechmatics is ideal for regulated enterprises in healthcare, financial services, and government that need data sovereignty and on-premises or edge deployment options. The medical model makes it the top choice for clinical documentation, ambient scribe systems, and healthcare AI applications. European media companies and broadcasters benefit from its regional dialect support across Scandinavian, Germanic, and Romance languages. It is not suitable for startups or cost-sensitive teams needing basic English transcription at under $0.01/min, where Deepgram or Rev.ai are significantly more affordable.
Yes, Speechmatics is API-first. It provides REST endpoints for batch (asynchronous) transcription and WebSocket endpoints for real-time streaming. Official clients are available in JavaScript and Python. SDKs and integration guides support React Native mobile development and LiveKit voice agents. Comprehensive documentation is at docs.speechmatics.com with API reference, integration guides, and example applications. The API covers all models (Standard, Enhanced, Medical), plus domain adaptation, speaker diarization, language identification, and custom dictionary configuration.
Yes, Speechmatics is the only major commercial STT vendor offering a full-featured on-device speech model. In April 2026, Adobe and Speechmatics delivered cloud-grade speech recognition on-device for Adobe Premiere Pro on Windows and Mac, achieving within 5% of cloud accuracy across nearly 10 million words of diverse real-world audio. The model runs on a wide range of hardware including Mac M5, NVIDIA RTX, AMD GPUs, and older Intel Macs without requiring cloud connectivity. This enables privacy-preserving transcription for sensitive audio workflows and ensures no data leaves the user's device.