Last updated: 2026-05-29
Deepgram voice AI APIs hit sub-300ms for 200K+ developers. Nova-3 from $0.0077/min, $200 free credit. Rated 4.6/5 on G2 (325 reviews). SOC 2 compliant.
Deepgram is a voice AI platform offering speech-to-text, text-to-speech, and Voice Agent APIs for developers. Nova-3 achieves 5.26% WER on English with sub-300ms streaming latency. Pay-As-You-Go starts with $200 free credit then $0.0077/minute. Voice Agent API costs $0.08/minute. Rated 4.6/5 on G2 (325 reviews). SOC 2 Type II, HIPAA, and GDPR compliant. Founded 2015, $246M raised, $1.3B valuation.
Deepgram is a voice AI platform built by Deepgram Inc. (San Francisco, founded 2015), providing APIs for speech-to-text, text-to-speech, and conversational voice agents. With $250 million in total funding and a $1.3 billion valuation as of January 2026, over 200,000 developers use Deepgram to add speech capabilities to their applications across customer support, healthcare, media, and conversational AI. At the core of the platform are neural speech models trained end-to-end for speed and accuracy. The flagship Nova-3 model delivers transcripts in under 300ms via WebSocket streaming, faster than most competitors. The Flux model, launched in 2024, is designed for conversational AI specifically: it handles turn detection and natural interruptions without additional tooling. All models support 45+ languages and bill per second rather than per minute, so teams pay only for audio actually processed. Deepgram targets backend engineers and product teams building real-time voice features. Common use cases include call center analytics (detecting sentiment and topics in live calls), voice agent pipelines that chain STT with an LLM and TTS, live captioning, and accessibility tooling. The add-on intelligence features (speaker diarization, smart formatting, entity redaction) make it viable for healthcare and legal transcription workflows. The Pay-As-You-Go plan includes $200 in free credits; after that, Nova-3 monolingual transcription costs $0.0077 per minute. The Growth plan requires a $4,000+ annual prepayment for up to 20% discounts. Enterprise plans start at $15,000 per year with dedicated support and optional self-hosted deployment for full data sovereignty. The platform is accessible via cloud API and web dashboard; no native desktop or mobile app exists. In January 2026, Deepgram closed a $130 million Series C round and simultaneously acquired a Y Combinator AI startup. SDKs are available in Python, JavaScript, Go, and .NET, with REST and WebSocket endpoints covering all products. GitHub: github.com/deepgram.
Pay-As-You-Go: $200 free credit to start, then $0.0077/min Nova-3 mono, $0.0092/min multilingual, $0.08/min Voice Agent. Growth: $4,000+ annual prepayment (~20% discount, ~$0.0065/min). Enterprise: $15,000+/year with custom pricing and self-hosted option. Billed per second.
Deepgram is a voice AI API platform developed by Deepgram Inc., founded in 2015 in San Francisco. It provides three core API products: speech-to-text transcription (Nova-3), text-to-speech synthesis (Aura), and a conversational Voice Agent API. Over 200,000 developers use the platform across customer support, media, healthcare, and conversational AI. The company raised $130 million in Series C funding in January 2026, reaching a $1.3 billion valuation with total funding of $246M.
Deepgram offers usage-based pricing across three tiers. Pay-As-You-Go starts with $200 in free API credits; after that, Nova-3 monolingual transcription costs $0.0077 per minute and multilingual costs $0.0092 per minute. The Voice Agent API is $0.08 per minute. The Growth plan requires a $4,000+ annual prepayment for up to 20% discounts. Enterprise plans start at $15,000 per year with custom pricing and optional self-hosted deployment. All usage is billed per second, not per minute.
Deepgram's core product is its speech-to-text API, supporting real-time WebSocket streaming under 300ms and batch REST processing. The Nova-3 model covers 45+ languages; the Flux model adds built-in turn detection for conversational AI. Intelligence add-ons include speaker diarization, smart formatting, sentiment analysis, topic detection, and entity redaction. A text-to-speech API and full Voice Agent API combining STT, LLM routing, and TTS round out the platform. An official MCP server shipped in April 2026.
Deepgram does not have a permanent free tier, but all new Pay-As-You-Go accounts receive $200 in free API credits. These credits cover approximately 430 hours of Nova-3 monolingual transcription and do not expire. After the credits are used, usage is billed at standard per-second rates. No subscription or minimum commitment is required for the Pay-As-You-Go plan.
The main alternatives are AssemblyAI, Google Cloud Speech-to-Text, AWS Transcribe, and OpenAI Whisper. AssemblyAI is better when you need richer audio intelligence features and slightly higher accuracy on pre-recorded audio. Google Speech-to-Text suits teams already inside Google Cloud. OpenAI Whisper is better for offline or self-hosted multilingual transcription at no API cost, though it lacks real-time streaming. Speechmatics leads on multilingual WER across non-English languages.
Deepgram is best for backend engineers and product teams building real-time voice features: voice agents, call center analytics, live captioning, and conversational AI pipelines where latency under 300ms is critical. It is particularly strong for teams prioritizing speed over a broad out-of-the-box feature set. Deepgram is not suitable for non-technical users who need a ready-made transcription app, or for teams needing deep CRM integrations without custom development work.
Yes. Deepgram's entire product is API-first with REST endpoints for batch transcription and WebSocket endpoints for real-time streaming. Official SDKs are available in Python, JavaScript, Go, and .NET. Documentation is at developers.deepgram.com. The API covers all products: speech-to-text, text-to-speech, voice agents, and intelligence add-ons. API keys are available immediately after signing up. GitHub repository is at github.com/deepgram.
Yes. Deepgram launched an official Model Context Protocol (MCP) server in April 2026 as part of its agentic engineering toolkit. The MCP server connects AI coding tools — Claude Code, Cursor, Windsurf, and others — directly to Deepgram's API, enabling agents to transcribe audio, generate speech, list available models, and manage projects without leaving the development environment. It is included in the Deepgram CLI and available as a standalone package at github.com/deepgram/mcp.