Deepgram: Real-Time Voice AI API | hokai.io

Deepgram gives 200,000+ developers sub-300ms speech-to-text via API. Nova-3 starts at $0.0077/min. Best for real-time voice apps and voice agent pipelines.

Deepgram is a voice AI platform founded in 2015 by San Francisco-based Deepgram Inc., offering APIs for speech-to-text, text-to-speech, and voice agents. Its Nova-3 model transcribes in under 300ms at $0.0077/minute. Over 200,000 developers use Deepgram. Key competitors include AssemblyAI, Google Speech-to-Text, and AWS Transcribe. Pay-As-You-Go starts with $200 free credit; Growth requires $4,000+ annual prepayment; Enterprise starts at $15,000/year.

Pricing

Pay-As-You-Go: $200 free credit to start, then $0.0077/min Nova-3 mono, $0.0092/min multilingual, $0.08/min Voice Agent. Growth: $4,000+ annual prepayment (up to 20% discount). Enterprise: $15,000+/year with custom pricing and self-hosted option. Billed per second.

Frequently Asked Questions

What is Deepgram and what does it do?

Deepgram is a voice AI API platform developed by Deepgram Inc., founded in 2015 in San Francisco. It provides three core API products: speech-to-text transcription, text-to-speech synthesis, and conversational voice agents. Over 200,000 developers use the platform across customer support, media, healthcare, and conversational AI. The company raised $130 million in Series C funding in January 2026, reaching a $1.3 billion valuation.

How much does Deepgram cost?

Deepgram offers usage-based pricing across three tiers. Pay-As-You-Go starts with $200 in free API credits; after that, Nova-3 monolingual transcription costs $0.0077 per minute and multilingual costs $0.0092 per minute. The Voice Agent API is $0.08 per minute. The Growth plan requires a $4,000+ annual prepayment for up to 20% discounts. Enterprise plans start at $15,000 per year with custom pricing and optional self-hosted deployment. All usage is billed per second, not per minute.

What are the main features of Deepgram?

Deepgram's core product is its speech-to-text API, supporting real-time WebSocket streaming (under 300ms) and batch REST processing. The Nova-3 model covers 45+ languages; the Flux model adds built-in turn detection for conversational AI. Intelligence add-ons include speaker diarization, smart formatting, sentiment analysis, topic detection, and entity redaction. A text-to-speech API and full voice agent API (combining STT, LLM routing, and TTS) round out the platform.

Is Deepgram free to use?

Deepgram does not have a permanent free tier, but all new Pay-As-You-Go accounts receive $200 in free API credits. These credits cover approximately 430 hours of Nova-3 monolingual transcription and do not expire. After the credits are used, usage is billed at standard per-second rates. No subscription or minimum commitment is required for the Pay-As-You-Go plan.

What are the best alternatives to Deepgram?

The main alternatives are AssemblyAI, Google Cloud Speech-to-Text, AWS Transcribe, and OpenAI Whisper. AssemblyAI is a better choice when you need richer audio intelligence features (sentiment, chapter detection) and slightly higher accuracy on pre-recorded audio. Google Speech-to-Text suits teams already inside Google Cloud with committed GCP spend. OpenAI Whisper is better for offline or self-hosted multilingual transcription with no API cost, though it lacks real-time streaming.

Who is Deepgram best for?

Deepgram is best for backend engineers and product teams building real-time voice features: voice agents, call center analytics, live captioning, and conversational AI pipelines where latency under 300ms is critical. It is particularly strong for teams prioritizing speed over a broad out-of-the-box feature set. Deepgram is not suitable for non-technical users who need a ready-made transcription app, or for teams needing deep CRM integrations without custom development work.

Does Deepgram have an API?

Yes, Deepgram's entire product is API-first. It provides REST endpoints for batch transcription and WebSocket endpoints for real-time streaming. Official SDKs are available in Python, JavaScript, Go, and .NET. Documentation is at developers.deepgram.com. The API covers all products: speech-to-text, text-to-speech, voice agents, and intelligence add-ons. API keys are available immediately after signing up.