Name: AssemblyAI: Speech-to-Text API for Voice AI Apps
Brand: AssemblyAI
Rating: 4.6 (500 reviews)

Question 1

What's the difference between Universal-3 Pro and Universal models?

Accepted Answer

Universal-3 Pro is a promptable speech language model supporting natural language prompting for fine-grained transcription control (speaker labels, disfluencies, audio tagging) without retraining. It achieves 5.6% WER on English and costs $0.21/hr (pre-recorded) or $0.45/hr (streaming). Universal is the older model at $0.15/hr, supports 99 languages, and doesn't include prompting—better for general-purpose use.

Question 2

How does AssemblyAI pricing actually work with add-on features?

Accepted Answer

Base transcription: $0.15/hr. Speaker diarization costs +$0.02/hr, sentiment analysis +$0.02/hr, entity detection +$0.03/hr. These stack on top of the base rate. A fully-featured transcription can quickly reach 2-5x the advertised $0.15/hr base cost. Streaming is 3x the pre-recorded rate ($0.45/hr for Universal-3 Pro vs $0.21/hr).

Question 3

Is AssemblyAI HIPAA compliant for healthcare applications?

Accepted Answer

Yes. AssemblyAI holds SOC 2 Type 2, HIPAA, GDPR, ISO 27001, PCI, FedRAMP, and CSA Star Level 1 certifications. It supports Business Associate Agreements (BAAs) for healthcare use cases including medical transcription, clinical documentation, and telemedicine call recording.

Question 4

What's the free tier limit and when should I upgrade?

Accepted Answer

Free tier provides $50 credits: up to 185 hours of pre-recorded transcription or 333 hours of streaming. Once exhausted, pricing switches to pay-as-you-go. For commercial use or >10 hours/month of transcription, a paid account is recommended. Enterprise discounts available at 50,000+ hours/month.

Question 5

How does Universal-3 Pro streaming work with voice agents?

Accepted Answer

Universal-3 Pro Streaming delivers immutable low-latency transcripts with intelligent endpointing for real-time voice agent turn detection. It uses punctuation-based turn detection and supports up to 1,000 domain-specific keyterms. Native integrations with LiveKit, Twilio, Daily, and PipeCat allow deployment in <15 minutes. Costs $0.45/hr for session duration.

Question 6

Can I use AssemblyAI offline or self-host?

Accepted Answer

AssemblyAI is a cloud-only API platform—no offline capabilities or self-hosting options. All processing happens on AssemblyAI's infrastructure. For on-premise transcription, consider open-source alternatives like Whisper or Deepgram's enterprise self-hosting option.

Question 7

What integrations does AssemblyAI support?

Accepted Answer

Native integrations: LiveKit, PipeCat, Twilio, Daily. LLM Gateway connects to OpenAI GPT, Anthropic Claude, Google Gemini. SDKs available for Python, JavaScript/Node.js, Ruby, Go. Zapier and Slack require custom webhook setup. No native Salesforce or Microsoft Teams connectors.

AssemblyAI: Speech-to-Text API for Voice AI Apps

Pricing

Frequently Asked Questions