Gemini 3.1 Flash Audio | hokai.io

Gemini 3.1 Flash Audio: real-time voice AI, 320ms latency, native audio in/out, video awareness. $3/$12 per million tokens. GA March 2026.

Gemini 3.1 Flash Audio: real-time voice with 320ms latency and SynthID watermarking. Native audio I/O, multimodal. March 2026 GA at $3/$12 per 1M tokens.

Gemini 3.1 Flash Audio is Google's real-time voice model with 320ms latency, native audio I/O, and video awareness. Released March 2026 at $3/$12 per 1M audio tokens.

Provider: Google · Family: Gemini 3.1

Context window: 100,000 tokens

Input modalities: audio, video, tool-calls · Output: audio

About Gemini 3.1 Flash Audio

Gemini 3.1 Flash Audio is Google's first production real-time voice model released March 26, 2026. Proprietary audio-specialized architecture, not text-based wrapper. Collapses transcribe-reason-synthesize into native audio-to-audio pipeline. Input: 16-bit PCM 16kHz audio via WebSocket. Output: native PCM audio. Session context maintained across turns. Latency: 320ms p50 TTFAC, 1.2s p99. Supports barge-in�users interrupt mid-response, model adapts naturally. Multimodal: simultaneous audio and video input (screen sharing), without separate vision+audio fusion. Pricing: $3/$12 per 1M audio tokens. All output watermarked with SynthID to enable AI-generated audio detection, preventing deepfakes. Training cutoff Jan 2025. Safety balanced. Deployment: Gemini Live API public, Gemini Enterprise for CX (GA), Google Search Live (beta), Firebase planned Q2 2026. Use for: customer support voice agents, accessibility, real estate tours, language tutoring, meeting transcription. Avoid if: ultra-low latency needed (100-200ms), on-device (API-only), text output primary (audio-first design).

Benchmarks

Frequently Asked Questions

What is Flash Audio?

Real-time voice model released March 26, 2026. Native audio-to-audio (not transcribe-reason-synthesize). 320ms latency, supports barge-in interruption, video input simultaneous with audio. $3/$12 per 1M audio tokens. SynthID watermarking on all output audio.

Visit Gemini 3.1 Flash Audio Official Page