Gemini 3.1 Flash-Lite | hokai.io

Gemini 3.1 Flash-Lite: $0.25/$1.50 per million tokens, 1M context, 260 tokens/sec. Ultra-efficient AI model from Google.

Gemini 3.1 Flash-Lite: ultra-cheap ($0.25/$1.50), ultra-fast (260 t/s), 1M context, 83% GPQA. March 2026. Best for cost-sensitive bulk work.

Gemini 3.1 Flash-Lite is Google's ultra-efficient model with 1M context, 83% GPQA, 260 tokens/sec. March 2026 GA at $0.25/$1.50 per 1M tokens.

Provider: Google · Family: Gemini 3.1

Context window: 1,000,000 tokens · Max output: 64,000

Input modalities: text, image, audio, video, pdf, tool-calls · Output: text, tool-calls

About Gemini 3.1 Flash-Lite

Gemini 3.1 Flash-Lite is Google's smallest and most cost-efficient model released March 3, 2026. Compact Transformer 8-15B params. 1M context, 64K output. GPQA 83%, MMLU-Pro 71%, SWE-bench est. 68%. Speed: 260 tokens/sec, 320ms p50 latency�fastest in the family. Pricing: $0.25/$1.50 per 1M tokens, 8X cheaper than Pro. Multimodal: all standard inputs. Function calling, structured output, Search, code execution GA. Computer use available as beta. Training cutoff Jan 2025. Safety balanced. Purpose-built for: high-volume moderation, classification, content generation, email triage, multi-turn conversation at massive scale. Avoid if: deep reasoning needed (GPQA 83% may be insufficient), coding critical (SWE-bench 68% is weak), on-device (API-only).

Benchmarks

Frequently Asked Questions

What is Flash-Lite?

Smallest Gemini 3.1 model released March 3, 2026. Compact Transformer, 1M context. GPQA 83%, MMLU 71%, SWE-bench 68%. 260 t/s speed, $0.25/$1.50 pricing. 8X cheaper than Pro for high-volume tasks.

Visit Gemini 3.1 Flash-Lite Official Page