Gemini 3.5 Flash Review: 78% SWE-bench, $1.50/M (2026)
Gemini 3.5 Flash, released May 19, 2026 by Google DeepMind, scores 78% SWE-bench Verified with 1M context. Priced $1.50 input / $9 output per 1M tokens.
Gemini 3.5 Flash is Google DeepMind's Flash-tier multimodal model, released May 19, 2026 at Google I/O with a 1,048,576-token context window and a 78% SWE-bench Verified score, up 42% from Gemini 3 Flash. Priced at $1.50 input / $9.00 output per 1M tokens with $0.15 cached input, it runs at 289 tokens/second and tops MCP Atlas tool-use at 83.6%.
Gemini 3.5 Flash, released by Google DeepMind on May 19, 2026, scores 78% on SWE-bench Verified (a 42% gain over Gemini 3 Flash) with a 1,048,576-token context window. It costs $1.50 per 1M input tokens and $9.00 per 1M output tokens, runs at 289 tokens/second (about 4x faster than GPT-5.5 and Claude Opus 4.7), and leads tool orchestration with 83.6% on MCP Atlas.
Provider: Google · Family: Gemini 3.5
Context window: 1,048,576 tokens · Max output: 65,536
Input modalities: text, image, audio, video, pdf · Output: text, tool-calls
About Gemini 3.5 Flash
Gemini 3.5 Flash is Google DeepMind's Flash-tier multimodal model, announced and released to general availability on May 19, 2026 at Google I/O. It is the successor to Gemini 3 Flash (released December 17, 2025) and is built on the Gemini 3 Flash reasoning foundation, adding configurable thinking levels to balance quality, cost, and latency. Architecture details and parameter counts have not been disclosed by Google, consistent with prior Gemini releases; it is a natively multimodal Transformer-based model. Within the Gemini 3.5 family, Flash is the speed and cost-optimized tier, sitting below Gemini 3.5 Pro, which was announced the same day and remains in limited Vertex AI enterprise preview as of June 2026. Gemini 3.5 Flash scores 78% on SWE-bench Verified, a 42% improvement over Gemini 3 Flash, and beats Gemini 3.1 Pro on several real-world benchmarks despite running at Flash-tier pricing. It posts 90.4% on GPQA Diamond and 84.2% on MMMU-Pro, the highest multimodal reasoning score recorded by Artificial Analysis at launch. On ARC-AGI-2 it scores 72.1%, trailing Gemini 3.1 Pro's 77.1%, showing the hardest abstract-reasoning tasks still favor the Pro tier. Its Artificial Analysis Intelligence Index of 55 sits below GPT-5.5's 60, while Claude Opus 4.7 leads the hardest coding benchmark, SWE-bench Pro, at 64.3%. Gemini 3.5 Flash's edge is in agentic tool use: it scores 83.6% on MCP Atlas, the highest tool-orchestration result of any model as of June 2026. The model has a 1,048,576-token (1M) context window with a maximum output of 65,536 tokens. This is half the 2M-token window Gemini 3.5 Pro targets, but doubles Gemini 3 Flash's prior limits. Long-context recall above 100K tokens has not been independently published for the 3.5 generation, though Gemini 3.1 Pro demonstrated reliable recall at large depths as a reference point for the lineage. Gemini 3.5 Flash accepts text, image, audio, video, and PDF inputs in a single native API call, but outputs only text and tool-calls; it does not generate images, audio, or video. Function calling and structured output are fully supported, including combined tool use that mixes built-in tools (such as Google Search grounding and code execution) with custom function schemas in the same request. Function call results can themselves return multimodal objects like images and PDFs. Thinking levels replace the older integer thinking_budget parameter with a minimal/low/medium/high enum; the default dropped from high (Gemini 3 Flash Preview) to medium in 3.5 Flash, prioritizing speed and cost for typical workloads. Pricing is $1.50 per 1M input tokens and $9.00 per 1M output tokens, roughly 3x Gemini 3 Flash's $0.50/$3.00 rate but still a third of Claude Opus 4.7's cost on a blended basis. Cached input tokens cost $0.15 per 1M, a 90% discount. A coding agent loop processing 1M input tokens and 200K output tokens costs about $3.30. A multimodal analysis pass over 500K input tokens (video, PDF, audio) with 10K output tokens costs about $0.84. A high-throughput support workload of 1,000 turns at 2K input and 500 output tokens each costs about $7.50. Gemini 3.5 Flash is available through the Gemini API (ai.google.dev) and Google Vertex AI, including a GDPR-compliant EU multi-region endpoint covered by the Vertex AI Data Processing Addendum. Single-region europe-west3 and europe-west4 Vertex endpoints remain limited to older Gemini 2.5 and 2.0 models. There is no open-weights release; Google's open-weight line is the separate Gemma family. OpenRouter also lists Gemini 3.5 Flash for third-party API access. On safety, Google reports that Gemini 3.5 Flash outperforms Gemini 3 Flash on both safety and tone metrics while keeping unjustified refusals low. Based on Gemini 3.1 Pro's evaluation results under Google's Frontier Safety Framework, Google assesses that Gemini 3.5 Flash is unlikely to reach any Critical Capability Levels. Its training data knowledge cutoff is January 2025. Google does not use API inputs to train production models by default, and Vertex AI enterprise customers can select data residency regions across the US, EU, and Asia. Google Cloud holds SOC 2 Type II and ISO 27001 certifications, with HIPAA-eligible configurations available on Vertex AI. Gemini 3.5 Flash is the right default for agentic coding pipelines, MCP-based tool orchestration, and multimodal applications that need to ingest large volumes of video, audio, or documents cheaply and quickly. Teams needing the absolute strongest abstract reasoning (ARC-AGI-2, Artificial Analysis Intelligence Index) should consider Gemini 3.1 Pro, GPT-5.5, or wait for Gemini 3.5 Pro's GA. Teams needing image, audio, or video generation should pair Gemini 3.5 Flash with Gemini 3.1 Flash Image or another generation model, since Flash's own output is text and tool-calls only.
Pricing
$1.50 per 1M input tokens, $9.00 per 1M output tokens. Cached input is $0.15 per 1M tokens, a 90% discount. Roughly 3x Gemini 3 Flash's $0.50/$3.00 pricing but about a third of Claude Opus 4.7's blended cost.
Key Features
- 1M-Token Multimodal Context: Accepts 1,048,576 tokens of text, image, audio, video, and PDF in a single request, with up to 65,536 tokens of output.
- MCP Atlas-Leading Tool Use: Scores 83.6% on MCP Atlas, the highest tool-orchestration result of any model as of June 2026, with combined built-in and custom tool calls in one request.
- Configurable Thinking Levels: Replaces the integer thinking_budget with a minimal/low/medium/high enum; default is medium, balancing quality against the roughly 17.75-second TTFT seen at high.
- 289 Tokens/Second Output: Roughly 4x faster output throughput than competing frontier models like GPT-5.5 and Claude Opus 4.7.
- 90% Cached-Input Discount: Cached input tokens cost $0.15 per 1M versus $1.50 standard, cutting repeat-context costs by 90%.
Pros
- 78% SWE-bench Verified, a 42% jump over Gemini 3 Flash, at Flash-tier pricing.
- Native ingestion of text, image, audio, video, and PDF with an 84.2% MMMU-Pro score, the highest recorded by Artificial Analysis.
- 289 output tokens/second and $1.50/$9 per 1M pricing make it roughly a third of Claude Opus 4.7's cost.
Cons
- Output is text and tool-calls only, no native image, audio, or video generation.
- 72.1% on ARC-AGI-2 trails Gemini 3.1 Pro's 77.1% for the hardest abstract-reasoning tasks.
- Artificial Analysis Intelligence Index of 55 sits below GPT-5.5's 60 on pure reasoning.
Benchmarks
- mmmu pro: 84.2
- arc agi 2: 72.1
- mcp atlas: 83.6
- gpqa diamond: 90.4
- swe bench verified: 78
- artificial analysis intelligence index: 55
- artificial analysis speed tokens per sec: 289
Frequently Asked Questions
What is Gemini 3.5 Flash and who built it?
Gemini 3.5 Flash is a natively multimodal large language model built by Google DeepMind and released to general availability on May 19, 2026 at Google I/O. It is the Flash tier of the Gemini 3.5 family, succeeding Gemini 3 Flash, which launched December 17, 2025. The model is built on the Gemini 3 Flash reasoning foundation with added configurable thinking levels (minimal, low, medium, high). It scores 78% on SWE-bench Verified, a 42% improvement over Gemini 3 Flash, and 83.6% on MCP Atlas, the highest tool-orchestration score of any model as of June 2026. It supports a 1,048,576-token context window with up to 65,536 tokens of output. Architecture details and exact parameter counts have not been disclosed. It sits below Gemini 3.5 Pro, which targets a 2M-token context window and remains in limited Vertex AI preview as of June 2026.
How much does Gemini 3.5 Flash cost per 1M tokens?
Gemini 3.5 Flash costs $1.50 per 1M input tokens and $9.00 per 1M output tokens, confirmed at general availability on May 19, 2026. Cached input tokens cost $0.15 per 1M, a 90% discount versus standard input pricing. This is roughly 3x Gemini 3 Flash's prior $0.50/$3.00 pricing, but about a third of Claude Opus 4.7's blended cost. A coding agent loop processing 1M input tokens and 200K output tokens costs about $3.30. A multimodal analysis pass over 500K input tokens with 10K output tokens costs about $0.84. A high-throughput support workload of 1,000 turns at 2K input and 500 output tokens each costs about $7.50. A cached-context batch summarization job using 1M cached input tokens and 50K output tokens costs about $0.60. Verify current rates at cloud.google.com/vertex-ai/generative-ai/pricing, as Google updates Gemini pricing periodically.
What is Gemini 3.5 Flash's context window and max output?
Gemini 3.5 Flash has a context window of 1,048,576 tokens, commonly described as 1M tokens, with a maximum output of 65,536 tokens per response. This doubles Gemini 3 Flash's prior context limit and is half of the 2M-token window Gemini 3.5 Pro targets in its limited Vertex AI preview. Independent long-context recall benchmarks above 100K tokens have not been published specifically for the 3.5 generation, though Gemini 3.1 Pro demonstrated reliable recall at large depths in prior evaluations as a reference point for the lineage. The large context window combined with multimodal input means a single request can include lengthy codebases, multi-document research corpora, or extended audio and video transcripts alongside text. Thinking levels (minimal, low, medium, high) operate within this context budget, with higher levels consuming more of the output token allowance for internal reasoning before the final response.
How does Gemini 3.5 Flash compare on benchmarks vs GPT-5.5 and Claude Opus 4.7?
Gemini 3.5 Flash scores 78% on SWE-bench Verified, 90.4% on GPQA Diamond, 72.1% on ARC-AGI-2, and 84.2% on MMMU-Pro, with an Artificial Analysis Intelligence Index of 55. GPT-5.5 scores higher on the Intelligence Index at 60 and posts 88.7% on SWE-bench Verified and 92.4% on MMLU, making it the stronger choice for pure reasoning and long-document retrieval. Claude Opus 4.7 leads the hardest software engineering benchmark, SWE-bench Pro, at 64.3%, ahead of both. On ARC-AGI-2, Gemini 3.5 Flash's 72.1% trails Gemini 3.1 Pro's 77.1%, showing the Pro tier still leads on abstract reasoning within Google's own lineup. Where Gemini 3.5 Flash wins outright is agentic tool use and multimodal ingestion: its 83.6% MCP Atlas score is the highest of any model as of June 2026, and its 84.2% MMMU-Pro is the highest multimodal score Artificial Analysis has recorded. In practice, a 10-point SWE-bench gap to GPT-5.5 translates to noticeably more failed or incomplete agentic coding runs on harder repository tasks, but Gemini 3.5 Flash's 4x speed and lower cost often offset this for high-volume use.
Is Gemini 3.5 Flash open source or proprietary?
Gemini 3.5 Flash is proprietary and closed-weights, accessible only through Google's APIs. There is no Hugging Face release, no downloadable weights, and no self-hosting or fine-tuning path for the base model. It is available via the Gemini API at ai.google.dev using an API key, through Google Vertex AI using Google Cloud IAM (including a GDPR-compliant EU multi-region endpoint covered by the Vertex AI Data Processing Addendum), and through Google AI Studio for prototyping with a free-tier rate limit. Third-party aggregators such as OpenRouter also list Gemini 3.5 Flash for API access. For teams that require open weights, Google's separate Gemma family (Apache 2.0 licensed) is the open-weight alternative, distinct from the proprietary Gemini line. Commercial use of Gemini 3.5 Flash is governed by the Gemini API Additional Terms of Service.
What modalities does Gemini 3.5 Flash support?
Gemini 3.5 Flash accepts text, image, audio, video, and PDF as inputs in a single native API request. Output modalities are limited to text and tool-calls; the model does not generate images, audio, or video. Function calling and structured output (JSON mode) are fully supported, and the model can combine built-in tools, such as Google Search grounding and code execution, with custom function-call schemas in the same request. Function call results can themselves include multimodal objects like images and PDFs, even though the model's own generated output remains text-based. Its 83.6% MCP Atlas score reflects strong performance on Model Context Protocol tool orchestration specifically. For applications that need image, audio, or video generation, Google's Gemini 3.1 Flash Image model or another dedicated generation model must be paired alongside Gemini 3.5 Flash.
Does Gemini 3.5 Flash train on user data?
Google does not use API inputs to train production Gemini models by default, including Gemini 3.5 Flash, consistent with Google Cloud's standard data policy. Enterprise customers on Vertex AI can select data residency regions across the US, EU, and Asia, and the EU multi-region endpoint is covered by the Vertex AI Data Processing Addendum for GDPR-compliant routing within the EU. Google Cloud holds SOC 2 Type II and ISO 27001 certifications, and HIPAA-eligible configurations are available on Vertex AI for healthcare workloads. Under the EU AI Act, Gemini 3.5 Flash is expected to carry general-purpose AI obligations with systemic risk reporting requirements, consistent with prior Gemini models. The model's training knowledge cutoff is January 2025, and Google's standard data pipeline includes deduplication, quality filtering, CSAM filtering, and safety filtering before training. Specific retention windows for API logs are detailed in the Gemini API Additional Terms of Service.
Who is Gemini 3.5 Flash best for and who should avoid it?
Gemini 3.5 Flash is best for agentic coding teams running MCP-based tool loops, where its 83.6% MCP Atlas score and 78% SWE-bench Verified lead the field as of June 2026. It is also strong for product teams building multimodal applications that ingest video, audio, or PDFs at scale, given its 84.2% MMMU-Pro score and 1M-token context window. Cost-sensitive, high-throughput applications benefit from its 289 tokens/second output and $1.50/$9 per 1M pricing, roughly a third of Claude Opus 4.7's cost. Teams that need the strongest abstract reasoning, measured by ARC-AGI-2 or the Artificial Analysis Intelligence Index, should consider Gemini 3.1 Pro or GPT-5.5 instead, since Flash's 72.1% and 55 respectively trail both. Teams that need native image, audio, or video generation should pair Gemini 3.5 Flash with Gemini 3.1 Flash Image, since Flash's own output is text and tool-calls only. Latency-sensitive interactive applications should avoid thinking_level='high', which pushes time-to-first-token to roughly 17.75 seconds.