Name: Gemini 3.5 Flash Review: 78% SWE-bench, $1.50/M (2026)
Brand: Google
Price: 1.50 USD
Availability: InStock

Question 1

What is Gemini 3.5 Flash and who built it?

Accepted Answer

Gemini 3.5 Flash is a natively multimodal large language model built by Google DeepMind and released to general availability on May 19, 2026 at Google I/O. It is the Flash tier of the Gemini 3.5 family, succeeding Gemini 3 Flash, which launched December 17, 2025. The model is built on the Gemini 3 Flash reasoning foundation with added configurable thinking levels (minimal, low, medium, high). It scores 78% on SWE-bench Verified, a 42% improvement over Gemini 3 Flash, and 83.6% on MCP Atlas, the highest tool-orchestration score of any model as of June 2026. It supports a 1,048,576-token context window with up to 65,536 tokens of output. Architecture details and exact parameter counts have not been disclosed. It sits below Gemini 3.5 Pro, which targets a 2M-token context window and remains in limited Vertex AI preview as of June 2026.

Question 2

How much does Gemini 3.5 Flash cost per 1M tokens?

Accepted Answer

Gemini 3.5 Flash costs $1.50 per 1M input tokens and $9.00 per 1M output tokens, confirmed at general availability on May 19, 2026. Cached input tokens cost $0.15 per 1M, a 90% discount versus standard input pricing. This is roughly 3x Gemini 3 Flash's prior $0.50/$3.00 pricing, but about a third of Claude Opus 4.7's blended cost. A coding agent loop processing 1M input tokens and 200K output tokens costs about $3.30. A multimodal analysis pass over 500K input tokens with 10K output tokens costs about $0.84. A high-throughput support workload of 1,000 turns at 2K input and 500 output tokens each costs about $7.50. A cached-context batch summarization job using 1M cached input tokens and 50K output tokens costs about $0.60. Verify current rates at cloud.google.com/vertex-ai/generative-ai/pricing, as Google updates Gemini pricing periodically.

Question 3

What is Gemini 3.5 Flash's context window and max output?

Accepted Answer

Gemini 3.5 Flash has a context window of 1,048,576 tokens, commonly described as 1M tokens, with a maximum output of 65,536 tokens per response. This doubles Gemini 3 Flash's prior context limit and is half of the 2M-token window Gemini 3.5 Pro targets in its limited Vertex AI preview. Independent long-context recall benchmarks above 100K tokens have not been published specifically for the 3.5 generation, though Gemini 3.1 Pro demonstrated reliable recall at large depths in prior evaluations as a reference point for the lineage. The large context window combined with multimodal input means a single request can include lengthy codebases, multi-document research corpora, or extended audio and video transcripts alongside text. Thinking levels (minimal, low, medium, high) operate within this context budget, with higher levels consuming more of the output token allowance for internal reasoning before the final response.

Question 4

How does Gemini 3.5 Flash compare on benchmarks vs GPT-5.5 and Claude Opus 4.7?

Accepted Answer

Gemini 3.5 Flash scores 78% on SWE-bench Verified, 90.4% on GPQA Diamond, 72.1% on ARC-AGI-2, and 84.2% on MMMU-Pro, with an Artificial Analysis Intelligence Index of 55. GPT-5.5 scores higher on the Intelligence Index at 60 and posts 88.7% on SWE-bench Verified and 92.4% on MMLU, making it the stronger choice for pure reasoning and long-document retrieval. Claude Opus 4.7 leads the hardest software engineering benchmark, SWE-bench Pro, at 64.3%, ahead of both. On ARC-AGI-2, Gemini 3.5 Flash's 72.1% trails Gemini 3.1 Pro's 77.1%, showing the Pro tier still leads on abstract reasoning within Google's own lineup. Where Gemini 3.5 Flash wins outright is agentic tool use and multimodal ingestion: its 83.6% MCP Atlas score is the highest of any model as of June 2026, and its 84.2% MMMU-Pro is the highest multimodal score Artificial Analysis has recorded. In practice, a 10-point SWE-bench gap to GPT-5.5 translates to noticeably more failed or incomplete agentic coding runs on harder repository tasks, but Gemini 3.5 Flash's 4x speed and lower cost often offset this for high-volume use.

Question 5

Is Gemini 3.5 Flash open source or proprietary?

Accepted Answer

Gemini 3.5 Flash is proprietary and closed-weights, accessible only through Google's APIs. There is no Hugging Face release, no downloadable weights, and no self-hosting or fine-tuning path for the base model. It is available via the Gemini API at ai.google.dev using an API key, through Google Vertex AI using Google Cloud IAM (including a GDPR-compliant EU multi-region endpoint covered by the Vertex AI Data Processing Addendum), and through Google AI Studio for prototyping with a free-tier rate limit. Third-party aggregators such as OpenRouter also list Gemini 3.5 Flash for API access. For teams that require open weights, Google's separate Gemma family (Apache 2.0 licensed) is the open-weight alternative, distinct from the proprietary Gemini line. Commercial use of Gemini 3.5 Flash is governed by the Gemini API Additional Terms of Service.

Question 6

What modalities does Gemini 3.5 Flash support?

Accepted Answer

Gemini 3.5 Flash accepts text, image, audio, video, and PDF as inputs in a single native API request. Output modalities are limited to text and tool-calls; the model does not generate images, audio, or video. Function calling and structured output (JSON mode) are fully supported, and the model can combine built-in tools, such as Google Search grounding and code execution, with custom function-call schemas in the same request. Function call results can themselves include multimodal objects like images and PDFs, even though the model's own generated output remains text-based. Its 83.6% MCP Atlas score reflects strong performance on Model Context Protocol tool orchestration specifically. For applications that need image, audio, or video generation, Google's Gemini 3.1 Flash Image model or another dedicated generation model must be paired alongside Gemini 3.5 Flash.

Question 7

Does Gemini 3.5 Flash train on user data?

Accepted Answer

Google does not use API inputs to train production Gemini models by default, including Gemini 3.5 Flash, consistent with Google Cloud's standard data policy. Enterprise customers on Vertex AI can select data residency regions across the US, EU, and Asia, and the EU multi-region endpoint is covered by the Vertex AI Data Processing Addendum for GDPR-compliant routing within the EU. Google Cloud holds SOC 2 Type II and ISO 27001 certifications, and HIPAA-eligible configurations are available on Vertex AI for healthcare workloads. Under the EU AI Act, Gemini 3.5 Flash is expected to carry general-purpose AI obligations with systemic risk reporting requirements, consistent with prior Gemini models. The model's training knowledge cutoff is January 2025, and Google's standard data pipeline includes deduplication, quality filtering, CSAM filtering, and safety filtering before training. Specific retention windows for API logs are detailed in the Gemini API Additional Terms of Service.

Question 8

Who is Gemini 3.5 Flash best for and who should avoid it?

Accepted Answer

Gemini 3.5 Flash is best for agentic coding teams running MCP-based tool loops, where its 83.6% MCP Atlas score and 78% SWE-bench Verified lead the field as of June 2026. It is also strong for product teams building multimodal applications that ingest video, audio, or PDFs at scale, given its 84.2% MMMU-Pro score and 1M-token context window. Cost-sensitive, high-throughput applications benefit from its 289 tokens/second output and $1.50/$9 per 1M pricing, roughly a third of Claude Opus 4.7's cost. Teams that need the strongest abstract reasoning, measured by ARC-AGI-2 or the Artificial Analysis Intelligence Index, should consider Gemini 3.1 Pro or GPT-5.5 instead, since Flash's 72.1% and 55 respectively trail both. Teams that need native image, audio, or video generation should pair Gemini 3.5 Flash with Gemini 3.1 Flash Image, since Flash's own output is text and tool-calls only. Latency-sensitive interactive applications should avoid thinking_level='high', which pushes time-to-first-token to roughly 17.75 seconds.

Gemini 3.5 Flash

Gemini 3.5 Flash Review: 78% SWE-bench, $1.50/M (2026)

About Gemini 3.5 Flash

Pricing

Key Features

Pros

Cons

Benchmarks

Frequently Asked Questions