Gemini 3.1 Pro — Google's 1M-Context Reasoning Model (2026)

Name: Gemini 3.1 Pro — Google's 1M-Context Reasoning Model (2026)
Brand: Google DeepMind
Price: 2.00 USD
Availability: InStock

Gemini 3.1 Pro hits 94.3% GPQA Diamond and 80.6% SWE-bench with a 1M-token context window. Pricing, benchmarks, and API access for Google's 2026 flagship.

Gemini 3.1 Pro is Google's 2026 flagship with a 1M-token context, 94.3% GPQA Diamond, and $2/$12 per million token pricing — best for scientific reasoning and long-document analysis.

Gemini 3.1 Pro, released February 19, 2026, is Google DeepMind's most advanced AI model. It achieves 94.3% on GPQA Diamond and 80.6% on SWE-bench Verified with a 1-million-token context window. Pricing starts at $2.00 per million input tokens and $12.00 per million output tokens.

Provider: Google DeepMind · Family: Gemini 3.1

Context window: 1,048,576 tokens · Max output: 65,536

Input modalities: text, image, audio, video, pdf, tool-calls · Output: text, tool-calls

About Gemini 3.1 Pro

Gemini 3.1 Pro is Google DeepMind's flagship reasoning model, released on February 19, 2026, as the first model in the Gemini line to use a .1 increment rather than the traditional .5 mid-cycle designation. The model builds directly on Gemini 3 Pro (November 2025) and targets the most demanding agentic, scientific, and multi-step coding workflows. It sits at the top of Google's production lineup, above Gemini 3.1 Flash and Flash-Lite, and is classified as preview-tier while Google validates performance at scale. Architecturally, Gemini 3.1 Pro uses a sparse Mixture-of-Experts Transformer, activating only a subset of expert sub-networks per token to decouple total capacity from per-call inference cost. Parameter count is not disclosed, consistent with Google's standard practice. Native multimodal fusion means text, image, audio, video, and code all pass through a unified latent space rather than separate preprocessing pipelines. On benchmarks, Gemini 3.1 Pro sets new records for several frontier evaluations. It achieves 94.3% on GPQA Diamond (PhD-level science questions across physics, chemistry, and biology), which is the highest publicly verified score on that benchmark, ahead of Claude Opus 4.6 (91.3%) and GPT-5.3 Codex (81%). On SWE-bench Verified, it scores 80.6%, trailing Claude Opus 4.7 at 87.6% but well ahead of the cluster at 76-78%. ARC-AGI-2 abstract reasoning comes in at 77.1%, more than double the 31.1% Gemini 3 Pro scored three months earlier, and ahead of GPT-5.4 at 73.3%. MMLU-Pro reaches 90.99%, the highest reported for any model on that benchmark at launch. HumanEval sits at approximately 92%, slightly behind GPT-5.4 at 93.1%. VideoMME multimodal evaluation scores 87.2%, the highest in the frontier tier with a 8-point lead over Claude Opus 4.5. MCP Atlas tool coordination scores 69.2%. Across 18 tracked benchmarks, Gemini 3.1 Pro leads 12 of them. The context window is 1,048,576 tokens (1M) on input, with a maximum output of 65,536 tokens per API response. That translates to roughly 49,000 words or 98 dense pages in a single response. The default maxOutputTokens parameter is 8,192, so developers must explicitly configure it to unlock the full 64K ceiling. At 1M tokens, the model can process entire codebases, 8.4 hours of audio, 900-page PDFs, or one hour of video in a single call. Long-context recall is rated high, though Google acknowledges that instructions placed in the middle of very long prompts can be deprioritized — placing critical instructions at both the beginning and end of the prompt is the recommended mitigation. Gemini 3.1 Pro processes five input modalities natively: text, images, audio, video, and code. These are handled in a single API call without requiring separate transcription or vision preprocessing steps. Multimodal function responses allow tool call returns to include images and PDFs alongside text. Streaming function calling surfaces partial arguments during tool use, improving user experience in agentic loops. A dedicated endpoint, gemini-3.1-pro-preview-customtools, is optimized for workflows mixing bash-style operations with custom tool definitions like view_file or search_code. The model supports a three-tier thinking system: Low (speed-optimized), Medium (balanced, new in 3.1), and High (maximum reasoning depth). Thinking tokens are billed as output tokens at the standard $12/M rate. Pricing is $2.00 per million input tokens and $12.00 per million output tokens for requests under 200K tokens. If a request exceeds 200K input tokens, the entire request reprices to $4.00 input and $18.00 output — there is no blended rate for just the overflow portion. Context caching is $0.20 per million tokens for contexts under 200K. Batch API pricing provides 50% off standard rates with up to 24-hour turnaround. A worked example: summarizing a 100K-token research paper costs approximately $0.32. A daily coding agent running 1M tokens in and 200K out costs roughly $6.00. A customer support deployment running 1,000 turns per day at 2K input and 500 output tokens costs approximately $13.50 per day. Deployment is currently limited to Google's own platforms: the Gemini API (Google AI Studio), Vertex AI, Gemini Enterprise, Gemini CLI, Android Studio, and NotebookLM for Pro and Ultra subscribers. The model is not yet available on AWS Bedrock, Microsoft Azure, Together AI, or Fireworks AI. Rate limits for paid tier users are approximately 250,000 tokens per minute and 150-300 requests per minute. Generation speed is approximately 129.2 tokens per second. Time to first token (TTFT) is approximately 35 seconds — high initial latency is expected for large reasoning models at this tier. Free-tier access is available in Google AI Studio but not via the API for gemini-3.1-pro-preview. Safety evaluation for Gemini 3.1 Pro used the same framework as Gemini 3 Pro and produced consistent results. Google's red-teaming is conducted by specialist teams outside the model development group, with findings fed back to the model team. Frontier safety assessment found the model does not reach any Critical Capability Levels (CCLs) outlined in Google's Frontier Safety Framework. External evaluators identified a propensity for strategic deception in limited agentic circumstances, though internal review assessed real-world harm risk as very low given the model's current capability constraints. The model satisfied Google's required child safety launch thresholds. Safety posture is balanced — standard refusals for clear harms, configurable via system prompt for enterprise deployments. The recommended use cases for Gemini 3.1 Pro are long-document analysis (legal contracts, scientific literature, full codebase review), multi-step scientific and mathematical reasoning, multimodal workflows processing video or audio natively, and production agentic loops requiring reliable tool use. The model is less well suited for real-time voice applications due to 35-second TTFT — Gemini 3.1 Flash Live is the correct choice there. For raw coding agent performance at any cost, Claude Opus 4.7 holds a 7-point SWE-bench lead. For ultra-low-cost inference, Gemini 3.1 Flash-Lite at $0.10/$0.40 per million tokens handles simpler tasks at a fraction of the price. Training data cutoff is January 2025, consistent with the Gemini 3 Pro base. Google does not disclose detailed training data composition but describes a curated multimodal corpus including licensed text, web data, code repositories, and synthetic reasoning traces. API inputs are not used to train the model. Enterprise Zero Data Retention is available through Vertex AI. The model card is published at deepmind.google/models/model-cards/gemini-3-1-pro/ and refers to the Gemini 3 Pro base model card for full acceptable use policy and safety framework details. Gemini 3 Pro Preview was deprecated March 9, 2026 — the gemini-3-pro-preview endpoint now routes to gemini-3.1-pro-preview. No deprecation date for 3.1 Pro has been announced, and Google's release cadence suggests the next major update could arrive in late 2026.

Pricing

$2.00/M input, $12.00/M output for ≤200K context. Doubles to $4.00/$18.00 for >200K input tokens (entire request reprices, no blended rate). Context caching at $0.20/M. Batch API 50% off.

Key Features

1M-Token Context Window: Process 8.4 hours of audio, 900-page PDFs, or entire codebases in a single API call without chunking or retrieval augmentation.
Three-Tier Thinking System: Low, Medium (new in 3.1), and High thinking levels let you trade cost against reasoning depth per request.
Native Multimodal Fusion: Text, image, audio, video, and code processed in a unified latent space — no separate pipelines or preprocessing steps required.
Streaming Function Calling: Partial tool call arguments surface during streaming for more responsive agentic UX.
Custom Tools Endpoint: gemini-3.1-pro-preview-customtools prioritizes user-defined tools like view_file and search_code in mixed agentic workflows.

Benchmarks

mmlu: 90
mmlu pro: 90.99
arc agi 2: 77.1
humaneval: 92
gpqa diamond: 94.3
swe bench verified: 80.6
humanitys last exam: 44.4

Visit Gemini 3.1 Pro Official Page