GPT-5.2: 400K Context, 92.4% GPQA Diamond (2026)
GPT-5.2 by OpenAI (Dec 2025): 400K-token context window, 80% SWE-bench Verified, 92.4% GPQA Diamond. Costs $1.75/$14 per 1M tokens. Deprecated June 2026.
GPT-5.2, released December 11, 2025 by OpenAI, scored 80% SWE-bench Verified and 100% AIME 2025 using its extended-thinking tier, with a 400,000-token context window. Priced at $1.75 per 1M input and $14 per 1M output tokens, it was deprecated in June 2026 and succeeded by GPT-5.5.
GPT-5.2, released December 11, 2025 by OpenAI, scored 80% on SWE-bench Verified and 92.4% on GPQA Diamond at launch. It features a 400,000-token context window and costs $1.75 per 1M input tokens and $14.00 per 1M output tokens. Deprecated in June 2026 in favor of GPT-5.5, which succeeded it as OpenAI's flagship reasoning model.
Provider: OpenAI · Family: GPT-5
Context window: 400,000 tokens · Max output: 128,000
Input modalities: text, image, audio, video, pdf, tool-calls · Output: text, tool-calls
About GPT-5.2
GPT-5.2 is OpenAI's fourth release in the GPT-5 product family, launched on December 11, 2025. It positions above GPT-5.1 in the lineup and was eventually succeeded by GPT-5.3 and GPT-5.5. The model uses a Mixture-of-Experts (MoE) transformer architecture, routing each token to a specialized subset of expert networks rather than activating all parameters per inference. This reduces compute cost per query while maintaining large total model capacity. OpenAI has not published the parameter count, but inference cost patterns and MoE activation ratios place it in the 2-to-5 trillion total parameter range. The model ships in three service tiers: GPT-5.2 Instant (low latency, no extended reasoning), GPT-5.2 Thinking (standard and extended chain-of-thought before responding), and GPT-5.2 Pro (maximum quality with an unlimited thinking budget). A coding-specialized companion, GPT-5.2-Codex, launched the same day. On release, GPT-5.2 posted the strongest benchmark results in the GPT-5 family to that point. It achieved 80.0% on SWE-bench Verified (agentic coding), 92.4% on GPQA Diamond (graduate-level science reasoning, a 4.3-point gain over GPT-5.1's 88.1%), and a perfect 100% on AIME 2025 (competition mathematics) using the Thinking tier without external tools. On FrontierMath, a contamination-resistant advanced math benchmark, GPT-5.2 scored 40.3%, ahead of Claude Opus 4.5 at 37.6% and Gemini 3 Pro at 31.1%. Visual reasoning on chart interpretation roughly halved error rates vs GPT-5.1. On the LMArena human-preference leaderboard, the model reached an Elo rating of 1402 as of May 2026, ranking third behind Claude Opus 4.6 at 1418 and Gemini 3.1 Pro at 1406. Confidence intervals across the top tier overlapped, meaning cost and latency should drive selection more than raw leaderboard position. The 400,000-token context window is the headline architectural change over GPT-5.1, which topped out at 128,000 tokens. The max output cap is 128,000 tokens. OpenAI reported over 99% recall accuracy on internal needle-in-haystack tests at full 400K depth for the Thinking tier. Independent evaluations found some degradation above 350,000 tokens, with recall for instructions placed at the very start of long prompts dropping from 99% to around 88% near the 400K ceiling. For comparison, Claude Opus 4.5 offered 200K context and Gemini 3 Pro offered a 1M-token window at reduced pricing; GPT-5.2 sat between those two on size but demonstrated stronger recall quality in the 100K-to-300K range in independent tests. For full-codebase review, multi-document contract analysis, and long research-paper synthesis without chunking, the 400K window covered the majority of real enterprise workloads. GPT-5.2 handles text, image, audio, and video inputs within a single API request. Video understanding scored 90.5% on Video-MMMU vs Gemini 3 Pro's 87.6%. Chart comprehension on CharXiv with Python reached 88.7%. Tool calling scored 98.7% on Tau2-bench Telecom, a multi-turn agentic benchmark with complex long-horizon tasks; OpenAI cited this as evidence that the model executes cleanly off a simple one-line system prompt where previous models required elaborate scaffolding. The model generates structured JSON natively and supports parallel tool calls in a single completion. Computer use capabilities and web search are accessible through the ChatGPT product interface, but are not exposed as raw API endpoints in GPT-5.2. The standard API pricing during the active period was $1.75 per 1M input tokens and $14.00 per 1M output tokens. Cached input tokens cost $0.18 per 1M after a minimum 1,024-token cache match, a 90% reduction. Artificial Analysis's 7:2:1 weighted blended rate works out to $1.87 per 1M tokens. The output price of $14 per 1M was frequently cited as expensive for generation-heavy workflows. As rough cost examples: summarizing a 100,000-token research paper costs around $0.20 in total tokens; a daily coding agent with 1M input and 200K output tokens costs about $4.55; a customer support deployment running 1,000 turns per day at 2K input and 500 output tokens per turn costs about $10.50 per day. By comparison, GPT-5 was priced below $1 per 1M input tokens at release, so the 5.2 pricing reflected a significant quality-tier premium. Access during the active period was through the OpenAI direct API at platform.openai.com and through Microsoft Azure OpenAI Service under the commercial arrangement in place through 2025. GPT-5.2 did not receive a standalone AWS Bedrock listing before its deprecation; Bedrock support for the GPT family launched later with GPT-5.4 and GPT-5.5. No confirmed availability on Google Vertex AI, Together AI, or Fireworks AI is documented. The model supported OpenAI's Python, TypeScript, and other official SDKs using the same chat completions endpoint structure as earlier GPT-4 models, meaning migrations from GPT-4 and GPT-5.1 required minimal code changes. OpenAI trained GPT-5.2 using reinforcement learning with extended chain-of-thought, integrating the reasoning techniques from the o1 and o3 families into the main GPT-5 product line. The knowledge and training data cutoff is August 31, 2025. The system card, published December 11, 2025 as an addendum to the GPT-5 system card, documents safety training covering agentic task refusal, resistance to prompt injection in multi-tool environments, and deployment at a GPT-4-class uplift tier under OpenAI's Preparedness Framework. The companion GPT-5.2-Codex system card added agent sandboxing and configurable network access as product-level mitigations for coding agent deployments. OpenAI does not train on API inputs by default; enterprise accounts can enable a zero-data retention option via agreement. GPT-5.2 was deprecated from the OpenAI API on May 8, 2026, with developer notifications issued simultaneously. It was retired from ChatGPT on June 12, 2026, with active conversations automatically migrating to the corresponding GPT-5.5 tier. GitHub Copilot deprecated GPT-5.2 and GPT-5.2-Codex on June 5, 2026, retaining GPT-5.2 only in Copilot code review. The successor, GPT-5.5, improved SWE-bench Verified and reduced output token pricing, addressing the two main developer complaints about GPT-5.2. Teams still on GPT-5.2 integrations should migrate to GPT-5.5 or GPT-5.4 to avoid service disruptions.
Pricing
$1.75 per 1M input tokens, $14.00 per 1M output tokens. Prompt caching at $0.18 per 1M cached input (90% discount, minimum 1,024-token match). Model is deprecated; migrate to GPT-5.4 or GPT-5.5 for continued access.
Key Features
- 400,000-Token Context Window: The first GPT-5 variant to reach 400K context, tripling the 128K limit of GPT-5 and GPT-5.1, enabling full-codebase reviews and multi-document legal analysis in one API call.
- Three-Tier Reasoning Design: Ships as Instant (low latency, no chain-of-thought), Thinking (extended reasoning before response), and Pro (unlimited thinking budget), so cost and latency can be matched to task complexity per request.
- Full Multimodal Input: Accepts text, images, audio, and video in a single request; scored 90.5% on Video-MMMU and roughly halved error rates vs GPT-5.1 on chart and software UI comprehension.
- 98.7% Tool Calling Accuracy: Scores 98.7% on Tau2-bench Telecom, a multi-turn agentic benchmark, meaning complex agent loops with many sequential tool calls work with a simple one-line system prompt.
- GPT-5.2-Codex Companion: A coding-specialized variant released the same day, with agent sandboxing, configurable network access, and tighter code-execution safety controls for autonomous software development pipelines.
Pros
- 100% AIME 2025 (Thinking tier) and 40.3% FrontierMath led both Claude Opus 4.5 (37.6%) and Gemini 3 Pro (31.1%) on advanced math at release.
- 400K context window covered full codebases and multi-document legal reviews in a single pass, with over 99% recall at 200K depth in internal tests.
- 98.7% Tau2-bench tool calling accuracy reduced the system prompt engineering needed to keep multi-step agent loops reliable.
Cons
- Deprecated in June 2026 after 6 months of availability; no new integrations should target GPT-5.2 given the active API cutoff.
- Output token price of $14 per 1M is 8x the $1.75 input price, making generation-heavy workloads such as bulk test scaffolding or long-form code generation expensive.
- No native audio output from the base API and no direct computer-use endpoint; voice pipelines require a separate TTS model call.
Benchmarks
- aime 2025: 100
- humaneval: 97.4
- lmarena elo: 1402
- gpqa diamond: 92.4
- lmarena rank: 3
- swe bench verified: 80
- artificial analysis intelligence index: 51
- artificial analysis price blended per m: 1.87
- artificial analysis speed tokens per sec: 71.1
Frequently Asked Questions
What is GPT-5.2 and who built it?
GPT-5.2 is a multimodal large language model built by OpenAI, released on December 11, 2025 as the fourth variant in the GPT-5 product family. It uses a Mixture-of-Experts (MoE) transformer architecture, where tokens are routed to a subset of expert networks per inference call, reducing compute cost while maintaining high total model capacity. OpenAI has not published the parameter count, but inference cost patterns and MoE routing behavior suggest between 2 and 5 trillion total parameters. The model launched with three service tiers: Instant (low latency, no chain-of-thought), Thinking (extended reasoning before responding), and Pro (maximum quality). A coding-specialized companion, GPT-5.2-Codex, launched the same day with additional agent sandboxing and configurable network access. The model improved over GPT-5.1 across agentic coding (SWE-bench Verified), graduate-level science reasoning (GPQA Diamond), and advanced math (AIME 2025, perfect score in Thinking tier). GPT-5.2 was deprecated from the API on May 8, 2026 and fully retired from ChatGPT on June 12, 2026, with GPT-5.5 as the designated successor.
How much does GPT-5.2 cost per 1M tokens?
GPT-5.2 was priced at $1.75 per 1M input tokens and $14.00 per 1M output tokens through the OpenAI API. Cached input tokens, available after a minimum 1,024-token cache match, cost $0.18 per 1M, a 90% reduction from the standard input rate. Artificial Analysis computed a blended rate of $1.87 per 1M tokens using a 7:2:1 cache-hit weighting. Summarizing a 100,000-token research paper costs approximately $0.20 in total tokens at those rates. A daily coding agent consuming 1M input and 200K output tokens costs about $4.55 per day. The $14/M output price was a frequent developer complaint for generation-heavy workloads like bulk test scaffolding or long-form code generation. GPT-5.5, the successor, reduced output token pricing as part of its release rationale, making it the cost-preferred option for output-heavy tasks.
What is GPT-5.2's context window and max output?
GPT-5.2 features a 400,000-token context window and a 128,000-token maximum output limit. The 400K context is triple the 128,000-token window in GPT-5 and GPT-5.1, making it the largest context window in the GPT-5 family at the time of release. OpenAI reported over 99% recall accuracy on internal needle-in-haystack tests at full 400K depth. Independent evaluations found modest degradation above 350,000 tokens, with recall for early-context instructions dropping from 99% to around 88% near the ceiling. For comparison, Claude Opus 4.5 offered 200K context and Gemini 3 Pro offered a 1M-token window; GPT-5.2 sat between those two on raw size but demonstrated stronger recall at the 100K-to-300K range in independent tests. The model handles PDFs, multi-image inputs, audio transcripts, and long video content within the same context limit. For full-codebase review, multi-document legal analysis, or long transcript synthesis without chunking, the 400K window covered the majority of real enterprise workloads.
How does GPT-5.2 compare on benchmarks vs Claude and Gemini?
GPT-5.2 led on advanced math and science reasoning at launch. On AIME 2025, GPT-5.2 Thinking achieved a perfect 100%, outperforming Gemini 3 Pro which lagged by 5 percentage points. On FrontierMath, GPT-5.2 scored 40.3%, ahead of Claude Opus 4.5 at 37.6% and Gemini 3 Pro at 31.1%. On GPQA Diamond (graduate-level science), GPT-5.2 hit 92.4%, a 4.3-point improvement over GPT-5.1 and ahead of the Claude and Gemini releases at the time. For coding on SWE-bench Verified, GPT-5.2 reached 80%, which was competitive with the top frontier models at launch. On video understanding with Video-MMMU, GPT-5.2 scored 90.5% versus Gemini 3 Pro's 87.6%. On the LMArena human-preference leaderboard (crowd-sourced votes), GPT-5.2 reached Elo 1402 in May 2026, placing third behind Claude Opus 4.6 at 1418 and Gemini 3.1 Pro at 1406, with overlapping confidence intervals across all three meaning the gap is not statistically decisive. In practice, task type and cost efficiency should drive selection over raw leaderboard rank.
Is GPT-5.2 open source or proprietary?
GPT-5.2 is fully proprietary: the model weights are closed, it is API-only, and there is no self-hosting path. During its active period, access was through the OpenAI direct API at platform.openai.com and through Microsoft Azure OpenAI Service. GPT-5.2 did not receive its own AWS Bedrock listing before deprecation; Bedrock support for the GPT family launched later with GPT-5.4 and GPT-5.5 in mid-2026. No availability on Google Vertex AI, Together AI, or Fireworks AI is documented for GPT-5.2 specifically. OpenAI's separate open-weight releases, gpt-oss-20b and gpt-oss-120b, are entirely distinct from the GPT-5.2 family and do not share its weights or architecture. All commercial use of GPT-5.2 was governed by OpenAI's standard API commercial terms. There is no VRAM requirement, quantization path, or container image for self-hosted GPT-5.2 deployment.
What modalities does GPT-5.2 support?
GPT-5.2 accepts text, images, audio, video, and PDFs as inputs within a single API request, and generates text and tool-call outputs. Multimodal inputs can be mixed in one request rather than requiring separate API calls for each modality type. Video understanding scored 90.5% on Video-MMMU, and chart comprehension on CharXiv with Python reached 88.7%. Audio input is supported for transcription and reasoning over spoken content. Audio output is not available from the base API; text-to-speech requires a separate TTS model call. Function calling and structured JSON output are fully supported, including parallel tool calls within a single completion response. Computer use (screen reading and UI interaction) is available in the ChatGPT product interface but is not exposed as a direct API capability in GPT-5.2. Code execution is supported via tool-call integrations rather than a native sandboxed environment in the base API.
Does GPT-5.2 train on user data?
OpenAI does not train on API inputs by default for GPT-5.2 or any other GPT-5 family model. API inputs and outputs are retained for up to 30 days for abuse monitoring, then deleted unless flagged for review. Enterprise customers can enable a zero-data retention option via a direct agreement with OpenAI, in which case no input or output data is stored beyond the API call. OpenAI's API terms prohibit using API traffic for model training unless the user explicitly opts in, which is not the default for business or enterprise accounts. GPT-5.2 API access is covered by OpenAI's SOC 2 Type II certification. HIPAA-eligible configurations are available through a Business Associate Agreement with OpenAI. GDPR compliance applies for EU users under OpenAI's Data Processing Agreement. When accessed through Azure OpenAI Service, data handling follows Microsoft's Azure retention and residency policies, which may differ from OpenAI's direct API terms and can include EU data residency options.
Who is GPT-5.2 best for and who should avoid it?
During its active period, GPT-5.2 was best for teams running large-context document analysis using the 400K window, agentic coding pipelines where 80% SWE-bench and 98.7% Tau2-bench tool accuracy mattered, and advanced math or science reasoning tasks where 100% AIME and 40.3% FrontierMath performance led the field. Enterprise teams on Azure with existing OpenAI SDK integrations found low migration overhead from GPT-5.1. However, GPT-5.2 is deprecated as of June 2026, making it a poor choice for any new project: the API cutoff will interrupt service, and GPT-5.5 or GPT-5.4 should be the migration target. The $14/M output price made it expensive for bulk generation workloads; teams running content pipelines or large-scale test generation were better served by GPT-5.4 once it released. Voice-first applications should avoid GPT-5.2 since it produces no audio output natively. On-device or air-gapped deployments are not possible given the closed-weights, API-only architecture.