GPT-5 by OpenAI: 94.6% AIME & 74.9% SWE-bench (2025)
GPT-5 by OpenAI (Aug 7, 2025): 272K context, 74.9% SWE-bench Verified, 94.6% AIME 2025. At $0.625/$5.00 per 1M tokens, built for coding, math, and reasoning.
GPT-5 is OpenAI's fifth-generation flagship model released August 7, 2025, with a 272,000-token context window, 74.9% SWE-bench Verified score, and 94.6% AIME 2025 accuracy using a sparse Mixture-of-Experts architecture. Priced at $0.625 per 1M input tokens and $5.00 per 1M output tokens, it set a new OpenAI performance ceiling at launch before being succeeded by GPT-5.5 in April 2026.
GPT-5, released August 7, 2025 by OpenAI, is a Mixture-of-Experts language model with a 272,000-token context window and 74.9% SWE-bench Verified score. Priced at $0.625 per 1M input tokens and $5.00 per 1M output tokens, it scored 94.6% on AIME 2025, the highest of any OpenAI model at launch. GPT-5 has been succeeded by GPT-5.5 as OpenAI's current flagship.
Provider: OpenAI · Family: GPT-5
Context window: 272,000 tokens · Max output: 128,000
Input modalities: text, image, audio, video, pdf, tool-calls · Output: text, audio, tool-calls
About GPT-5
GPT-5 is OpenAI's fifth-generation foundation model, released publicly on August 7, 2025, as the successor to GPT-4o. It is the first flagship model from OpenAI built on a sparse Mixture-of-Experts (MoE) architecture, departing from the dense transformer design used across the GPT-4 lineage. The total parameter count has not been officially disclosed; industry estimates place it at 2 to 5 trillion total parameters, with only a fraction active during each forward pass due to MoE routing. GPT-5 was designed to push frontier performance across coding, mathematics, multimodal perception, and long-context reasoning within a single unified architecture at a notably lower cost per capability than prior generations. On benchmark evaluations published at launch, GPT-5 scored 74.9% on SWE-bench Verified, the primary real-world agentic software engineering benchmark. It reached 88.4% on GPQA Diamond for graduate-level scientific reasoning with extended thinking enabled, and 94.6% on AIME 2025 math competition problems, the highest score of any OpenAI model to that date. MMLU came in at 91.4% and HumanEval code generation at 97.4%, a 7-point improvement over GPT-4o's 90.2%. Across the Aider Polyglot benchmark for multi-language code editing, GPT-5 scored 88%, confirming consistent strength in Python, TypeScript, JavaScript, and Go. These results placed GPT-5 ahead of Claude 3.5 Sonnet on SWE-bench and competitive with Gemini 1.5 Pro on MMLU at the time of release. The API version of GPT-5 supports up to 272,000 input tokens, doubling GPT-4o's 128K limit. Maximum output per request is 128,000 tokens. ChatGPT access to context is tiered by subscription: free users get 8K, Plus subscribers 32K, and Pro subscribers 128K. Long-context recall holds up reliably in internal OpenAI evaluations through the full 272K range, though third-party needle-in-haystack testing shows moderate variability above 200K tokens. Inputs exceeding the context limit return a hard error rather than silently truncating. GPT-5 is fully multimodal at launch, accepting text, images, PDFs, audio, and video frames as inputs within the same API call. A dedicated vision encoder is integrated into the base architecture rather than run as a separate pipeline. Video input processes temporal embeddings from frame sequences, enabling tasks like extracting UI states from screen recordings or producing timestamped summaries from meeting clips. Tool use and function calling support parallel invocations, structured JSON output, and multi-step tool chaining within a single response turn. Code execution is available in the ChatGPT interface via the built-in interpreter; the raw API does not provide a sandboxed runtime by default. Standard API pricing is $0.625 per million input tokens and $5.00 per million output tokens. The Batch API applies a 50% discount, bringing costs to $0.3125 per million input and $2.50 per million output with asynchronous delivery within 24 hours. Summarizing a 100,000-token research paper costs approximately $0.07. Running a daily coding agent at 1 million input and 200,000 output tokens costs roughly $1.63 per day. A customer support deployment handling 1,000 conversations at 2,000 input and 500 output tokens each costs about $3.75 per 1,000 turns. These rates made GPT-5 one of the most cost-efficient frontier models at launch. At launch, GPT-5 was accessible via the OpenAI API directly and through Microsoft Azure AI Services. Authentication for the OpenAI API uses standard bearer tokens via API key. Azure AI Services access uses Azure Active Directory or API key authentication, with data residency options in US and EU regions. The model is not available for self-hosting; OpenAI's gpt-oss open-weight models (20B and 120B parameters), released on August 8, 2025, are a completely separate product and are not distilled or derived variants of GPT-5. OpenAI published the GPT-5 System Card on August 13, 2025. The document describes RLHF-based alignment combined with additional post-training to address sycophancy, with GPT-5 performing approximately 3x better than GPT-4o on internal sycophancy metrics. Extended thinking mode showed 8x reductions in health error rates and over 50x fewer errors in urgent safety-relevant situations compared to GPT-4o. OpenAI evaluated GPT-5 against biological, chemical, nuclear, and radiological uplift scenarios under their Preparedness Framework before deployment approval. GPT-5 fits teams that need frontier STEM reasoning at low per-token cost: $0.625 per million input was among the lowest prices at the frontier tier on release day. Engineers building production coding agents benefit from the 74.9% SWE-bench Verified score and reliable multi-step tool chaining. Researchers analyzing documents up to 272K tokens gain native multimodal input without separate preprocessing. Teams needing the best available OpenAI performance as of mid-2026 should use GPT-5.5, which scored 88.7% on SWE-bench Verified versus GPT-5's 74.9% and added a 1M-token API context window. Real-time voice application builders should use the OpenAI Realtime API rather than GPT-5, which does not support sub-500ms audio output through the standard endpoint. OpenAI has not published the full training dataset composition for GPT-5 but confirmed the use of curated internet text, licensed third-party datasets, and synthetic reasoning traces. The estimated knowledge cutoff is April 2025. API inputs are retained for 30 days for safety monitoring by default, with zero-retention enterprise agreements available. OpenAI's API platform holds SOC 2 Type II certification. HIPAA-eligible configurations are available via enterprise contracts. GDPR-compliant data processing agreements cover EU customers. The GPT-5 System Card classifies the model under EU AI Act obligations for general-purpose AI with systemic risk. GPT-5 was followed by a rapid succession of updates: GPT-5.1 in October 2025, GPT-5.2 in December 2025 with a 400K-token context window and 100% AIME 2025 score, and GPT-5.4 in early 2026 with native computer use. GPT-5.5, released April 23, 2026, became the current OpenAI flagship with 88.7% SWE-bench Verified and a 1M-token API context window. The original GPT-5 snapshot (gpt-5-2025-08-07) was deprecated June 11, 2026, with API removal scheduled for December 11, 2026. Users still on the original snapshot should migrate to GPT-5.5 before the December cutoff.
Pricing
$0.625 per 1M input tokens, $5.00 per 1M output. Batch API at 50% off: $0.3125 input and $2.50 output. Flex processing also at 50% off with variable wait times.
Key Features
- Sparse MoE Architecture: Routes each input to specialized expert subnetworks rather than activating all parameters, delivering frontier performance at $0.625 per million input tokens.
- 272K-Token Context Window: Accepts 272,000 input tokens in the API, doubling GPT-4o's limit and enabling full-codebase or book-length analysis in a single call.
- Extended Thinking Mode: Optional chain-of-thought reasoning raises GPQA Diamond from roughly 80% to 88.4% and cuts health-domain error rates 8x versus GPT-4o.
- Native Multimodal Input: Processes text, images, audio, video frames, and PDFs within the same API call through a shared unified architecture, not separate pipelines.
- Parallel Tool Calls: Executes multiple function calls in a single turn with structured JSON output, reducing round-trips in multi-step agentic workflows.
Pros
- Scored 94.6% on AIME 2025, the highest math competition result of any OpenAI model at its August 2025 launch.
- Priced at $0.625 per million input tokens, delivering frontier SWE-bench accuracy at one of the lowest cost-per-capability ratios at release.
- 272K input context with reliable long-document recall, double the capacity of its predecessor GPT-4o, enabling whole-repository or full-contract analysis.
Cons
- Superseded as of April 2026 by GPT-5.5, which scores 88.7% SWE-bench Verified versus GPT-5's 74.9% and adds a 1M-token context window.
- No self-hosting option; proprietary closed weights require API dependency, ruling out air-gapped and on-device deployments.
- Original dated snapshot deprecated June 11, 2026, with API removal December 11, 2026, requiring active migration planning.
Benchmarks
- mmlu: 91.4
- aime 2025: 94.6
- humaneval: 97.4
- gpqa diamond: 88.4
- aider polyglot: 88
- swe bench verified: 74.9
- artificial analysis intelligence index: 45
- artificial analysis price blended per m: 2.81
- artificial analysis speed tokens per sec: 75
Frequently Asked Questions
What is GPT-5 and who built it?
GPT-5 is the fifth-generation foundation model from OpenAI, launched publicly on August 7, 2025. It is built on a sparse Mixture-of-Experts (MoE) architecture, a significant departure from the dense transformer design used in GPT-4 and GPT-4o. Industry estimates place the total parameter count at 2 to 5 trillion, with only a small fraction activated per forward pass through expert routing. The model covers text, image, audio, video, and PDF inputs in a unified architecture, with native tool use and function calling. At launch, GPT-5 scored 74.9% on SWE-bench Verified, 94.6% on AIME 2025, and 91.4% on MMLU, placing it ahead of Claude 3.5 Sonnet and Gemini 1.5 Pro on most coding and reasoning benchmarks at the time. OpenAI positioned GPT-5 as the successor to GPT-4o for both ChatGPT users and API developers, with access through the OpenAI API and Microsoft Azure AI Services. The original snapshot (gpt-5-2025-08-07) was deprecated June 11, 2026; GPT-5.5 is the current OpenAI flagship.
How much does GPT-5 cost per 1M tokens?
GPT-5 is priced at $0.625 per million input tokens and $5.00 per million output tokens in the standard pay-as-you-go tier. The Batch API reduces both rates by 50%, giving $0.3125 per million input and $2.50 per million output, with results returned asynchronously within 24 hours. Flex processing also offers 50% off at variable processing speed. For practical workloads: summarizing a 100,000-token research paper costs approximately $0.07; a daily coding agent at 1 million input and 200,000 output tokens costs about $1.63 per day; a customer support deployment handling 1,000 conversations at 2,000 input and 500 output tokens each costs roughly $3.75 per 1,000 turns. OpenAI did not publish prompt caching pricing for the original GPT-5 at launch, unlike later models in the family. For teams needing self-hosting to avoid API costs entirely, GPT-5's weights are proprietary; the separate gpt-oss-120b model (Apache 2.0, released August 8, 2025) is the open-weight alternative.
What is GPT-5's context window and max output?
The API version of GPT-5 accepts up to 272,000 input tokens, doubling GPT-4o's 128,000-token limit. Maximum output per request is 128,000 tokens. ChatGPT interface access to context is tiered: free users get 8,000 tokens, Plus subscribers get 32,000 tokens, and Pro subscribers access 128,000 tokens. Long-context recall holds up reliably through the full 272K range according to OpenAI's internal evaluations, though third-party needle-in-haystack testing shows moderate variability above 200,000 tokens. Inputs that exceed the context limit return a hard API error; the model does not silently truncate. GPT-5.2, released December 2025, extended the context window to 400,000 tokens, and GPT-5.5 (April 2026) brought the context window to 1,000,000 tokens in the API with 922,000 usable input tokens.
How does GPT-5 compare on benchmarks vs Claude and Gemini?
At launch in August 2025, GPT-5 scored 74.9% on SWE-bench Verified, placing it ahead of Claude 3.5 Sonnet, which scored around 49%, and Gemini 1.5 Pro, which scored around 46% on the same benchmark. On GPQA Diamond for graduate-level reasoning, GPT-5 reached 88.4% with extended thinking enabled, competitive with Gemini 1.5 Pro's 75.2% and ahead of Claude 3.5 Sonnet's 65.0% at that time. AIME 2025 at 94.6% was the standout metric, placing GPT-5 significantly ahead of all contemporaneous frontier models on math competition problems. HumanEval at 97.4% compared favorably to Claude 3.5 Sonnet at 92% and Gemini 1.5 Pro at 84.1%. The benchmark advantage does not hold uniformly across all dimensions: Claude 3.5 Sonnet showed stronger instruction-following for long multi-turn conversations, and Gemini 1.5 Pro offered more flexible output modalities including native image generation. By mid-2026 the competitive landscape shifted substantially; Claude Opus 4.8 and GPT-5.5 both surpassed the original GPT-5 on every major benchmark.
Is GPT-5 open source or proprietary?
GPT-5 is a proprietary, closed-weights model. The model weights are not publicly available and cannot be downloaded, run locally, or fine-tuned. Access is exclusively through the OpenAI API at platform.openai.com and through Microsoft Azure AI Services. OpenAI released a separate open-weight model family called gpt-oss (gpt-oss-20b and gpt-oss-120b) on August 8, 2025, one day after GPT-5, under an Apache 2.0 license with weights on Hugging Face. The gpt-oss models are not distilled from GPT-5 and are architecturally distinct; they are designed for self-hosting use cases where GPT-5's API pricing or cloud dependency is a constraint. For air-gapped deployments, VRAM requirements for gpt-oss-120b at FP16 are approximately 240 GB, with Q4 quantized variants available requiring around 65 GB. Teams that need on-device or offline inference should use gpt-oss-120b rather than GPT-5.
What modalities does GPT-5 support?
GPT-5 accepts text, images, audio, video frames, PDFs, and tool-calls as inputs in a unified architecture. On the output side, it produces text, audio, and tool-call results. A dedicated vision encoder is integrated into the base model rather than run as a separate pipeline, enabling tasks like reading charts, interpreting screenshots, and analyzing PDFs without format conversion. Video input processes temporal embeddings from frame sequences, confirmed at launch with 90.5% accuracy on Video-MMMU. Audio input and output are supported through the model architecture, enabling tasks like transcription, audio summarization, and voice response generation. Function calling supports parallel tool invocations and structured JSON output in a single response turn, making it suitable for multi-step agentic workflows. Code execution is available only in the ChatGPT interface via the built-in interpreter; the raw API does not provide a sandboxed execution environment, requiring developers to supply their own.
Does GPT-5 train on user data?
GPT-5 does not train on API inputs by default. API inputs and outputs are retained for 30 days for safety monitoring, then deleted unless flagged for a policy violation. Enterprise customers can request a zero-retention configuration, which deletes inputs and outputs immediately after the response is returned, with no 30-day window. OpenAI's API platform holds SOC 2 Type II certification. HIPAA-eligible configurations are available through enterprise contracts for healthcare-related deployments. GDPR-compliant data processing agreements are available for EU customers using the direct OpenAI API or Azure AI Services. For Azure-hosted deployments, data stays within the Azure region selected and is covered by Microsoft's data processing addendum. The GPT-5 System Card classifies the model under EU AI Act obligations for general-purpose AI with systemic risk. ChatGPT users on free and Plus tiers may have conversations used to improve the model by default; this can be opted out in account settings.
Who is GPT-5 best for and who should avoid it?
GPT-5 is well-suited for engineering teams building production coding agents that need SWE-bench-verified accuracy at a low per-call cost: $0.625 per million input tokens was among the best price-per-benchmark-point ratios at launch. STEM researchers running long quantitative analyses benefit from the 272K context window and 94.6% AIME 2025 score for hard math tasks. Enterprise teams on Microsoft Azure get SOC 2 Type II compliance with native multimodal input without additional pipeline configuration. Teams building document-intensive workflows (legal contracts, research papers, full codebases) benefit from processing everything in one 272K-token call. Teams that should avoid GPT-5 today include those who need the current best-in-class OpenAI performance: GPT-5.5 (April 2026) scores 88.7% SWE-bench Verified versus GPT-5's 74.9% and offers a 1M-token context window at $5.00 per million input tokens. Real-time voice application teams should use the OpenAI Realtime API, not GPT-5's standard endpoint, which does not support sub-500ms audio generation. Any team requiring self-hosting, fine-tuning, or offline operation cannot use GPT-5 and should evaluate gpt-oss-120b instead.