Name: GPT-5 by OpenAI: 94.6% AIME & 74.9% SWE-bench (2025)
Brand: OpenAI
Price: 0.63 USD
Availability: InStock

Question 1

What is GPT-5 and who built it?

Accepted Answer

GPT-5 is the fifth-generation foundation model from OpenAI, launched publicly on August 7, 2025. It is built on a sparse Mixture-of-Experts (MoE) architecture, a significant departure from the dense transformer design used in GPT-4 and GPT-4o. Industry estimates place the total parameter count at 2 to 5 trillion, with only a small fraction activated per forward pass through expert routing. The model covers text, image, audio, video, and PDF inputs in a unified architecture, with native tool use and function calling. At launch, GPT-5 scored 74.9% on SWE-bench Verified, 94.6% on AIME 2025, and 91.4% on MMLU, placing it ahead of Claude 3.5 Sonnet and Gemini 1.5 Pro on most coding and reasoning benchmarks at the time. OpenAI positioned GPT-5 as the successor to GPT-4o for both ChatGPT users and API developers, with access through the OpenAI API and Microsoft Azure AI Services. The original snapshot (gpt-5-2025-08-07) was deprecated June 11, 2026; GPT-5.5 is the current OpenAI flagship.

Question 2

How much does GPT-5 cost per 1M tokens?

Accepted Answer

GPT-5 is priced at $0.625 per million input tokens and $5.00 per million output tokens in the standard pay-as-you-go tier. The Batch API reduces both rates by 50%, giving $0.3125 per million input and $2.50 per million output, with results returned asynchronously within 24 hours. Flex processing also offers 50% off at variable processing speed. For practical workloads: summarizing a 100,000-token research paper costs approximately $0.07; a daily coding agent at 1 million input and 200,000 output tokens costs about $1.63 per day; a customer support deployment handling 1,000 conversations at 2,000 input and 500 output tokens each costs roughly $3.75 per 1,000 turns. OpenAI did not publish prompt caching pricing for the original GPT-5 at launch, unlike later models in the family. For teams needing self-hosting to avoid API costs entirely, GPT-5's weights are proprietary; the separate gpt-oss-120b model (Apache 2.0, released August 8, 2025) is the open-weight alternative.

Question 3

What is GPT-5's context window and max output?

Accepted Answer

The API version of GPT-5 accepts up to 272,000 input tokens, doubling GPT-4o's 128,000-token limit. Maximum output per request is 128,000 tokens. ChatGPT interface access to context is tiered: free users get 8,000 tokens, Plus subscribers get 32,000 tokens, and Pro subscribers access 128,000 tokens. Long-context recall holds up reliably through the full 272K range according to OpenAI's internal evaluations, though third-party needle-in-haystack testing shows moderate variability above 200,000 tokens. Inputs that exceed the context limit return a hard API error; the model does not silently truncate. GPT-5.2, released December 2025, extended the context window to 400,000 tokens, and GPT-5.5 (April 2026) brought the context window to 1,000,000 tokens in the API with 922,000 usable input tokens.

Question 4

How does GPT-5 compare on benchmarks vs Claude and Gemini?

Accepted Answer

At launch in August 2025, GPT-5 scored 74.9% on SWE-bench Verified, placing it ahead of Claude 3.5 Sonnet, which scored around 49%, and Gemini 1.5 Pro, which scored around 46% on the same benchmark. On GPQA Diamond for graduate-level reasoning, GPT-5 reached 88.4% with extended thinking enabled, competitive with Gemini 1.5 Pro's 75.2% and ahead of Claude 3.5 Sonnet's 65.0% at that time. AIME 2025 at 94.6% was the standout metric, placing GPT-5 significantly ahead of all contemporaneous frontier models on math competition problems. HumanEval at 97.4% compared favorably to Claude 3.5 Sonnet at 92% and Gemini 1.5 Pro at 84.1%. The benchmark advantage does not hold uniformly across all dimensions: Claude 3.5 Sonnet showed stronger instruction-following for long multi-turn conversations, and Gemini 1.5 Pro offered more flexible output modalities including native image generation. By mid-2026 the competitive landscape shifted substantially; Claude Opus 4.8 and GPT-5.5 both surpassed the original GPT-5 on every major benchmark.

Question 5

Is GPT-5 open source or proprietary?

Accepted Answer

GPT-5 is a proprietary, closed-weights model. The model weights are not publicly available and cannot be downloaded, run locally, or fine-tuned. Access is exclusively through the OpenAI API at platform.openai.com and through Microsoft Azure AI Services. OpenAI released a separate open-weight model family called gpt-oss (gpt-oss-20b and gpt-oss-120b) on August 8, 2025, one day after GPT-5, under an Apache 2.0 license with weights on Hugging Face. The gpt-oss models are not distilled from GPT-5 and are architecturally distinct; they are designed for self-hosting use cases where GPT-5's API pricing or cloud dependency is a constraint. For air-gapped deployments, VRAM requirements for gpt-oss-120b at FP16 are approximately 240 GB, with Q4 quantized variants available requiring around 65 GB. Teams that need on-device or offline inference should use gpt-oss-120b rather than GPT-5.

Question 6

What modalities does GPT-5 support?

Accepted Answer

GPT-5 accepts text, images, audio, video frames, PDFs, and tool-calls as inputs in a unified architecture. On the output side, it produces text, audio, and tool-call results. A dedicated vision encoder is integrated into the base model rather than run as a separate pipeline, enabling tasks like reading charts, interpreting screenshots, and analyzing PDFs without format conversion. Video input processes temporal embeddings from frame sequences, confirmed at launch with 90.5% accuracy on Video-MMMU. Audio input and output are supported through the model architecture, enabling tasks like transcription, audio summarization, and voice response generation. Function calling supports parallel tool invocations and structured JSON output in a single response turn, making it suitable for multi-step agentic workflows. Code execution is available only in the ChatGPT interface via the built-in interpreter; the raw API does not provide a sandboxed execution environment, requiring developers to supply their own.

Question 7

Does GPT-5 train on user data?

Accepted Answer

GPT-5 does not train on API inputs by default. API inputs and outputs are retained for 30 days for safety monitoring, then deleted unless flagged for a policy violation. Enterprise customers can request a zero-retention configuration, which deletes inputs and outputs immediately after the response is returned, with no 30-day window. OpenAI's API platform holds SOC 2 Type II certification. HIPAA-eligible configurations are available through enterprise contracts for healthcare-related deployments. GDPR-compliant data processing agreements are available for EU customers using the direct OpenAI API or Azure AI Services. For Azure-hosted deployments, data stays within the Azure region selected and is covered by Microsoft's data processing addendum. The GPT-5 System Card classifies the model under EU AI Act obligations for general-purpose AI with systemic risk. ChatGPT users on free and Plus tiers may have conversations used to improve the model by default; this can be opted out in account settings.

Question 8

Who is GPT-5 best for and who should avoid it?

Accepted Answer

GPT-5 is well-suited for engineering teams building production coding agents that need SWE-bench-verified accuracy at a low per-call cost: $0.625 per million input tokens was among the best price-per-benchmark-point ratios at launch. STEM researchers running long quantitative analyses benefit from the 272K context window and 94.6% AIME 2025 score for hard math tasks. Enterprise teams on Microsoft Azure get SOC 2 Type II compliance with native multimodal input without additional pipeline configuration. Teams building document-intensive workflows (legal contracts, research papers, full codebases) benefit from processing everything in one 272K-token call. Teams that should avoid GPT-5 today include those who need the current best-in-class OpenAI performance: GPT-5.5 (April 2026) scores 88.7% SWE-bench Verified versus GPT-5's 74.9% and offers a 1M-token context window at $5.00 per million input tokens. Real-time voice application teams should use the OpenAI Realtime API, not GPT-5's standard endpoint, which does not support sub-500ms audio generation. Any team requiring self-hosting, fine-tuning, or offline operation cannot use GPT-5 and should evaluate gpt-oss-120b instead.

GPT-5

About GPT-5

Pricing

Key Features

Pros

Cons

Benchmarks

Frequently Asked Questions