GPT-5.4: 1M Context & 80% SWE-bench Score (2026)
GPT-5.4 is OpenAI's March 2026 flagship: 1M-token context, ~80% SWE-bench Verified, 75% OSWorld computer-use score, priced from $2.50/$15 per 1M tokens.
GPT-5.4 is OpenAI's flagship model released March 5, 2026, with a 1M-token context window, about 80% on SWE-bench Verified, and 74.8% on GPQA Diamond. It is priced at $2.50 per 1M input tokens and $15 per 1M output tokens, and its Computer Use API scores 75% on OSWorld-Verified, ahead of the 72.4% human baseline.
GPT-5.4, released by OpenAI on March 5, 2026, scores about 80% on SWE-bench Verified and 74.8% on GPQA Diamond with a 1M-token context window. It costs $2.50 per 1M input tokens and $15 per 1M output tokens, with $1.25 cached input. Its Computer Use API scores 75% on OSWorld-Verified, ahead of the 72.4% human baseline.
Provider: OpenAI · Family: GPT-5
Context window: 1,000,000 tokens · Max output: 128,000
Input modalities: text, image, tool-calls, code · Output: text, tool-calls, code
About GPT-5.4
GPT-5.4 is OpenAI's flagship model for professional work, announced March 5, 2026 alongside GPT-5.4 Thinking (a reasoning-focused variant) and GPT-5.4 Pro (a higher-cost, deep-reasoning tier). GPT-5.4 mini and nano followed on March 17, 2026, bringing GPT-5.4-class capability to high-volume workloads at roughly four times the per-token price of their GPT-5 equivalents. GPT-5.4 is the first mainline GPT-5 model to fold in GPT-5.3-Codex's coding capabilities directly, replacing the previous split between a general chat model and a separate coding-specialist model with one unified architecture and five reasoning-effort levels (none, low, medium, high, xhigh). It sits below GPT-5.5, which OpenAI released roughly six weeks later on April 23, 2026. On benchmarks, GPT-5.4 scores approximately 80% on SWE-bench Verified, 74.8% on GPQA Diamond, 95.1% on HumanEval, and 97.2% on MATH-500, and around 62% on ARC-AGI-2. The standout number is OSWorld-Verified, where GPT-5.4 hits 75%, ahead of the 72.4% average human baseline and a large jump from GPT-5.2's 47.3% on the same benchmark, reflecting OpenAI's push into native computer use. OpenAI also reports a 33% reduction in factual errors compared to GPT-5.2. On Scale's live SWE-bench Pro leaderboard, the xHigh reasoning configuration of GPT-5.4 led the public rankings as of May 20, 2026, at 59.10 (+/- 3.56). The headline context window is 1M tokens (922K input plus up to 128K output, per OpenRouter's listing), the largest OpenAI has shipped in the API. In practice the pricing and rate-limit ceiling sits at 272K input tokens: sessions under that limit use standard pricing, while anything above it doubles the input price and applies a 1.5x output multiplier across the whole session. Codex users can opt into the full 1M window experimentally via model_context_window and model_auto_compact_token_limit settings, but OpenAI and third-party reviewers both note that retrieval quality degrades past roughly 800K tokens, with details buried in the middle of very long contexts getting missed. Modalities are text and image input with text and tool-call output. There is no native audio or video input or output despite the multimodal framing of GPT-5.4's document-understanding work, which covers dense scans, handwritten forms, engineering diagrams, and chart-heavy reports in a single pass. The model ships a Computer Use API that takes screenshots and issues cursor, click, and keyboard actions against a desktop environment, plus deep research and a tool-search feature for selecting from large tool libraries. Function calling and structured outputs carry over from GPT-5 with lower schema-violation rates in multi-tool sequences. Standard API pricing is $2.50 per 1M input tokens and $15 per 1M output tokens for sessions under 272K tokens, with cached input billed at $1.25 per 1M tokens (a 50% discount on repeat context). Above 272K tokens, input pricing doubles to $5.00 per 1M and output rises to $22.50 per 1M for the entire session. GPT-5.4 Pro, aimed at higher-stakes enterprise reasoning, is priced separately at $30 per 1M input and $180 per 1M output. A 200K-token document summary costs roughly $0.53 at standard rates; a 500K-token agentic coding session that crosses the long-context threshold costs around $4.75; a support bot handling 1,000 turns of 2K-in/500-out tokens costs about $12.50. GPT-5.4 is available through the direct OpenAI API and ChatGPT (Plus at $20/month, Pro at $200/month, with no tier in between). It reached general availability on AWS Bedrock on June 1, 2026, in US East (Ohio), US West (Oregon), and AWS GovCloud (US-West) as of June 3, 2026, as part of an expanded AWS-OpenAI partnership that ended Azure's exclusivity. Azure AI access continues under the post-exclusivity arrangement, while a Google Vertex AI listing has not been announced as of mid-2026. The GPT-5.4 Thinking system card was published on the OpenAI Deployment Safety Hub and last updated April 24, 2026, documenting red-teaming and safety evaluations for the reasoning variant. OpenAI describes a balanced safety posture with standard content filters, a moderation endpoint, and reduced factual-error rates relative to GPT-5.2. As with prior GPT-5 releases, OpenAI has not disclosed parameter counts or a specific training data cutoff date for GPT-5.4. GPT-5.4 is best suited to agentic coding workflows, desktop and browser automation through the Computer Use API, and long-document or multi-file analysis up to roughly 272K tokens. Teams building voice assistants or audio pipelines need a separate model, since GPT-5.4 has no native audio I/O. Budget-conscious teams may find the consumer pricing gap (Plus at $20 jumping straight to Pro at $200) and the rate-limit consumption of GPT-5.4 (reported to burn through limits faster than GPT-5.3-Codex for identical workloads) a real constraint, and may prefer GPT-5.4 mini or a competing mid-tier model for high-volume use. GPT-5.4 mini and nano arrived March 17, 2026, more than doubling speed over GPT-5 mini on coding and reasoning tasks. GPT-5.2 and GPT-5.2-Codex were deprecated across most GitHub Copilot experiences by June 5, 2026, with GPT-5.5 named as the suggested replacement. GPT-5.4 itself reached Bedrock GA on June 1, 2026, six weeks after GPT-5.5 had already shipped as OpenAI's newer flagship, meaning GPT-5.4 now occupies a mid-lineup position between GPT-5.4 mini/nano and GPT-5.5/GPT-5.4 Pro.
Pricing
$2.50 per 1M input tokens and $15 per 1M output tokens for sessions under 272K context tokens. Cached input is $1.25 per 1M tokens (50% off). Beyond 272K input tokens, input pricing doubles to $5.00 per 1M and output rises to $22.50 per 1M for the full session. GPT-5.4 Pro is priced separately at $30/$180 per 1M.
Key Features
- Native Computer Use API: Takes screenshots and issues cursor, click, and keyboard actions against a desktop environment, scoring 75% on OSWorld-Verified versus a 72.4% human baseline.
- 1M-Token Context Window: 922K input plus up to 128K output tokens, the largest context OpenAI has shipped in the API, with experimental Codex support for the full window via context-compaction settings.
- Five Reasoning-Effort Levels: Configurable reasoning effort from none to xhigh lets developers trade cost and latency against accuracy on a per-request basis.
- Unified Coding Capability: First mainline GPT-5 model to fold in GPT-5.3-Codex's coding capabilities, removing the need for a separate coding-specialist model for most SWE-bench-style tasks.
- Prompt Caching: Cached input tokens cost $1.25 per 1M versus $2.50 for fresh input, a 50% discount on repeated context.
Pros
- 75% on OSWorld-Verified computer-use, ahead of the 72.4% human baseline and a large jump from GPT-5.2's 47.3%.
- ~80% SWE-bench Verified and 74.8% GPQA Diamond with five configurable reasoning-effort levels.
- 1M-token context window with prompt caching at $1.25 per 1M tokens, a 50% discount on repeat context.
Cons
- No native audio or video input/output despite the multimodal document-understanding framing.
- Sessions over 272K input tokens double the input price and apply a 1.5x output multiplier for the whole session.
- Reported to consume rate-limit budget faster than GPT-5.3-Codex for identical workloads, and consumer pricing jumps from $20 (Plus) directly to $200 (Pro).
Benchmarks
- math: 97.2
- arc agi 2: 62.1
- humaneval: 95.1
- gpqa diamond: 74.8
- osworld verified: 75
- swe bench verified: 80
Frequently Asked Questions
What is GPT-5.4 and who built it?
GPT-5.4 is OpenAI's flagship model for professional work, released March 5, 2026 alongside GPT-5.4 Thinking, a reasoning-focused variant, and GPT-5.4 Pro, a higher-cost deep-reasoning tier. It is the first mainline GPT-5 model to fold in GPT-5.3-Codex's coding capabilities, replacing the previous split between a general chat model and a separate coding specialist. It scores approximately 80% on SWE-bench Verified, 74.8% on GPQA Diamond, and 75% on OSWorld-Verified, ahead of the 72.4% average human baseline. OpenAI has not disclosed a parameter count. GPT-5.4 mini and nano followed on March 17, 2026 for high-volume workloads. It sits below GPT-5.5, OpenAI's newer flagship released April 23, 2026, and above GPT-5.4 mini/nano in the current lineup. It was designed primarily to push agentic coding and native computer-use automation forward.
How much does GPT-5.4 cost per 1M tokens?
Standard API pricing is $2.50 per 1M input tokens and $15 per 1M output tokens for sessions under 272K tokens. Cached input costs $1.25 per 1M tokens, a 50% discount on repeated context. Once a session's input exceeds 272K tokens, input pricing doubles to $5.00 per 1M and output rises to $22.50 per 1M for the entire session, not just the overflow. GPT-5.4 Pro is priced separately at $30 per 1M input and $180 per 1M output for higher-stakes enterprise reasoning. A 200K-token report summary costs roughly $0.53. A 500K-token agentic coding session that crosses the long-context threshold costs around $4.75. A support bot handling 1,000 turns of 2K-in/500-out tokens costs about $12.50, comparable to Claude and Gemini flagships at similar context sizes. GPT-5.4 has no self-hosting option, so there is no infrastructure cost alternative.
What is GPT-5.4's context window and max output?
GPT-5.4 supports a 1M-token context window, made up of 922K input tokens plus up to 128K output tokens according to OpenRouter's listing. In practice, the pricing and rate-limit ceiling sits at 272K input tokens: sessions under that limit use standard rates, while anything above doubles input pricing and applies a 1.5x output multiplier for the whole session and counts 2x against rate limits. Codex users can opt into the full 1M window experimentally via model_context_window and model_auto_compact_token_limit settings. Retrieval quality degrades past roughly 800K tokens, with details buried in the middle of very long contexts getting missed. This is larger than GPT-5.2's context window and is currently the largest context OpenAI offers in its API. Document handling covers dense scans, handwritten forms, and chart-heavy reports in a single pass up to the practical limit.
How does GPT-5.4 compare on benchmarks vs GPT-5.5 and GPT-5.2?
GPT-5.4 scores approximately 80% on SWE-bench Verified, 74.8% on GPQA Diamond, 95.1% on HumanEval, and 97.2% on MATH-500, with a 33% reduction in factual errors compared to GPT-5.2. On OSWorld-Verified, GPT-5.4 hits 75%, a major jump from GPT-5.2's 47.3% and ahead of the 72.4% human baseline. On Scale's live SWE-bench Pro leaderboard, GPT-5.4's xHigh reasoning configuration led the public rankings as of May 20, 2026, at 59.10 (+/- 3.56). GPT-5.5, released six weeks later on April 23, 2026, supersedes GPT-5.4 as OpenAI's flagship, though specific head-to-head benchmark deltas between GPT-5.4 and GPT-5.5 were not independently verified at the time of writing. The practical takeaway is that GPT-5.4 closed most of the gap between general chat models and coding specialists, while GPT-5.5 pushes further still. A 3-5 point SWE-bench gap typically means the difference between a model that needs one or two retries on a moderately complex pull request versus one that succeeds on the first attempt.
Is GPT-5.4 open source or proprietary?
GPT-5.4 is fully proprietary with closed weights, available only through OpenAI's hosted API, ChatGPT, and partner cloud platforms. There is no Hugging Face listing, no downloadable weights, and no self-hosting option for any GPT-5.4 variant, including mini and nano. It reached general availability on AWS Bedrock on June 1, 2026, in US East (Ohio) and US West (Oregon), extending to AWS GovCloud (US-West) on June 3, 2026, as part of an expanded AWS-OpenAI partnership. It remains accessible through Azure following the end of Microsoft's exclusivity arrangement with OpenAI earlier in 2026. A Google Vertex AI listing for GPT-5.4 has not been announced as of mid-2026. Commercial use is governed entirely by OpenAI's standard API usage policies; there is no separate open license to consider.
What modalities does GPT-5.4 support?
GPT-5.4 accepts text and image input and produces text and tool-call output; it has no native audio or video input or output. Its document-understanding capability handles dense scans, handwritten forms, engineering diagrams, and chart-heavy reports in a single pass. Function calling and structured outputs carry over from GPT-5 with improved schema-adherence in multi-tool sequences, and the model includes a tool-search feature for selecting from large tool libraries. The standout modality feature is the Computer Use API, which takes screenshots and issues cursor, click, and keyboard actions against a real desktop environment, scoring 75% on OSWorld-Verified. Five reasoning-effort levels (none, low, medium, high, xhigh) let developers trade latency and cost against accuracy per request. Compared to Gemini 2.5 Pro or GPT-5.5, which add broader audio/video pipelines, GPT-5.4 remains text-and-image-first with computer use as its differentiator.
Does GPT-5.4 train on user data?
GPT-5.4 follows OpenAI's standard API data usage policy: inputs and outputs sent through the API are not used to train OpenAI's models by default. Zero-data-retention options are available for eligible enterprise customers who need stricter guarantees. OpenAI's platform maintains SOC 2 Type II and ISO 27001 certifications and supports HIPAA-eligible and GDPR-compliant configurations for enterprise customers, details available through OpenAI's trust documentation. Under the EU AI Act, GPT-5.4 falls under general-purpose AI model obligations given its capability and reach. Data handling on AWS Bedrock follows AWS's standard data isolation model, where customer data is not shared with OpenAI beyond the inference request itself. The GPT-5.4 Thinking system card, last updated April 24, 2026, documents the safety evaluation and red-teaming process behind the release.
Who is GPT-5.4 best for and who should avoid it?
GPT-5.4 is best for agentic coding teams running SWE-bench-style autonomous loops, given its roughly 80% SWE-bench Verified score and GPT-5.3-Codex capability folded into the mainline model. It is also strong for teams building desktop or browser automation through the Computer Use API, where its 75% OSWorld-Verified score beats the human baseline, and for analysts processing long documents up to the practical 272K-token pricing ceiling. Teams building voice assistants or audio-first products should avoid it, since GPT-5.4 has no native audio input or output and needs a separate ASR/TTS model. Cost-sensitive, high-volume teams should be cautious: GPT-5.4 has been reported to burn through rate limits faster than GPT-5.3-Codex for identical workloads, and ChatGPT's consumer pricing jumps from $20 (Plus) directly to $200 (Pro) with no middle tier, unlike Anthropic's $100/month option. Teams needing self-hosted or air-gapped deployment should look at open-weight models like Llama 4 or Qwen3 instead, since GPT-5.4 is API-only.