Name: GPT-5.4: 1M Context & 80% SWE-bench Score (2026)
Brand: OpenAI
Price: 2.50 USD
Availability: InStock

Question 1

What is GPT-5.4 and who built it?

Accepted Answer

GPT-5.4 is OpenAI's flagship model for professional work, released March 5, 2026 alongside GPT-5.4 Thinking, a reasoning-focused variant, and GPT-5.4 Pro, a higher-cost deep-reasoning tier. It is the first mainline GPT-5 model to fold in GPT-5.3-Codex's coding capabilities, replacing the previous split between a general chat model and a separate coding specialist. It scores approximately 80% on SWE-bench Verified, 74.8% on GPQA Diamond, and 75% on OSWorld-Verified, ahead of the 72.4% average human baseline. OpenAI has not disclosed a parameter count. GPT-5.4 mini and nano followed on March 17, 2026 for high-volume workloads. It sits below GPT-5.5, OpenAI's newer flagship released April 23, 2026, and above GPT-5.4 mini/nano in the current lineup. It was designed primarily to push agentic coding and native computer-use automation forward.

Question 2

How much does GPT-5.4 cost per 1M tokens?

Accepted Answer

Standard API pricing is $2.50 per 1M input tokens and $15 per 1M output tokens for sessions under 272K tokens. Cached input costs $1.25 per 1M tokens, a 50% discount on repeated context. Once a session's input exceeds 272K tokens, input pricing doubles to $5.00 per 1M and output rises to $22.50 per 1M for the entire session, not just the overflow. GPT-5.4 Pro is priced separately at $30 per 1M input and $180 per 1M output for higher-stakes enterprise reasoning. A 200K-token report summary costs roughly $0.53. A 500K-token agentic coding session that crosses the long-context threshold costs around $4.75. A support bot handling 1,000 turns of 2K-in/500-out tokens costs about $12.50, comparable to Claude and Gemini flagships at similar context sizes. GPT-5.4 has no self-hosting option, so there is no infrastructure cost alternative.

Question 3

What is GPT-5.4's context window and max output?

Accepted Answer

GPT-5.4 supports a 1M-token context window, made up of 922K input tokens plus up to 128K output tokens according to OpenRouter's listing. In practice, the pricing and rate-limit ceiling sits at 272K input tokens: sessions under that limit use standard rates, while anything above doubles input pricing and applies a 1.5x output multiplier for the whole session and counts 2x against rate limits. Codex users can opt into the full 1M window experimentally via model_context_window and model_auto_compact_token_limit settings. Retrieval quality degrades past roughly 800K tokens, with details buried in the middle of very long contexts getting missed. This is larger than GPT-5.2's context window and is currently the largest context OpenAI offers in its API. Document handling covers dense scans, handwritten forms, and chart-heavy reports in a single pass up to the practical limit.

Question 4

How does GPT-5.4 compare on benchmarks vs GPT-5.5 and GPT-5.2?

Accepted Answer

GPT-5.4 scores approximately 80% on SWE-bench Verified, 74.8% on GPQA Diamond, 95.1% on HumanEval, and 97.2% on MATH-500, with a 33% reduction in factual errors compared to GPT-5.2. On OSWorld-Verified, GPT-5.4 hits 75%, a major jump from GPT-5.2's 47.3% and ahead of the 72.4% human baseline. On Scale's live SWE-bench Pro leaderboard, GPT-5.4's xHigh reasoning configuration led the public rankings as of May 20, 2026, at 59.10 (+/- 3.56). GPT-5.5, released six weeks later on April 23, 2026, supersedes GPT-5.4 as OpenAI's flagship, though specific head-to-head benchmark deltas between GPT-5.4 and GPT-5.5 were not independently verified at the time of writing. The practical takeaway is that GPT-5.4 closed most of the gap between general chat models and coding specialists, while GPT-5.5 pushes further still. A 3-5 point SWE-bench gap typically means the difference between a model that needs one or two retries on a moderately complex pull request versus one that succeeds on the first attempt.

Question 5

Is GPT-5.4 open source or proprietary?

Accepted Answer

GPT-5.4 is fully proprietary with closed weights, available only through OpenAI's hosted API, ChatGPT, and partner cloud platforms. There is no Hugging Face listing, no downloadable weights, and no self-hosting option for any GPT-5.4 variant, including mini and nano. It reached general availability on AWS Bedrock on June 1, 2026, in US East (Ohio) and US West (Oregon), extending to AWS GovCloud (US-West) on June 3, 2026, as part of an expanded AWS-OpenAI partnership. It remains accessible through Azure following the end of Microsoft's exclusivity arrangement with OpenAI earlier in 2026. A Google Vertex AI listing for GPT-5.4 has not been announced as of mid-2026. Commercial use is governed entirely by OpenAI's standard API usage policies; there is no separate open license to consider.

Question 6

What modalities does GPT-5.4 support?

Accepted Answer

GPT-5.4 accepts text and image input and produces text and tool-call output; it has no native audio or video input or output. Its document-understanding capability handles dense scans, handwritten forms, engineering diagrams, and chart-heavy reports in a single pass. Function calling and structured outputs carry over from GPT-5 with improved schema-adherence in multi-tool sequences, and the model includes a tool-search feature for selecting from large tool libraries. The standout modality feature is the Computer Use API, which takes screenshots and issues cursor, click, and keyboard actions against a real desktop environment, scoring 75% on OSWorld-Verified. Five reasoning-effort levels (none, low, medium, high, xhigh) let developers trade latency and cost against accuracy per request. Compared to Gemini 2.5 Pro or GPT-5.5, which add broader audio/video pipelines, GPT-5.4 remains text-and-image-first with computer use as its differentiator.

Question 7

Does GPT-5.4 train on user data?

Accepted Answer

GPT-5.4 follows OpenAI's standard API data usage policy: inputs and outputs sent through the API are not used to train OpenAI's models by default. Zero-data-retention options are available for eligible enterprise customers who need stricter guarantees. OpenAI's platform maintains SOC 2 Type II and ISO 27001 certifications and supports HIPAA-eligible and GDPR-compliant configurations for enterprise customers, details available through OpenAI's trust documentation. Under the EU AI Act, GPT-5.4 falls under general-purpose AI model obligations given its capability and reach. Data handling on AWS Bedrock follows AWS's standard data isolation model, where customer data is not shared with OpenAI beyond the inference request itself. The GPT-5.4 Thinking system card, last updated April 24, 2026, documents the safety evaluation and red-teaming process behind the release.

Question 8

Who is GPT-5.4 best for and who should avoid it?

Accepted Answer

GPT-5.4 is best for agentic coding teams running SWE-bench-style autonomous loops, given its roughly 80% SWE-bench Verified score and GPT-5.3-Codex capability folded into the mainline model. It is also strong for teams building desktop or browser automation through the Computer Use API, where its 75% OSWorld-Verified score beats the human baseline, and for analysts processing long documents up to the practical 272K-token pricing ceiling. Teams building voice assistants or audio-first products should avoid it, since GPT-5.4 has no native audio input or output and needs a separate ASR/TTS model. Cost-sensitive, high-volume teams should be cautious: GPT-5.4 has been reported to burn through rate limits faster than GPT-5.3-Codex for identical workloads, and ChatGPT's consumer pricing jumps from $20 (Plus) directly to $200 (Pro) with no middle tier, unlike Anthropic's $100/month option. Teams needing self-hosted or air-gapped deployment should look at open-weight models like Llama 4 or Qwen3 instead, since GPT-5.4 is API-only.

GPT-5.4: 1M Context & 80% SWE-bench Score (2026)

About GPT-5.4

Pricing

Key Features

Pros

Cons

Benchmarks

Frequently Asked Questions