Name: GPT-5.5 Review: 88.7% SWE-bench, $5/M Tokens (2026)
Brand: OpenAI
Price: 5.00 USD
Availability: InStock

Question 1

What is GPT-5.5 and who built it?

Accepted Answer

GPT-5.5, codename Spud, is a large multimodal language model developed by OpenAI and released on April 23, 2026, with API access opening April 24. It is built on a Mixture-of-Experts Transformer architecture and is the first OpenAI model to process text, images, audio, and video in a single native architecture rather than stitching together separate models. GPT-5.5 sits above GPT-5.4 in OpenAI's lineup as the flagship system for complex professional work. It scores 88.7% on SWE-bench Verified and 92.4% on MMLU, placing it fourth globally among 119 models on BenchLM as of June 2026. The model supports a 1M-token context window in the API and includes built-in computer use via Codex. It replaced GPT-5.4 as OpenAI's primary frontier model and GPT-5.5 Instant replaced GPT-5.3 Instant as ChatGPT's default on May 5, 2026. Standard API pricing is $5 input / $30 output per 1M tokens.

Question 2

How much does GPT-5.5 cost per 1M tokens?

Accepted Answer

Standard API pricing for GPT-5.5 is $5.00 per 1M input tokens and $30.00 per 1M output tokens, with cached input at $0.50 per 1M (a 90% reduction from the full input rate). The Batch API tier provides 50% off at $2.50 input / $15.00 output per 1M, with results delivered asynchronously within 24 hours. The GPT-5.5 Pro tier costs $30.00 input / $180.00 output per 1M and is available only on OpenAI's Pro, Business, and Enterprise ChatGPT plans. A context surcharge activates when prompts exceed 272,000 tokens: the full session is billed at 2x input and 1.5x output. Worked cost examples: a 100K-token coding task with 10K output costs about $0.80; a daily coding agent running 1M tokens in and 200K out costs about $11 at standard rates; a 1M-token research session crossing the surcharge threshold costs roughly $12 in input alone. For comparison, Gemini 2.5 Flash costs $0.15/$0.60 per 1M for text-only tasks, making GPT-5.5 around 30-50x more expensive on pure-text workloads. GPT-5.5 is not available for self-hosting, so infrastructure costs do not apply.

Question 3

What is GPT-5.5's context window and max output?

Accepted Answer

GPT-5.5 supports a 1,000,000-token (1M) context window in the API, making it the largest in the GPT-5 family. Maximum output per completion is 128,000 tokens. In the Codex product, the context window is capped at 400,000 tokens; teams requiring full 1M-token sessions must use the API directly via the Responses API endpoint. A context surcharge activates when prompts exceed 272,000 tokens per session: the entire session is then billed at 2x input and 1.5x output pricing. No publicly verified needle-in-haystack evaluation at 1M tokens has been published for GPT-5.5 as of June 2026. For comparison, Claude Opus 4.8 offers 1M context and Gemini 2.5 Pro offers 2M context. For most coding and document tasks under 200K tokens, the practical difference between GPT-5.5 and competitors at the context-window level is minimal; the surcharge matters most for RAG workloads passing full document corpora.

Question 4

How does GPT-5.5 compare on benchmarks vs Claude Opus 4.8?

Accepted Answer

GPT-5.5 scores 88.7% on SWE-bench Verified; Claude Opus 4.8 scores 88.6%, making the two models statistically tied on agentic coding. On MMLU, GPT-5.5 posts 92.4%, while Claude Opus 4.8's MMLU-Pro score sits around 81.2% (the tests differ slightly in methodology). For visual reasoning, GPT-5.5 demonstrates 92.1% ChartQA and 96.2% AI2D, benchmarks where Claude's specific scores have not been published head-to-head. Claude Opus 4.8 tops the SWE-bench Pro variant (which measures harder, longer agentic sessions) at 69.2% versus GPT-5.5's unpublished Pro score. On AIME 2025 advanced math, Claude Opus 4.8 posts high scores in the extended-thinking regime; GPT-5.5's full model AIME score has not been published (the Instant variant scored 81.2%). BenchLM places GPT-5.5 at rank 4 and Claude Opus 4.8 at or near the top position. In practice, the choice between them depends more on cost structure, deployment platform, and multimodal requirements than on benchmark deltas of under 1 percentage point.

Question 5

Is GPT-5.5 open source or proprietary?

Accepted Answer

GPT-5.5 is fully proprietary: the model weights are closed and it is API-only. There is no HuggingFace release, no VRAM requirement to consider, and no self-hosting path. Access is via the OpenAI API (api.openai.com), AWS Bedrock (available since June 1, 2026), and Azure OpenAI Service (non-exclusive, active through 2032). OpenAI did release open-weight models in 2026 (the GPT-OSS family: gpt-oss-120b and gpt-oss-20b, Apache 2.0), but these are distinct from GPT-5.5 and do not share its architecture or capability level. GPT-5.5 has no commercial self-hosting option, no fine-tuning of the base weights, and no air-gapped deployment path. Enterprise customers can access it with zero-retention data handling via API agreements. For teams requiring on-premises deployment or open weights, Llama 4 or Mistral Medium 3 are alternatives.

Question 6

What modalities does GPT-5.5 support?

Accepted Answer

GPT-5.5 accepts text, images, audio, and video as inputs in a single native model session, making it the first OpenAI model to handle all four modalities without routing to separate specialist models. Output modalities are text and tool-calls only: audio output is not available from GPT-5.5 directly. Function calling, structured outputs, MCP server integration, OpenAI Skills, and web search are all live in the API. Computer use is handled via Codex's hosted shell and apply-patch tooling. PDF inputs are supported via the image pipeline. Video processing enables structured summaries with timestamps and action items from meeting recordings and webinars. Audio supports multi-language transcription and translation. For applications requiring audio output (voice assistants, spoken responses), GPT-5.5 must be paired with a TTS model or the OpenAI Realtime API, which handles bidirectional audio at lower latency.

Question 7

Does GPT-5.5 train on user data?

Accepted Answer

OpenAI does not train on API inputs by default. Users and organizations can opt out of data retention entirely through enterprise zero-retention agreements. API inputs are retained under OpenAI's standard data policy for a limited period for abuse monitoring unless a zero-retention agreement is active. For AWS Bedrock deployments, data governance follows AWS terms separately from OpenAI's direct API policy, with data processed within the selected AWS region. OpenAI holds SOC 2 Type II certification; HIPAA-eligible configurations are available for enterprise. GDPR compliance applies for EU users. The EU AI Act classifies GPT-5.5 as a general-purpose AI with systemic risk obligations, which requires OpenAI to maintain transparency reporting and adversarial testing documentation. Developers should review the OpenAI data usage policies at openai.com/policies and the specific Bedrock or Azure data terms for their deployment region.

Question 8

Who is GPT-5.5 best for and who should avoid it?

Accepted Answer

GPT-5.5 is best for teams building multimodal agentic workflows: native audio, video, and image processing in one model removes the complexity of integrating separate ASR and vision systems. It is strong for complex professional research, long-document analysis (up to 1M tokens), and agentic coding tasks at 88.7% SWE-bench Verified. Organizations already on OpenAI's platform scaling from GPT-5.4 gain a clear upgrade path without SDK changes. Teams running text-only cost-sensitive workloads should avoid GPT-5.5: at $5/$30 per 1M, Gemini 2.5 Flash at $0.15/$0.60 is 30-50x cheaper for pure-text tasks. Voice-first applications needing sub-200ms latency should use the OpenAI Realtime API, not GPT-5.5's standard endpoint where TTFT runs 400-700ms. Air-gapped or on-premises deployments are impossible given the proprietary, API-only model. Teams with strict per-session cost caps should model the 272K-token surcharge before committing, since it can double input costs on document-heavy sessions.

GPT-5.5 Review: 88.7% SWE-bench, $5/M Tokens (2026)

About GPT-5.5

Pricing

Key Features

Pros

Cons

Benchmarks

Frequently Asked Questions