GPT-4o Review 2026: 128K Context, $2.50/M Pricing, Deprecated

GPT-4o: OpenAI's 2024 multimodal flagship with 128K context, 90.2% HumanEval, and real-time voice. Now deprecated at $2.50/$10 per 1M tokens API pricing.

GPT-4o is OpenAI's natively multimodal flagship, released May 13, 2024, with a 128K context window, 16,384-token max output, and benchmark scores of 88.7% MMLU and 90.2% HumanEval. Priced at $2.50 input and $10.00 output per 1M tokens (cached input $1.25), it was retired from ChatGPT on February 13, 2026, with OpenAI recommending migration to GPT-4.1 or GPT-5.1/5.2.

GPT-4o, released by OpenAI on May 13, 2024, is a natively multimodal model scoring 88.7% on MMLU and 90.2% on HumanEval with a 128K token context window. It costs $2.50 per 1M input tokens and $10.00 per 1M output tokens via the API. OpenAI retired GPT-4o from ChatGPT on February 13, 2026, recommending GPT-4.1 or GPT-5.1 for new projects.

Provider: OpenAI · Family: GPT-4o

Context window: 128,000 tokens · Max output: 16,384

Input modalities: text, image, audio, video, tool-calls · Output: text, audio, image, tool-calls

About GPT-4o

GPT-4o (the 'o' stands for omni) is OpenAI's natively multimodal flagship model, announced and released on May 13, 2024 during the company's Spring Updates livestream. Unlike earlier GPT-4 variants that bolted vision or voice onto a text-first model, GPT-4o was trained end-to-end across text, image, and audio in a single network, with the explicit goal of bringing GPT-4 Turbo-level reasoning to real-time, low-latency multimodal interaction at a fraction of the previous cost. OpenAI has never officially disclosed parameter counts or whether the architecture is dense or mixture-of-experts; independent estimates place it in the hundreds of billions of parameters, broadly in line with the rest of the GPT-4 family. It sits as the flagship of the GPT-4o line, with a smaller GPT-4o mini variant released two months later for cost-sensitive workloads. On released benchmarks, GPT-4o scored 88.7% on MMLU and 90.2% on HumanEval, both improvements over GPT-4 Turbo's roughly 86.5% MMLU at the time. OpenAI did not publish a SWE-bench Verified score for GPT-4o directly, but when introducing GPT-4.1 in 2025, OpenAI stated that GPT-4.1 improved on SWE-bench Verified by 21.4 points over GPT-4o, implying a GPT-4o baseline near 33%, a wide gap to Claude Opus 4's 72.5% on the same benchmark a year later. On the LMArena Chatbot Arena leaderboard, the August 2024 GPT-4o snapshot reached an Elo of roughly 1314, competitive at the time but since overtaken: by mid-2026 Claude Opus 4.6 led at 1418, with Gemini 3.1 Pro at 1406 and GPT-5.2 at 1402, and GPT-4o no longer appears on current leaderboards. GPT-4o ships with a 128,000 token context window. At launch in May 2024, maximum output was capped at 4,096 tokens; the gpt-4o-2024-11-20 update raised this ceiling to 16,384 tokens, a four-fold increase that matters for long-form generation, code output, and structured JSON responses. There is no separate extended-context tier, unlike Gemini 2.5 Pro's 2-million-token window or Claude's 1-million-token option, so 128K is mid-pack by 2026 standards. As a natively multimodal model, GPT-4o accepts any combination of text, image, audio, and video as input and can generate text, audio, and image outputs. Its real-time voice mode responds to audio input in as little as 232 milliseconds (average 320ms), close to human conversational response time, and runs through OpenAI's Realtime API rather than the standard Chat Completions endpoint. Function calling with vision support was added in the November 2024 update, letting the model reason over images while deciding which tools to call. Structured outputs with strict JSON schema adherence and fine-tuning both reached general availability on the gpt-4o-2024-08-06 snapshot. GPT-4o launched at $5.00 per 1M input tokens and $15.00 per 1M output tokens in May 2024. OpenAI cut prices by 50% in October 2024, to the $2.50 input / $10.00 output per 1M tokens that remains current in 2026. Cached input tokens cost $1.25 per 1M, half the standard input rate, and the Batch API offers a further 50% discount on both input and output for asynchronous jobs returned within 24 hours. A 100K-token document summarization costs roughly $0.27; a coding agent processing 1M input and 200K output tokens per day costs about $4.50; a support bot handling 1,000 turns of 2K-token context and 500-token replies costs roughly $10.00 per day. GPT-4o is available through the OpenAI API and Azure OpenAI Service, with official SDKs in Python, Node.js/TypeScript, Java, .NET, and Go. Fine-tuning is generally available on the gpt-4o-2024-08-06 snapshot via the OpenAI API. As a closed, proprietary model, there are no downloadable weights, no self-hosting option, and no quantized community builds; access is strictly API-based. OpenAI published a dedicated GPT-4o System Card in August 2024 covering pretraining data filtering, which removed CSAM, hateful content, and CBRN-related material via the Moderation API and safety classifiers, plus post-training red-teaming with particular attention to risks introduced by real-time speech-to-speech capability, including voice cloning and audio content moderation. GPT-4o follows OpenAI's standard balanced refusal policy: it declines clearly harmful requests such as weapons or malware but is generally permissive on creative, business, and technical tasks compared to earlier, stricter GPT-4 variants. Teams already running gpt-4o-2024-08-06 or gpt-4o-2024-11-20 in production for multimodal chat, document analysis, or voice assistants can continue to do so at a known, stable price. It remains a reasonable choice for prototypes needing native vision and audio without the cost of GPT-5-class models. It is a poor choice for greenfield projects: GPT-4.1 beats it by 21.4 points on SWE-bench Verified at a lower cost, and GPT-5.1/5.2 lead on essentially every 2026 benchmark. Coding-focused teams should look at GPT-4.1 or Claude Opus 4.6 instead; teams needing the largest context windows should consider Gemini 2.5 Pro or Claude's 1M-token tier. GPT-4o's training data cutoff was originally October 2023, but a March 2025 update silently extended it to June 2024 without a model version change, so two requests to the same model string can reflect different effective knowledge cutoffs depending on when they were served. OpenAI's standard API data retention applies: inputs and outputs may be retained for up to 30 days for abuse monitoring unless an organization qualifies for zero-data-retention. OpenAI maintains SOC 2 Type II certification for the API platform and offers HIPAA business associate agreements and EU data residency options for eligible enterprise customers. GPT-4o was removed from the default ChatGPT model picker in August 2025 when GPT-5 launched, then reinstated for paid subscribers after user complaints about losing its conversational style. OpenAI fully retired GPT-4o from ChatGPT on February 13, 2026, with Business, Enterprise, and Edu customers retaining access inside Custom GPTs only until April 3, 2026. As of mid-2026, GPT-4o remains accessible via the API but is officially deprecated, with OpenAI directing developers to GPT-4.1 for cost/latency-sensitive workloads or GPT-5.1/5.2 for frontier capability.

Pricing

$2.50 per 1M input tokens and $10.00 per 1M output tokens (since October 2024, a 50% cut from the May 2024 launch price of $5.00/$15.00). Cached input is $1.25 per 1M tokens. Batch API gives a further 50% discount on both input and output for jobs returned within 24h.

Key Features

Pros

Cons

Benchmarks

Frequently Asked Questions

What is GPT-4o and who built it?

GPT-4o is a natively multimodal large language model built by OpenAI, announced and released on May 13, 2024 during the company's Spring Updates livestream. The 'o' stands for omni, reflecting that it was trained end-to-end to handle text, image, and audio in one network rather than bolting vision or speech onto a text-first model. OpenAI has not disclosed its parameter count or whether it uses a dense or mixture-of-experts architecture, but independent estimates place it in the hundreds of billions of parameters, in line with the broader GPT-4 family. On release, it scored 88.7% on MMLU and 90.2% on HumanEval, both improvements over GPT-4 Turbo. It was designed to bring GPT-4 Turbo-level reasoning to real-time, low-latency multimodal interaction at roughly half the cost. It sits at the top of the GPT-4o line, with a cheaper GPT-4o mini variant released two months later. As of 2026, it competed directly with Gemini 1.5 Pro and Claude 3.5 Sonnet at launch, and was specifically designed to beat them on combined vision, audio, and multilingual benchmarks. Its headline price was $2.50 input and $10.00 output per 1M tokens, with a 128K context window.

How much does GPT-4o cost per 1M tokens?

GPT-4o costs $2.50 per 1M input tokens and $10.00 per 1M output tokens, a price that has held since OpenAI's 50% price cut in October 2024 from the original May 2024 launch price of $5.00/$15.00. Cached input tokens, used for repeated prompt prefixes, cost $1.25 per 1M, half the standard input rate. The Batch API gives a further 50% discount on both input and output, bringing batch pricing to $1.25/$5.00 per 1M tokens for asynchronous jobs returned within 24 hours. As worked examples: summarizing a 100K-token document costs about $0.27, running a coding agent that processes 1M input and 200K output tokens per day costs about $4.50, and a support bot handling 1,000 daily chat turns at 2K input and 500 output tokens each costs roughly $10.00 per day. By comparison, GPT-4.1 offers similar or better performance at a lower per-token cost, and GPT-4o cannot be self-hosted since OpenAI has not released its weights.

What is GPT-4o's context window and max output?

GPT-4o has a 128,000 token context window, unchanged since its May 2024 release. The maximum output token limit started at 4,096 tokens at launch but was raised to 16,384 tokens in the gpt-4o-2024-11-20 update, a four-fold increase that benefits long-form generation, large code outputs, and structured JSON responses. There is no separate extended-context tier or sliding-window mode documented for GPT-4o. By comparison, Gemini 2.5 Pro offers up to 2 million tokens of context and Claude's largest tier reaches 1 million tokens, making GPT-4o's 128K mid-pack by 2026 standards. For multi-document or large-codebase tasks, GPT-4o's effective working context is smaller than these newer models, so very large inputs need to be chunked or summarized before being passed to GPT-4o. Document handling for PDFs and images counts toward the same 128K token budget as text.

How does GPT-4o compare on benchmarks vs GPT-4.1 and GPT-5?

GPT-4o scored 88.7% on MMLU and 90.2% on HumanEval at release in May 2024. OpenAI did not publish a direct SWE-bench Verified score for GPT-4o, but when introducing GPT-4.1 in 2025, OpenAI stated GPT-4.1 improved on SWE-bench Verified by 21.4 points over GPT-4o, implying GPT-4o scored roughly 33% on that benchmark, well behind GPT-4.1 and Claude Opus 4's 72.5%. On the LMArena Chatbot Arena, the August 2024 GPT-4o snapshot reached an Elo near 1314, but by mid-2026 the leaderboard is led by Claude Opus 4.6 (1418), Gemini 3.1 Pro (1406), and GPT-5.2 (1402), with GPT-4o no longer ranked among current models. In practice, the SWE-bench gap means GPT-4o is meaningfully less reliable at multi-step coding and agentic tasks than GPT-4.1 or GPT-5.x. GPT-4o's strongest published numbers (MMLU and HumanEval) are both legacy, saturated benchmarks by 2026 standards, and OpenAI has not released GPQA Diamond or AIME 2025 scores for it.

Is GPT-4o open source or proprietary?

GPT-4o is fully proprietary and API-only. OpenAI has not released its weights, and there is no open-weights or open-source variant of GPT-4o itself. Access is exclusively through the OpenAI API (api.openai.com) and Azure OpenAI Service, both of which require an API key or Azure AD credential. OpenAI's separate gpt-oss-20b and gpt-oss-120b models, released under the Apache 2.0 license, are open-weight models but are architecturally distinct from GPT-4o and were released later as a separate initiative. There are no Hugging Face weights, VRAM requirements, or quantization options for GPT-4o because self-hosting is not possible. Commercial use of GPT-4o is governed entirely by OpenAI's usage policies and API terms of service, with no separate community license to consider.

What modalities does GPT-4o support?

GPT-4o accepts text, image, audio, and video as input, and can generate text, audio, and image as output, all from a single natively trained model. Its real-time voice capability, accessed through OpenAI's Realtime API, responds to spoken input in as little as 232 milliseconds (average 320ms), close to human conversational latency; this is distinct from sending audio through the standard Chat Completions endpoint, which is not optimized for real-time use. Function calling is fully supported, and as of the November 2024 update, function calling works alongside vision input, letting the model decide which tools to call based on what it sees in an image. Structured outputs with strict JSON schema enforcement are generally available on the gpt-4o-2024-08-06 snapshot and later. GPT-4o does not support computer-use style screen control or web browsing natively. Compared to Gemini's native video understanding or Claude's computer-use tooling, GPT-4o's video input support is less emphasized in OpenAI's documentation.

Does GPT-4o train on user data?

By default, OpenAI does not use API inputs and outputs to train its models, including GPT-4o. Data sent through the API may be retained for up to 30 days for abuse and misuse monitoring, after which it is deleted, unless an organization has been approved for zero data retention, in which case inputs and outputs are not stored at all. OpenAI's API platform holds SOC 2 Type II certification, and OpenAI offers HIPAA business associate agreements and EU data residency options for eligible enterprise customers. Usage through Azure OpenAI Service follows Microsoft's separate Azure data handling and compliance commitments, which can differ from the direct OpenAI API in terms of regional data storage. ChatGPT consumer usage (as opposed to the API) has separate data controls, including a setting to opt out of having conversations used to improve OpenAI's models. There is no GPT-4o-specific data policy beyond OpenAI's standard API-wide terms.

Who is GPT-4o best for and who should avoid it in 2026?

GPT-4o is best for teams already running production integrations on pinned snapshots like gpt-4o-2024-08-06 or gpt-4o-2024-11-20 who want stable, known pricing at $2.50/$10.00 per 1M tokens without an immediate migration. It also suits real-time voice assistant prototypes that benefit from its 232-320ms Realtime API latency, and cost-sensitive multimodal prototypes that need native vision, audio, and text without paying GPT-5-class prices. Teams should avoid GPT-4o for new agentic coding projects, since GPT-4.1 scores 21.4 points higher on SWE-bench Verified at a lower cost, and GPT-5.1/5.2 lead further still. It's also a poor fit for anything needing context windows beyond 128K, where Gemini 2.5 Pro (2M) or Claude (1M) are better suited. Finally, since GPT-4o was retired from ChatGPT on February 13, 2026 and is officially deprecated, any new product built today should default to GPT-4.1 or GPT-5.1/5.2 rather than building fresh dependencies on a model OpenAI is actively phasing out.

Visit GPT-4o Official Page