GPT-4o mini: Pricing, Specs & 2026 Status Explained
GPT-4o mini (OpenAI, July 2024): 128K context, 82% MMLU, $0.15/$0.60 per 1M tokens. Retired from ChatGPT in Feb 2026 but still live via API and Azure.
GPT-4o mini is OpenAI's small multimodal model, released July 18, 2024, with a 128K context window, 16,384 max output tokens, and 82.0% MMLU / 87.2% HumanEval. Priced at $0.15 input / $0.60 output per 1M tokens (Batch: $0.075/$0.30), it was retired from ChatGPT on February 13, 2026 but remains live via API and Azure, with GPT-5.1 mini as OpenAI's recommended successor.
GPT-4o mini, released by OpenAI on July 18, 2024, is a small multimodal model with a 128K context window and 82.0% MMLU. It costs $0.15 per 1M input tokens and $0.60 per 1M output tokens. OpenAI retired it from ChatGPT on February 13, 2026, but it remains available via the API and Azure OpenAI, with GPT-5.1 mini as the recommended successor.
Provider: OpenAI · Family: GPT-4o
Context window: 128,000 tokens · Max output: 16,384
Input modalities: text, image · Output: text, tool-calls
About GPT-4o mini
GPT-4o mini is OpenAI's small, cost-efficient multimodal model, announced on July 18, 2024, as the successor to GPT-3.5 Turbo in the API and ChatGPT free tier. It is a dense transformer in the GPT-4o family, distilled and optimized for low latency and low cost rather than peak capability. OpenAI has not disclosed an exact parameter count, but the model is widely understood to sit far below GPT-4o in size, trading raw reasoning power for a price point roughly 60% cheaper than GPT-3.5 Turbo at launch. Its role in the lineup is the default "small" tier: the model developers reach for when a task does not need frontier reasoning but still benefits from instruction-following, tool use, and vision. On academic benchmarks, GPT-4o mini scores 82.0% on MMLU and 87.2% on HumanEval, which was competitive with mid-tier models at its 2024 launch but has since been surpassed by newer small models such as GPT-4.1 mini and Gemini 2.0 Flash. On agentic coding evaluations it scores far lower than frontier models: SWE-bench Verified sits around 7.8%, compared to 70%+ for 2026 frontier models like Claude Opus 4.7 or GPT-5.1. Its Chatbot Arena (LMArena) Elo is approximately 1274, well below the 1450-1560 range occupied by 2026 frontier systems. GPQA Diamond and AIME 2025 scores have not been independently published for this model, reflecting that it was never positioned as a reasoning model. The honest read is that GPT-4o mini's benchmark profile reflects a 2024-era small model: solid at general knowledge and basic coding, weak at multi-step agentic and graduate-level reasoning tasks that define 2026 comparisons. The context window is 128,000 tokens with a maximum output of 16,384 tokens per request, matching the broader GPT-4o family's input ceiling. OpenAI has not published a dedicated long-context recall evaluation for the mini variant; independent needle-in-haystack tests of the GPT-4o family generally show solid retrieval up to the full 128K window but with some degradation in the middle of very long contexts, a pattern common to most 2024-generation models. There is no extended-context tier above 128K for this model. GPT-4o mini supports text and image inputs with text output in the API, plus function calling and structured outputs (JSON mode and JSON schema). Audio and video inputs were promised at launch as "coming in the future" but in practice ship as separate specialized variants (gpt-4o-mini-transcribe, gpt-4o-mini-tts, gpt-4o-mini-audio-preview) rather than as native modalities of the base chat model. Vision support covers document understanding, screenshot interpretation, and basic visual classification, and pairs with function calling so the model can reason over an image and then call a tool. There is no native computer-use or browsing capability in this model; those live in separate OpenAI products. Pricing has been stable since the July 2024 launch at $0.15 per 1M input tokens and $0.60 per 1M output tokens, with the Batch API offering a 50% discount ($0.075/$0.30). Fine-tuning costs $0.30 per 1M training tokens, after which inference on the fine-tuned model doubles to $0.30 input / $1.20 output per 1M tokens. As worked examples: summarizing a 100K-token document costs about $0.015 in input tokens alone; a daily coding-assistant workload of 1M input / 200K output tokens costs roughly $0.27; a customer-support bot handling 1,000 turns averaging 2K input / 500 output tokens per turn costs about $0.60 per day. These numbers make it one of the cheapest models with vision and tool use still available from a frontier lab. The model is available through the direct OpenAI API and through Azure OpenAI Service (as gpt-4o-mini), where it sits alongside transcription and audio-preview variants. OpenAI has not published a dedicated AWS Bedrock or Google Vertex AI listing for GPT-4o mini, since those platforms primarily host first-party Amazon, Anthropic, and Google models alongside select third-party open-weight models rather than OpenAI's proprietary lineup. SDKs are available for Python, Node.js/TypeScript, and via community libraries for Java, Go, and Ruby through the standard OpenAI-compatible API surface. Safety follows the GPT-4o system card: OpenAI applies its Moderation API and safety classifiers during training data curation to filter CSAM, hateful content, violence, and CBRN-related material, alongside human preference alignment (RLHF) and red-teaming before release. Users can opt images out of training, with fingerprinting used to remove opted-out images from future training runs. The model defaults to standard refusal behavior for clear policy violations, consistent with OpenAI's usage policies, and supports the same moderation endpoint and system-prompt-based steering as other GPT-4o family models. Training data has a knowledge cutoff of October 2023, the same as the rest of the original GPT-4o family. OpenAI's default API data retention is 30 days for abuse monitoring, with zero-data-retention available to approved enterprise customers. OpenAI maintains SOC 2 Type II compliance for the API platform and offers a Business Associate Agreement for HIPAA-eligible workloads on enterprise plans, alongside GDPR-aligned data processing terms. As of 2026, GPT-4o mini has been retired from the ChatGPT consumer product (alongside GPT-4o, GPT-4.1, GPT-4.1 mini, and o4-mini, retired February 13, 2026) but remains available via the API and Azure OpenAI for existing integrations. OpenAI is steering new users toward GPT-5.1 mini for general small-model tasks and o3-mini or GPT-5.1 mini for reasoning-heavy workloads. Teams should treat GPT-4o mini as a legacy-but-supported option: fine for stable production pipelines already built on it, but new projects should default to GPT-5.1 mini, which offers materially better reasoning and agentic benchmarks at a similar price point. Anyone needing native audio or video input should look at GPT-4o (full) realtime variants or GPT-5.1, not the mini text/vision tier.
Pricing
$0.15 per 1M input tokens, $0.60 per 1M output tokens, unchanged since July 2024 launch. Batch API: $0.075/$0.30 (50% off). Fine-tuned inference: $0.30 input / $1.20 output per 1M tokens; fine-tuning training itself costs $0.30 per 1M training tokens.
Key Features
- 128K context window: Matches the broader GPT-4o family's 128,000 token input capacity, with up to 16,384 tokens of output per request.
- Native vision input: Accepts images alongside text for document understanding, screenshot interpretation, and visual classification.
- Function calling and structured outputs: Supports JSON mode and JSON schema structured outputs, making it usable as a low-cost tool-calling layer.
- Batch API discount: Async Batch API processes requests at 50% off ($0.075 input / $0.30 output per 1M tokens) with results within 24 hours.
- Fine-tuning support: Can be fine-tuned for $0.30 per 1M training tokens, with fine-tuned inference at $0.30/$1.20 per 1M tokens.
Pros
- Among the cheapest vision-and-tool-use models from a frontier lab at $0.15/$0.60 per 1M tokens.
- 128K context with 16,384 max output tokens is generous for a small-tier model.
- Still accessible via API and Azure OpenAI after the February 2026 ChatGPT retirement, so existing integrations keep working.
Cons
- SWE-bench Verified around 7.8%, unsuitable for agentic coding compared to 2026 frontier models.
- Training data cutoff of October 2023 is dated relative to 2025-2026 models.
- Retired from ChatGPT consumer product (Feb 13, 2026); OpenAI is steering new projects to GPT-5.1 mini.
Benchmarks
- mmlu: 82
- humaneval: 87.2
- lmarena elo: 1274
- swe bench verified: 7.8
- artificial analysis price blended per m: 0.26
- artificial analysis speed tokens per sec: 61.5
Frequently Asked Questions
What is GPT-4o mini and who built it?
GPT-4o mini is a small, cost-efficient multimodal model built by OpenAI and announced on July 18, 2024. It is part of the GPT-4o family of dense transformer models, positioned as the small-tier successor to GPT-3.5 Turbo in both the API and ChatGPT. OpenAI has not disclosed an exact parameter count, but the model trades raw capability for low latency and low cost. On benchmarks it scores 82.0% on MMLU and 87.2% on HumanEval, figures that were competitive at launch but have since been overtaken by newer small models such as GPT-4.1 mini and Gemini 2.0 Flash. It was designed to make vision, function calling, and a 128K context window affordable for high-volume applications. As of 2026, OpenAI directs new projects toward GPT-5.1 mini, with GPT-4o mini remaining as a legacy option. The headline price is $0.15 input / $0.60 output per 1M tokens with a 128K context window.
How much does GPT-4o mini cost per 1M tokens?
GPT-4o mini costs $0.15 per 1M input tokens and $0.60 per 1M output tokens, a price that has not changed since its July 2024 launch. Cached input tokens cost $0.075 per 1M (50% off). The Batch API offers a further 50% discount at $0.075 input / $0.30 output per 1M tokens, with results returned within 24 hours. Fine-tuning costs $0.30 per 1M training tokens, after which inference on the fine-tuned model rises to $0.30 input / $1.20 output per 1M tokens. As worked examples, summarizing a 100K-token document costs about $0.015, a daily coding-assistant workload of 1M input / 200K output tokens costs roughly $0.27, and a support bot handling 1,000 turns of 2K in / 500 out per day costs about $0.60. By comparison, GPT-5.1 mini costs more per token but delivers materially higher reasoning scores, so GPT-4o mini remains attractive mainly for very high-volume, low-complexity workloads. The model cannot be self-hosted, so there is no infrastructure cost alternative.
What is GPT-4o mini's context window and max output?
GPT-4o mini has a 128,000 token context window and a maximum output of 16,384 tokens per request, matching the input ceiling of the broader GPT-4o family. OpenAI has not published a dedicated long-context recall evaluation for the mini variant specifically, but independent needle-in-haystack tests of the GPT-4o family generally show reliable retrieval across the 128K window with some degradation for information placed in the middle of very long prompts, a pattern common to 2024-era models. There is no separate extended-context tier above 128K for this model. For document-heavy workloads, the 128K window comfortably fits documents in the tens of thousands of words, but teams working with multi-hundred-page documents should chunk inputs or use retrieval rather than relying on a single 128K call. Compared to 2026 frontier models offering 200K-1M token windows, GPT-4o mini's context is now mid-pack rather than leading.
How does GPT-4o mini compare on benchmarks vs GPT-5.1 mini and Gemini 2.0 Flash?
GPT-4o mini scores 82.0% on MMLU, 87.2% on HumanEval, roughly 7.8% on SWE-bench Verified, and around 1274 Elo on LMArena. GPT-5.1 mini and Gemini 2.0 Flash, both released after GPT-4o mini, post materially higher scores on agentic coding and reasoning benchmarks, with 2026 frontier models clearing 70%+ on SWE-bench Verified compared to GPT-4o mini's single-digit score. On general knowledge (MMLU) the gap is smaller, since MMLU has become a near-saturated benchmark for models at this scale. In practice, a roughly 60-point SWE-bench gap means GPT-4o mini will frequently produce broken or incomplete multi-file code edits where newer small models succeed. GPT-4o mini does not publish GPQA Diamond or AIME 2025 scores at all, while newer small models increasingly report both, signalling that reasoning was simply not a design priority for this model. For pure cost-per-token on simple classification tasks, GPT-4o mini remains competitive, but for anything agentic, GPT-5.1 mini is the clear winner.
Is GPT-4o mini open source or proprietary?
GPT-4o mini is fully proprietary and API-only; OpenAI has not released its weights and has no open-weights or open-source variant of this model. Access is available through the direct OpenAI API and through Azure OpenAI Service under the same model name, gpt-4o-mini. There is no AWS Bedrock or Google Vertex AI listing for this model, since those platforms primarily host first-party and select third-party open-weight models rather than OpenAI's proprietary lineup. Commercial use is permitted under OpenAI's standard API terms and usage policies, with no separate license fee beyond per-token API charges. Fine-tuned versions of the model remain OpenAI's proprietary weights; customers cannot export or self-host a fine-tuned GPT-4o mini. Anyone needing an open-weights alternative with similar capability should look at models like Llama 3.1 8B or Qwen2.5 7B instead.
What modalities does GPT-4o mini support?
GPT-4o mini accepts text and image inputs and produces text output, plus structured tool-call outputs via function calling. Vision input covers document understanding, screenshot interpretation, and basic visual classification, and can be combined with function calling so the model reasons over an image before invoking a tool. Audio and video inputs were promised at the original 2024 launch as 'coming in the future,' but in practice OpenAI shipped those as separate specialized models, gpt-4o-mini-transcribe, gpt-4o-mini-tts, and gpt-4o-mini-audio-preview, rather than as native capabilities of the base chat model. There is no computer-use or web-browsing capability built into GPT-4o mini; those exist only in separate OpenAI agent products. Structured outputs (JSON mode and JSON schema) are fully supported, making the model reliable for extraction and classification pipelines that need machine-readable responses.
Does GPT-4o mini train on user data?
By default, OpenAI does not train its models on data submitted through the API, including GPT-4o mini, and retains API inputs and outputs for up to 30 days for abuse monitoring before deletion. Enterprise customers can apply for zero-data-retention agreements that remove even this 30-day window. Image inputs are subject to an opt-out fingerprinting system that can exclude specific images from any future training data across the GPT-4o model series. OpenAI's API platform holds SOC 2 Type II certification, offers a Business Associate Agreement for HIPAA-eligible enterprise workloads, and provides GDPR-aligned data processing terms with US and EU data residency options. On Azure OpenAI Service, data handling follows Microsoft's enterprise data processing agreements, which similarly exclude customer data from model training by default. Consumer ChatGPT usage (where still applicable to GPT-4o family models) is governed by separate, more permissive default settings that users can disable in their account data controls.
Who is GPT-4o mini best for and who should avoid it?
GPT-4o mini is best for teams running high-volume, low-complexity classification, extraction, or routing pipelines where its $0.15/$0.60 per 1M token pricing makes per-call cost negligible. It also suits document and screenshot processing workloads that need cheap vision input, and existing production integrations already built on GPT-4o mini that don't want to absorb a migration. Teams should avoid it for agentic coding, where its roughly 7.8% SWE-bench Verified score means frequent broken multi-file edits, GPT-5.1 mini or a frontier model is the better choice. It is also a poor fit for any task requiring knowledge after October 2023 without retrieval augmentation, and for graduate-level reasoning or competition math, where it has no published GPQA Diamond or AIME 2025 scores at all. New projects in 2026 should generally start with GPT-5.1 mini, which costs more per token but clears agentic and reasoning benchmarks by a wide margin, reserving GPT-4o mini for legacy continuity or extremely cost-sensitive simple tasks.