GPT Image 2: 99% Text Accuracy and O-Series Reasoning (2026)

GPT Image 2 by OpenAI (April 2026): 99% text accuracy, O-series reasoning, $0.053/image at 1024x1024. Top-ranked on LM Arena with natural language inpainting.

GPT Image 2 is OpenAI's image generation flagship, released April 21, 2026, with 99% text rendering accuracy, 2K native resolution, and top rank (#1) on the LM Arena image leaderboard. Priced at $8 per 1M image input tokens and $30 per 1M output tokens (about $0.053 per 1024x1024 medium-quality image), it is the first image model to integrate O-series planning and reasoning before generation.

GPT Image 2, released April 21, 2026 by OpenAI, is the top-ranked image generation model on LM Arena with 99% text rendering accuracy across Latin, CJK, and Arabic scripts. It costs $8 per 1M image input tokens and $30 per 1M image output tokens, with per-image estimates of $0.006 (low), $0.053 (medium), and $0.211 (high) at 1024x1024. It is the first image model with integrated O-series reasoning.

Provider: OpenAI · Family: GPT Image

Input modalities: text, image · Output: image

About GPT Image 2

GPT Image 2 (API model ID: gpt-image-2, branded ChatGPT Images 2.0 in the consumer product) is OpenAI's third-generation image generation flagship, released on April 21, 2026. It follows gpt-image-1 (April 2025) and gpt-image-1.5 (December 2025), both of which were deprecated on June 2, 2026 and scheduled for API removal on December 1, 2026. The model's defining architectural shift is the integration of OpenAI's O-series reasoning pipeline into image generation: rather than running a single-pass denoising loop, gpt-image-2 executes a four-stage "Understand, Plan, Generate, Review" sequence before committing to any pixel output. OpenAI has not confirmed whether the generation step uses diffusion, autoregression, or a Transfusion-style hybrid; only that it is a "completely new architecture" compared to prior gpt-image models. The model is fully proprietary with no public weights. GPT Image 2 holds the top rank on the LM Arena image generation leaderboard as of June 2026, rated 9.6 out of 10 overall by independent reviewers across prompt adherence, text accuracy, image editing, and commercial suitability. Text rendering accuracy increased from roughly 60 to 70 percent in gpt-image-1 to approximately 99 percent in gpt-image-2, with reliable output across Latin scripts, CJK characters (Chinese, Japanese, Korean), and Arabic. Flux 2 Pro v1.1 and GPT Image 1.5 had nearly identical LM Arena Elo scores (1,265 and 1,264 respectively) before gpt-image-2 overtook both. Midjourney v7 retains a quality edge on artistic and painterly aesthetics. Ideogram 2.0 and Google Imagen 3 remain competitive specifically on text rendering but trail on multi-reference composition and prompt fidelity for complex scenes. GPT Image 2 does not have a context window in the LLM sense. Each generation request accepts a text prompt, up to 16 reference images, and an optional mask image. The built-in web search capability, introduced with this model, supplements the December 2025 training data cutoff by querying live information before generating, helping with time-sensitive or real-world-reference requests. Multi-turn editing is supported: you can iteratively refine a generated image with follow-up natural language instructions, and the model tracks edit history within a session. Maximum output resolution is 4096 by 4096 pixels with native 2K (2048px) generation. Supported aspect ratios are 1:1, 3:2, 2:3, 16:9, and 9:16. GPT Image 2 accepts text prompts and up to 16 reference images as input, and outputs raster images. It does not generate text, audio, or video. The model supports inpainting via natural language instructions without manual masking for most tasks, precise mask-based inpainting for pixel-level control, outpainting (extending images beyond original borders), background removal, style transfer, product shot generation, and character consistency across multiple reference images. The model has no function calling, structured JSON output, or agentic tool-use capabilities in the standard API sense; it is a generation and editing model, not a conversational model. GPT Image 2 uses token-based billing rather than flat per-image pricing. The official rates from OpenAI's pricing page are $8.00 per 1 million image input tokens, $30.00 per 1 million image output tokens, $2.00 per 1 million cached image input tokens, and $5.00 per 1 million text input tokens. At 1024 by 1024 resolution using OpenAI's image generation calculator, per-image cost estimates are $0.006 at low quality, $0.053 at medium quality, and $0.211 at high quality. The Batch API cuts all token rates by 50 percent in exchange for asynchronous processing with results delivered within 24 hours. A team generating 1,000 product images per day at medium quality (1024x1024) pays approximately $53 per day; at high quality the same volume is approximately $211 per day. High quality takes 30 to 50 times longer than low quality due to the reasoning and review stages. No free tier exists for gpt-image-2. The primary API access path is the OpenAI API directly (api.openai.com). GPT Image 2 is also available through Microsoft Azure AI Foundry (Azure OpenAI Service) as of April 2026, adding Azure AI Content Safety on top of OpenAI's own safety layer. The model is not available on AWS Bedrock or Google Vertex AI as of June 2026. Authentication uses standard OpenAI API keys or Azure credentials. The API exposes two endpoints: a generation endpoint (text and optional reference images to image output) and an editing endpoint (image plus optional mask for inpainting and outpainting). The standard OpenAI Python and Node.js SDKs (version 1.x and above) support gpt-image-2 without additional libraries. OpenAI published a dedicated system card for ChatGPT Images 2.0 on April 21, 2026, covering a three-layer safety architecture. Pre-generation text classifiers screen prompt text for policy violations before image generation begins. Input image classifiers screen uploaded reference images for violating content. A post-generation classifier reviews the final output before delivery to the user. The heightened photorealism of gpt-image-2 compared to prior models drove additional investment in deepfake detection: the model refuses to generate realistic depictions of real people in political, sexual, or otherwise misleading contexts, CSAM, and content that violates OpenAI's usage policy. No external red-team partner names are disclosed in the image model system card. GPT Image 2 is best for product teams building content creation tools, marketing automation pipelines, e-commerce photography, and multilingual typographic design where prompt adherence and text rendering accuracy in multiple scripts are required. Its built-in web search makes it viable for news-adjacent and real-time visual content without a knowledge cutoff workaround. Teams should not pick gpt-image-2 if they need generation latency below 1 second (the model takes 8 to 25 seconds at medium quality), artistic or painterly stylization on par with Midjourney v7 (which still leads on aesthetic output and mood), or on-premise or air-gapped deployment (weights are closed). For cost-sensitive bulk generation at low quality tiers, Google Imagen 4 Fast at $0.02 per image outperforms on price for non-typographic workloads. Generated images carry full commercial rights; the user retains ownership of outputs per OpenAI's standard terms. API inputs are retained for up to 30 days for abuse monitoring by default; enterprise customers can request zero-retention agreements. The model is SOC 2 Type 2 compliant through OpenAI's shared enterprise tier, with HIPAA-eligible deployment available for qualifying customers. The training data cutoff is December 2025, supplemented at inference time by the built-in web search feature. On June 2, 2026, OpenAI deprecated gpt-image-1-mini and gpt-image-1.5 and set their API removal date to December 1, 2026, with gpt-image-2 as the official replacement. DALL-E 2 and DALL-E 3 were removed from the API on May 12, 2026, completing the transition to the gpt-image model family. The primary benchmark improvements over gpt-image-1.5 are the O-series reasoning pipeline (four-stage versus single-pass), text rendering accuracy rising from approximately 85 to 99 percent, support for up to 16 reference images (up from 4), native 2K resolution output, and built-in web search before generation. Output token pricing dropped slightly from $32 per 1 million tokens for gpt-image-1.5 to $30 per 1 million tokens for gpt-image-2. OpenAI has not announced a public roadmap for the next gpt-image release, and the model carries a sporadic update velocity consistent with the prior gpt-image release cadence of roughly every 6 to 9 months.

Pricing

$8.00 per 1M image input tokens, $30.00 per 1M image output tokens, $2.00 per 1M cached image input tokens, $5.00 per 1M text input tokens. Per-image estimates at 1024x1024: $0.006 (low), $0.053 (medium), $0.211 (high). Batch API cuts all rates by 50%.

Key Features

Pros

Cons

Benchmarks

Frequently Asked Questions

What is GPT Image 2 and who built it?

GPT Image 2 (API model ID: gpt-image-2) is OpenAI's third-generation image generation flagship, released on April 21, 2026, and branded as ChatGPT Images 2.0 in the consumer product. It follows gpt-image-1 (April 2025) and gpt-image-1.5 (December 2025), both deprecated on June 2, 2026 and scheduled for API removal on December 1, 2026. The model's primary architectural innovation is the integration of OpenAI's O-series reasoning pipeline into image generation: it runs a four-stage Understand/Plan/Generate/Review process before producing any pixel output, making it the first commercial image model with embedded planning and self-review. OpenAI describes the architecture as a completely new approach versus prior gpt-image models, likely a Transformer-Diffusion hybrid (Transfusion-style) rather than a pure diffusion or pure autoregressive model, though exact details are not publicly disclosed. The model is proprietary with no open weights. It surpassed DALL-E 3 (deprecated May 12, 2026) and its own predecessor gpt-image-1.5 on every major quality metric, reaching top rank on the LM Arena image generation leaderboard with a 9.6 out of 10 overall rating. It is available via the OpenAI API and Microsoft Azure AI Foundry.

How much does GPT Image 2 cost per image in 2026?

GPT Image 2 uses token-based billing rather than flat per-image pricing. The official OpenAI rates are $8.00 per 1 million image input tokens, $30.00 per 1 million image output tokens, $2.00 per 1 million cached image input tokens, and $5.00 per 1 million text input tokens. Using OpenAI's image generation calculator, per-image estimates at 1024x1024 resolution are $0.006 at low quality, $0.053 at medium quality, and $0.211 at high quality. The Batch API cuts all token rates by 50 percent in exchange for asynchronous processing with results delivered within 24 hours, making large-volume production jobs significantly cheaper. A team generating 1,000 medium-quality product images per day pays approximately $53 per day; the same volume at high quality costs approximately $211 per day. For comparison, Google Imagen 4 Fast charges a flat $0.02 per image regardless of quality tier, and Midjourney's API (when available) uses a subscription model. GPT Image 2 has no free tier; all access requires a paid OpenAI API account. Use OpenAI's official image cost calculator with your specific prompt and resolution settings for accurate per-run estimates, as token consumption varies with prompt complexity and reference image count.

What is GPT Image 2's maximum resolution and what quality tiers are available?

GPT Image 2 generates images natively at 2K resolution (2048px) and supports output up to 4096x4096 pixels, making it suitable for commercial printing and large-format display work. Three quality tiers are available through the API: low, medium, and high. At 1024x1024, the estimated cost scales from $0.006 (low) to $0.053 (medium) to $0.211 (high). High quality triggers the full four-stage O-series reasoning pipeline, which takes 30 to 50 times longer than low quality per image; a low-quality request that takes 1 to 2 seconds can take 40 to 50 seconds at high quality for a complex prompt. Supported aspect ratios are 1:1 (square), 3:2 (landscape), 2:3 (portrait), 16:9 (widescreen), and 9:16 (vertical/mobile). For comparison, Midjourney v7 and Flux 2 Pro v1.1 also support comparable resolutions and aspect ratios but use flat-rate pricing models with no quality tier selector. The editing endpoint (inpainting and outpainting) supports the same resolution range as the generation endpoint. Actual output resolution should be selected based on the intended display or print context to avoid unnecessary token spend.

How does GPT Image 2 compare on quality vs Midjourney v7 and Flux 2 Pro?

GPT Image 2 holds the top rank on the LM Arena image generation leaderboard as of June 2026 with an overall rating of 9.6 out of 10, ahead of Midjourney v7 and Flux 2 Pro v1.1. GPT Image 1.5 and Flux 2 Pro v1.1 had near-identical LM Arena Elo scores (1,264 and 1,265 respectively) before gpt-image-2 overtook both on leaderboard ranking. Text rendering accuracy is where gpt-image-2 has the clearest advantage: 99 percent versus the 60 to 70 percent typical of prior models; Midjourney v7 and Flux 2 Pro do not reliably render text in non-Latin scripts. Midjourney v7 retains the lead on artistic and painterly aesthetics, mood, and stylistic interpretation; for cinematic or fine-art output, Midjourney remains the preferred choice among creative professionals. Flux 2 Pro v1.1 is competitive on photorealism and speed but does not match gpt-image-2 on multilingual text or multi-reference composition. Ideogram 2.0 specifically targets text-in-image use cases but trails gpt-image-2 on overall prompt fidelity for complex scenes. GPT Image 2 leads on commercial and product photography, UI mockup generation, marketing asset production, and any workflow requiring accurate text in the image.

Is GPT Image 2 open source or proprietary?

GPT Image 2 is fully proprietary with no public weights released; it is API-only. OpenAI has not published any paper disclosing the architecture parameters or training recipe. The model is accessible through the OpenAI API (api.openai.com) and Microsoft Azure AI Foundry (Azure OpenAI Service), with Azure adding its own AI Content Safety layer. As of June 2026, gpt-image-2 is not available on AWS Bedrock or Google Vertex AI. Generated images carry full commercial rights and the user retains ownership of outputs per OpenAI's standard API terms. This contrasts with open image generation models: Flux 2 Pro v1.1 has open weights on Hugging Face under an Apache 2.0 license and can be self-hosted with sufficient GPU VRAM. Stable Diffusion 3.5 and SDXL are open-source under permissive licenses. Teams requiring on-premise deployment, air-gapped environments, or model fine-tuning should use an open-weights alternative. gpt-image-2 has a Hugging Face repository page documenting the hosted product experience, but it ships no model weights and no inference provider is listed for self-hosted deployment.

What editing and inpainting features does GPT Image 2 support?

GPT Image 2 supports natural language inpainting without manual masking: you describe what to change and the model applies the edit while preserving the rest of the image. For pixel-precise control, the editing API endpoint also accepts a mask image that explicitly defines which regions to modify. Outpainting (extending an image beyond its original borders) is supported through the same editing endpoint. Background removal is available as a dedicated capability, producing a transparent-background PNG. Up to 16 reference images can be submitted with each request, enabling consistent character appearances, product surfaces, and brand styles across multiple outputs. Multi-turn iterative editing is supported within a session: you can refine the same image across several follow-up instructions without re-uploading the base image. The model does not support video editing, audio, or 3D output. For comparison, Adobe Firefly and Photoshop Generative Fill also offer natural language inpainting but are designed for single-image desktop workflows; gpt-image-2 is designed for API-driven, high-volume pipeline integration with programmatic control over reference images and masks.

Does OpenAI train on images submitted to the GPT Image 2 API?

OpenAI does not train on API inputs by default. API inputs and outputs are retained for up to 30 days for abuse monitoring and then deleted unless flagged for policy review. Enterprise customers can request zero-retention agreements that eliminate this 30-day window entirely. GPT Image 2 is SOC 2 Type 2 compliant through OpenAI's enterprise tier and is HIPAA-eligible for qualifying customers. GDPR compliance is supported, with EU data residency options available. When using gpt-image-2 through Microsoft Azure AI Foundry, Microsoft's data handling terms and Azure AI Content Safety policies apply in addition to OpenAI's, and Azure's enterprise data protection commitments (including EU data boundary) govern API inputs on that deployment path. The model's underlying training data had a cutoff of December 2025; OpenAI has not disclosed the specific dataset composition. At inference time, the built-in web search feature may query live third-party sources before generating; the content of those queries is not retained beyond the session. Review OpenAI's privacy policy and enterprise DPA for the current data handling terms, as these may update after the June 2026 review date.

Who is GPT Image 2 best for and who should avoid it?

GPT Image 2 is best for product design and e-commerce teams that need consistent, photorealistic product shots from multiple reference angles at scale; the 16-reference-image support and natural language inpainting are the differentiating factors here. It is the only production-ready image API for multilingual typographic design where text must appear accurately in Chinese, Japanese, Korean, or Arabic in the output image. Marketing agencies running high-volume localized content pipelines benefit from the Batch API at 50 percent discount combined with the prompt fidelity for complex brand scenes. Frontend engineers integrating image generation into web or mobile apps should account for the 8 to 25 second latency at medium quality and design async UX accordingly. Teams that should avoid gpt-image-2: those needing sub-2-second image generation for real-time or gaming applications (Flux or Imagen 4 Fast are better fits on speed); those building artistic or editorial imagery where Midjourney v7's aesthetic quality and mood control remains superior; teams requiring on-premise or air-gapped deployment where open-weights models like Flux or SDXL are necessary; and budget-constrained teams doing bulk low-quality generation where Imagen 4 Fast at $0.02 per image is significantly cheaper than gpt-image-2's $0.006 low-quality estimate at similar volume.

Visit GPT Image 2 Official Page