Name: MAI-Image-2.5: Microsoft's #2 Arena Image Editor (2026)
Brand: Microsoft AI
Price: 5.00 USD
Availability: InStock

Question 1

What is MAI-Image-2.5 and who built it?

Accepted Answer

MAI-Image-2.5 is a diffusion-based text-to-image and image-editing model built by Microsoft AI, the in-house model group Microsoft formed in November 2025 under CEO Mustafa Suleyman to reduce its dependence on OpenAI. Microsoft announced it on June 2, 2026 at Build 2026, alongside seven other new MAI models covering voice, transcription, coding and reasoning. It is the fifth release in Microsoft's image-model lineage, following MAI-Image-1 (October 2025), MAI-Image-2 (March 2026) and MAI-Image-2-Efficient. The model ships in two SKUs: the full MAI-Image-2.5 and the faster, cheaper MAI-Image-2.5-Flash. On Arena, the crowd-voted image model leaderboard, it ranks #3 for text-to-image (Elo ~1,269) and #2 for image editing (Elo ~1,401), a 75-point composite gain over MAI-Image-2. It was designed specifically to compete with GPT-Image-2 and Google's Nano Banana line while giving Microsoft an owned alternative to licensing OpenAI's image models. No architecture paper has been published, and Microsoft has not disclosed an official parameter count.

Question 2

How much does MAI-Image-2.5 cost per 1M tokens?

Accepted Answer

On Microsoft Foundry, the standard MAI-Image-2.5 SKU costs $5.00 per 1M text-input tokens, $8.00 per 1M image-input tokens, and $47.00 per 1M image-output tokens. The Flash SKU is cheaper across the board: $1.75 per 1M text-input tokens, $1.75 per 1M image-input tokens, and $19.50 per 1M image-output tokens, roughly 41% below MAI-Image-2's pricing. Generating 100 marketing images (about 1M output tokens) costs roughly $47 on the standard SKU or $19.50 on Flash. A 500-image batch edit job on Flash (about 2M output tokens) runs about $39. By comparison, GPT-Image-2 access typically costs more per image at equivalent quality tiers, which is part of Microsoft's cost pitch for the MAI line. Enterprise customers can reserve provisioned throughput (PTU) capacity for predictable workloads, though Microsoft has not published PTU rates publicly. There is no flat per-image consumer price; all billing runs through the token-based Foundry API.

Question 3

What is MAI-Image-2.5's context window and output resolution?

Accepted Answer

MAI-Image-2.5 accepts up to 32,000 tokens of text context per request, enough for detailed prompts, style references and multi-step edit instructions in a single call. It natively outputs images up to 1024x1024 pixels across seven supported aspect ratios: 1:1, 4:3, 3:4, 16:9, 9:16, 3:2 and 2:3. Microsoft has not published a separate high-resolution or upscaling tier for MAI-Image-2.5, so 1024x1024 should be treated as the model's native output ceiling rather than a minimum. There is no long-context recall metric to report since this is an image-output model, not a text-generation model; the 32K context figure governs the prompt and edit instructions the model can process, not conversational memory. For document or multi-file inputs, Microsoft has not documented a dedicated PDF or multi-image batch mode beyond single-image editing calls.

Question 4

How does MAI-Image-2.5 compare on benchmarks vs GPT-Image-2?

Accepted Answer

On Arena's image-editing leaderboard, GPT-Image-2 holds the #1 spot with MAI-Image-2.5 close behind at #2 (Elo ~1,401), ahead of Google's Nano Banana 2.1, DALL-E 3 and Ideogram 2.0. On the text-to-image leaderboard, MAI-Image-2.5 ranks #3 at Elo ~1,269, level with Google's Nano Banana 2 but behind GPT-Image-2 and the leaderboard's top entry. In practice, a top-2-versus-top-1 gap on editing means MAI-Image-2.5 is competitive but not the clear leader for pure edit-quality workloads, while its pricing (as low as $19.50 per 1M output tokens on Flash) undercuts typical GPT-Image-2 access costs. Neither Microsoft nor independent researchers have published FID scores, CLIP scores, or other standardized metrics comparing the two models directly, so the Arena Elo comparison is the only verifiable benchmark head-to-head available as of this writing. Teams should run their own task-specific evaluation before assuming Arena rank translates directly to their use case.

Question 5

Is MAI-Image-2.5 open source or proprietary?

Accepted Answer

MAI-Image-2.5 is fully proprietary. Microsoft has not released model weights, an architecture paper, or a GitHub repository for any model in the MAI-Image line. Access is exclusively through Microsoft's own surfaces: the Microsoft Foundry (Azure AI Foundry) API, the MAI Playground, PowerPoint's built-in image tools, and third-party routers OpenRouter, Fireworks AI and Baseten that resell API access under Microsoft's terms. There is no permissive or restrictive open license to cite, no Hugging Face listing, and no self-hosting option; VRAM and quantization questions do not apply since the weights are never distributed. Commercial use is governed entirely by the Microsoft Foundry Model Terms tied to an Azure subscription, not a separate open-source license.

Question 6

What modalities does MAI-Image-2.5 support?

Accepted Answer

MAI-Image-2.5 accepts text prompts and image inputs, and produces image outputs only; it does not generate or accept audio or video. Its defining new capability versus earlier MAI-Image releases is image-to-image editing: given an existing image plus a text instruction, it can replace a single object, update in-image text, or remove motion blur while leaving the rest of the frame untouched, a feature Microsoft calls surgical editing. Text-to-image generation from a prompt alone remains fully supported, as it was in MAI-Image-1 and MAI-Image-2. The model does not support function calling, structured JSON output, or tool use in the way LLMs do, since its sole output modality is a rendered image. There is no confirmed computer-use or agentic-loop capability for this model; it functions as a single-turn image generation and editing endpoint.

Question 7

Does MAI-Image-2.5 train on user data?

Accepted Answer

Microsoft has not published a MAI-Image-2.5-specific data retention or training-on-inputs policy beyond its general Microsoft Foundry data handling terms, so this should be treated as unconfirmed rather than assumed. The model's published model card focuses on safety evaluation (a two-phase pre-mitigation and post-mitigation process) rather than data retention specifics. Microsoft's model card lists Microsoft Ireland Operations Limited as the EU regulatory contact, suggesting standard Microsoft compliance infrastructure applies, but no SOC 2, ISO 27001, HIPAA, or GDPR compliance statement specific to this model has been located in public sources. Microsoft began adding C2PA content-provenance metadata to Microsoft 365 content in February 2026, but whether MAI-Image-2.5's API output specifically carries C2PA manifests is not explicitly confirmed in the public model card. Enterprises with strict data-handling requirements should confirm retention and training terms directly with their Microsoft Foundry account team before relying on assumptions.

Question 8

Who is MAI-Image-2.5 best for and who should avoid it?

Accepted Answer

MAI-Image-2.5 is best for Microsoft 365 teams doing in-document image generation and editing directly inside PowerPoint, with OneDrive integration rolling out; Azure developers building commercial and packaging imagery where in-image text quality matters, since text rendering improved 107 Arena points over MAI-Image-2; and cost-sensitive teams running high-volume batch edits on the Flash SKU at $19.50 per 1M output tokens. It is a poor fit for independent creators or hobbyists who want a one-click consumer app, since the primary access path runs through Azure/Microsoft Foundry developer tooling rather than a standalone app comparable to Midjourney or DALL-E 3 in ChatGPT, and Bing Image Creator/Copilot rollout for the 2.5 generation was not confirmed live at launch. Teams that specifically need the single top-ranked Arena text-to-image model, rather than a top-three placement, should evaluate GPT-Image-2 first. Researchers or procurement teams that require a disclosed architecture paper, parameter count, or training data cutoff will also find MAI-Image-2.5's documentation thinner than some competitors.

MAI-Image-2.5: Microsoft's #2 Arena Image Editor (2026)

About MAI-Image-2.5

Pricing

Key Features

Pros

Cons

Benchmarks

Frequently Asked Questions