MAI-Thinking-1: Microsoft's 256K Reasoning Model (2026)

MAI-Thinking-1 is Microsoft AI's first in-house reasoning model (June 2026), 35B-active MoE, 256K context, 97.0% AIME 2025, 73.5% SWE-bench Verified.

MAI-Thinking-1 is Microsoft AI's first in-house reasoning model, released June 2, 2026 with a 35B-active/1T-total MoE architecture, 256K context window, and 97.0% AIME 2025 score. It is in Microsoft Foundry private preview with undisclosed pricing, and Microsoft says it matches Claude Opus 4.6 on 52.8% SWE-bench Pro despite a fraction of the active parameters.

MAI-Thinking-1, released June 2, 2026 by Microsoft AI, is a 35B-active/1T-total MoE reasoning model with a 256K context window. It scores 97.0% on AIME 2025 and 73.5% on SWE-bench Verified. Pricing is undisclosed during its Microsoft Foundry private preview, and it trades blows with Claude Opus 4.6 on SWE-bench Pro.

Provider: Microsoft · Family: MAI

Context window: 256,000 tokens

Input modalities: text, tool-calls · Output: text, tool-calls

About MAI-Thinking-1

MAI-Thinking-1 is Microsoft AI's first flagship in-house reasoning model, announced at Microsoft Build on June 2, 2026. It is a sparse Mixture of Experts model with 35 billion active parameters out of roughly 1 trillion total, built on the internal MAI-Base-1 foundation model and trained through what Microsoft calls its Hill-Climbing Machine pipeline. Unlike earlier Microsoft Copilot deployments that leaned on OpenAI models, MAI-Thinking-1 was trained from scratch on commercially licensed data with no distillation from any third-party model, positioning it as Microsoft's first serious bid at first-party frontier intelligence rather than a wrapper around a partner's weights. On benchmarks, MAI-Thinking-1 lands solidly in the upper-mid frontier tier for a 35B-active model. It scores 97.0% on AIME 2025 and 94.5% on AIME 2026, 84.9% on HMMT February 2026, and 84.2% on GPQA Diamond, putting its math and graduate-science reasoning close to much larger dense and MoE competitors. On coding, it reaches 87.7% on LiveCodeBench v6, 73.5% on SWE-bench Verified, and 52.8% on SWE-bench Pro, which Microsoft describes as toe-to-toe with Claude Opus 4.6 on SWE-bench Pro despite the smaller active-parameter footprint. MMLU-Pro sits at 85.0%. Terminal-Bench 2.0 (46.0%) and MultiChallenge (53.0%) scores are lower, suggesting the model is stronger at bounded reasoning tasks than at long, messy multi-turn agentic terminal work compared to the very top frontier models. The model ships with a 256,000-token context window, enough to hold roughly 600 pages of text in a single request. Microsoft has not published a max output token limit publicly as of this writing. In blind human side-by-side evaluations across 1,276 tasks, raters preferred MAI-Thinking-1's responses over Anthropic's Sonnet 4.6, a notable claim for a model roughly a third the active-parameter size of many rivals it is compared against. On modalities, MAI-Thinking-1 is a text-in, text-out reasoning model. It supports function calling, developer system instructions, and the Chat Completions API format, and is built for agentic and tool-use workloads rather than vision or audio input. Microsoft has not announced vision, audio, or video support for this model; that positioning is reserved for the sibling Image 2.5 and Voice 2 models released in the same MAI family wave. Pricing has not been publicly disclosed as of the June 2026 private preview launch. The model is available first through Microsoft Foundry in private preview with a public preview planned via the MAI Playground, and Microsoft has committed to distributing MAI-Thinking-1 through third-party inference providers including Fireworks AI, Baseten, and OpenRouter, plus AI gateway integrations (LiteLLM, Portkey, Azure AI Foundry Model Router, Helicone, Kong AI Gateway). No self-hosting or open-weights path exists; MAI-Thinking-1 is proprietary and API/Foundry-access only. Safety is built into the same reinforcement learning loop used for capability training rather than layered on afterward. Microsoft's team treated both unsafe compliance and unnecessary refusal as defects scored by the same hill-climbing reward signal, which the company argues produces a model that refuses genuinely harmful requests without the over-refusal pattern common in earlier safety-tuned models. Microsoft frames this as a step toward its stated 'Humanist Superintelligence' goal: powerful, domain-bounded AI designed to work in service of people rather than as an unbounded autonomous entity. Training data draws from commercially licensed sources with staggered cutoffs across content types: web HTML to around September 2025, web PDFs to December 2025, GitHub code to June 2025, and books/journals to March 2026. MAI-Thinking-1 is best suited for enterprise teams already inside the Microsoft ecosystem who want a first-party reasoning model for agentic coding, math-heavy analysis, and tool-calling workflows inside Foundry, Copilot, or Azure governance boundaries, without depending on OpenAI's models. It is a weaker fit for teams needing vision or audio input, teams needing published pricing today (rates are undisclosed during private preview), or teams needing an open-weights model they can self-host. Teams doing very long, messy multi-turn agentic terminal automation may find its Terminal-Bench 2.0 score (46.0%) and MultiChallenge score (53.0%) noticeably behind top frontier agentic models, and should evaluate against GPT-5-class or Claude Opus-class alternatives for those specific workloads.

Pricing

Pricing not publicly disclosed as of the June 2026 private preview launch on Microsoft Foundry. Watch the Azure AI Foundry Models pricing page for rate disclosure ahead of public preview.

Key Features

Pros

Cons

Benchmarks

Frequently Asked Questions

What is MAI-Thinking-1 and who built it?

MAI-Thinking-1 is Microsoft AI's first in-house flagship reasoning model, announced at Microsoft Build on June 2, 2026. It is a sparse Mixture of Experts model with 35 billion active parameters out of roughly 1 trillion total, built on Microsoft's internal MAI-Base-1 foundation model through what the company calls its Hill-Climbing Machine training pipeline. Unlike earlier Microsoft Copilot deployments built on OpenAI's models, MAI-Thinking-1 was trained entirely from scratch on commercially licensed data with zero distillation from any third-party model. It scores 97.0% on AIME 2025, 84.2% on GPQA Diamond, and 73.5% on SWE-bench Verified. Microsoft designed it to reduce Copilot and Foundry's dependency on OpenAI while competing with Claude Opus 4.6 on agentic coding benchmarks. It launched in private preview on Microsoft Foundry with a 256K context window.

How much does MAI-Thinking-1 cost per 1M tokens?

Microsoft has not publicly disclosed per-token pricing for MAI-Thinking-1 as of its June 2, 2026 private preview launch on Microsoft Foundry. No input, output, or cached-input rates, batch discount, or provisioned throughput pricing have been published. For comparison, competing reasoning models in this benchmark tier such as OpenAI's o3 price around $10 per 1M input and $40 per 1M output tokens, and Google's Gemini 2.5 Pro prices around $1.25 per 1M input and $10 per 1M output tokens with a 200K thinking budget, but these are not MAI-Thinking-1's actual prices. Microsoft says pricing will be announced via the Azure AI Foundry Models pricing page ahead of the model's public preview. There is no free tier and no option to self-host, since the model is proprietary and API-only. Anyone budgeting for production use should treat any third-party cost estimate for MAI-Thinking-1 today as speculative until Microsoft publishes official rates.

What is MAI-Thinking-1's context window and max output?

MAI-Thinking-1 ships with a 256,000-token context window, which Microsoft describes as enough to hold roughly 600 pages of text in a single request. Microsoft has not published a specific max output token limit for the model, which is unusual for a headline Build 2026 release; most 2026 frontier models disclose this figure alongside context window size. There is no publicly verified needle-in-haystack or long-context recall evaluation for MAI-Thinking-1 at launch. At 256K tokens, its context window sits below the 1M-token windows offered by Claude Opus 4.8, GPT-5.5, and Gemini 3.1 Pro, but above many mid-weight open-weights competitors. Document handling for PDFs or multi-file inputs has not been detailed in Microsoft's public materials. Developers should test actual generation length limits empirically during the Foundry private preview period.

How does MAI-Thinking-1 compare on benchmarks vs Claude Opus 4.6?

Microsoft states that MAI-Thinking-1 is toe-to-toe with Claude Opus 4.6 on SWE-bench Pro, where MAI-Thinking-1 scores 52.8% despite having far fewer active parameters (35B vs Opus 4.6's larger footprint). MAI-Thinking-1 scores 73.5% on SWE-bench Verified, 97.0% on AIME 2025, 94.5% on AIME 2026, and 84.2% on GPQA Diamond. In blind human side-by-side evaluations across 1,276 tasks, raters preferred MAI-Thinking-1's responses over Anthropic's Sonnet 4.6, though this comparison is against Sonnet rather than Opus. These benchmark figures are Microsoft-reported at launch and have not yet been independently cross-verified by third-party leaderboards like LMArena or Artificial Analysis. A roughly 3-5 point gap on SWE-bench Pro or Verified translates to meaningfully fewer one-shot correct pull requests on real agentic coding tasks. Microsoft has not published ARC-AGI 2 or LMArena Elo scores for MAI-Thinking-1, an absence that makes independent cross-model ranking harder until those numbers appear.

Is MAI-Thinking-1 open source or proprietary?

MAI-Thinking-1 is fully proprietary and closed-weight. There is no Hugging Face release, no downloadable weights, and no open or research-only license variant. Access is exclusively through Microsoft Foundry, currently in private preview as of June 2026, with a public preview planned via the MAI Playground. Microsoft has also committed to distributing the model through third-party inference providers including Fireworks AI, Baseten, and OpenRouter, plus AI gateway integrations like LiteLLM, Portkey, Azure AI Foundry Model Router, Helicone, and Kong AI Gateway, but none of these routes involve open weights; they are all hosted API access to the same closed model. There is no self-hosting path, no quantized GGUF release, and no VRAM requirement to publish since the model cannot be run locally. Commercial use is gated behind Microsoft Foundry's preview terms rather than a public license file.

What modalities does MAI-Thinking-1 support?

MAI-Thinking-1 is a text-in, text-out reasoning model. Confirmed input modalities are text and tool-call payloads; confirmed output modalities are text and tool-calls. Microsoft has not announced vision, audio, or video input or output support for MAI-Thinking-1, unlike some competing frontier models that ship multimodal by default. The model supports native function calling, structured output, and a distinct developer-instruction role compatible with the Chat Completions API format. Parallel tool calls and computer-use style agent loops have not been explicitly confirmed in Microsoft's public materials. For multimodal workloads, Microsoft's separate Image 2.5 and Voice 2 models, released in the same MAI family wave, handle image and speech generation instead of MAI-Thinking-1 itself. Teams needing a single model that handles both reasoning and vision should look at Claude Opus 4.8, GPT-5.5, or Gemini 3.1 Pro instead.

Does MAI-Thinking-1 train on user data?

Microsoft has not publicly detailed MAI-Thinking-1's default API data retention or training-on-inputs policy as of its June 2026 private preview launch. The model is served through Microsoft Foundry, which Microsoft markets as offering enterprise governance and Azure data residency controls, but specific SOC 2 Type II, ISO 27001, HIPAA-eligibility, and GDPR compliance statements have not been published for this specific model at launch, unlike some longer-established Azure OpenAI Service offerings. There is no confirmed zero-retention enterprise tier disclosed yet for MAI-Thinking-1 specifically. Given Microsoft's broader Azure compliance posture, enterprise customers evaluating this model for regulated workloads should request explicit data handling and compliance documentation directly from their Microsoft account team rather than relying on general Azure AI Foundry claims, since MAI-Thinking-1's own compliance certifications had not been separately published as of this writing.

Who is MAI-Thinking-1 best for and who should avoid it?

MAI-Thinking-1 is best for enterprise teams already standardized on Microsoft Azure and Foundry who want a first-party reasoning model for agentic coding, math-heavy analysis, and tool-calling workflows without depending on OpenAI's models inside Copilot. It is also a strong fit for teams whose reasoning workloads lean toward math and graduate-science problems, given its 97.0% AIME 2025 and 84.2% GPQA Diamond scores. Teams should avoid MAI-Thinking-1 if they need vision or audio input today, since it is text-and-tool-calls only. Cost-sensitive teams needing firm budget numbers should also wait, since pricing remains undisclosed during private preview as of June 2026. Teams running long, messy multi-turn terminal agent automation should consider alternatives too, since MAI-Thinking-1's Terminal-Bench 2.0 score (46.0%) and MultiChallenge score (53.0%) trail top frontier agentic models like Claude Opus 4.6 and GPT-5.5 on those specific axes.

Visit MAI-Thinking-1 Official Page