MiniMax M3: 1M Context & 59% SWE-Bench Pro (2026)

MiniMax M3 (June 2026) scores 59% on SWE-Bench Pro with a 1M-token context window and open weights, priced at $0.30 input / $1.20 output per 1M tokens.

MiniMax M3 is MiniMax's open-weight flagship, released June 1, 2026 with a 1,048,576-token context window, native text/image/video input, and a 59.0% SWE-Bench Pro score that beats GPT-5.5 and Gemini 3.1 Pro. It costs $0.30 per 1M input tokens and $1.20 per 1M output tokens, roughly 1/20th of comparable frontier pricing, and uses a new MiniMax Sparse Attention architecture for 15.6x faster decoding at 1M tokens.

MiniMax M3, released June 1, 2026 by MiniMax, is an open-weight multimodal model with a 1,048,576-token context window and a 59.0% SWE-Bench Pro score, beating GPT-5.5 and Gemini 3.1 Pro. It is priced at $0.30 per 1M input tokens and $1.20 per 1M output tokens, with MiniMax Sparse Attention giving 15.6x faster decoding at full context.

Provider: MiniMax · Family: MiniMax M3

Context window: 1,048,576 tokens · Max output: 512,000

Input modalities: text, image, video, tool-calls · Output: text, tool-calls

About MiniMax M3

MiniMax M3 is the latest large language model from MiniMax, a Shanghai-based AI lab listed on the Hong Kong Stock Exchange (0100.HK) since January 2026. MiniMax released M3 on June 1, 2026, as a Mixture-of-Experts model with roughly 229.9 billion total parameters and 9.8 billion active parameters per token across 256 fine-grained experts. The headline architectural change is MiniMax Sparse Attention (MSA), a new sparse attention scheme that replaces the full-attention backbone used in the prior M2 generation. M3 sits at the top of MiniMax's M-series lineup, succeeding MiniMax-M2.7 (released March 18, 2026, open-weighted in April 2026), and was designed to solve three problems at once: make 1M-token context windows computationally practical, add native multimodal input from the start of pretraining rather than as an afterthought, and push agentic coding performance past frontier US and Chinese rivals at a fraction of the price. On MiniMax's own June 2026 release benchmarks, M3 scores 59.0% on SWE-Bench Pro, which the company says surpasses both GPT-5.5 and Gemini 3.1 Pro on the same test. It also scores 66.0% on Terminal-Bench 2.1, 74.2% on MCP Atlas (a tool-use benchmark), 70.06% on OSWorld-Verified (computer-use), 34.8% on SWE-fficiency, and 28.8% on KernelBench Hard. On multimodal evaluations, M3 reaches 84.8% on Video-MME, 81.4% on VideoMMMU, 75.1% on MMMU Pro, and 80.8% on OmniDocBench. On math olympiad benchmarks it solved 25 of 42 problems on IMO 2025 and 17 of 42 on USAMO 2026. A widely cited BrowseComp score of 83.5 is reported to beat Claude Opus 4.7's 79.3 on the same autonomous-browsing benchmark. Some third-party aggregators list a GPQA Diamond score near 92.9%, but that figure does not appear in MiniMax's own benchmark table and should be treated as unverified until MiniMax publishes its technical report. M3's headline feature is a 1,048,576-token (1M) context window with a 512,000-token maximum output, both roughly 5x the 204,800-token context of M2.7. The MSA architecture is designed specifically to make that window usable: MiniMax reports a 9.7x speedup in prefill and a 15.6x speedup in decode at 1M tokens compared to M2's full-attention design, cutting per-token compute at full context to roughly 1/20th of the previous generation. MSA keeps a Grouped-Query Attention backbone but adds block-level selection over real, uncompressed key-value pairs, which MiniMax contrasts with DeepSeek's Multi-head Latent Attention approach of compressing keys and values into a low-dimensional latent space. Independent time-to-first-token figures from Artificial Analysis put M3 at around 2.59 seconds on the MiniMax-hosted endpoint, with output speed around 54.8 tokens per second, which is workable for agentic loops but not the fastest option for latency-sensitive chat. M3 is natively multimodal for input, accepting text, images, and video, with text as the only confirmed output modality. It supports function calling, structured outputs, and an optional 'thinking' reasoning mode that can be toggled per request at no extra price premium. MiniMax also demonstrated computer-use: M3 can operate a desktop environment directly, scoring 70.06% on OSWorld-Verified. Despite MiniMax's broader audio product line (Speech 2.8, Music 2.5+), M3 itself does not support audio input or output, so voice workflows require pairing it with a separate MiniMax Audio endpoint or third-party ASR/TTS. MiniMax trained these modalities in together from the start of pretraining using interleaved text, image, and video data, which the company says scales better than bolting on synthetic multimodal data after the fact. M3's standard API pricing is $0.30 per 1M input tokens and $1.20 per 1M output tokens for prompts up to 512K input tokens, confirmed live on OpenRouter and MiniMax's own platform; this is a launch promotion, and MiniMax has indicated the list price doubles to $0.60/$2.40 once the promotion ends. A separate, higher long-context rate applies above 512K input tokens, though the exact multiplier was not published at launch. MiniMax also sells monthly Token Plans that pool usage across text, image, speech, and music: Plus at $20/month (~1.7B M3 tokens), Max at $50/month (~5.1B tokens), and Ultra at $120/month (~9.8B tokens). At the promotional per-token rate, summarizing a 100K-token document costs about $0.03, a daily coding-agent run of 1M input and 200K output tokens costs about $0.54, and a full 1M-token context analysis with a 50K-token response costs roughly $0.38, all well under comparable runs on GPT-5.5 or Gemini 3.1 Pro. M3 is available directly through MiniMax's own API platform (platform.minimax.io) and MiniMax Code, plus third-party inference providers including OpenRouter, Fireworks AI (with Day-0 support), and Together AI. As an open-weight model, M3's weights and technical report were published on Hugging Face (MiniMaxAI/MiniMax-M3) and GitHub (MiniMax-AI/MiniMax-M3) within about ten days of the June 1, 2026 API launch, under a MiniMax Community License similar in spirit to the modified-MIT terms used for M2.7. At 229.9B total parameters, self-hosting the full BF16 weights requires roughly 480GB or more of GPU memory, realistically a multi-node cluster of 8x H100/H200-class GPUs, though FP8 quantization can reduce that footprint. No AWS Bedrock, Google Vertex, or Azure listing was found for M3 as of mid-June 2026; MiniMax's infrastructure spans North America, Europe, and Asia-Pacific data regions. MiniMax has not published a system card, responsible scaling policy, or named third-party red-teaming partners for M3, consistent with the company's pattern for the M-series generally. Safety behavior is governed by MiniMax's general API terms of service rather than a dedicated model safety report, and the company states that customer API inputs are not stored or used for training by default unless a customer opts in. No SOC 2 Type II, ISO 27001, or HIPAA-eligible certification has been published for MiniMax's platform, which limits M3's appeal for regulated US and EU enterprise workloads compared with Anthropic, OpenAI, or Google models that publish detailed model and system cards. M3 is a strong fit for agentic coding teams that need a 1M-token context window at a fraction of frontier pricing, for teams building autonomous browsing or computer-use agents (BrowseComp 83.5, OSWorld-Verified 70.06%), and for teams that want to self-host or fine-tune an open-weight frontier-class model rather than depend on a closed API. It is a weaker choice for voice-first products, since it has no native audio I/O, and for regulated enterprises that require SOC 2 or HIPAA-eligible vendors, where Anthropic's Claude or OpenAI's GPT-5.5 are safer procurement choices. Teams that need the absolute fastest short-prompt latency may also prefer GPT-5.5 mini-class models, since M3's ~54.8 tokens/sec and ~2.6 second time-to-first-token on the MiniMax endpoint trail smaller, latency-optimized models. MiniMax has not disclosed a specific training data cutoff date or detailed dataset composition for M3, continuing the pattern from M2.7 where training data composition was largely undisclosed. The model was trained with native multimodal interleaved data rather than text-first pretraining followed by multimodal fine-tuning. As with MiniMax's other API products, the company states it does not train on customer inputs by default, and enterprise customers can choose data processing regions across North America, Europe, and Asia-Pacific. As a Shanghai-headquartered, Hong Kong-listed company, MiniMax also operates under China's Generative AI Measures from the Cyberspace Administration of China alongside any GDPR obligations for EU-facing products. M3 is a generational leap over M2.7: it swaps full attention for MSA sparse attention, expands the context window from 204,800 to 1,048,576 tokens, adds native multimodal (image and video) input that M2.7 lacked, and lifts SWE-Bench Pro from 56.2% to 59.0%. MiniMax has not announced an M3.1 or successor as of mid-June 2026, and M2.7 remains available for users who do not need the larger context window or multimodal input. Given MiniMax's cadence of roughly one major M-series release per quarter (M2.5 in February, M2.7 in March, M3 in June 2026), a further update is plausible later in 2026.

Pricing

$0.30 per 1M input tokens / $1.20 per 1M output tokens for prompts up to 512K input, confirmed on OpenRouter and MiniMax's own platform as of June 2026. This is a launch promotion; MiniMax has indicated the list price doubles to $0.60/$2.40 once the promotion ends. A higher long-context rate applies above 512K input tokens (exact multiplier not published). Monthly Token Plans (Plus $20, Max $50, Ultra $120) pool usage across text, image, speech, and music.

Key Features

Pros

Cons

Benchmarks

Frequently Asked Questions

What is MiniMax M3 and who built it?

MiniMax M3 is a large language model released on June 1, 2026 by MiniMax, a Shanghai-based AI lab listed on the Hong Kong Stock Exchange (0100.HK). It is a Mixture-of-Experts model with roughly 229.9 billion total parameters and 9.8 billion active parameters per token across 256 fine-grained experts. Its headline architectural feature is MiniMax Sparse Attention (MSA), a new sparse attention scheme built on a Grouped-Query Attention backbone that makes a 1,048,576-token context window computationally practical. M3 scores 59.0% on SWE-Bench Pro, which MiniMax says beats GPT-5.5 and Gemini 3.1 Pro on the same benchmark, plus 66.0% on Terminal-Bench 2.1 and 83.5 on BrowseComp. It succeeds MiniMax-M2.7 (March 2026) and was designed to combine frontier agentic coding, a 1M-token context window, and native multimodal input in a single model. M3 sits at the top of MiniMax's M-series lineup and is positioned to compete directly with GPT-5.5, Gemini 3.1 Pro, and Claude Opus 4.7 on agentic and coding benchmarks.

How much does MiniMax M3 cost per 1M tokens?

MiniMax M3's standard API pricing is $0.30 per 1M input tokens and $1.20 per 1M output tokens for prompts up to 512K input tokens, confirmed live on both OpenRouter and MiniMax's own platform as of June 2026. This is a launch promotional rate; MiniMax has indicated the list price will double to $0.60 input / $2.40 output once the promotion ends. A separate, higher long-context rate applies above 512K input tokens, though MiniMax has not published the exact multiplier. At the promotional rate, summarizing a 100K-token document costs about $0.03, a daily coding-agent run of 1M input and 200K output tokens costs about $0.54, and a full 1M-token context analysis with a 50K-token response costs roughly $0.38. MiniMax also offers monthly Token Plans (Plus $20, Max $50, Ultra $120) that pool usage across text, image, speech, and music models. This pricing is roughly 1/20th of comparable frontier model pricing from OpenAI or Google. Because M3 is open-weight, self-hosting is also possible for teams with the GPU capacity, eliminating per-token costs entirely.

What is MiniMax M3's context window and max output?

MiniMax M3 has a context window of 1,048,576 tokens (1M) and a maximum output of 512,000 tokens, both roughly 5x the 204,800-token context window of its predecessor, MiniMax-M2.7. This is made usable by MiniMax Sparse Attention (MSA), which MiniMax reports delivers a 9.7x speedup in prefill and a 15.6x speedup in decode at 1M tokens compared to M2's full-attention design, cutting per-token compute at full context to roughly 1/20th of the previous generation. MSA uses a Grouped-Query Attention backbone with block-level selection over real, uncompressed key-value pairs, differing from DeepSeek's Multi-head Latent Attention approach. Independent measurements from Artificial Analysis put M3's time-to-first-token around 2.59 seconds and output speed around 54.8 tokens per second on the MiniMax-hosted endpoint. A separate, higher-priced long-context tier applies for inputs above 512K tokens. M3's 1M-token window is among the largest available in any open-weight model as of mid-2026, exceeding GPT-5.5 and matching or exceeding Gemini 3.1 Pro's long-context tier.

How does MiniMax M3 compare on benchmarks vs GPT-5.5 and Gemini 3.1 Pro?

MiniMax reports that M3 scores 59.0% on SWE-Bench Pro, which it says surpasses both GPT-5.5 and Gemini 3.1 Pro on the same benchmark, though MiniMax did not publish the exact competitor scores in its release blog. On BrowseComp, an autonomous-browsing benchmark, M3 scores 83.5, ahead of Claude Opus 4.7's reported 79.3. M3 also scores 66.0% on Terminal-Bench 2.1, 74.2% on MCP Atlas (tool use), and 70.06% on OSWorld-Verified (computer-use). These are vendor-reported figures from MiniMax's own June 2026 release blog and have not yet been independently reproduced by third-party leaderboards. Some aggregator sites cite a GPQA Diamond score near 92.9% for M3, but that figure is absent from MiniMax's own benchmark table and should be treated as unverified. A 3-point SWE-Bench Pro gap in agentic coding typically translates to meaningfully more successful end-to-end task completions on real repositories, but until independent benchmarks confirm M3's numbers, buyers should treat the GPT-5.5 and Gemini 3.1 Pro comparisons as MiniMax's own framing rather than neutral third-party results.

Is MiniMax M3 open source or proprietary?

MiniMax M3 is open-weight: MiniMax released the model's weights and technical report on Hugging Face (MiniMaxAI/MiniMax-M3) and GitHub (MiniMax-AI/MiniMax-M3) within about ten days of the June 1, 2026 API launch, under a MiniMax Community License similar in spirit to the modified-MIT terms used for the predecessor M2.7. This license is closer to 'open weights' than to a fully permissive open-source license like Apache 2.0 or MIT, and may restrict certain commercial uses; developers should review the license file on Hugging Face before commercial deployment. Self-hosting the full 229.9B-parameter model at BF16 requires roughly 480GB or more of GPU memory, realistically a multi-node cluster of 8x H100/H200-class GPUs, with FP8 quantization available to reduce that footprint. Alongside the open weights, M3 is also available as a hosted API directly from MiniMax and through third-party providers including OpenRouter, Fireworks AI, and Together AI for teams that prefer not to self-host.

What modalities does MiniMax M3 support?

MiniMax M3 accepts text, image, and video as input, with text as the confirmed output modality, plus support for function calling, structured outputs, and tool-calls. It supports an optional 'thinking' reasoning mode that can be toggled per request at the same per-token price, useful for multi-step coding and agentic tasks. M3 can also operate a desktop computer-use environment directly, scoring 70.06% on OSWorld-Verified. Despite MiniMax marketing M3 as 'natively multimodal' and the company's broader audio product line (Speech 2.8, Music 2.5+), M3 itself does not support audio input or output; voice workflows require pairing it with a separate MiniMax Audio endpoint or third-party ASR/TTS. On multimodal benchmarks, M3 scores 84.8% on Video-MME, 81.4% on VideoMMMU, 75.1% on MMMU Pro, and 80.8% on OmniDocBench, indicating strong document and video understanding relative to text-only open-weight models.

Does MiniMax M3 train on user data?

MiniMax states that customer API inputs are not stored or used for training by default, and customers can opt in if they want their data used for model improvement. However, MiniMax has not published a specific data retention period for M3 separately from its general API terms, nor a dedicated trust center page. No SOC 2 Type II report, ISO 27001 certificate, or HIPAA-eligible tier has been published for MiniMax's platform as of June 2026. MiniMax does offer data residency choices across North America, Europe, and Asia-Pacific processing regions for enterprise customers. M3's formal EU AI Act classification has not been published, though as a general-purpose open-weight model it would likely fall under general-purpose AI provider obligations if offered to EU users. As a Shanghai-headquartered, Hong Kong-listed company, MiniMax also operates under China's Generative AI Measures from the Cyberspace Administration of China. Because M3 is open-weight, self-hosted deployments avoid sending any data to MiniMax at all.

Who is MiniMax M3 best for and who should avoid it?

MiniMax M3 is best for agentic coding teams that need a 1M-token context window at a fraction of frontier pricing ($0.30/$1.20 per 1M tokens versus typical frontier rates), for teams building autonomous browsing or computer-use agents given its 83.5 BrowseComp and 70.06% OSWorld-Verified scores, and for teams that want to self-host or fine-tune an open-weight frontier-class model rather than depend on a closed API. Teams should avoid M3 for voice-first products, since it has no native audio input or output and would need a separate ASR/TTS pipeline such as MiniMax's Speech 2.8. Regulated enterprises requiring SOC 2 Type II, ISO 27001, or HIPAA-eligible vendors should consider Anthropic's Claude or OpenAI's GPT-5.5 instead, since MiniMax has not published equivalent certifications. Teams needing the absolute fastest short-prompt latency may also prefer smaller, latency-optimized models, since M3's roughly 2.59-second time-to-first-token and 54.8 tokens/sec output speed on the MiniMax endpoint trail dedicated low-latency models.

Visit MiniMax M3 Official Page