Name: MiniMax M3: 1M Context & 59% SWE-Bench Pro (2026)
Brand: MiniMax
Price: 0.30 USD
Availability: InStock

Question 1

What is MiniMax M3 and who built it?

Accepted Answer

MiniMax M3 is a large language model released on June 1, 2026 by MiniMax, a Shanghai-based AI lab listed on the Hong Kong Stock Exchange (0100.HK). It is a Mixture-of-Experts model with roughly 229.9 billion total parameters and 9.8 billion active parameters per token across 256 fine-grained experts. Its headline architectural feature is MiniMax Sparse Attention (MSA), a new sparse attention scheme built on a Grouped-Query Attention backbone that makes a 1,048,576-token context window computationally practical. M3 scores 59.0% on SWE-Bench Pro, which MiniMax says beats GPT-5.5 and Gemini 3.1 Pro on the same benchmark, plus 66.0% on Terminal-Bench 2.1 and 83.5 on BrowseComp. It succeeds MiniMax-M2.7 (March 2026) and was designed to combine frontier agentic coding, a 1M-token context window, and native multimodal input in a single model. M3 sits at the top of MiniMax's M-series lineup and is positioned to compete directly with GPT-5.5, Gemini 3.1 Pro, and Claude Opus 4.7 on agentic and coding benchmarks.

Question 2

How much does MiniMax M3 cost per 1M tokens?

Accepted Answer

MiniMax M3's standard API pricing is $0.30 per 1M input tokens and $1.20 per 1M output tokens for prompts up to 512K input tokens, confirmed live on both OpenRouter and MiniMax's own platform as of June 2026. This is a launch promotional rate; MiniMax has indicated the list price will double to $0.60 input / $2.40 output once the promotion ends. A separate, higher long-context rate applies above 512K input tokens, though MiniMax has not published the exact multiplier. At the promotional rate, summarizing a 100K-token document costs about $0.03, a daily coding-agent run of 1M input and 200K output tokens costs about $0.54, and a full 1M-token context analysis with a 50K-token response costs roughly $0.38. MiniMax also offers monthly Token Plans (Plus $20, Max $50, Ultra $120) that pool usage across text, image, speech, and music models. This pricing is roughly 1/20th of comparable frontier model pricing from OpenAI or Google. Because M3 is open-weight, self-hosting is also possible for teams with the GPU capacity, eliminating per-token costs entirely.

Question 3

What is MiniMax M3's context window and max output?

Accepted Answer

MiniMax M3 has a context window of 1,048,576 tokens (1M) and a maximum output of 512,000 tokens, both roughly 5x the 204,800-token context window of its predecessor, MiniMax-M2.7. This is made usable by MiniMax Sparse Attention (MSA), which MiniMax reports delivers a 9.7x speedup in prefill and a 15.6x speedup in decode at 1M tokens compared to M2's full-attention design, cutting per-token compute at full context to roughly 1/20th of the previous generation. MSA uses a Grouped-Query Attention backbone with block-level selection over real, uncompressed key-value pairs, differing from DeepSeek's Multi-head Latent Attention approach. Independent measurements from Artificial Analysis put M3's time-to-first-token around 2.59 seconds and output speed around 54.8 tokens per second on the MiniMax-hosted endpoint. A separate, higher-priced long-context tier applies for inputs above 512K tokens. M3's 1M-token window is among the largest available in any open-weight model as of mid-2026, exceeding GPT-5.5 and matching or exceeding Gemini 3.1 Pro's long-context tier.

Question 4

How does MiniMax M3 compare on benchmarks vs GPT-5.5 and Gemini 3.1 Pro?

Accepted Answer

MiniMax reports that M3 scores 59.0% on SWE-Bench Pro, which it says surpasses both GPT-5.5 and Gemini 3.1 Pro on the same benchmark, though MiniMax did not publish the exact competitor scores in its release blog. On BrowseComp, an autonomous-browsing benchmark, M3 scores 83.5, ahead of Claude Opus 4.7's reported 79.3. M3 also scores 66.0% on Terminal-Bench 2.1, 74.2% on MCP Atlas (tool use), and 70.06% on OSWorld-Verified (computer-use). These are vendor-reported figures from MiniMax's own June 2026 release blog and have not yet been independently reproduced by third-party leaderboards. Some aggregator sites cite a GPQA Diamond score near 92.9% for M3, but that figure is absent from MiniMax's own benchmark table and should be treated as unverified. A 3-point SWE-Bench Pro gap in agentic coding typically translates to meaningfully more successful end-to-end task completions on real repositories, but until independent benchmarks confirm M3's numbers, buyers should treat the GPT-5.5 and Gemini 3.1 Pro comparisons as MiniMax's own framing rather than neutral third-party results.

Question 5

Is MiniMax M3 open source or proprietary?

Accepted Answer

MiniMax M3 is open-weight: MiniMax released the model's weights and technical report on Hugging Face (MiniMaxAI/MiniMax-M3) and GitHub (MiniMax-AI/MiniMax-M3) within about ten days of the June 1, 2026 API launch, under a MiniMax Community License similar in spirit to the modified-MIT terms used for the predecessor M2.7. This license is closer to 'open weights' than to a fully permissive open-source license like Apache 2.0 or MIT, and may restrict certain commercial uses; developers should review the license file on Hugging Face before commercial deployment. Self-hosting the full 229.9B-parameter model at BF16 requires roughly 480GB or more of GPU memory, realistically a multi-node cluster of 8x H100/H200-class GPUs, with FP8 quantization available to reduce that footprint. Alongside the open weights, M3 is also available as a hosted API directly from MiniMax and through third-party providers including OpenRouter, Fireworks AI, and Together AI for teams that prefer not to self-host.

Question 6

What modalities does MiniMax M3 support?

Accepted Answer

MiniMax M3 accepts text, image, and video as input, with text as the confirmed output modality, plus support for function calling, structured outputs, and tool-calls. It supports an optional 'thinking' reasoning mode that can be toggled per request at the same per-token price, useful for multi-step coding and agentic tasks. M3 can also operate a desktop computer-use environment directly, scoring 70.06% on OSWorld-Verified. Despite MiniMax marketing M3 as 'natively multimodal' and the company's broader audio product line (Speech 2.8, Music 2.5+), M3 itself does not support audio input or output; voice workflows require pairing it with a separate MiniMax Audio endpoint or third-party ASR/TTS. On multimodal benchmarks, M3 scores 84.8% on Video-MME, 81.4% on VideoMMMU, 75.1% on MMMU Pro, and 80.8% on OmniDocBench, indicating strong document and video understanding relative to text-only open-weight models.

Question 7

Does MiniMax M3 train on user data?

Accepted Answer

MiniMax states that customer API inputs are not stored or used for training by default, and customers can opt in if they want their data used for model improvement. However, MiniMax has not published a specific data retention period for M3 separately from its general API terms, nor a dedicated trust center page. No SOC 2 Type II report, ISO 27001 certificate, or HIPAA-eligible tier has been published for MiniMax's platform as of June 2026. MiniMax does offer data residency choices across North America, Europe, and Asia-Pacific processing regions for enterprise customers. M3's formal EU AI Act classification has not been published, though as a general-purpose open-weight model it would likely fall under general-purpose AI provider obligations if offered to EU users. As a Shanghai-headquartered, Hong Kong-listed company, MiniMax also operates under China's Generative AI Measures from the Cyberspace Administration of China. Because M3 is open-weight, self-hosted deployments avoid sending any data to MiniMax at all.

Question 8

Who is MiniMax M3 best for and who should avoid it?

Accepted Answer

MiniMax M3 is best for agentic coding teams that need a 1M-token context window at a fraction of frontier pricing ($0.30/$1.20 per 1M tokens versus typical frontier rates), for teams building autonomous browsing or computer-use agents given its 83.5 BrowseComp and 70.06% OSWorld-Verified scores, and for teams that want to self-host or fine-tune an open-weight frontier-class model rather than depend on a closed API. Teams should avoid M3 for voice-first products, since it has no native audio input or output and would need a separate ASR/TTS pipeline such as MiniMax's Speech 2.8. Regulated enterprises requiring SOC 2 Type II, ISO 27001, or HIPAA-eligible vendors should consider Anthropic's Claude or OpenAI's GPT-5.5 instead, since MiniMax has not published equivalent certifications. Teams needing the absolute fastest short-prompt latency may also prefer smaller, latency-optimized models, since M3's roughly 2.59-second time-to-first-token and 54.8 tokens/sec output speed on the MiniMax endpoint trail dedicated low-latency models.

MiniMax M3

MiniMax M3: 1M Context & 59% SWE-Bench Pro (2026)

About MiniMax M3

Pricing

Key Features

Pros

Cons

Benchmarks

Frequently Asked Questions