MiniMax M2.7: Open 230B MoE Agentic Coder at $0.30/M
MiniMax M2.7: 230B MoE (10B active), Mar 2026, 204,800-token context, 56.2% SWE-Bench Pro. Open weights, $0.30 in / $1.20 out per 1M tokens. For coding agents.
MiniMax M2.7 is a 230B-parameter Mixture-of-Experts model (10B active, 256 experts) released March 18, 2026, with a 204,800-token context window and 131,072-token max output, scoring 56.2% on SWE-Bench Pro and an Artificial Analysis Intelligence Index of 50. It costs $0.30 per 1M input tokens and $1.20 per 1M output tokens, ships as open weights (Modified-MIT license, 60GB-457GB quantizations), but runs at just 35-46 tokens per second.
MiniMax M2.7, released March 18, 2026 by MiniMax, is a 230-billion-parameter Mixture-of-Experts model with only 10 billion active parameters and a 204,800-token context window. It scores 56.2% on SWE-Bench Pro, a 23.6-point jump over MiniMax M2.1, and 86.2% on PinchBench, within 1.2 points of Claude Opus 4.6. Priced at $0.30 per 1M input tokens and $1.20 per 1M output tokens, it shipped as open weights under a Modified-MIT license in April 2026.
Provider: MiniMax · Family: MiniMax M2
Context window: 204,800 tokens · Max output: 131,072
Input modalities: text, code, tool-calls · Output: text, code, tool-calls
About MiniMax M2.7
MiniMax-M2.7 is a large language model built by MiniMax, a Shanghai-based AI company, and released on March 18, 2026. It is a Sparse Mixture-of-Experts model with 230 billion total parameters but only 10 billion active per token, spread across 256 experts with 8 activated per pass, using 62 layers and a hidden size of 3072. The architecture combines multi-head causal self-attention with Rotary Position Embeddings (RoPE) and Query-Key RMSNorm, plus top-k expert routing. M2.7 is part of MiniMax's M-series, following M2, M2.1, and M2.5, and preceding the multimodal M3 released June 1, 2026. MiniMax dedicated all of M2.7's training compute to 'Code + Agent' domains, positioning it as a coding and agentic-workflow specialist rather than a general multimodal flagship. On benchmarks, M2.7 scores 56.2% on SWE-Bench Pro, a 23.6-point jump over M2.1's 32.6%, and 57.0% on Terminal-Bench 2, up from M2.1's 47.9%. It reaches 86.2% on PinchBench, within 1.2 points of Claude Opus 4.6, and posts an Artificial Analysis Intelligence Index of 50, eight points above M2.5, putting it ahead of MiMo-V2-Pro (49) and Kimi K2.5 (47) and roughly level with GLM-5 (50). MiniMax also reports a GDPval-AA Elo of 1495. Independently reported scores on graduate-level reasoning benchmarks such as GPQA Diamond are notably lower than its coding scores, reflecting the training compute split toward code and agents. The model has a 204,800-token context window and a 131,072-token maximum output. It uses full attention across that window, so latency rises as context fills, and reviewers note that pushing near the 204,800-token limit feels slow in practice. M2.7 is text-only: it accepts plain text and code, with no native image, audio, or video input or output. Tool use and function calling are fully supported via the documented tool_calling_guide.md schema, using the same syntax as MiniMax M2, and the model is built for agent harnesses including Cline, OpenCode, and Kilo. For image understanding, MiniMax documents a workaround of registering an 'analyze_image' function-call tool that proxies to a vision model such as Claude, GPT, or Gemini. On MiniMax's own API, pricing is $0.30 per 1M input tokens and $1.20 per 1M output tokens, with cached input reads at $0.06 per 1M tokens and cache writes at $0.375 per 1M tokens. A faster 'HighSpeed' variant costs roughly $0.60 input / $2.40 output per 1M tokens. Third-party providers including Novita, Fireworks, and MiniMax itself offer blended pricing around $0.22 per 1M tokens. A daily coding agent workload (1M input / 200K output tokens) costs roughly $0.54 on the standard tier, around 10-20x cheaper than Claude Opus or GPT-5 for similar agentic coding tasks, though M2.7's high verbosity (up to 4x the average output tokens of similarly-sized models on benchmark evals) can offset some of that saving. M2.7 was initially released as a proprietary API model, then published with open weights on Hugging Face and ModelScope in April 2026. The full BF16 release is roughly 457GB, while community GGUF quantizations range down to about 60GB for a 1-bit build, making it runnable on prosumer hardware via vLLM or SGLang (NVIDIA also ships an NVFP4 quantization). Shortly after the open-weight release, MiniMax updated the Hugging Face license from MIT to a 'Modified-MIT' license requiring the company's written authorization for commercial use, which drew criticism from developers who had begun building under the original terms. MiniMax states that customer API inputs are not stored or used for training unless a customer opts in, and the company offers data residency choices across North America, Europe, and Asia-Pacific. No SOC 2 Type II report, ISO 27001 certificate, or HIPAA-eligible tier was found in public sources, and MiniMax has not published a system card or named external red-team partners for M2.7. As a Shanghai-based company, MiniMax also operates under China's Generative AI Measures from the Cyberspace Administration of China. M2.7 is best for budget-conscious teams building agentic coding pipelines, developers who want to self-host an open-weight frontier-adjacent model, and teams already using Cline, OpenCode, or Kilo. It is a poor fit for interactive low-latency chat (35-46 tokens/sec output, ~2.2s time-to-first-token), any task requiring image, audio, or video understanding, and workloads needing top-tier graduate-level reasoning benchmarks like GPQA Diamond or AIME, where competitors with stronger published scores are a better choice. MiniMax's M-series cadence has been rapid: M2.5 in February 2026, M2.7 in March 2026 (open-weighted in April), and M3 on June 1, 2026 with a new MiniMax Sparse Attention (MSA) architecture, a 1-million-token context window, and native multimodality. M2.7 remains available and has not been deprecated, but M3 is now MiniMax's flagship for multimodal and longer-context workloads, while M2.7 continues to serve as the company's cost-efficient, text-first agentic coding option.
Pricing
MiniMax direct API: $0.30 per 1M input tokens, $1.20 per 1M output tokens. Cached input reads cost $0.06 per 1M tokens, cache writes $0.375 per 1M tokens. A faster 'HighSpeed' variant costs about $0.60 input / $2.40 output per 1M tokens. Open weights (Modified-MIT license) can be self-hosted for free with your own compute, but commercial use requires MiniMax's written authorization.
Key Features
- Self-Evolving Training Loop: M2.7 is MiniMax's first model that actively participates in its own development, handling an estimated 30-50% of its own reinforcement learning research workflow.
- 204,800-Token Context Window: Handles large multi-file codebases and long agent transcripts with a 131,072-token max output, though full attention means latency rises near the ceiling.
- Sparse MoE Architecture: 230B total parameters with only 10B active per token across 256 experts (8 active), keeping inference costs low while preserving model capacity.
- Open Weights with Quantization Options: Released on Hugging Face and ModelScope under a Modified-MIT license, with community GGUF builds from 60GB (1-bit) to 457GB (BF16).
- Tool Calling for Agent Harnesses: Supports the documented MiniMax-M2 tool-call syntax for structured function calling, used by agent frameworks like Cline, OpenCode, and Kilo.
Pros
- At $0.30/$1.20 per 1M tokens, it's roughly 10-20x cheaper than Claude Opus or GPT-5 for agentic coding tasks.
- Scores 56.2% on SWE-Bench Pro (+23.6 over M2.1) and 86.2% on PinchBench, within 1.2 points of Claude Opus 4.6.
- Open weights let you self-host with quantizations from 60GB to 457GB, avoiding API lock-in entirely.
- 204,800-token context window with 131,072-token max output covers large codebases and long agent sessions.
Cons
- Output speed of 35-46 tokens/sec with ~2.2s time-to-first-token is sluggish versus a 60-95 t/s median for similarly priced models.
- Text-only: no native image, audio, or video input, unlike the newer MiniMax M3.
- Highly verbose output (up to 4x the average tokens of similarly-sized models on benchmark evals) raises real-world cost and latency.
- April 2026 license change to 'Modified-MIT' requires written authorization from MiniMax for commercial self-hosted deployments.
Benchmarks
- pinchbench: 86.2
- gdpval aa elo: 1495
- swe bench pro: 56.2
- terminal bench 2: 57
- artificial analysis intelligence index: 50
- artificial analysis speed tokens per sec: 45.6
Frequently Asked Questions
What is MiniMax M2.7 and who built it?
MiniMax-M2.7 is a large language model built by MiniMax, a Shanghai-based AI company, and released on March 18, 2026. It is a Sparse Mixture-of-Experts model with 230 billion total parameters but only 10 billion active per token, spread across 256 experts with 8 activated per pass, using 62 layers and a hidden size of 3072. The model is part of MiniMax's M-series, following M2, M2.1, and M2.5, and preceding the multimodal M3 released June 1, 2026. MiniMax designed M2.7 specifically for software engineering, agentic tool use, and office productivity workflows, dedicating its training compute to 'Code + Agent' domains rather than general multimodal ability. On benchmarks, it scores 56.2% on SWE-Bench Pro (a 23.6-point jump over M2.1's 32.6%), 57.0% on Terminal-Bench 2, and an Artificial Analysis Intelligence Index of 50, eight points above M2.5. MiniMax describes M2.7 as its first model that actively participates in its own development cycle, handling an estimated 30-50% of its own reinforcement learning research workflow. It was designed to beat similarly-priced open models from DeepSeek, Qwen, and Kimi on agentic coding while undercutting Claude Opus on price. The model is priced from $0.30 per 1M input tokens with a 204,800-token context window.
How much does MiniMax M2.7 cost per 1M tokens?
On MiniMax's own API, MiniMax-M2.7 costs $0.30 per 1 million input tokens and $1.20 per 1 million output tokens for standard usage. Cached input reads cost $0.06 per 1 million tokens, and cache writes cost $0.375 per 1 million tokens, which significantly cuts costs for repeated system prompts in agent loops. A faster 'HighSpeed' variant is available at roughly $0.60 per 1 million input tokens and $2.40 per 1 million output tokens for lower-latency interactive use. Third-party providers including Novita, Fireworks, and MiniMax itself offer blended pricing around $0.22 per 1 million tokens, making it one of the cheapest frontier-adjacent agentic models available. For example, a coding agent session processing 1 million input tokens and 200,000 output tokens would cost roughly $0.54 on the standard tier. Because M2.7 ships as open weights under a Modified-MIT license, teams with their own GPUs can self-host it for free, though the license requires MiniMax's written authorization for commercial deployments. Compared to Claude Opus or GPT-5, which charge several dollars per million tokens, M2.7 is roughly 10-20x cheaper for similar agentic coding tasks, though it is also notably more verbose, which can offset some of the savings in real-world usage.
What is MiniMax M2.7's context window and max output?
MiniMax-M2.7 has a context window of 204,800 tokens and a maximum output of 131,072 tokens per response. This puts it ahead of many similarly-priced open models on raw context size, and it is large enough to hold sizeable multi-file codebases or long agent transcripts in a single request. However, M2.7 uses full attention across its context window rather than a sparse or linear-attention mechanism, so latency and cost increase as the context fills up, and reviewers note that pushing close to the 204,800-token limit can feel slow in practice. There is no separate extended-context tier; 204,800 tokens is the model's fixed window. For document handling, M2.7 accepts plain text and code files passed directly in the prompt; it has no native PDF or image ingestion, so structured documents must be converted to text first. Compared to its successor, MiniMax M3, which introduces a new MiniMax Sparse Attention (MSA) architecture and a 1-million-token context window, M2.7's 204,800-token window is now the smaller of MiniMax's two current flagship context sizes. Against competitors, M2.7's context window is comparable to MiniMax M2.1's and larger than many budget open models, though smaller than Gemini 2.5 Pro or Claude's largest context tiers.
How does MiniMax M2.7 compare on benchmarks vs Claude Opus and DeepSeek?
MiniMax-M2.7 scores 86.2% on PinchBench, placing it within 1.2 points of Claude Opus 4.6, despite costing a small fraction of Opus's per-token price, which is the headline comparison MiniMax highlights. On SWE-Bench Pro, M2.7 scores 56.2%, a 23.6-point improvement over its predecessor M2.1 (32.6%), and on Terminal-Bench 2 it scores 57.0% versus M2.1's 47.9%. Against DeepSeek's open-weight coding models, M2.7 is positioned as a faster-improving but slower-running alternative: M2.7 generates output at only 35-46 tokens per second, well below the 60-95 tokens-per-second median for similarly-priced open models, including many DeepSeek releases. On the Artificial Analysis Intelligence Index, a composite of reasoning, knowledge, math, and coding, M2.7 scores 50, putting it ahead of MiMo-V2-Pro (49) and Kimi K2.5 (47), and roughly equal to GLM-5 (50). On graduate-level reasoning benchmarks like GPQA Diamond, independently reported scores for M2.7 are notably lower than its coding scores, suggesting the model's training compute was weighted heavily toward code and agent tasks rather than general reasoning. In practice, a 23-point SWE-Bench Pro gap over M2.1 translates to M2.7 completing significantly more real-world coding tasks end-to-end without human intervention. MiniMax has not published GPQA Diamond or AIME 2025 scores with the same prominence as its coding benchmarks, a notable omission for a model marketed on 'intelligence' gains.
Is MiniMax M2.7 open source or proprietary?
MiniMax-M2.7 is open weights, not fully open source. It was initially released as a proprietary API-only model on March 18, 2026, then published with downloadable weights on Hugging Face and ModelScope in April 2026 under what was first listed as an MIT license. Shortly after, MiniMax updated the Hugging Face license to a 'Modified-MIT' license that adds a clause requiring MiniMax's prior written authorization for commercial use, which sparked criticism from developers who had begun building on the original MIT terms. The full BF16 weights are roughly 457GB, while community quantizations bring that down dramatically, including a 1-bit GGUF build at around 60GB, making it accessible to teams with high-end consumer or prosumer GPU setups. Recommended inference runtimes are vLLM and SGLang, with standard Hugging Face Transformers tooling also supported, and NVIDIA has published an NVFP4 quantized version for its platforms. For commercial use beyond research and personal projects, teams should treat the Modified-MIT terms as requiring direct permission from MiniMax, similar to a source-available license rather than a permissive open-source one. There are no separate model variants with different openness levels; M2.7 itself is the open-weight release, while MiniMax's hosted API (and the HighSpeed variant) remain the company's own infrastructure. By contrast, MiniMax M3, released two months later, was also signaled for open-source release, continuing the same pattern.
What modalities does MiniMax M2.7 support?
MiniMax-M2.7 is a text-only model: its only input modality is text (including code), and its only output modality is text, alongside structured tool-call output for function calling. MiniMax deliberately dedicated all of M2.7's training compute to code and agent domains, so unlike many 2026 'multimodal' flagships, it has no native image, audio, or video input or output. Function calling and tool use are fully supported and documented in MiniMax's tool_calling_guide.md, using the same tool-call syntax as the earlier MiniMax M2 model, which lets it identify when an external tool is needed and emit structured parameters. If an application needs image understanding, the documented workaround is to register an 'analyze_image' tool that internally calls a vision-capable model such as Claude, GPT, or Gemini and returns the result as JSON for M2.7 to use. Agent harnesses including Cline, OpenCode, and Kilo support M2.7's tool-calling format, with some earlier tool-calling loop and premature-halting issues from M2-series models reported as improved but not fully resolved in M2.7. There is no computer-use or screen-control capability built into M2.7 itself. For teams that need native image, video, or desktop-operation support, MiniMax positions its newer M3 model (June 2026) as the multimodal option instead. In short, M2.7 trades multimodality for depth in text-based coding and agentic tool use.
Does MiniMax M2.7 train on user data?
MiniMax states that customer inputs sent to its API for inference are not stored or used to train MiniMax's models unless the customer explicitly opts in, according to the company's API privacy policy. Specific data retention windows for the M2.7 endpoint, such as a fixed number of days, are not publicly disclosed in detail, unlike some Western labs that publish exact retention periods. MiniMax offers data residency controls that let enterprise customers choose processing regions, with infrastructure described as spanning North America, Europe, and Asia-Pacific. No SOC 2 Type II report, ISO 27001 certificate, or HIPAA-eligible tier was found in public sources for MiniMax as of mid-2026, and the company has not published a dedicated trust center comparable to Anthropic's or OpenAI's. For GDPR, MiniMax's privacy policy addresses EU users, but independently verified Standard Contractual Clauses or a dedicated EU representative were not confirmed. Because M2.7 is also available as open weights, teams with strict data requirements can self-host the model entirely on their own infrastructure, removing any reliance on MiniMax's API data handling. On third-party providers such as Fireworks, Together.ai, or OpenRouter, data handling follows each provider's own policies rather than MiniMax's directly. As a Shanghai-based company, MiniMax also operates under China's Generative AI Measures from the Cyberspace Administration of China, a separate regulatory framework from GDPR or US privacy law.
Who is MiniMax M2.7 best for and who should avoid it?
MiniMax-M2.7 is best for budget-conscious teams building agentic coding pipelines, where its $0.30/$1.20 per 1M token pricing and 56.2% SWE-Bench Pro score make it one of the cheapest ways to automate real coding tasks. It also suits developers who want to self-host an open-weight, frontier-adjacent model, since the 230B-parameter MoE (10B active) ships in quantizations from 60GB to 457GB runnable on vLLM or SGLang. Teams already using agent harnesses like Cline, OpenCode, or Kilo can plug M2.7 in directly using its documented tool-calling format. Researchers studying self-evolving or self-improving training loops may also find M2.7 interesting, since MiniMax says the model handled 30-50% of its own development workflow. Teams should avoid M2.7 for interactive, low-latency chat applications: its 35-46 tokens-per-second output speed and roughly 2.2-second time-to-first-token, combined with high verbosity (up to 4x the average output tokens of similar models on benchmarks), make it feel sluggish compared to a 60-95 t/s median. It is also the wrong choice for any task requiring image, audio, or video understanding, since it is strictly text-only; MiniMax M3 or a Western multimodal model like Gemini 2.5 Pro would be better fits there. Teams needing graduate-level reasoning or math performance should look at models with stronger published scores on those specific benchmarks, since M2.7's training compute was weighted toward code and agents. Finally, anyone planning commercial self-hosted deployment should confirm licensing terms with MiniMax given the April 2026 Modified-MIT license change.