Name: MiniMax M2.7: Open 230B MoE Agentic Coder at $0.30/M
Brand: MiniMax
Price: 0.30 USD
Availability: InStock

Question 1

What is MiniMax M2.7 and who built it?

Accepted Answer

MiniMax-M2.7 is a large language model built by MiniMax, a Shanghai-based AI company, and released on March 18, 2026. It is a Sparse Mixture-of-Experts model with 230 billion total parameters but only 10 billion active per token, spread across 256 experts with 8 activated per pass, using 62 layers and a hidden size of 3072. The model is part of MiniMax's M-series, following M2, M2.1, and M2.5, and preceding the multimodal M3 released June 1, 2026. MiniMax designed M2.7 specifically for software engineering, agentic tool use, and office productivity workflows, dedicating its training compute to 'Code + Agent' domains rather than general multimodal ability. On benchmarks, it scores 56.2% on SWE-Bench Pro (a 23.6-point jump over M2.1's 32.6%), 57.0% on Terminal-Bench 2, and an Artificial Analysis Intelligence Index of 50, eight points above M2.5. MiniMax describes M2.7 as its first model that actively participates in its own development cycle, handling an estimated 30-50% of its own reinforcement learning research workflow. It was designed to beat similarly-priced open models from DeepSeek, Qwen, and Kimi on agentic coding while undercutting Claude Opus on price. The model is priced from $0.30 per 1M input tokens with a 204,800-token context window.

Question 2

How much does MiniMax M2.7 cost per 1M tokens?

Accepted Answer

On MiniMax's own API, MiniMax-M2.7 costs $0.30 per 1 million input tokens and $1.20 per 1 million output tokens for standard usage. Cached input reads cost $0.06 per 1 million tokens, and cache writes cost $0.375 per 1 million tokens, which significantly cuts costs for repeated system prompts in agent loops. A faster 'HighSpeed' variant is available at roughly $0.60 per 1 million input tokens and $2.40 per 1 million output tokens for lower-latency interactive use. Third-party providers including Novita, Fireworks, and MiniMax itself offer blended pricing around $0.22 per 1 million tokens, making it one of the cheapest frontier-adjacent agentic models available. For example, a coding agent session processing 1 million input tokens and 200,000 output tokens would cost roughly $0.54 on the standard tier. Because M2.7 ships as open weights under a Modified-MIT license, teams with their own GPUs can self-host it for free, though the license requires MiniMax's written authorization for commercial deployments. Compared to Claude Opus or GPT-5, which charge several dollars per million tokens, M2.7 is roughly 10-20x cheaper for similar agentic coding tasks, though it is also notably more verbose, which can offset some of the savings in real-world usage.

Question 3

What is MiniMax M2.7's context window and max output?

Accepted Answer

MiniMax-M2.7 has a context window of 204,800 tokens and a maximum output of 131,072 tokens per response. This puts it ahead of many similarly-priced open models on raw context size, and it is large enough to hold sizeable multi-file codebases or long agent transcripts in a single request. However, M2.7 uses full attention across its context window rather than a sparse or linear-attention mechanism, so latency and cost increase as the context fills up, and reviewers note that pushing close to the 204,800-token limit can feel slow in practice. There is no separate extended-context tier; 204,800 tokens is the model's fixed window. For document handling, M2.7 accepts plain text and code files passed directly in the prompt; it has no native PDF or image ingestion, so structured documents must be converted to text first. Compared to its successor, MiniMax M3, which introduces a new MiniMax Sparse Attention (MSA) architecture and a 1-million-token context window, M2.7's 204,800-token window is now the smaller of MiniMax's two current flagship context sizes. Against competitors, M2.7's context window is comparable to MiniMax M2.1's and larger than many budget open models, though smaller than Gemini 2.5 Pro or Claude's largest context tiers.

Question 4

How does MiniMax M2.7 compare on benchmarks vs Claude Opus and DeepSeek?

Accepted Answer

MiniMax-M2.7 scores 86.2% on PinchBench, placing it within 1.2 points of Claude Opus 4.6, despite costing a small fraction of Opus's per-token price, which is the headline comparison MiniMax highlights. On SWE-Bench Pro, M2.7 scores 56.2%, a 23.6-point improvement over its predecessor M2.1 (32.6%), and on Terminal-Bench 2 it scores 57.0% versus M2.1's 47.9%. Against DeepSeek's open-weight coding models, M2.7 is positioned as a faster-improving but slower-running alternative: M2.7 generates output at only 35-46 tokens per second, well below the 60-95 tokens-per-second median for similarly-priced open models, including many DeepSeek releases. On the Artificial Analysis Intelligence Index, a composite of reasoning, knowledge, math, and coding, M2.7 scores 50, putting it ahead of MiMo-V2-Pro (49) and Kimi K2.5 (47), and roughly equal to GLM-5 (50). On graduate-level reasoning benchmarks like GPQA Diamond, independently reported scores for M2.7 are notably lower than its coding scores, suggesting the model's training compute was weighted heavily toward code and agent tasks rather than general reasoning. In practice, a 23-point SWE-Bench Pro gap over M2.1 translates to M2.7 completing significantly more real-world coding tasks end-to-end without human intervention. MiniMax has not published GPQA Diamond or AIME 2025 scores with the same prominence as its coding benchmarks, a notable omission for a model marketed on 'intelligence' gains.

Question 5

Is MiniMax M2.7 open source or proprietary?

Accepted Answer

MiniMax-M2.7 is open weights, not fully open source. It was initially released as a proprietary API-only model on March 18, 2026, then published with downloadable weights on Hugging Face and ModelScope in April 2026 under what was first listed as an MIT license. Shortly after, MiniMax updated the Hugging Face license to a 'Modified-MIT' license that adds a clause requiring MiniMax's prior written authorization for commercial use, which sparked criticism from developers who had begun building on the original MIT terms. The full BF16 weights are roughly 457GB, while community quantizations bring that down dramatically, including a 1-bit GGUF build at around 60GB, making it accessible to teams with high-end consumer or prosumer GPU setups. Recommended inference runtimes are vLLM and SGLang, with standard Hugging Face Transformers tooling also supported, and NVIDIA has published an NVFP4 quantized version for its platforms. For commercial use beyond research and personal projects, teams should treat the Modified-MIT terms as requiring direct permission from MiniMax, similar to a source-available license rather than a permissive open-source one. There are no separate model variants with different openness levels; M2.7 itself is the open-weight release, while MiniMax's hosted API (and the HighSpeed variant) remain the company's own infrastructure. By contrast, MiniMax M3, released two months later, was also signaled for open-source release, continuing the same pattern.

Question 6

What modalities does MiniMax M2.7 support?

Accepted Answer

MiniMax-M2.7 is a text-only model: its only input modality is text (including code), and its only output modality is text, alongside structured tool-call output for function calling. MiniMax deliberately dedicated all of M2.7's training compute to code and agent domains, so unlike many 2026 'multimodal' flagships, it has no native image, audio, or video input or output. Function calling and tool use are fully supported and documented in MiniMax's tool_calling_guide.md, using the same tool-call syntax as the earlier MiniMax M2 model, which lets it identify when an external tool is needed and emit structured parameters. If an application needs image understanding, the documented workaround is to register an 'analyze_image' tool that internally calls a vision-capable model such as Claude, GPT, or Gemini and returns the result as JSON for M2.7 to use. Agent harnesses including Cline, OpenCode, and Kilo support M2.7's tool-calling format, with some earlier tool-calling loop and premature-halting issues from M2-series models reported as improved but not fully resolved in M2.7. There is no computer-use or screen-control capability built into M2.7 itself. For teams that need native image, video, or desktop-operation support, MiniMax positions its newer M3 model (June 2026) as the multimodal option instead. In short, M2.7 trades multimodality for depth in text-based coding and agentic tool use.

Question 7

Does MiniMax M2.7 train on user data?

Accepted Answer

MiniMax states that customer inputs sent to its API for inference are not stored or used to train MiniMax's models unless the customer explicitly opts in, according to the company's API privacy policy. Specific data retention windows for the M2.7 endpoint, such as a fixed number of days, are not publicly disclosed in detail, unlike some Western labs that publish exact retention periods. MiniMax offers data residency controls that let enterprise customers choose processing regions, with infrastructure described as spanning North America, Europe, and Asia-Pacific. No SOC 2 Type II report, ISO 27001 certificate, or HIPAA-eligible tier was found in public sources for MiniMax as of mid-2026, and the company has not published a dedicated trust center comparable to Anthropic's or OpenAI's. For GDPR, MiniMax's privacy policy addresses EU users, but independently verified Standard Contractual Clauses or a dedicated EU representative were not confirmed. Because M2.7 is also available as open weights, teams with strict data requirements can self-host the model entirely on their own infrastructure, removing any reliance on MiniMax's API data handling. On third-party providers such as Fireworks, Together.ai, or OpenRouter, data handling follows each provider's own policies rather than MiniMax's directly. As a Shanghai-based company, MiniMax also operates under China's Generative AI Measures from the Cyberspace Administration of China, a separate regulatory framework from GDPR or US privacy law.

Question 8

Who is MiniMax M2.7 best for and who should avoid it?

Accepted Answer

MiniMax-M2.7 is best for budget-conscious teams building agentic coding pipelines, where its $0.30/$1.20 per 1M token pricing and 56.2% SWE-Bench Pro score make it one of the cheapest ways to automate real coding tasks. It also suits developers who want to self-host an open-weight, frontier-adjacent model, since the 230B-parameter MoE (10B active) ships in quantizations from 60GB to 457GB runnable on vLLM or SGLang. Teams already using agent harnesses like Cline, OpenCode, or Kilo can plug M2.7 in directly using its documented tool-calling format. Researchers studying self-evolving or self-improving training loops may also find M2.7 interesting, since MiniMax says the model handled 30-50% of its own development workflow. Teams should avoid M2.7 for interactive, low-latency chat applications: its 35-46 tokens-per-second output speed and roughly 2.2-second time-to-first-token, combined with high verbosity (up to 4x the average output tokens of similar models on benchmarks), make it feel sluggish compared to a 60-95 t/s median. It is also the wrong choice for any task requiring image, audio, or video understanding, since it is strictly text-only; MiniMax M3 or a Western multimodal model like Gemini 2.5 Pro would be better fits there. Teams needing graduate-level reasoning or math performance should look at models with stronger published scores on those specific benchmarks, since M2.7's training compute was weighted toward code and agents. Finally, anyone planning commercial self-hosted deployment should confirm licensing terms with MiniMax given the April 2026 Modified-MIT license change.

MiniMax M2.7: Open 230B MoE Agentic Coder at $0.30/M

About MiniMax M2.7

Pricing

Key Features

Pros

Cons

Benchmarks

Frequently Asked Questions