Mistral Large 3: 675B MoE, Apache 2.0, $0.50/M (2026)
Mistral Large 3 (Dec 2025) is a 675B MoE model (41B active) with 256K context and 92% HumanEval. Apache 2.0 open weights, priced at $0.50/$1.50 per 1M tokens.
Mistral Large 3 (mistral-large-2512) is Mistral AI's December 2, 2025 flagship return to Mixture-of-Experts, with 675B total/41B active parameters, a 256K context window, ~92% HumanEval, and 43.9% GPQA Diamond. It costs $0.50 per 1M input tokens and $1.50 per 1M output tokens, ships under Apache 2.0 with weights on Hugging Face, and succeeds Mistral Large 2 (November 2024).
Mistral Large 3, released by Mistral AI on December 2, 2025, is a 675B Mixture-of-Experts model (41B active parameters) with a 256K-token context window, scoring 92% on HumanEval and 43.9% on GPQA Diamond. Priced at $0.50 per million input tokens and $1.50 per million output tokens, it ships under the Apache 2.0 license with downloadable weights.
Provider: Mistral AI · Family: Mistral Large
Context window: 256,000 tokens · Max output: 256,000
Input modalities: text, image · Output: text
About Mistral Large 3
Mistral Large 3 (API name mistral-large-2512) is Mistral AI's flagship general-purpose model, released on December 2, 2025. It marks Mistral's return to Mixture-of-Experts architecture after the Mixtral series, using a granular MoE design with 41 billion active parameters and 675 billion total parameters, trained from the ground up on 3,000 NVIDIA H200 GPUs. It succeeds Mistral Large 2 (November 2024) as the top of Mistral's lineup and was released alongside the broader 'Mistral 3' family, including the Ministral 3 small models, all under the Apache 2.0 license. On benchmarks, Mistral Large 3 scores roughly 85.5% on an 8-language MMLU evaluation and in the low-to-mid 80s on MMLU-Pro. On HumanEval it reaches approximately 92% pass@1, close to GPT-4o's roughly 95%. Its LMArena Elo of approximately 1418 ranks it #2 among open-weight non-reasoning models and #6 overall on the leaderboard at the time of evaluation. Its clearest weak spot is graduate-level science reasoning: GPQA Diamond comes in at 43.9%, well behind DeepSeek-V3.2 and Kimi K2-Thinking, which score in the 70-85% range on the same benchmark, nearly double Mistral Large 3's result. SimpleQA factual accuracy was measured at approximately 23.8%. Mistral has not published SWE-bench Verified, AIME 2025, or ARC-AGI 2 scores for this model. The model ships with a 256,000-token context window for both input and output, which Mistral describes as engineered for 'deep endurance tasks' such as processing long documents, codebases, or multi-turn agent sessions without truncation. This doubles the 128K window of Mistral Medium 3 and matches Mistral Medium 3.5's later context size. Mistral Large 3 is multimodal: it accepts text and up to 8 images simultaneously for cross-modal analysis, supports document OCR through chat completions, and natively handles 40+ languages. On the API side it supports function calling and Mistral's built-in tools framework for agentic workflows, with structured output support inherited from the broader Mistral 3 family. Pricing on Mistral's own La Plateforme API is $0.50 per million input tokens and $1.50 per million output tokens, confirmed in Mistral's official model documentation. This makes Large 3 substantially cheaper per token than most proprietary frontier models while offering open weights. A long-document analysis workload processing 1M input and 100K output tokens per day would cost roughly $0.65/day on Mistral's API; a multimodal agent loop running 3M input and 600K output tokens per day would cost around $2.40/day. As an Apache 2.0 open-weight model, Mistral Large 3 is available for download on Hugging Face in both base (Mistral-Large-3-675B-Base-2512) and instruct (Mistral-Large-3-675B-Instruct-2512) forms, including quantized NVFP4 variants. It is also available through Amazon Bedrock (first cloud provider to offer it), Microsoft Azure AI Foundry, IBM watsonx, Google Cloud Vertex AI, Fireworks, Together AI, and OpenRouter. Self-hosting is demanding: because the MoE router must read all expert weights to select which 41B activate per token, all 675B parameters must fit in VRAM, requiring roughly 710GB at FP16 (plus KV cache) or roughly 355GB at 4-bit quantization, typically spread across multiple GPU nodes with vLLM and expert parallelism. Mistral AI is based in Paris and operates under EU data protection law. Technical and governance documentation for Mistral Large 3 is published on Mistral's AI Governance Hub, though a detailed public system card with HarmBench-style refusal or jailbreak-resistance figures specific to this model was not found. The exact training data cutoff has not been formally published, though given the December 2, 2025 release date it is reasonably estimated to be late 2025. Mistral Large 3 is best suited for organizations that want a frontier-scale, Apache 2.0-licensed model they can self-host or fine-tune without restriction, for long-context multimodal analysis up to 256K tokens, and for cost-sensitive high-volume API usage where $0.50/$1.50 per million tokens beats most closed alternatives. It is a poor fit for graduate-level scientific reasoning given its 43.9% GPQA Diamond score, and for teams without access to multi-GPU infrastructure capable of holding 355GB+ of weights in VRAM for self-hosted deployment. A dedicated reasoning variant of Large 3 was announced alongside the December 2025 launch but had not shipped as of Mistral's April 2026 product wave; Mistral Small 4 and Mistral Medium 3.5 were the first models in the family to ship a configurable reasoning-effort parameter, suggesting Large 3's reasoning variant will follow the same approach when released.
Pricing
$0.50 per 1M input tokens and $1.50 per 1M output tokens on Mistral La Plateforme, confirmed in Mistral's official model docs. Open Apache 2.0 weights also allow free self-hosting (infrastructure cost only).
Key Features
- 675B MoE, 41B Active: Granular Mixture-of-Experts architecture, trained from scratch on 3,000 NVIDIA H200 GPUs, marking Mistral's return to MoE after Mixtral.
- Apache 2.0 Open Weights: Base and instruct checkpoints downloadable on Hugging Face with unrestricted commercial use, modification, and redistribution.
- 256K Context Window: Handles long documents, codebases, and multi-turn agent sessions for both input and output without truncation.
- Multimodal: Text + Up to 8 Images: Accepts up to 8 images per request for cross-modal analysis, plus document OCR via chat completions, across 40+ languages.
- Function Calling & Built-In Tools: Native function calling and Mistral's built-in tools framework support multi-step agentic workflows.
Pros
- Apache 2.0 open weights at frontier scale (675B total, 41B active), free to self-host and fine-tune.
- 256K-token context window for both input and output, double Mistral Medium 3's 128K.
- ~92% HumanEval pass@1 at $0.50/$1.50 per 1M tokens, far cheaper than comparable proprietary models.
Cons
- GPQA Diamond of 43.9% trails DeepSeek-V3.2 and Kimi K2-Thinking, which score 70-85% on the same benchmark.
- Self-hosting needs roughly 355GB+ VRAM at 4-bit quantization, even though only 41B of 675B parameters activate per token.
- No reasoning-effort mode at launch; the announced reasoning variant had not shipped as of April 2026.
Benchmarks
- mmlu: 85.5
- mmlu pro: 81
- humaneval: 92
- lmarena elo: 1418
- gpqa diamond: 43.9
- artificial analysis speed tokens per sec: 52
Frequently Asked Questions
What is Mistral Large 3 and who built it?
Mistral Large 3 (API name mistral-large-2512) is Mistral AI's flagship general-purpose model, released on December 2, 2025 by the Paris-based lab. It marks Mistral's return to Mixture-of-Experts architecture after the original Mixtral series, using a granular MoE design with 41 billion active parameters and 675 billion total parameters, trained from scratch on 3,000 NVIDIA H200 GPUs. It succeeds Mistral Large 2 (November 2024) at the top of Mistral's lineup and launched alongside the broader 'Mistral 3' family, including the Ministral 3 small models. On launch benchmarks it scored roughly 85.5% on an 8-language MMLU evaluation, about 92% on HumanEval, and achieved an LMArena Elo of approximately 1418, ranking #2 among open-weight non-reasoning models. Unlike Mistral Medium 3, Large 3 is released under the Apache 2.0 license with downloadable weights.
How much does Mistral Large 3 cost per 1M tokens?
Mistral Large 3 costs $0.50 per 1 million input tokens and $1.50 per 1 million output tokens on Mistral's La Plateforme API, a price confirmed in Mistral's official model documentation. The same pricing applies on Amazon Bedrock and Azure AI Foundry. A long-document analysis workload processing 1M input and 100K output tokens per day would cost roughly $0.65 per day. A multimodal agent loop running 3M input and 600K output tokens per day would cost around $2.40 per day. Because the model is released under Apache 2.0, organizations with sufficient GPU infrastructure can also self-host it for free, paying only for compute; this requires roughly 355GB of VRAM at 4-bit quantization or about 710GB at FP16, typically spread across a multi-node GPU cluster. No prompt-caching discount is currently published for this model.
What is Mistral Large 3's context window and max output?
Mistral Large 3 has a 256,000-token context window for both input and output, according to its model card and independent benchmarking sites. This is double Mistral Medium 3's 128K window and matches the later Mistral Medium 3.5. Mistral describes the 256K window as engineered for 'deep endurance tasks', meaning long documents, large codebases, or extended multi-turn agent sessions can be processed in a single request without chunking. There is no documented separate extended-context tier; 256K is the standard window for all deployments of this model, whether via the API, Amazon Bedrock, Azure AI Foundry, or self-hosted. For multimodal requests, the context budget is shared between text tokens and the up to 8 images the model can process per request.
How does Mistral Large 3 compare on benchmarks vs DeepSeek and Llama 4?
Mistral Large 3 scores about 92% on HumanEval, close to GPT-4o's roughly 95%, and an LMArena Elo of approximately 1418, ranking #2 among open-weight non-reasoning models and #6 overall at the time of evaluation, ahead of DeepSeek-3.1 and Kimi-K2 on general benchmarks. However, on GPQA Diamond, a graduate-level science reasoning benchmark, Large 3 scores only 43.9%, while DeepSeek-V3.2 and Kimi K2-Thinking score in the 70-85% range, nearly double Mistral's result. Mistral has not published SWE-bench Verified, AIME 2025, or ARC-AGI 2 scores for Large 3, making direct comparison on agentic coding and abstract reasoning incomplete. In practice, this means Large 3 is highly competitive for general knowledge, coding, and long-context tasks, but teams working on hard scientific reasoning problems should expect DeepSeek's reasoning-tuned models to outperform it significantly.
Is Mistral Large 3 open source or proprietary?
Mistral Large 3 is open-weight under the Apache 2.0 license, one of the most permissive open-source licenses, allowing unrestricted commercial use, modification, and redistribution with no copyleft requirements. Both base (Mistral-Large-3-675B-Base-2512) and instruction-tuned (Mistral-Large-3-675B-Instruct-2512) checkpoints are downloadable from Hugging Face, including quantized NVFP4 variants. Running the full model requires roughly 710GB of VRAM at FP16 or about 355GB at 4-bit quantization, because the MoE router must access all 675B parameters even though only 41B activate per token; this typically means a multi-node GPU cluster using vLLM with expert parallelism. This is a notable contrast to Mistral Medium 3, which remains proprietary and API-only under Mistral's Commercial License.
What modalities does Mistral Large 3 support?
Mistral Large 3 accepts text and image input, supporting up to 8 images per request for cross-modal analysis, and produces text output. It supports document OCR through its chat completions API and natively handles 40+ languages. On the agentic side, it supports native function calling and Mistral's built-in tools framework, enabling multi-step tool-use workflows, plus structured JSON output consistent with the rest of the Mistral 3 family. The model does not support audio input or output, or video input. There is no dedicated reasoning mode: unlike Mistral Medium 3.5 and Mistral Small 4, Large 3 shipped without a configurable reasoning-effort parameter, though Mistral announced a reasoning variant of Large 3 that had not shipped as of April 2026.
Does Mistral Large 3 train on user data?
Mistral's published policy is that API inputs and outputs sent to Mistral Large 3 via La Plateforme are not used to train future models unless a customer opts in. Because the model is also released as Apache 2.0 open weights, organizations that self-host it keep all data within their own infrastructure with no data leaving to Mistral at all. Mistral AI is based in Paris and operates under EU data protection law, giving the hosted API a GDPR-aligned baseline with EU data residency. Under the EU AI Act, Mistral Large 3 falls under the general-purpose AI model (GPAI) category, which carries documentation and transparency obligations. Mistral has not published SOC 2 Type II or ISO 27001 certification details specific to this model, and it is not marketed as HIPAA-eligible. On Amazon Bedrock or Azure AI Foundry, data handling follows each cloud provider's standard tenant-isolation policies layered on top of Mistral's base no-training commitment.
Who is Mistral Large 3 best for and who should avoid it?
Mistral Large 3 is best for enterprises that want a frontier-scale, Apache 2.0-licensed model they can self-host, fine-tune, or run air-gapped without commercial restrictions, for teams building long-context multimodal tools that need the full 256K window for documents or codebases, and for cost-sensitive high-volume API users who want roughly GPT-4o-level coding performance (92% HumanEval) at $0.50/$1.50 per million tokens. It should be avoided for graduate-level scientific reasoning tasks, where its 43.9% GPQA Diamond score is roughly half of DeepSeek-V3.2 and Kimi K2-Thinking's 70-85%; those teams should pick a reasoning-tuned model instead. It's also a poor fit for teams without multi-GPU infrastructure, since self-hosting requires roughly 355GB+ of VRAM even at 4-bit quantization, and for workloads that need visible chain-of-thought reasoning, since no reasoning-effort mode had shipped for Large 3 as of April 2026.