Name: Mistral Large 3: 675B MoE, Apache 2.0, $0.50/M (2026)
Brand: Mistral AI
Price: 0.50 USD
Availability: InStock

Question 1

What is Mistral Large 3 and who built it?

Accepted Answer

Mistral Large 3 (API name mistral-large-2512) is Mistral AI's flagship general-purpose model, released on December 2, 2025 by the Paris-based lab. It marks Mistral's return to Mixture-of-Experts architecture after the original Mixtral series, using a granular MoE design with 41 billion active parameters and 675 billion total parameters, trained from scratch on 3,000 NVIDIA H200 GPUs. It succeeds Mistral Large 2 (November 2024) at the top of Mistral's lineup and launched alongside the broader 'Mistral 3' family, including the Ministral 3 small models. On launch benchmarks it scored roughly 85.5% on an 8-language MMLU evaluation, about 92% on HumanEval, and achieved an LMArena Elo of approximately 1418, ranking #2 among open-weight non-reasoning models. Unlike Mistral Medium 3, Large 3 is released under the Apache 2.0 license with downloadable weights.

Question 2

How much does Mistral Large 3 cost per 1M tokens?

Accepted Answer

Mistral Large 3 costs $0.50 per 1 million input tokens and $1.50 per 1 million output tokens on Mistral's La Plateforme API, a price confirmed in Mistral's official model documentation. The same pricing applies on Amazon Bedrock and Azure AI Foundry. A long-document analysis workload processing 1M input and 100K output tokens per day would cost roughly $0.65 per day. A multimodal agent loop running 3M input and 600K output tokens per day would cost around $2.40 per day. Because the model is released under Apache 2.0, organizations with sufficient GPU infrastructure can also self-host it for free, paying only for compute; this requires roughly 355GB of VRAM at 4-bit quantization or about 710GB at FP16, typically spread across a multi-node GPU cluster. No prompt-caching discount is currently published for this model.

Question 3

What is Mistral Large 3's context window and max output?

Accepted Answer

Mistral Large 3 has a 256,000-token context window for both input and output, according to its model card and independent benchmarking sites. This is double Mistral Medium 3's 128K window and matches the later Mistral Medium 3.5. Mistral describes the 256K window as engineered for 'deep endurance tasks', meaning long documents, large codebases, or extended multi-turn agent sessions can be processed in a single request without chunking. There is no documented separate extended-context tier; 256K is the standard window for all deployments of this model, whether via the API, Amazon Bedrock, Azure AI Foundry, or self-hosted. For multimodal requests, the context budget is shared between text tokens and the up to 8 images the model can process per request.

Question 4

How does Mistral Large 3 compare on benchmarks vs DeepSeek and Llama 4?

Accepted Answer

Mistral Large 3 scores about 92% on HumanEval, close to GPT-4o's roughly 95%, and an LMArena Elo of approximately 1418, ranking #2 among open-weight non-reasoning models and #6 overall at the time of evaluation, ahead of DeepSeek-3.1 and Kimi-K2 on general benchmarks. However, on GPQA Diamond, a graduate-level science reasoning benchmark, Large 3 scores only 43.9%, while DeepSeek-V3.2 and Kimi K2-Thinking score in the 70-85% range, nearly double Mistral's result. Mistral has not published SWE-bench Verified, AIME 2025, or ARC-AGI 2 scores for Large 3, making direct comparison on agentic coding and abstract reasoning incomplete. In practice, this means Large 3 is highly competitive for general knowledge, coding, and long-context tasks, but teams working on hard scientific reasoning problems should expect DeepSeek's reasoning-tuned models to outperform it significantly.

Question 5

Is Mistral Large 3 open source or proprietary?

Accepted Answer

Mistral Large 3 is open-weight under the Apache 2.0 license, one of the most permissive open-source licenses, allowing unrestricted commercial use, modification, and redistribution with no copyleft requirements. Both base (Mistral-Large-3-675B-Base-2512) and instruction-tuned (Mistral-Large-3-675B-Instruct-2512) checkpoints are downloadable from Hugging Face, including quantized NVFP4 variants. Running the full model requires roughly 710GB of VRAM at FP16 or about 355GB at 4-bit quantization, because the MoE router must access all 675B parameters even though only 41B activate per token; this typically means a multi-node GPU cluster using vLLM with expert parallelism. This is a notable contrast to Mistral Medium 3, which remains proprietary and API-only under Mistral's Commercial License.

Question 6

What modalities does Mistral Large 3 support?

Accepted Answer

Mistral Large 3 accepts text and image input, supporting up to 8 images per request for cross-modal analysis, and produces text output. It supports document OCR through its chat completions API and natively handles 40+ languages. On the agentic side, it supports native function calling and Mistral's built-in tools framework, enabling multi-step tool-use workflows, plus structured JSON output consistent with the rest of the Mistral 3 family. The model does not support audio input or output, or video input. There is no dedicated reasoning mode: unlike Mistral Medium 3.5 and Mistral Small 4, Large 3 shipped without a configurable reasoning-effort parameter, though Mistral announced a reasoning variant of Large 3 that had not shipped as of April 2026.

Question 7

Does Mistral Large 3 train on user data?

Accepted Answer

Mistral's published policy is that API inputs and outputs sent to Mistral Large 3 via La Plateforme are not used to train future models unless a customer opts in. Because the model is also released as Apache 2.0 open weights, organizations that self-host it keep all data within their own infrastructure with no data leaving to Mistral at all. Mistral AI is based in Paris and operates under EU data protection law, giving the hosted API a GDPR-aligned baseline with EU data residency. Under the EU AI Act, Mistral Large 3 falls under the general-purpose AI model (GPAI) category, which carries documentation and transparency obligations. Mistral has not published SOC 2 Type II or ISO 27001 certification details specific to this model, and it is not marketed as HIPAA-eligible. On Amazon Bedrock or Azure AI Foundry, data handling follows each cloud provider's standard tenant-isolation policies layered on top of Mistral's base no-training commitment.

Question 8

Who is Mistral Large 3 best for and who should avoid it?

Accepted Answer

Mistral Large 3 is best for enterprises that want a frontier-scale, Apache 2.0-licensed model they can self-host, fine-tune, or run air-gapped without commercial restrictions, for teams building long-context multimodal tools that need the full 256K window for documents or codebases, and for cost-sensitive high-volume API users who want roughly GPT-4o-level coding performance (92% HumanEval) at $0.50/$1.50 per million tokens. It should be avoided for graduate-level scientific reasoning tasks, where its 43.9% GPQA Diamond score is roughly half of DeepSeek-V3.2 and Kimi K2-Thinking's 70-85%; those teams should pick a reasoning-tuned model instead. It's also a poor fit for teams without multi-GPU infrastructure, since self-hosting requires roughly 355GB+ of VRAM even at 4-bit quantization, and for workloads that need visible chain-of-thought reasoning, since no reasoning-effort mode had shipped for Large 3 as of April 2026.

Mistral Large 3

Mistral Large 3: 675B MoE, Apache 2.0, $0.50/M (2026)

About Mistral Large 3

Pricing

Key Features

Pros

Cons

Benchmarks

Frequently Asked Questions