Ministral 3 14B: 85% AIME 2025, 74.2% MMLU-Pro, $0.20/1M (2025)
Ministral 3 14B by Mistral AI (Dec 2025): Apache 2.0, 14B params, 256K context, 85% AIME 2025 reasoning, 74.2% MMLU-Pro, vision input, $0.20/1M tokens.
Ministral 3 14B is Mistral AI's most capable small model (released December 4, 2025), with 13.5 billion language parameters, 256K context, and a reasoning variant scoring 85% on AIME 2025 and 74.2% MMLU-Pro on the base. At $0.20 per 1M tokens for both input and output, it runs in 24GB FP8 on a single RTX 4090 under Apache 2.0, and beats Qwen2.5 14B by 11 points on AIME 2025.
Ministral 3 14B, released December 4, 2025 by Mistral AI, is an Apache 2.0 open-weight multimodal model with 13.5B language parameters plus a 410M ViT encoder and a 256,000 token context window. The reasoning variant scores 85% on AIME 2025 and the base model scores 74.2% on MMLU-Pro, the highest in the Ministral 3 family. Priced at $0.20 per 1M tokens (input and output), it runs in 24GB FP8 VRAM on a single RTX 4090.
Provider: Mistral AI · Family: Ministral 3
Context window: 256,000 tokens
Input modalities: text, image, tool-calls · Output: text, tool-calls
About Ministral 3 14B
Ministral 3 14B is the largest and most capable model in Mistral AI's Ministral 3 family, released December 4, 2025. It uses a dense Transformer architecture with 13.5 billion language decoder parameters plus a 410 million parameter Vision Transformer (ViT) encoder, totalling approximately 14 billion parameters. The architecture employs Grouped Query Attention (GQA) with 40 transformer layers and a hidden dimension of 5,120, designed for strong task performance without the complexity of a Mixture-of-Experts design. Under Apache 2.0, it is freely usable for commercial self-hosting. In Mistral's lineup it sits at the top of the Ministral family, below the Mistral Large 3 MoE at 675B total parameters but offering stronger per-parameter reasoning than the 3B and 8B. The Ministral 3 14B base model scores 74.2% on MMLU-Pro according to LayerLens/Atlas independent evaluation, placing it above the 8B (70.6%) and ahead of many comparable 13-14B class models. The dedicated 14B Reasoning variant achieves 85% on AIME 2025, the highest published score in the Ministral 3 family, and is one of the strongest open-source results at the 14B parameter class as of December 2025. For comparison, Qwen2.5 14B Instruct (released September 2024) scores 73.7% on AIME 2025 in its best-of-1 configuration, making the Ministral 3 14B Reasoning 11 percentage points ahead on the same benchmark. The instruct variant scores lower on math-heavy tasks but delivers strong performance on coding, instruction-following, and multilingual generation. The model supports a 256,000 token context window. Some third-party sources report the architectural limit as 262,144 tokens (2^18); Mistral's official documentation states 256K as the effective figure. GQA with 32 query heads and 8 key-value heads controls KV cache memory usage at long contexts. Mistral has not published a formal needle-in-haystack long-context recall evaluation for the 14B. Maximum output tokens per call are not separately documented in Mistral's API reference. For reference, the Mistral Large 3 at 256K context shares the same window size, so the 14B is competitive on context length even against the flagship. Ministral 3 14B accepts text and image inputs via the embedded 410M ViT encoder. It processes standard image formats for visual QA, chart analysis, and image OCR. Output is text only. Function calling and structured JSON output are supported via Mistral's OpenAI-compatible tool schema, with parallel tool call support. Audio and video inputs are not supported. The model is multilingual across the same 11-language core as the wider Ministral 3 family, with first-class support for French, German, Spanish, Italian, Portuguese, Dutch, Russian, Japanese, Korean, and Chinese. Pricing via Mistral's la Plateforme API is $0.20 per 1M input tokens and $0.20 per 1M output tokens. There is no published cached-input discount. For comparison, GPT-4o Mini charges $0.15 input and $0.60 output per 1M, making the Ministral 3 14B notably cheaper on output while delivering stronger reasoning benchmark scores. A daily batch processing 2M input tokens and generating 500K output tokens costs $0.50 on Mistral's API. Self-hosted deployments under Apache 2.0 pay only compute costs. API access is available through la Plateforme at api.mistral.ai with model IDs ministral-3-14b-latest or ministral-3-14b-2512 (instruct) and ministral-3-14b-reasoning-2512 (reasoning). Weights are on Hugging Face at mistralai/Ministral-3-14B-Instruct-2512, mistralai/Ministral-3-14B-Base-2512, and mistralai/Ministral-3-14B-Reasoning-2512. In BF16 the model requires 32GB of VRAM; in FP8 it fits in 24GB, compatible with a single RTX 3090 or RTX 4090. Q4 quantization brings it under 10GB. The model is also available on NVIDIA Build at build.nvidia.com/mistralai/ministral-14b-instruct-2512. Alignment follows SFT plus RLHF on the instruct variant. Mistral has not published a standalone safety card or red team report for the 14B. The model follows Mistral's permissive philosophy: lighter default refusals than GPT-4o or Claude, with the expectation that deployers implement their own content policies for consumer-facing applications. API inputs via la Plateforme are not used for model training. Self-hosted deployments have no external data exposure. Ministral 3 14B is best for teams that need the strongest open-source reasoning quality at the sub-24GB GPU tier: math competition problems, scientific QA, advanced coding agents, and multilingual analysis pipelines. The 85% AIME 2025 reasoning variant makes it competitive with proprietary models in structured math. Teams that need maximum quality for very long agentic loops or advanced multi-tool orchestration should consider Mistral Large 3 (675B total parameters), which operates at a different tier. Real-time voice applications should add an ASR layer since audio input is not supported. Teams that need BF16 on a single 24GB GPU (RTX 3090/4090) will find the 14B a better fit than the 8B for demanding tasks. The Ministral 3 14B was trained on a multilingual dataset with an estimated knowledge cutoff in late 2025, consistent with the wider Ministral 3 family. Mistral AI is GDPR-compliant, headquartered in Paris, and API traffic routes through European infrastructure. The model is available on NVIDIA Build for deployment in NVIDIA cloud environments. Enterprise data agreements and zero-retention options are available via Mistral's sales channel.
Pricing
$0.20 per 1M input tokens and $0.20 per 1M output tokens via la Plateforme. No cached-input rate published. Self-hosted under Apache 2.0 incurs only compute cost.
Key Features
- 85% AIME 2025 (Reasoning Variant): The dedicated reasoning model ID achieves 85% on AIME 2025, 11 points above Qwen2.5 14B Instruct and among the highest published open-source scores at this parameter class.
- 74.2% MMLU-Pro Base Score: The highest MMLU-Pro result in the Ministral 3 family, above the 8B (70.6%), covering graduate-level multitask knowledge across 11 languages.
- 24GB FP8 Deployment: Runs in 24GB of VRAM (FP8) on a single RTX 4090 or RTX 3090, the strongest local reasoning option at the consumer flagship GPU tier.
- 256K Context Window: Supports 256,000 token context via GQA with 40 layers and 5,120 hidden dimensions, matching the context window of Mistral Large 3 at a fraction of the cost.
- Apache 2.0 with Three Variants: Base, instruct, and reasoning variants on Hugging Face under the most permissive open-source license, enabling task-matched routing across all use cases.
Pros
- Reasoning variant reaches 85% AIME 2025, the strongest open-source math score at the 14B class as of December 2025, beating Qwen2.5 14B by 11 points.
- 74.2% MMLU-Pro on the base model, the highest MMLU-Pro in the Ministral 3 family and competitive with proprietary models at a fraction of their cost.
- $0.20/1M flat rate is cheaper on output than GPT-4o Mini ($0.60/1M output) while delivering significantly stronger reasoning benchmark scores.
Cons
- BF16 requires 32GB VRAM, exceeding consumer flagship GPUs; FP8 or Q4 quantization is mandatory for single-GPU self-hosting.
- Reasoning benchmarks apply to the dedicated reasoning model ID only; the instruct variant scores materially lower on math without chain-of-thought activation.
- No native audio input; voice-first applications need a separate ASR pipeline adding latency and complexity.
Benchmarks
- mmlu pro: 74.2
- aime 2025: 85
- artificial analysis price blended per m: 0.2
Frequently Asked Questions
What is Ministral 3 14B and who built it?
Ministral 3 14B is a multimodal foundation model built by Mistral AI, a Paris-based AI lab founded in April 2023. Released December 4, 2025 as the largest model in the Ministral 3 family, it uses a dense Transformer architecture with 13.5 billion language decoder parameters and a 410 million parameter Vision Transformer (ViT) encoder, for approximately 14 billion parameters total. The design uses Grouped Query Attention (GQA) with 40 layers and a 5,120 hidden dimension, optimising for strong task performance without the inference complexity of a MoE design. Released under Apache 2.0, it is freely usable for commercial products. Three variants are on Hugging Face: base, instruct, and reasoning. The reasoning variant scores 85% on AIME 2025, 11 points above Qwen2.5 14B Instruct and among the highest open-source results at this parameter scale.
How much does Ministral 3 14B cost per 1M tokens?
Ministral 3 14B is priced at $0.20 per 1 million input tokens and $0.20 per 1 million output tokens via Mistral's la Plateforme API. There is no published cached-input discount; all tokens are billed at the flat $0.20 rate. For comparison, GPT-4o Mini charges $0.15 input and $0.60 output per 1M tokens; Ministral 3 14B is only 33% more expensive on input but 67% cheaper on output, making it better value for generation-heavy tasks. A daily pipeline generating 500,000 tokens of output on 2M input tokens costs $0.50. Self-hosted on a 24GB FP8 GPU under Apache 2.0, compute cost is roughly $0.01-0.03 per 1M tokens in electricity. NVIDIA Build pricing is available separately at build.nvidia.com; rates vary by tier.
What is Ministral 3 14B's context window and max output?
Ministral 3 14B supports a 256,000 token context window, the same effective context length as Mistral Large 3, making it competitive on context at a fraction of the cost. Some third-party benchmarks report an architectural limit of 262,144 tokens (2^18); Mistral's official documentation states 256K. The model uses GQA with 40 transformer layers and 5,120 hidden dimensions to manage KV cache efficiency at long contexts. Mistral has not published a formal needle-in-haystack long-context recall evaluation for the 14B specifically. Maximum output tokens per API call are not separately specified. For comparison, Qwen2.5 14B Instruct supports 128K context, making the Ministral 3 14B's window twice as large for the same parameter class.
How does Ministral 3 14B compare on benchmarks vs Qwen2.5 14B?
The Ministral 3 14B Reasoning variant scores 85% on AIME 2025 versus Qwen2.5 14B Instruct's 73.7% on the same benchmark, an 11-point advantage. On MMLU-Pro, the Ministral 3 14B base scores 74.2% versus Qwen2.5 14B Instruct at approximately 72% on MMLU-Pro in independent evaluations. The Ministral 3 14B also includes a native 410M ViT vision encoder, which Qwen2.5 14B Instruct lacks. On context window, Ministral 3 14B supports 256K tokens versus Qwen2.5 14B's 128K. On pricing, the models are in the same tier; Qwen2.5 14B is open-weight under a custom Qwen license allowing commercial use, while Ministral 3 14B is Apache 2.0 with no additional restrictions. For coding tasks, both models have strong results but no direct head-to-head LiveCodeBench comparison is published.
Is Ministral 3 14B open source or proprietary?
Ministral 3 14B is open-source under the Apache 2.0 license, the most permissive widely-used open-source license, allowing commercial use, modification, and redistribution without royalties. Weights are available on Hugging Face at mistralai/Ministral-3-14B-Instruct-2512 (instruct), mistralai/Ministral-3-14B-Base-2512 (base), and mistralai/Ministral-3-14B-Reasoning-2512 (reasoning). In FP8 format the model requires 24GB of VRAM, compatible with a single RTX 4090 or RTX 3090. BF16 requires 32GB. Q4 quantization (GGUF) brings memory under 10GB for sub-consumer-flagship GPU deployment. The recommended self-hosting framework is vLLM; llama.cpp supports GGUF variants. The model is also available via NVIDIA Build at build.nvidia.com for cloud-hosted inference without self-hosting setup.
What modalities does Ministral 3 14B support?
Ministral 3 14B accepts text and image inputs. Images are processed through an integrated 410 million parameter Vision Transformer (ViT) encoder, handling visual QA, chart reading, and image OCR natively in a single model call. Output is text only; image generation is not supported. Function calling and structured JSON output are available via Mistral's OpenAI-compatible tool schema, with support for parallel tool calls enabling multi-step agentic workflows. Audio and video inputs are not supported; voice applications require a separate ASR model before calling the 14B. The model is multilingual across 11 languages including French, German, Spanish, Italian, Japanese, Korean, and Chinese at native quality.
Does Ministral 3 14B train on user data?
API inputs via Mistral's la Plateforme are not used for model training according to Mistral's data handling policy. Abuse monitoring may retain flagged inputs for a limited period. Self-hosted deployments under Apache 2.0 process all data locally with no external exposure, enabling fully air-gapped deployments for sensitive environments. Mistral AI is a GDPR-compliant European company headquartered in Paris; API traffic routes through European infrastructure by default. Enterprise data processing agreements and zero-retention options are available via Mistral's sales channel. SOC 2 Type II and HIPAA certifications at the API tier are not separately documented in Mistral's public trust resources.
Who is Ministral 3 14B best for and who should avoid it?
Ministral 3 14B is best for AI researchers running math or science benchmarks, developers building advanced local reasoning agents on 24GB consumer GPUs, and teams that need the strongest open-source multilingual reasoning at under $0.25 per 1M tokens. The 85% AIME 2025 reasoning variant makes it a strong choice for math tutoring, scientific QA, structured logic, and competition-level problem solving. Teams needing verified maximum agentic reasoning should consider Mistral Large 3 (675B total parameters, 41B active), which outperforms the 14B on multi-step orchestration. Voice-first teams should avoid this model as audio input requires a separate ASR step. Teams that need BF16 precision on a single 24GB GPU cannot do so as BF16 requires 32GB; use FP8 or Q4 instead. For very high throughput production APIs without self-hosting, Mistral Large 3 via la Plateforme offers stronger quality at a higher price.