Name: Ministral 3 14B: 85% AIME 2025, 74.2% MMLU-Pro, $0.20/1M (2025)
Brand: Mistral AI
Price: 0.20 USD
Availability: InStock

Question 1

What is Ministral 3 14B and who built it?

Accepted Answer

Ministral 3 14B is a multimodal foundation model built by Mistral AI, a Paris-based AI lab founded in April 2023. Released December 4, 2025 as the largest model in the Ministral 3 family, it uses a dense Transformer architecture with 13.5 billion language decoder parameters and a 410 million parameter Vision Transformer (ViT) encoder, for approximately 14 billion parameters total. The design uses Grouped Query Attention (GQA) with 40 layers and a 5,120 hidden dimension, optimising for strong task performance without the inference complexity of a MoE design. Released under Apache 2.0, it is freely usable for commercial products. Three variants are on Hugging Face: base, instruct, and reasoning. The reasoning variant scores 85% on AIME 2025, 11 points above Qwen2.5 14B Instruct and among the highest open-source results at this parameter scale.

Question 2

How much does Ministral 3 14B cost per 1M tokens?

Accepted Answer

Ministral 3 14B is priced at $0.20 per 1 million input tokens and $0.20 per 1 million output tokens via Mistral's la Plateforme API. There is no published cached-input discount; all tokens are billed at the flat $0.20 rate. For comparison, GPT-4o Mini charges $0.15 input and $0.60 output per 1M tokens; Ministral 3 14B is only 33% more expensive on input but 67% cheaper on output, making it better value for generation-heavy tasks. A daily pipeline generating 500,000 tokens of output on 2M input tokens costs $0.50. Self-hosted on a 24GB FP8 GPU under Apache 2.0, compute cost is roughly $0.01-0.03 per 1M tokens in electricity. NVIDIA Build pricing is available separately at build.nvidia.com; rates vary by tier.

Question 3

What is Ministral 3 14B's context window and max output?

Accepted Answer

Ministral 3 14B supports a 256,000 token context window, the same effective context length as Mistral Large 3, making it competitive on context at a fraction of the cost. Some third-party benchmarks report an architectural limit of 262,144 tokens (2^18); Mistral's official documentation states 256K. The model uses GQA with 40 transformer layers and 5,120 hidden dimensions to manage KV cache efficiency at long contexts. Mistral has not published a formal needle-in-haystack long-context recall evaluation for the 14B specifically. Maximum output tokens per API call are not separately specified. For comparison, Qwen2.5 14B Instruct supports 128K context, making the Ministral 3 14B's window twice as large for the same parameter class.

Question 4

How does Ministral 3 14B compare on benchmarks vs Qwen2.5 14B?

Accepted Answer

The Ministral 3 14B Reasoning variant scores 85% on AIME 2025 versus Qwen2.5 14B Instruct's 73.7% on the same benchmark, an 11-point advantage. On MMLU-Pro, the Ministral 3 14B base scores 74.2% versus Qwen2.5 14B Instruct at approximately 72% on MMLU-Pro in independent evaluations. The Ministral 3 14B also includes a native 410M ViT vision encoder, which Qwen2.5 14B Instruct lacks. On context window, Ministral 3 14B supports 256K tokens versus Qwen2.5 14B's 128K. On pricing, the models are in the same tier; Qwen2.5 14B is open-weight under a custom Qwen license allowing commercial use, while Ministral 3 14B is Apache 2.0 with no additional restrictions. For coding tasks, both models have strong results but no direct head-to-head LiveCodeBench comparison is published.

Question 5

Is Ministral 3 14B open source or proprietary?

Accepted Answer

Ministral 3 14B is open-source under the Apache 2.0 license, the most permissive widely-used open-source license, allowing commercial use, modification, and redistribution without royalties. Weights are available on Hugging Face at mistralai/Ministral-3-14B-Instruct-2512 (instruct), mistralai/Ministral-3-14B-Base-2512 (base), and mistralai/Ministral-3-14B-Reasoning-2512 (reasoning). In FP8 format the model requires 24GB of VRAM, compatible with a single RTX 4090 or RTX 3090. BF16 requires 32GB. Q4 quantization (GGUF) brings memory under 10GB for sub-consumer-flagship GPU deployment. The recommended self-hosting framework is vLLM; llama.cpp supports GGUF variants. The model is also available via NVIDIA Build at build.nvidia.com for cloud-hosted inference without self-hosting setup.

Question 6

What modalities does Ministral 3 14B support?

Accepted Answer

Ministral 3 14B accepts text and image inputs. Images are processed through an integrated 410 million parameter Vision Transformer (ViT) encoder, handling visual QA, chart reading, and image OCR natively in a single model call. Output is text only; image generation is not supported. Function calling and structured JSON output are available via Mistral's OpenAI-compatible tool schema, with support for parallel tool calls enabling multi-step agentic workflows. Audio and video inputs are not supported; voice applications require a separate ASR model before calling the 14B. The model is multilingual across 11 languages including French, German, Spanish, Italian, Japanese, Korean, and Chinese at native quality.

Question 7

Does Ministral 3 14B train on user data?

Accepted Answer

API inputs via Mistral's la Plateforme are not used for model training according to Mistral's data handling policy. Abuse monitoring may retain flagged inputs for a limited period. Self-hosted deployments under Apache 2.0 process all data locally with no external exposure, enabling fully air-gapped deployments for sensitive environments. Mistral AI is a GDPR-compliant European company headquartered in Paris; API traffic routes through European infrastructure by default. Enterprise data processing agreements and zero-retention options are available via Mistral's sales channel. SOC 2 Type II and HIPAA certifications at the API tier are not separately documented in Mistral's public trust resources.

Question 8

Who is Ministral 3 14B best for and who should avoid it?

Accepted Answer

Ministral 3 14B is best for AI researchers running math or science benchmarks, developers building advanced local reasoning agents on 24GB consumer GPUs, and teams that need the strongest open-source multilingual reasoning at under $0.25 per 1M tokens. The 85% AIME 2025 reasoning variant makes it a strong choice for math tutoring, scientific QA, structured logic, and competition-level problem solving. Teams needing verified maximum agentic reasoning should consider Mistral Large 3 (675B total parameters, 41B active), which outperforms the 14B on multi-step orchestration. Voice-first teams should avoid this model as audio input requires a separate ASR step. Teams that need BF16 precision on a single 24GB GPU cannot do so as BF16 requires 32GB; use FP8 or Q4 instead. For very high throughput production APIs without self-hosting, Mistral Large 3 via la Plateforme offers stronger quality at a higher price.

Ministral 3 14B: 85% AIME 2025, 74.2% MMLU-Pro, $0.20/1M (2025)

About Ministral 3 14B

Pricing

Key Features

Pros

Cons

Benchmarks

Frequently Asked Questions