Ministral 3 8B: 78.7% AIME 2025, 256K Context, $0.15/1M
Ministral 3 8B by Mistral AI (Dec 2025): Apache 2.0, 8B params, 256K context, 70.6% MMLU-Pro, 78.7% AIME 2025 reasoning, vision input, $0.15/1M tokens.
Ministral 3 8B is Mistral AI's mid-tier December 2025 model with 8 billion parameters, 256K context, and a reasoning variant scoring 78.7% on AIME 2025 and 66.8% on GPQA Diamond. At $0.15 per 1M tokens for both input and output, it runs in 12GB FP8 VRAM under Apache 2.0, making it the strongest open-source reasoning model at the 12GB GPU tier.
Ministral 3 8B, released December 4, 2025 by Mistral AI, is an Apache 2.0 open-weight multimodal model with approximately 8 billion parameters and a 256,000 token context window. The dedicated reasoning variant scores 78.7% on AIME 2025 and 66.8% on GPQA Diamond. Priced at $0.15 per 1M tokens (input and output), it runs in 12GB of VRAM (FP8) and supports vision input via an integrated ViT encoder.
Provider: Mistral AI · Family: Ministral 3
Context window: 256,000 tokens
Input modalities: text, image, tool-calls · Output: text, tool-calls
About Ministral 3 8B
Ministral 3 8B is the mid-tier model in Mistral AI's Ministral 3 family, released December 4, 2025. It uses a dense Transformer architecture with approximately 8 billion parameters and an interleaved sliding-window attention pattern for memory-efficient inference at longer contexts. Released under the Apache 2.0 license, it is freely downloadable and commercially deployable. In Mistral's product lineup the 8B sits between the lightweight 3B and the higher-quality 14B, and it is the first size in the family where reasoning-mode benchmark scores have been published in detail. The primary deployment target is local inference on 12GB GPUs and cost-sensitive cloud batches. On MMLU-Pro the Ministral 3 8B base model scores 70.6% according to LayerLens/Atlas independent evaluation. The dedicated 8B Reasoning variant achieves 86.0% on AIME 2024, 78.7% on AIME 2025, 66.8% on GPQA Diamond, and 61.6% on LiveCodeBench. These scores are for the reasoning-mode variant, not the instruct variant; the instruct model will score lower on math-heavy tasks but performs better for chat and instruction-following. Compared to Qwen2.5 14B Instruct (which was released in September 2024), the Ministral 3 8B Reasoning achieves comparable or higher scores on AIME while offering an integrated vision encoder Qwen2.5 14B lacks. The 8B supports a 256,000 token context window. Some third-party benchmarks report 262,144 tokens (2^18), which is the architectural maximum; the official Mistral documentation states 256K as the effective context length. The model uses an interleaved sliding-window attention mechanism to manage KV cache efficiency at long context. Mistral has not published a formal needle-in-haystack recall evaluation for the 8B variant. Maximum output tokens per API call are not separately specified. For comparison, Llama 3.1 8B supports 128K context, making the Ministral 3 8B's window roughly twice as large. The model accepts text and image inputs via a 410 million parameter Vision Transformer (ViT) encoder. It handles standard image formats for visual QA, chart reading, and OCR tasks. Output is text only. Function calling and structured JSON output are supported via Mistral's OpenAI-compatible tool schema with parallel tool call support. Audio and video inputs are not supported. The model covers the same 11-language multilingual core as the rest of the Ministral 3 family, with strong performance in French, German, Spanish, Italian, Portuguese, Japanese, Korean, and Chinese. Pricing via Mistral's la Plateforme API is $0.15 per 1M input tokens and $0.15 per 1M output tokens, the middle tier of the Ministral 3 family. There is no published cached-input discount. Self-hosted deployments under Apache 2.0 pay only compute costs. For comparison, Llama 3.1 8B Instruct hosted on Together AI costs $0.18/1M, making the Ministral 3 8B slightly cheaper and more capable. A daily batch that processes 2M input tokens and generates 500K output tokens costs $0.375 on the Mistral API. API access is available through la Plateforme at api.mistral.ai with model ID ministral-3-8b-latest or ministral-3-8b-2512. Weights are on Hugging Face at mistralai/Ministral-3-8B-Instruct-2512, mistralai/Ministral-3-8B-Base-2512, and mistralai/Ministral-3-8B-Reasoning-2512. In BF16 the model needs 24GB of VRAM; in FP8 it fits in 12GB, compatible with an RTX 3080 or RTX 4070. Q4 quantization brings it under 6GB, enabling deployment on mid-range consumer GPUs and Apple M2 Pro. The recommended self-hosting framework is vLLM; llama.cpp supports GGUF variants. Safety alignment follows the same SFT plus RLHF approach as the wider Ministral 3 family. Mistral has not published a standalone red team report for the 8B. The instruct variant has lighter default refusals than Claude or GPT-4o Mini, consistent with Mistral's permissive positioning for developer-facing open models. For consumer-facing products, deployers are expected to add their own content filtering layer. API inputs via la Plateforme are not used for model training. Ministral 3 8B is the best size for teams that outgrow the 3B's reasoning capability but cannot justify the 24GB VRAM or higher pricing of the 14B. It is well-suited for coding assistants, document analysis agents, and multilingual chatbots that need vision support. Teams building reasoning-heavy math or science pipelines should use the 8B Reasoning variant specifically. For maximum reasoning quality, the Ministral 3 14B Reasoning reaches 85% AIME 2025 at roughly twice the cost. Voice-first applications should add an ASR layer since native audio input is not supported. Mistral has not published a detailed technical report for the 8B model. The training dataset is multilingual with an estimated knowledge cutoff in late 2025, shared with the other Ministral 3 variants. Mistral AI is a GDPR-compliant European company; API traffic routes through European infrastructure. Enterprise data agreements and zero-retention options are available through Mistral's sales channel.
Pricing
$0.15 per 1M input tokens and $0.15 per 1M output tokens via la Plateforme. No cached-input rate published. Self-hosted under Apache 2.0 incurs only compute cost.
Key Features
- 78.7% AIME 2025 (Reasoning Variant): The dedicated reasoning model ID achieves 78.7% on AIME 2025 and 66.8% on GPQA Diamond, outperforming Qwen2.5 14B on math at half the parameter count.
- 12GB FP8 Self-Hosting: Runs in 12GB of VRAM (FP8) on a single RTX 3080 or RTX 4070, bringing mid-tier reasoning to consumer hardware.
- Integrated Vision Input: A 410M ViT encoder processes images natively for visual QA, chart reading, and image OCR without a separate vision model.
- 256K Context Window: Supports 256,000 token context, roughly twice the 128K of Llama 3.1 8B, for long document and multi-turn conversation tasks.
- Apache 2.0 with Three Variants: Base, instruct, and reasoning variants all available on Hugging Face under Apache 2.0, letting teams pick the right behaviour per task.
Pros
- Reasoning variant scores 78.7% AIME 2025, matching or beating models with twice the parameters for math-heavy workflows.
- Fits in 12GB FP8 (single RTX 3080/4070), making it the strongest local reasoning option at the 12GB GPU tier.
- $0.15/1M flat rate is cheaper than Together AI's hosted Llama 3.1 8B ($0.18/1M) with better benchmark scores.
Cons
- Reasoning benchmarks are for the dedicated reasoning variant; the instruct variant scores lower on math without chain-of-thought activation.
- BF16 requires 24GB VRAM; deployers needing native precision must use an A10G or better for self-hosting.
- No native audio input; voice pipelines require a separate ASR model adding latency.
Benchmarks
- mmlu pro: 70.6
- aime 2025: 78.7
- live bench: 61.6
- gpqa diamond: 66.8
- artificial analysis price blended per m: 0.15
Frequently Asked Questions
What is Ministral 3 8B and who built it?
Ministral 3 8B is a multimodal foundation model built by Mistral AI, a Paris-based AI lab founded in April 2023. Released December 4, 2025 as the mid-tier member of the Ministral 3 family, it uses a dense Transformer architecture with approximately 8 billion parameters and an interleaved sliding-window attention pattern optimised for memory efficiency at long contexts. The model is released under Apache 2.0, making it freely usable for commercial products without royalties. Three variants are available: base (pre-trained), instruct (chat-optimised), and reasoning (chain-of-thought optimised for math and logic). The 8B reasoning variant scores 78.7% on AIME 2025 and 66.8% on GPQA Diamond, outperforming models twice its size on math-heavy benchmarks. The 8B sits above the 3B in the Ministral lineup and below the 14B, which reaches 85% AIME 2025 at roughly double the VRAM requirement.
How much does Ministral 3 8B cost per 1M tokens?
Ministral 3 8B is priced at $0.15 per 1 million input tokens and $0.15 per 1 million output tokens via Mistral's la Plateforme API. There is no published cached-input discount; all tokens are billed at the flat $0.15 rate. For comparison, Together AI's hosted Llama 3.1 8B costs $0.18/1M, making the Ministral 3 8B cheaper while offering stronger benchmark scores. A daily pipeline that processes 2 million input tokens and generates 500,000 output tokens costs $0.375 total on the Mistral API. Self-hosted deployments under Apache 2.0 carry zero per-token fees; a 12GB FP8 deployment on an RTX 3080 runs at roughly $0.01-0.02 per 1M tokens in electricity and amortised hardware cost. No batch API tier is separately advertised.
What is Ministral 3 8B's context window and max output?
Ministral 3 8B supports a 256,000 token context window, double the 128K of Llama 3.1 8B at the same parameter scale. Some third-party benchmarks report 262,144 tokens (2^18) as the architectural maximum; Mistral's official documentation states 256K as the effective context length. The model uses an interleaved sliding-window attention mechanism to manage KV cache memory at long contexts. Mistral has not published a formal needle-in-haystack long-context recall evaluation for the 8B variant. Maximum output tokens per API call are not separately documented. For tasks requiring the full 256K context window on self-hosted FP8, the interleaved attention can reduce coherence on cross-context lookups near the window limit; chunking is recommended for global cross-reference tasks.
How does Ministral 3 8B compare on benchmarks vs Llama 3.1 8B?
The Ministral 3 8B reasoning variant scores 78.7% on AIME 2025 and 66.8% on GPQA Diamond; Llama 3.1 8B does not publish AIME 2025 or GPQA Diamond scores from the vendor. On MMLU-Pro, the Ministral 3 8B base scores 70.6% versus Llama 3.1 8B at approximately 63% on standard MMLU. LiveCodeBench: Ministral 3 8B Reasoning scores 61.6%; Llama 3.1 8B code scores have not been independently published at that benchmark. Ministral 3 8B also includes a native vision encoder, which Llama 3.1 8B lacks. On cost, Ministral 3 8B via la Plateforme ($0.15/1M) is cheaper than Llama 3.1 8B on Together AI ($0.18/1M). The main advantage Llama 3.1 8B holds is community ecosystem size and plug-in compatibility with more tooling.
Is Ministral 3 8B open source or proprietary?
Ministral 3 8B is open-source under the Apache 2.0 license, one of the most permissive open-source licenses available. The weights are downloadable from Hugging Face at mistralai/Ministral-3-8B-Instruct-2512 (instruct), mistralai/Ministral-3-8B-Base-2512 (base), and mistralai/Ministral-3-8B-Reasoning-2512 (reasoning). In FP8 format the model fits in 12GB of VRAM, compatible with RTX 3080 and RTX 4070 GPUs. BF16 requires 24GB; Q4 quantization (GGUF) reduces memory to under 6GB for deployment on Apple M2 Pro 16GB or mid-range consumer GPUs. The recommended self-hosting framework is vLLM; llama.cpp supports GGUF-converted variants. There are no commercial use restrictions beyond Apache 2.0 itself.
What modalities does Ministral 3 8B support?
Ministral 3 8B accepts text and image inputs. Images are processed through a 410 million parameter Vision Transformer (ViT) encoder integrated into the architecture, handling visual QA, chart reading, and image OCR natively. Output is text only; image generation is not supported. Function calling and structured JSON output are available via Mistral's OpenAI-compatible tool schema, enabling integration with LangChain, LlamaIndex, and custom agent loops. Parallel tool calls are supported. Audio and video inputs are not supported; voice applications must add a separate ASR model before calling the 8B. The model covers 11 languages natively: English, French, German, Spanish, Italian, Portuguese, Dutch, Russian, Japanese, Korean, and Chinese.
Does Ministral 3 8B train on user data?
API inputs via Mistral's la Plateforme are not used for model training according to Mistral's published data handling policy. Abuse monitoring may retain flagged inputs for a limited period. For zero data-exposure, the Apache 2.0 license enables fully air-gapped local deployment with no network dependency. Mistral AI is a GDPR-compliant European company headquartered in Paris; API traffic routes through European infrastructure by default. Enterprise data processing agreements and zero-retention options are available through Mistral's sales channel. SOC 2 Type II and HIPAA certifications at the API tier are not separately documented in Mistral's public trust resources.
Who is Ministral 3 8B best for and who should avoid it?
Ministral 3 8B is best for developers building local reasoning assistants on 12GB GPUs, teams running multilingual coding agents that need vision support, and cost-sensitive pipelines that need stronger math reasoning than a 3B provides without stepping up to 14B pricing. The reasoning variant makes it particularly suited for math tutoring, science QA, and structured logic tasks. Teams needing maximum reasoning quality should use the Ministral 3 14B Reasoning variant, which reaches 85% AIME 2025 at roughly double the VRAM and cost. Voice assistant builders should avoid this model because there is no native audio input. Teams that need confirmed BF16 on a single 16GB GPU should use Q4 or FP8 instead, as BF16 requires 24GB. For very large-scale agentic coding, Mistral Large 3 at 675B parameters provides stronger multi-step reasoning.