Name: Ministral 3 8B: 78.7% AIME 2025, 256K Context, $0.15/1M
Brand: Mistral AI
Price: 0.15 USD
Availability: InStock

Question 1

What is Ministral 3 8B and who built it?

Accepted Answer

Ministral 3 8B is a multimodal foundation model built by Mistral AI, a Paris-based AI lab founded in April 2023. Released December 4, 2025 as the mid-tier member of the Ministral 3 family, it uses a dense Transformer architecture with approximately 8 billion parameters and an interleaved sliding-window attention pattern optimised for memory efficiency at long contexts. The model is released under Apache 2.0, making it freely usable for commercial products without royalties. Three variants are available: base (pre-trained), instruct (chat-optimised), and reasoning (chain-of-thought optimised for math and logic). The 8B reasoning variant scores 78.7% on AIME 2025 and 66.8% on GPQA Diamond, outperforming models twice its size on math-heavy benchmarks. The 8B sits above the 3B in the Ministral lineup and below the 14B, which reaches 85% AIME 2025 at roughly double the VRAM requirement.

Question 2

How much does Ministral 3 8B cost per 1M tokens?

Accepted Answer

Ministral 3 8B is priced at $0.15 per 1 million input tokens and $0.15 per 1 million output tokens via Mistral's la Plateforme API. There is no published cached-input discount; all tokens are billed at the flat $0.15 rate. For comparison, Together AI's hosted Llama 3.1 8B costs $0.18/1M, making the Ministral 3 8B cheaper while offering stronger benchmark scores. A daily pipeline that processes 2 million input tokens and generates 500,000 output tokens costs $0.375 total on the Mistral API. Self-hosted deployments under Apache 2.0 carry zero per-token fees; a 12GB FP8 deployment on an RTX 3080 runs at roughly $0.01-0.02 per 1M tokens in electricity and amortised hardware cost. No batch API tier is separately advertised.

Question 3

What is Ministral 3 8B's context window and max output?

Accepted Answer

Ministral 3 8B supports a 256,000 token context window, double the 128K of Llama 3.1 8B at the same parameter scale. Some third-party benchmarks report 262,144 tokens (2^18) as the architectural maximum; Mistral's official documentation states 256K as the effective context length. The model uses an interleaved sliding-window attention mechanism to manage KV cache memory at long contexts. Mistral has not published a formal needle-in-haystack long-context recall evaluation for the 8B variant. Maximum output tokens per API call are not separately documented. For tasks requiring the full 256K context window on self-hosted FP8, the interleaved attention can reduce coherence on cross-context lookups near the window limit; chunking is recommended for global cross-reference tasks.

Question 4

How does Ministral 3 8B compare on benchmarks vs Llama 3.1 8B?

Accepted Answer

The Ministral 3 8B reasoning variant scores 78.7% on AIME 2025 and 66.8% on GPQA Diamond; Llama 3.1 8B does not publish AIME 2025 or GPQA Diamond scores from the vendor. On MMLU-Pro, the Ministral 3 8B base scores 70.6% versus Llama 3.1 8B at approximately 63% on standard MMLU. LiveCodeBench: Ministral 3 8B Reasoning scores 61.6%; Llama 3.1 8B code scores have not been independently published at that benchmark. Ministral 3 8B also includes a native vision encoder, which Llama 3.1 8B lacks. On cost, Ministral 3 8B via la Plateforme ($0.15/1M) is cheaper than Llama 3.1 8B on Together AI ($0.18/1M). The main advantage Llama 3.1 8B holds is community ecosystem size and plug-in compatibility with more tooling.

Question 5

Is Ministral 3 8B open source or proprietary?

Accepted Answer

Ministral 3 8B is open-source under the Apache 2.0 license, one of the most permissive open-source licenses available. The weights are downloadable from Hugging Face at mistralai/Ministral-3-8B-Instruct-2512 (instruct), mistralai/Ministral-3-8B-Base-2512 (base), and mistralai/Ministral-3-8B-Reasoning-2512 (reasoning). In FP8 format the model fits in 12GB of VRAM, compatible with RTX 3080 and RTX 4070 GPUs. BF16 requires 24GB; Q4 quantization (GGUF) reduces memory to under 6GB for deployment on Apple M2 Pro 16GB or mid-range consumer GPUs. The recommended self-hosting framework is vLLM; llama.cpp supports GGUF-converted variants. There are no commercial use restrictions beyond Apache 2.0 itself.

Question 6

What modalities does Ministral 3 8B support?

Accepted Answer

Ministral 3 8B accepts text and image inputs. Images are processed through a 410 million parameter Vision Transformer (ViT) encoder integrated into the architecture, handling visual QA, chart reading, and image OCR natively. Output is text only; image generation is not supported. Function calling and structured JSON output are available via Mistral's OpenAI-compatible tool schema, enabling integration with LangChain, LlamaIndex, and custom agent loops. Parallel tool calls are supported. Audio and video inputs are not supported; voice applications must add a separate ASR model before calling the 8B. The model covers 11 languages natively: English, French, German, Spanish, Italian, Portuguese, Dutch, Russian, Japanese, Korean, and Chinese.

Question 7

Does Ministral 3 8B train on user data?

Accepted Answer

API inputs via Mistral's la Plateforme are not used for model training according to Mistral's published data handling policy. Abuse monitoring may retain flagged inputs for a limited period. For zero data-exposure, the Apache 2.0 license enables fully air-gapped local deployment with no network dependency. Mistral AI is a GDPR-compliant European company headquartered in Paris; API traffic routes through European infrastructure by default. Enterprise data processing agreements and zero-retention options are available through Mistral's sales channel. SOC 2 Type II and HIPAA certifications at the API tier are not separately documented in Mistral's public trust resources.

Question 8

Who is Ministral 3 8B best for and who should avoid it?

Accepted Answer

Ministral 3 8B is best for developers building local reasoning assistants on 12GB GPUs, teams running multilingual coding agents that need vision support, and cost-sensitive pipelines that need stronger math reasoning than a 3B provides without stepping up to 14B pricing. The reasoning variant makes it particularly suited for math tutoring, science QA, and structured logic tasks. Teams needing maximum reasoning quality should use the Ministral 3 14B Reasoning variant, which reaches 85% AIME 2025 at roughly double the VRAM and cost. Voice assistant builders should avoid this model because there is no native audio input. Teams that need confirmed BF16 on a single 16GB GPU should use Q4 or FP8 instead, as BF16 requires 24GB. For very large-scale agentic coding, Mistral Large 3 at 675B parameters provides stronger multi-step reasoning.

Ministral 3 8B: 78.7% AIME 2025, 256K Context, $0.15/1M

About Ministral 3 8B

Pricing

Key Features

Pros

Cons

Benchmarks

Frequently Asked Questions