Name: Ministral 3 3B: 256K Context, Open-Source, $0.10/1M (2025)
Brand: Mistral AI
Price: 0.10 USD
Availability: InStock

Question 1

What is Ministral 3 3B and who built it?

Accepted Answer

Ministral 3 3B is a small-scale multimodal foundation model built by Mistral AI, a Paris-based AI lab founded in April 2023. Released December 2, 2025 as the smallest member of the Ministral 3 family, it combines 3.4 billion language decoder parameters with a 410 million parameter Vision Transformer (ViT) encoder for image understanding. The model uses a dense Transformer architecture with Grouped Query Attention (GQA), distinguishing it from the sparse Mixture-of-Experts approach used by Mistral Large 3. It is released under the Apache 2.0 license, making it freely usable for commercial products without royalties. Within Mistral's lineup, the 3B sits below the Ministral 3 8B and 14B, which offer stronger reasoning and higher benchmark scores. The key design goal is fitting into constrained hardware budgets, starting from 4GB of VRAM with Q4 quantization.

Question 2

How much does Ministral 3 3B cost per 1M tokens?

Accepted Answer

Ministral 3 3B is priced at $0.10 per 1 million input tokens and $0.10 per 1 million output tokens via Mistral's la Plateforme API, the lowest published rate in Mistral's catalog. There is no separate cached-input discount; the flat $0.10 rate applies to all tokens. For comparison, GPT-4o Mini charges $0.15 input and $0.60 output per 1M, making Ministral 3 3B roughly 40% cheaper on input and 83% cheaper on output for high-output workloads. A pipeline generating 1 million output tokens per day costs approximately $0.10 on Ministral 3 3B versus $0.60 on GPT-4o Mini. Self-hosted deployments under Apache 2.0 carry zero per-token fees; only GPU compute costs apply, typically under $0.02 per 1M tokens on an 8GB consumer GPU. Batch API pricing is not separately advertised for this model tier.

Question 3

What is Ministral 3 3B's context window and max output?

Accepted Answer

Ministral 3 3B supports a 256,000 token context window, double the 128K of Llama 3.2 3B for the same parameter class. The model uses Grouped Query Attention (GQA) with 32 query heads and 8 key-value heads to reduce KV cache memory usage at long context lengths. Mistral has not published a formal needle-in-haystack long-context recall evaluation for the 3B model specifically. Maximum output tokens per API call are not separately documented in Mistral's public API reference for this model. In practice, chat completions and extraction tasks typically generate under 4,096 output tokens. For comparison, the Ministral 3 8B and 14B share the same 256K context window, so upgrading for context alone is not necessary.

Question 4

How does Ministral 3 3B compare on benchmarks vs Llama 3.2 3B?

Accepted Answer

Mistral has not published a standalone MMLU-Pro, GPQA Diamond, or AIME score for the Ministral 3 3B instruct variant. At the 8B scale in the same family, the base model scores 70.6% on MMLU-Pro and the reasoning variant reaches 78.7% on AIME 2025; at 14B, the base scores 74.2% MMLU-Pro and the reasoning variant hits 85% AIME 2025. Llama 3.2 3B scores approximately 58% on standard MMLU, suggesting the Ministral 3B should outperform it on general knowledge, though no direct verified comparison has been published. Ministral 3 3B also adds a vision encoder not present in Llama 3.2 3B, making it the more capable choice for image-understanding tasks. Teams that need verified benchmark evidence before deployment should benchmark both models on their specific task, or consider stepping up to the Ministral 3 8B which has more complete published evals.

Question 5

Is Ministral 3 3B open source or proprietary?

Accepted Answer

Ministral 3 3B is open-source under the Apache 2.0 license, one of the most permissive available: it allows free commercial use, modification, and redistribution without royalties or attribution restrictions beyond the license header. The weights are available on Hugging Face at mistralai/Ministral-3-3B-Instruct-2512 (instruct), mistralai/Ministral-3-3B-Base-2512 (base), and mistralai/Ministral-3-3B-Reasoning-2512 (reasoning chain-of-thought variant). In FP8 format the model requires 8GB of VRAM; Q4 quantization (GGUF format) reduces this to under 4GB, compatible with Apple M2/M3/M4 unified memory and NVIDIA RTX 30/40 series consumer GPUs. The recommended self-hosting framework is vLLM; llama.cpp and LM Studio also support the GGUF-converted model. For those who prefer not to self-host, Mistral's la Plateforme API provides access at $0.10/1M tokens with no minimum commitment.

Question 6

What modalities does Ministral 3 3B support?

Accepted Answer

Ministral 3 3B accepts text and image inputs. Images are processed through a 410 million parameter Vision Transformer (ViT) encoder that is integrated into the base architecture, not a separate pipeline step. The model can describe images, answer visual questions about photographs and charts, and extract printed text from images. Output is text only; the model cannot generate images or audio. Function calling and structured JSON output are supported via Mistral's OpenAI-compatible tool-use schema, enabling integration with tool-calling agent frameworks. Audio and video inputs are not supported; voice applications require a separate automatic speech recognition layer before passing text to the model. The model supports multilingual input across 11 languages including French, German, Spanish, Italian, Japanese, Korean, and Chinese.

Question 7

Does Ministral 3 3B train on user data?

Accepted Answer

API inputs processed through Mistral's la Plateforme are not used for model training according to Mistral's published data handling policy. Abuse monitoring retention may apply to flagged inputs. Self-hosted deployments under Apache 2.0 process all data locally; no data leaves the deployment environment, making fully air-gapped deployments trivial. Mistral AI is headquartered in Paris and operates as a GDPR-compliant European company; API traffic via la Plateforme routes through European infrastructure by default. Enterprise-level data processing agreements and zero-retention options are available through Mistral's commercial channel for teams with strict compliance requirements. SOC 2 Type II and HIPAA certifications at the model-API tier are not separately documented in Mistral's public trust resources.

Question 8

Who is Ministral 3 3B best for and who should avoid it?

Accepted Answer

Ministral 3 3B is the right pick for mobile app developers, IoT engineers, and cost-sensitive pipeline operators who need a self-hostable multimodal model under 4GB with Apache 2.0 freedom. High-volume document classification, OCR-assisted extraction, and translation workloads benefit most from the $0.10/1M flat rate, saving 80%+ on output costs versus GPT-4o Mini. Offline and privacy-first applications benefit from the fully air-gapped Apache 2.0 self-hosting path. Teams that need strong math reasoning or verified coding benchmarks should use the Ministral 3 14B Reasoning variant, which achieves 85% on AIME 2025. Real-time voice assistants should avoid this model because there is no native audio input and adding a separate ASR step increases total latency. For multi-step agentic coding loops requiring sustained reasoning across many tool calls, Mistral Large 3 or the Ministral 3 14B are better choices.

Ministral 3 3B: 256K Context, Open-Source, $0.10/1M (2025)

About Ministral 3 3B

Pricing

Key Features

Pros

Cons

Benchmarks

Frequently Asked Questions