Ministral 3 3B: 256K Context, Open-Source, $0.10/1M (2025)

Ministral 3 3B by Mistral AI (Dec 2025): Apache 2.0 open-weight edge model, 3.4B params, 256K context, vision input, $0.10/1M tokens, fits in 8GB VRAM.

Ministral 3 3B is Mistral AI's smallest production model (released December 2, 2025), with 3.4 billion parameters, a 256K token context window, and integrated vision support under Apache 2.0. Priced at $0.10 per 1M tokens for both input and output, it fits in 8GB of VRAM (FP8) or under 4GB quantized, making it the most accessible open multimodal model at this price point.

Ministral 3 3B, released December 2, 2025 by Mistral AI, is an Apache 2.0 open-weight multimodal edge model with 3.4 billion parameters and a 256,000 token context window. It is priced at $0.10 per 1M tokens (input and output) and runs in 8GB of VRAM in FP8, or under 4GB with Q4 quantization. It targets on-device and edge inference, supporting vision input via an integrated 410M ViT encoder.

Provider: Mistral AI · Family: Ministral 3

Context window: 256,000 tokens

Input modalities: text, image, tool-calls · Output: text, tool-calls

About Ministral 3 3B

Ministral 3 3B is the smallest member of Mistral AI's Ministral 3 family, released December 2, 2025. It uses a dense Transformer architecture combining 3.4 billion language decoder parameters with a 410 million parameter Vision Transformer (ViT) encoder. Released under Apache 2.0, it is freely usable for commercial self-hosting. In Mistral's product lineup it sits below the 8B and 14B Ministral models and far below the Mistral Large 3 MoE at 675B total parameters. The design priority is edge and on-device deployment where 8GB of VRAM or less is available. Mistral has not published a standalone MMLU-Pro or GPQA score for the 3B variant. At the 8B scale in the same family, the base model scores 70.6% on MMLU-Pro; at 14B the base scores 74.2%. Llama 3.2 3B scores approximately 58% on standard MMLU, placing the Ministral 3 3B well above it on general knowledge tasks, though Mistral has not published a direct verified comparison. The Ministral 3 8B Reasoning variant achieves 78.7% on AIME 2025 and 66.8% on GPQA Diamond, giving a sense of the family's ceiling; the 3B reasoning results have not been separately released. The model supports a 256,000 token context window, twice the 128K window of Llama 3.2 3B for the same parameter class. It uses Grouped Query Attention (GQA) with 32 query heads and 8 key-value heads, which reduces KV cache memory at long contexts. Mistral has not published a formal needle-in-haystack long-context recall eval for the 3B. In practice, maximum output per call is not separately documented in the API; most use cases stay under 4,096 output tokens. Ministral 3 3B accepts text and image inputs through its integrated ViT encoder. It can describe images, answer visual questions about photographs and charts, and extract printed text from images. Output is text only; image generation is not supported. Function calling and structured JSON output are available via Mistral's OpenAI-compatible tool-use schema. Audio and video inputs are not supported. The model is multilingual across European languages (French, German, Spanish, Italian, Portuguese, Dutch) and East Asian languages (Japanese, Korean, Chinese) alongside English. At $0.10 per 1M input tokens and $0.10 per 1M output tokens, Ministral 3 3B is the cheapest model in Mistral's API catalog. GPT-4o Mini charges $0.15 input and $0.60 output per 1M, making the Ministral 3B around 40% cheaper on input and 83% cheaper on output. A workload generating 1M tokens of responses to 1M tokens of user messages costs $0.20 total. Self-hosted deployments incur only compute costs under Apache 2.0. No batch API discount is separately advertised. API access runs through Mistral's la Plateforme at api.mistral.ai with model ID ministral-3-3b-latest or ministral-3-3b-2512. Weights are available on Hugging Face at mistralai/Ministral-3-3B-Instruct-2512 (instruct), mistralai/Ministral-3-3B-Base-2512 (base), and mistralai/Ministral-3-3B-Reasoning-2512 (reasoning). In FP8 format the model fits in 8GB of VRAM; Q4 quantization via GGUF reduces memory below 4GB for deployment on Apple M-series chips and RTX 30/40 series consumer GPUs. The recommended self-hosting framework is vLLM; llama.cpp and LM Studio support the model via standard GGUF conversion. Safety alignment uses supervised fine-tuning plus RLHF on the instruct variant. Mistral's approach is lighter on refusals than comparable Anthropic or OpenAI models, consistent with the company's position that safety guardrails should be configurable by the deploying organization. No separate system card or red team report has been published for the 3B model. API inputs via la Plateforme are subject to Mistral's abuse monitoring retention policy; inputs are not used for model training. Fully air-gapped self-hosted deployments have no data exposure. Ministral 3 3B is best for mobile app developers, IoT engineers, and cost-sensitive pipeline operators who need a multimodal model under 8GB with no licensing fees. High-volume document classification, extraction, and translation workloads benefit most from its $0.10/1M flat rate. Teams that need strong math or code generation should step up to the Ministral 3 14B Reasoning variant, which scores 85% on AIME 2025. Real-time voice assistant builders should avoid this model because there is no native audio input. For very long agentic coding loops, Mistral Large 3 or the Ministral 3 14B are better suited. The Ministral 3 3B was trained on a multilingual dataset with a knowledge cutoff in or around late 2025, consistent with other Ministral 3 family models. Mistral AI is headquartered in Paris and operates as a GDPR-compliant European company; API traffic via la Plateforme routes through European infrastructure by default. Enterprise data handling agreements and zero-retention options are available through Mistral's commercial channel.

Pricing

$0.10 per 1M input tokens and $0.10 per 1M output tokens via la Plateforme. No cached-input rate published. Self-hosted under Apache 2.0 incurs only compute cost.

Key Features

Pros

Cons

Benchmarks

Frequently Asked Questions

What is Ministral 3 3B and who built it?

Ministral 3 3B is a small-scale multimodal foundation model built by Mistral AI, a Paris-based AI lab founded in April 2023. Released December 2, 2025 as the smallest member of the Ministral 3 family, it combines 3.4 billion language decoder parameters with a 410 million parameter Vision Transformer (ViT) encoder for image understanding. The model uses a dense Transformer architecture with Grouped Query Attention (GQA), distinguishing it from the sparse Mixture-of-Experts approach used by Mistral Large 3. It is released under the Apache 2.0 license, making it freely usable for commercial products without royalties. Within Mistral's lineup, the 3B sits below the Ministral 3 8B and 14B, which offer stronger reasoning and higher benchmark scores. The key design goal is fitting into constrained hardware budgets, starting from 4GB of VRAM with Q4 quantization.

How much does Ministral 3 3B cost per 1M tokens?

Ministral 3 3B is priced at $0.10 per 1 million input tokens and $0.10 per 1 million output tokens via Mistral's la Plateforme API, the lowest published rate in Mistral's catalog. There is no separate cached-input discount; the flat $0.10 rate applies to all tokens. For comparison, GPT-4o Mini charges $0.15 input and $0.60 output per 1M, making Ministral 3 3B roughly 40% cheaper on input and 83% cheaper on output for high-output workloads. A pipeline generating 1 million output tokens per day costs approximately $0.10 on Ministral 3 3B versus $0.60 on GPT-4o Mini. Self-hosted deployments under Apache 2.0 carry zero per-token fees; only GPU compute costs apply, typically under $0.02 per 1M tokens on an 8GB consumer GPU. Batch API pricing is not separately advertised for this model tier.

What is Ministral 3 3B's context window and max output?

Ministral 3 3B supports a 256,000 token context window, double the 128K of Llama 3.2 3B for the same parameter class. The model uses Grouped Query Attention (GQA) with 32 query heads and 8 key-value heads to reduce KV cache memory usage at long context lengths. Mistral has not published a formal needle-in-haystack long-context recall evaluation for the 3B model specifically. Maximum output tokens per API call are not separately documented in Mistral's public API reference for this model. In practice, chat completions and extraction tasks typically generate under 4,096 output tokens. For comparison, the Ministral 3 8B and 14B share the same 256K context window, so upgrading for context alone is not necessary.

How does Ministral 3 3B compare on benchmarks vs Llama 3.2 3B?

Mistral has not published a standalone MMLU-Pro, GPQA Diamond, or AIME score for the Ministral 3 3B instruct variant. At the 8B scale in the same family, the base model scores 70.6% on MMLU-Pro and the reasoning variant reaches 78.7% on AIME 2025; at 14B, the base scores 74.2% MMLU-Pro and the reasoning variant hits 85% AIME 2025. Llama 3.2 3B scores approximately 58% on standard MMLU, suggesting the Ministral 3B should outperform it on general knowledge, though no direct verified comparison has been published. Ministral 3 3B also adds a vision encoder not present in Llama 3.2 3B, making it the more capable choice for image-understanding tasks. Teams that need verified benchmark evidence before deployment should benchmark both models on their specific task, or consider stepping up to the Ministral 3 8B which has more complete published evals.

Is Ministral 3 3B open source or proprietary?

Ministral 3 3B is open-source under the Apache 2.0 license, one of the most permissive available: it allows free commercial use, modification, and redistribution without royalties or attribution restrictions beyond the license header. The weights are available on Hugging Face at mistralai/Ministral-3-3B-Instruct-2512 (instruct), mistralai/Ministral-3-3B-Base-2512 (base), and mistralai/Ministral-3-3B-Reasoning-2512 (reasoning chain-of-thought variant). In FP8 format the model requires 8GB of VRAM; Q4 quantization (GGUF format) reduces this to under 4GB, compatible with Apple M2/M3/M4 unified memory and NVIDIA RTX 30/40 series consumer GPUs. The recommended self-hosting framework is vLLM; llama.cpp and LM Studio also support the GGUF-converted model. For those who prefer not to self-host, Mistral's la Plateforme API provides access at $0.10/1M tokens with no minimum commitment.

What modalities does Ministral 3 3B support?

Ministral 3 3B accepts text and image inputs. Images are processed through a 410 million parameter Vision Transformer (ViT) encoder that is integrated into the base architecture, not a separate pipeline step. The model can describe images, answer visual questions about photographs and charts, and extract printed text from images. Output is text only; the model cannot generate images or audio. Function calling and structured JSON output are supported via Mistral's OpenAI-compatible tool-use schema, enabling integration with tool-calling agent frameworks. Audio and video inputs are not supported; voice applications require a separate automatic speech recognition layer before passing text to the model. The model supports multilingual input across 11 languages including French, German, Spanish, Italian, Japanese, Korean, and Chinese.

Does Ministral 3 3B train on user data?

API inputs processed through Mistral's la Plateforme are not used for model training according to Mistral's published data handling policy. Abuse monitoring retention may apply to flagged inputs. Self-hosted deployments under Apache 2.0 process all data locally; no data leaves the deployment environment, making fully air-gapped deployments trivial. Mistral AI is headquartered in Paris and operates as a GDPR-compliant European company; API traffic via la Plateforme routes through European infrastructure by default. Enterprise-level data processing agreements and zero-retention options are available through Mistral's commercial channel for teams with strict compliance requirements. SOC 2 Type II and HIPAA certifications at the model-API tier are not separately documented in Mistral's public trust resources.

Who is Ministral 3 3B best for and who should avoid it?

Ministral 3 3B is the right pick for mobile app developers, IoT engineers, and cost-sensitive pipeline operators who need a self-hostable multimodal model under 4GB with Apache 2.0 freedom. High-volume document classification, OCR-assisted extraction, and translation workloads benefit most from the $0.10/1M flat rate, saving 80%+ on output costs versus GPT-4o Mini. Offline and privacy-first applications benefit from the fully air-gapped Apache 2.0 self-hosting path. Teams that need strong math reasoning or verified coding benchmarks should use the Ministral 3 14B Reasoning variant, which achieves 85% on AIME 2025. Real-time voice assistants should avoid this model because there is no native audio input and adding a separate ASR step increases total latency. For multi-step agentic coding loops requiring sustained reasoning across many tool calls, Mistral Large 3 or the Ministral 3 14B are better choices.

Visit Ministral 3 3B Official Page