Name: Mistral Medium 3 Review: 92% HumanEval, $0.40/M (2026)
Brand: Mistral AI
Price: 0.40 USD
Availability: InStock

Question 1

What is Mistral Medium 3 and who built it?

Accepted Answer

Mistral Medium 3 (API name mistral-medium-2505) is a frontier-class multimodal model built by Mistral AI, a Paris-based lab, and released on May 7, 2025. It sits in the middle of Mistral's lineup, below the Large tier, but was marketed under the tagline 'medium is the new large' for delivering near-flagship performance at a fraction of the cost. Mistral has not disclosed the parameter count or whether the architecture is dense or mixture-of-experts. On launch benchmarks it scored 92.1% on HumanEval, 77.2% on MMLU-Pro, and 91.0% on Math500 Instruct, putting it ahead of GPT-4o and roughly on par with Claude Sonnet 3.7 on coding. It was designed to compete with mid-to-large proprietary models like GPT-4o, Claude Sonnet 3.7, and Llama 4 Maverick while costing a fraction as much per token. Mistral positioned it as the model that brings flagship coding and document-understanding ability into a deployable, self-hostable package.

Question 2

How much does Mistral Medium 3 cost per 1M tokens?

Accepted Answer

Mistral Medium 3 costs $0.40 per 1 million input tokens and $2.00 per 1 million output tokens on Mistral's La Plateforme API, Amazon Bedrock, and Azure AI Foundry. Mistral marketed this as an 8x cost reduction versus Claude Sonnet 3.7, which costs $3.00 input and $15.00 output per 1 million tokens. Artificial Analysis lists a blended price of roughly $0.56 per 1 million tokens using a typical 7:2:1 input:output:cache ratio. A document OCR and QA pipeline processing 500K input and 50K output tokens per day would cost about $0.30 per day. A coding agent loop running 2M input and 400K output tokens per day would cost roughly $1.60 per day. No cached-input discount is published for this model. For teams that can self-host, the model runs on four GPUs and above, trading the per-token fee for infrastructure cost.

Question 3

What is Mistral Medium 3's context window and max output?

Accepted Answer

Mistral Medium 3 has a 128,000-token context window, confirmed in Mistral's official model card. A separate maximum output token limit is not published; output tokens share the same 128K budget as input tokens. On RULER 128K, a long-context retrieval benchmark, the model scored 0.902, ahead of GPT-4o's reported 0.889 at the same context length, indicating reliable recall near the top of its context window. There is no documented sliding-window behavior or separate extended-context tier for this model. For multi-document workloads, Mistral recommends chunking very large files to leave headroom for output generation within the 128K budget. Compared to Medium 3.5's later 256K window, Medium 3 offers half the context but was the largest in its tier at launch in May 2025.

Question 4

How does Mistral Medium 3 compare on benchmarks vs Claude Sonnet 3.7?

Accepted Answer

On HumanEval, Mistral Medium 3 scored 92.1%, matching Claude Sonnet 3.7's reported score on the same benchmark. On Math500 Instruct, Medium 3 scored 91.0%, and on MMLU-Pro it scored 77.2%, both competitive with Sonnet 3.7-class models. However, on GPQA Diamond, a graduate-level science reasoning benchmark, Medium 3 scored only 57.1%, a notable gap versus frontier reasoning-tuned models released later in 2025 and 2026. Mistral did not publish SWE-bench Verified, AIME, or ARC-AGI scores for Medium 3, while Anthropic has published SWE-bench numbers for Claude models, making a direct agentic-coding comparison impossible from public data alone. In practice, the 35-point GPQA gap means Medium 3 is reliable for coding and document tasks but more likely to make mistakes on multi-step scientific or logical reasoning chains than reasoning-focused competitors. The headline result is that Medium 3 matches Sonnet 3.7 on raw coding pass rates at about 1/8th the price, but trails on hard reasoning.

Question 5

Is Mistral Medium 3 open source or proprietary?

Accepted Answer

Mistral Medium 3 is proprietary and API-only; its weights are not published. It is licensed under Mistral's Commercial License and accessed via Mistral's La Plateforme API, Amazon Bedrock, Amazon SageMaker, and Azure AI Foundry, with Google Cloud Vertex AI and IBM watsonx listed as additional deployment targets. For organizations needing on-premises or VPC control, Mistral states the model can be self-hosted on four GPUs and above, with support for continuous pretraining and fine-tuning, but this still requires a commercial agreement with Mistral rather than an open download. This differs from some other models in the Mistral 3 family (such as the Apache 2.0-licensed Mistral 3 small/dense models and the open-weight Mistral Medium 3.5, released under a modified MIT license restricting commercial use) which are downloadable from Hugging Face. There is no commercial-use-free path to run Mistral Medium 3 itself.

Question 6

What modalities does Mistral Medium 3 support?

Accepted Answer

Mistral Medium 3 accepts text and image input and produces text output. On DocVQA, a document visual question-answering benchmark, it scored 0.953, and on MMMU, a multimodal multitask understanding benchmark, it scored 0.661, both indicating solid image and document comprehension. The model supports function calling and tool use with structured JSON output, fill-in-the-middle code completions, document OCR, document Q&A, and Mistral's Agents and Conversations APIs for multi-step agentic workflows. It does not support audio input or output, or video input; those modalities are handled by separate Mistral models such as Voxtral. Compared to GPT-4o, which supports native audio in the same model, Medium 3 is text-and-image only, so voice applications require pairing it with a separate transcription or speech model.

Question 7

Does Mistral Medium 3 train on user data?

Accepted Answer

Mistral's published policy states that API inputs and outputs sent to Mistral Medium 3 are not used to train future models unless a customer explicitly opts in. Mistral AI is headquartered in Paris and operates under EU data protection law by default, giving the model a GDPR-aligned baseline; data_residency_options for the standard API are EU-based. Mistral has not published SOC 2 Type II or ISO 27001 certification details specific to this model's hosting, and the model is not marketed as HIPAA-eligible. Under the EU AI Act, Mistral Medium 3 falls under the general-purpose AI model (GPAI) category, which carries documentation and transparency obligations for the provider. On Amazon Bedrock or Azure AI Foundry, data handling follows each cloud provider's standard tenant-isolation and retention policies layered on top of Mistral's base no-training commitment. No dedicated system card with HarmBench or jailbreak-resistance scores has been published for this specific model.

Question 8

Who is Mistral Medium 3 best for and who should avoid it?

Accepted Answer

Mistral Medium 3 is best for engineering teams building cost-sensitive coding assistants who need near-Claude-Sonnet-3.7 HumanEval performance at roughly 1/8th the price, document-processing pipelines that need OCR plus long-context retrieval (DocVQA 0.953, RULER 128K 0.902), and enterprises that want a multimodal model they can self-host on four-plus GPUs for VPC or on-premises requirements. It should be avoided for graduate-level scientific or multi-step logical reasoning, where its 57.1% GPQA Diamond score trails reasoning-tuned competitors; teams should look at Medium 3.5 or a dedicated reasoning model instead. It's also a poor fit for latency-sensitive real-time chat, since Artificial Analysis measured 36.8 tokens/sec output speed against a roughly 94.5 tok/s median for comparable models. Finally, any new long-term integration should target Mistral Medium 3.5 rather than Medium 3, since Medium 3 has already been superseded twice (Medium 3.1 in August 2025, Medium 3.5 in April 2026) and carries a published deprecation timeline around May 2026.

Mistral Medium 3

Mistral Medium 3 Review: 92% HumanEval, $0.40/M (2026)

About Mistral Medium 3

Pricing

Key Features

Pros

Cons

Benchmarks

Frequently Asked Questions