Mistral OCR 4: 85.20 OlmOCRBench, 170 Languages (2026)
Mistral OCR 4 extracts text, bounding boxes, and block types from PDFs across 170 languages. Tops OlmOCRBench at 85.20. Released June 2026, $4 per 1,000 pages.
Mistral OCR 4, released June 23, 2026, tops OlmOCRBench at 85.20 with a 72% human-preference win rate across 600 real-world documents in 170 languages. Priced at $4 per 1,000 pages ($2 via Batch API), it adds bounding boxes, typed block classification, and per-word confidence scores, running as a single container for on-premise enterprise deployments.
Mistral OCR 4, released June 23, 2026 by Mistral AI, leads OlmOCRBench at 85.20 and achieves a 72% human-preference win rate across 600 real-world documents in 12 or more languages. Priced at $4 per 1,000 pages via API ($2 with Batch), it adds bounding boxes, typed block classification, and per-word confidence scores across 170 languages, and ships as a single container for on-premise enterprise deployments.
Provider: Mistral AI · Family: Mistral OCR
Input modalities: image, pdf · Output: text
About Mistral OCR 4
Mistral OCR 4 is a specialized document intelligence model developed by Mistral AI and released on June 23, 2026. Unlike Mistral's general-purpose language models, OCR 4 is purpose-built for extracting structured content from PDFs, scanned images, and multi-page documents. It is the fourth generation of Mistral's optical character recognition lineup, following OCR 3 (December 2025), OCR 2, and the original Mistral OCR released in early 2025. The model adds three structural primitives that prior generations did not provide together: pixel-level bounding boxes that localize every text block, typed block classification that labels each region (title, table, equation, signature, and more), and per-word confidence scores that flag low-quality extractions before they reach downstream systems. It supports 170 languages across 10 language groups and deploys as a single container for on-premise use by regulated enterprises. On public evaluation benchmarks, Mistral OCR 4 tops the OlmOCRBench leaderboard with a score of 85.20 and scores 93.07 on OmniDocBench. On Mistral's internal Crawl Multilingual evaluation it achieves 0.98 (98% accuracy). In human preference evaluations, independent annotators preferred OCR 4 over every competing OCR system tested, averaging a 72% win rate across more than 600 real-world documents covering 12 or more languages. Third-party benchmarks reinforce the lead: Rogo, an AI platform for financial services, reports achieving equivalent accuracy at roughly 8 times lower cost and 17 times lower latency compared to leading agentic document parsers on financial QA datasets. On raw throughput, the model processes up to 2,000 pages per minute on a single GPU node. Mistral OCR 4 does not operate on a traditional token-based context window. Processing is document-page-based: each page is analyzed independently and returns a structured per-page object containing extracted text in reading order, bounding boxes at the block level, block type labels, and confidence scores. Multi-page documents are supported, and the pages parameter accepts both integer arrays and range strings (for example, "0,2-4" selects pages 0, 2, 3, and 4). For very large document sets, the Batch API processes pages asynchronously at half the cost. The Document AI layer, powered by mistral-small-2603, applies a user-supplied JSON schema to the full document output to produce schema-conformant extraction across the entire file in a single API call. OCR 4 accepts PDF and image inputs including JPEG, PNG, and TIFF formats. Output is always structured: extracted text in reading order, per-block bounding boxes, typed block classification labels (title, table, equation, signature, list, image, caption, header, footer, code, references, aside text), and inline confidence scores per page and per word. The include_blocks API flag enables the full structural output; without it, the API returns extracted text and markdown only. The Document AI add-on, activated at the same endpoint with a JSON schema and optional prompt, routes OCR 4 output through mistral-small-2603 for schema-conformant JSON extraction. Downstream systems receive three primitives that plain OCR never supplied: location (bounding box), role (block type), and reliability (confidence score). Mistral OCR 4 uses per-page pricing rather than per-token billing. The standard API costs $4 per 1,000 pages. The Batch API cuts that to $2 per 1,000 pages, a 50% discount for asynchronous workloads with results returned within hours. The Document AI layer, combining OCR 4 extraction with the mistral-small-2603 structuring step, costs $5 per 1,000 pages via Mistral Studio. For a team processing 100,000 pages per month, the standard API cost is $400 and the Batch API cost is $200. Google Document AI charges approximately $5 per 1,000 pages for comparable multilingual extraction, making OCR 4 cheaper on the standard tier and matching Google's price point on the Document AI tier. Self-hosted enterprise pricing is negotiated directly with Mistral sales. OCR 4 is live on three managed platforms: Mistral's own la Plateforme API, Amazon SageMaker, and Microsoft Foundry (Azure's model catalog). All three use an API key or cloud IAM for authentication. For regulated industries where documents cannot leave the organization's infrastructure, OCR 4 ships as a single Docker container for on-premise or private cloud deployment. This is a key differentiator versus AWS Textract and Google Document AI, which are cloud-only services. Enterprise self-hosting requires a commercial agreement with Mistral; pricing is not published. The OCR API is documented at docs.mistral.ai under Studio and Document Processing (endpoint: /v1/ocr). Mistral positions OCR 4 as a document extraction tool with explicitly stated out-of-scope uses. The model is not intended for medical diagnosis, legal judgment, or high-stakes financial decisions without human review. It is unsuited to real-time safety-critical systems or non-document inputs such as raw audio or video. Mistral has signed the EU GPAI Code of Practice. Mistral holds SOC 2 and ISO 27001 certifications. GDPR compliance is supported via EU-based hosting in Paris and a Data Processing Addendum that covers Regulation (EU) 2016/679. The EU AI Act's enforcement provisions take effect August 2, 2026; Mistral engaged proactively with the regulatory process ahead of this deadline. OCR 4 is strongest for document-heavy enterprise workflows: financial statement extraction, legal contract parsing, medical record digitization, government form processing, and RAG pipeline document ingestion where source attribution requires bounding boxes for citation highlighting. Teams processing documents in low-resource or non-Latin scripts benefit from the 170-language coverage. Regulated industries that cannot route sensitive documents to cloud-only OCR services can use the self-hosted container. OCR 4 is not the right choice for real-time sub-second document processing, as the Batch API is asynchronous and the standard API is not rated for sub-100ms turnaround. It is also not a general-purpose vision model: it cannot describe scenes, caption images, or answer questions about charts without the Document AI layer. For teams needing single-step document QA without infrastructure setup, the Document AI tier at $5 per 1,000 pages provides both extraction and language model reasoning in one call. Mistral has not published a system card with a specific training data cutoff date for OCR 4. The model appears trained on a multilingual scanned document corpus spanning 170 languages. API inputs are not used to train the model by default. Self-hosted deployments provide full data sovereignty: no document leaves the organization's infrastructure. Mistral holds SOC 2 Type II and ISO 27001 certifications and is GDPR-compliant with EU data residency in Paris. The Data Processing Addendum at legal.mistral.ai governs API usage. For enterprise compliance procurement, Mistral has structured the offering around standard SLA and DPA documentation. OCR 4 is a significant step up from OCR 3, released December 17, 2025. OCR 3 achieved a 74% win rate over OCR 2 on forms, scanned documents, tables, and handwriting, priced at $2 per 1,000 pages. OCR 4 adds the three structural primitives (bounding boxes, block classification, confidence scores) that OCR 3 lacked entirely, with pricing that doubled to $4 per 1,000 pages to reflect the additional structured data per page. Bounding boxes were the most-requested missing feature before OCR 4. The trajectory points toward Mistral expanding the Document AI layer with figure extraction and table-to-spreadsheet output in future releases, positioning OCR 4 as an enterprise document ingestion layer rather than a standalone OCR tool.
Pricing
$4 per 1,000 pages via standard API. $2 per 1,000 pages via Batch API (50% discount, asynchronous). $5 per 1,000 pages via Document AI in Mistral Studio. Self-hosted enterprise pricing available on request from Mistral sales.
Key Features
- Bounding Boxes: Localizes every text block with pixel-level coordinates, enabling citation highlighting, source attribution, and reliable data pipelines that know exactly where each word sits on the page.
- Typed Block Classification: Labels each extracted region by structural role: title, table, equation, signature, list, image, caption, header, footer, code, references, and aside text. Fourteen block types in reading order.
- Per-Word Confidence Scores: Returns inline confidence scores per page and per word so downstream systems can flag low-quality extractions before they corrupt RAG indexes, databases, or compliance records.
- 170-Language Support: Covers 170 languages across 10 language groups, including low-resource scripts where competing services (Google Document AI, AWS Textract) provide incomplete or absent support.
- Self-Hosted Single-Container Deployment: Ships as a single Docker container for on-premise or private cloud deployment, giving regulated industries full data sovereignty without any document leaving their infrastructure.
- Document AI Add-On: Routes OCR 4 output through mistral-small-2603 to produce schema-conformant JSON from any document type, eliminating the need for custom post-processing prompt engineering.
Pros
- Leads OlmOCRBench at 85.20 and achieves a 72% human-preference win rate across 600 real-world documents, making it the top public OCR benchmark performer as of June 2026.
- Self-hosted single-container option serves regulated industries that cannot send documents to third-party cloud APIs, a capability Google Document AI and AWS Textract do not offer.
- 170-language coverage across 10 language groups, including low-resource scripts, with consistent bounding box and confidence score output regardless of script complexity.
Cons
- Pricing doubled from OCR 3 ($2/1K pages) to OCR 4 ($4/1K pages), a significant cost increase for teams with high-volume pipelines already running on OCR 3.
- Structural output (bounding boxes, block types, confidence scores) is opt-in via include_blocks: true and not returned by default, creating integration friction for teams expecting structured output out of the box.
- Self-hosted enterprise deployment requires a commercial sales agreement and is not publicly available from Docker Hub, adding weeks to procurement timelines for regulated industries.
Benchmarks
Frequently Asked Questions
What is Mistral OCR 4 and who built it?
Mistral OCR 4 is a proprietary document intelligence model developed by Mistral AI and released on June 23, 2026. It is the fourth generation of Mistral's optical character recognition lineup, following OCR 3 (December 2025), OCR 2, and the original Mistral OCR (March 2025). Unlike Mistral's general-purpose language models, OCR 4 is purpose-built for extracting structured content from PDFs, scanned images, and multi-page documents, with three structural primitives: bounding boxes that localize every text block, typed block classification that labels each region by structural role (title, table, equation, signature, and ten more types), and per-word confidence scores. The model supports 170 languages across 10 language groups and ships as a single container for on-premise deployment. On the OlmOCRBench public leaderboard it scores 85.20, the highest reported result as of June 2026. It achieves a 72% human-preference win rate over competing OCR and document-AI systems across 600 real-world documents in 12 or more languages. The model is accessible via Mistral's la Plateforme API, Amazon SageMaker, and Microsoft Foundry.
How much does Mistral OCR 4 cost per 1,000 pages?
Mistral OCR 4 uses per-page billing rather than per-token pricing. The standard API costs $4 per 1,000 pages, providing real-time extraction with bounding boxes, block types, and confidence scores. The Batch API costs $2 per 1,000 pages, a 50% discount for asynchronous workloads where results are returned within hours rather than seconds. The Document AI add-on, which adds schema-conformant JSON extraction via mistral-small-2603, costs $5 per 1,000 pages through Mistral Studio. For a team processing 100,000 pages per month, the standard API cost is $400 and the Batch API cost is $200. A team processing 1 million pages monthly on the Batch API spends $2,000, saving $2,000 compared to the standard tier. Self-hosted enterprise pricing requires a commercial agreement with Mistral sales and is not publicly listed. Google Document AI charges approximately $5 per 1,000 pages for similar multilingual extraction, making OCR 4 cheaper on the standard tier and matching Google's price on the Document AI tier.
What is Mistral OCR 4's context window and how does it handle large documents?
Mistral OCR 4 does not use a traditional token context window. Processing is page-based: each page is analyzed independently and returns a structured object with extracted text, bounding boxes, block types, and confidence scores. Multi-page documents are fully supported, and the pages parameter accepts both integer arrays and range strings (for example, '0,2-4' selects pages 0, 2, 3, and 4). There is no published upper limit on document length, but processing cost scales linearly with page count at $4 per 1,000 pages. For very large document sets, the Batch API handles asynchronous processing at $2 per 1,000 pages. The Document AI add-on applies a user-supplied JSON schema across the full document output after all pages are extracted, producing a single schema-conformant JSON object regardless of document length. Unlike LLMs, OCR 4 does not suffer from context degradation at long document lengths because each page is processed independently. This makes it well suited for 200-page contracts or 500-page financial filings without the recall degradation seen in long-context LLMs.
How does Mistral OCR 4 compare on benchmarks versus Google Document AI and AWS Textract?
On the OlmOCRBench public leaderboard, Mistral OCR 4 scores 85.20, the top reported score as of June 2026. Google Document AI and AWS Textract do not publish OlmOCRBench or OmniDocBench scores, making direct automated benchmark comparison difficult. In human preference evaluations, OCR 4 wins 72% of pairwise comparisons against competing OCR and document-AI systems across 600 real-world documents in 12 or more languages; Mistral does not name every system tested, but the description covers cloud OCR leaders. On pricing, OCR 4 at $4 per 1,000 pages is cheaper than Google Document AI at approximately $5 per 1,000 pages and comparable to AWS Textract's per-page rate for multi-language extraction. The key differentiator versus both Google and AWS is deployment: OCR 4 ships as a single on-premise container, while Google Document AI and AWS Textract are cloud-only services. For regulated industries where documents cannot leave internal infrastructure, OCR 4 is currently the only API-first OCR model with a self-hosted option at this benchmark level. Rogo reports 17 times lower latency and 8 times lower cost versus leading agentic document parsers on financial QA datasets using OCR 4.
Is Mistral OCR 4 open source or proprietary?
Mistral OCR 4 is proprietary with closed model weights. It is not downloadable from Hugging Face and cannot be run locally without a commercial agreement with Mistral. This contrasts with several of Mistral's language models (Mistral Small 4, Devstral, Mistral Large 3) that are released under Apache 2.0 as open-source weights. OCR 4 is accessible via three managed platforms: Mistral's la Plateforme API (with an API key), Amazon SageMaker (with AWS IAM), and Microsoft Foundry in Azure (with an Azure API key). For organizations needing on-premise deployment, Mistral offers a self-hosted enterprise option as a single Docker container, but this requires a commercial agreement with Mistral sales and is not publicly available from Docker Hub. The self-hosted option provides data sovereignty (no documents leave your infrastructure) but does not grant access to the model weights themselves. Pricing for the self-hosted enterprise tier is not publicly listed and requires a sales engagement.
What structured outputs does Mistral OCR 4 return?
Mistral OCR 4 returns several levels of structured output depending on which API parameters are set. By default, without include_blocks: true, the API returns extracted text and markdown-formatted content. With include_blocks: true, each page returns a blocks array with paragraph-level bounding boxes (pixel coordinates marking where each region sits on the page) and a structural label from one of 14 block types: text, title, list, table, image, equation, caption, code, references, aside_text, header, footer, and signature. Every block also carries an inline confidence score indicating extraction reliability. The Document AI add-on, activated at the same endpoint with a JSON schema parameter, routes OCR 4 output through mistral-small-2603 to produce schema-conformant JSON, allowing users to extract named fields (like invoice total, contract parties, or form answers) without writing custom parsing logic. This three-level output (extracted text, bounding boxes plus block types, schema-conformant JSON) covers most enterprise document extraction use cases from raw digitization to structured data pipelines.
Does Mistral OCR 4 train on user data?
Mistral OCR 4 does not train on API inputs by default. API usage does not feed back into model training. Mistral's Data Processing Addendum, available at legal.mistral.ai/terms/data-processing-addendum, governs how input data is handled and covers compliance with GDPR (Regulation EU 2016/679). Mistral holds SOC 2 Type II and ISO 27001 certifications. GDPR compliance is supported via EU-based hosting in Paris, and Mistral offers data residency options in both the EU and the US. For organizations with stricter requirements, the self-hosted enterprise container provides full data sovereignty: no document is transmitted outside the organization's infrastructure, eliminating any vendor-side data handling question entirely. HIPAA eligibility is not confirmed for OCR 4 as of June 2026. For enterprise procurement, Mistral provides standard DPA documentation suitable for regulated industry IT security review. The EU AI Act's enforcement provisions take effect August 2, 2026, and Mistral has signed the GPAI Code of Practice as a proactive compliance measure.
Who is Mistral OCR 4 best for and who should avoid it?
Mistral OCR 4 is best suited to enterprise data engineering teams building document ingestion pipelines for RAG, search, or compliance workflows that need source-grounded citation (bounding boxes are required for this). It is the strongest choice for regulated industries (healthcare, finance, legal, government) that cannot route sensitive documents to cloud-only OCR APIs and need an on-premise container option. Multilingual organizations processing documents in low-resource or non-Latin scripts benefit from the 170-language coverage, which outpaces Google Document AI and AWS Textract on script breadth. Teams should avoid OCR 4 for real-time sub-second document processing: the Batch API is asynchronous and the standard API is not optimized for sub-100ms turnaround. AWS Textract or Google Document AI are better choices if latency is the primary constraint. OCR 4 is also the wrong tool for general-purpose vision tasks: it cannot describe photographs, caption images, or answer visual questions about charts without the Document AI layer; use Pixtral or Mistral Small 4 for those tasks. For low-volume or ad hoc document conversion, open-source alternatives like Tesseract or Surya are free and sufficient.