DeepSeek V4 Pro: 80.6% SWE-bench, 1M Context, MIT 2026

DeepSeek V4 Pro: 1.6T-param open-source MoE (April 2026), 80.6% SWE-bench Verified with 1M token context under MIT license. $1.74/$3.48 per 1M tokens.

DeepSeek-V4-Pro, released April 24, 2026 by DeepSeek, is a 1.6-trillion-parameter open-source MoE under MIT license scoring 80.6% on SWE-bench Verified (Think Max mode) with a 1M-token context window. Standard pricing is $1.74/$3.48 per 1M input/output tokens, 4-7x cheaper than closed rivals. The leading open-weight model for agentic coding and long-document reasoning as of May 2026.

DeepSeek-V4-Pro is an open-source 1.6-trillion-parameter MoE model released April 24, 2026 by DeepSeek under MIT license. It scores 80.6% on SWE-bench Verified (Think Max mode) and 90.1% on GPQA Diamond. Standard API pricing is $1.74 per 1M input tokens and $3.48 per 1M output tokens. The model supports a 1M-token context window and leads all published open-weight models for agentic coding as of May 2026.

Provider: DeepSeek · Family: DeepSeek V4

Context window: 1,048,576 tokens · Max output: 384,000

Input modalities: text, tool-calls · Output: text, tool-calls

About DeepSeek-V4-Pro

DeepSeek-V4-Pro is the flagship model from DeepSeek, a Chinese AI research company founded in 2023 by High-Flyer hedge fund CEO Liang Wenfeng. Released April 24, 2026 as a preview under the MIT license, V4-Pro is a Mixture-of-Experts transformer with 1.6 trillion total parameters and 49 billion activated per token. It was pre-trained on 33 trillion tokens using a multilingual corpus with emphasis on code, math, scientific text, and agentic execution traces. The model was trained on Huawei Ascend 950PR chips rather than Nvidia hardware, making it the first frontier-scale model to complete training on Huawei silicon. V4-Pro is the larger of two V4-family releases: the companion DeepSeek-V4-Flash uses 284 billion total parameters with 13 billion active, optimized for faster throughput at lower cost. In Think Max mode (maximum reasoning effort), DeepSeek-V4-Pro scores 80.6% on SWE-bench Verified, placing it within 0.2 percentage points of Claude Opus 4.6 and ahead of Gemini 3.1 Pro at 75%. On GPQA Diamond (graduate-level scientific reasoning) it scores 90.1%, behind Gemini 3.1 Pro at 94.3% and trailing Claude on Humanity's Last Exam (37.7% vs Claude's 40.0%). For mathematics, it achieves 92% on MATH-500, 95.2% on HMMT 2026 February, and a perfect 120/120 on Putnam-2025. On LiveCodeBench it scores 93.5, and holds a Codeforces rating of 3,206, ahead of GPT-5.4 at 3,168 and Gemini 3.1 Pro at 3,052. The pattern: V4-Pro leads on competitive coding and math olympiad evaluations; closed models from Google and Anthropic maintain edges on the hardest scientific reasoning tasks. DeepSeek-V4-Pro provides a 1,048,576-token (1 million token) context window with a maximum output of 384,000 tokens per request. The Think Max reasoning mode requires at least 384K tokens of context budget to operate at full capacity. The hybrid attention architecture (Compressed Sparse Attention plus Heavily Compressed Attention) achieves 1M-token processing at just 27% of the single-token inference FLOPs required by DeepSeek-V3.2, and uses only 10% of V3.2's KV cache at that length. HCA applies 128x compression to the sequence before dense attention, eliminating the sparse selection step that becomes expensive at scale. No independent third-party needle-in-haystack evaluation at 1M token depth has been published as of May 2026; efficiency figures are from DeepSeek's own benchmarks in the April 27, 2026 model card. Among current frontier models, Gemini 3.1 Pro also supports 1M tokens; GPT-5.4 defaults to 128K with a 1M preview tier; Claude Opus 4.6 caps at 200K with 64K max output. At launch in April 2026, DeepSeek-V4-Pro is text-only: there is no native image, audio, or video input in the preview release. DeepSeek has signaled that multimodal support is in development, potentially as a V4 Vision extension or a DeepSeek OCRv3 integration. Text tool use and function calling are available through both the OpenAI ChatCompletions interface and an Anthropic-compatible API, with JSON mode confirmed across all eight API hosting providers. Parallel tool calls are supported, enabling multi-step agentic loops where the model issues multiple function calls in a single response. The three reasoning modes (Non-think for speed, Think High for logical analysis, Think Max for maximum depth) are toggled per request via API parameters. Computer use and web browsing are not available in the April 2026 preview. Standard API pricing is $1.74 per 1M cache-miss input tokens and $3.48 per 1M output tokens. A 75% promotional discount runs until May 31, 2026, reducing rates to $0.435/M input and $0.87/M output. Cache-hit input is $0.145/M at standard rates ($0.003625/M on promo), after a 1/10 price reduction applied April 26, 2026. At standard rates, processing a 100K-token research document costs roughly $0.17 input plus $0.35 for a 100K summary output, totaling $0.52. A daily coding agent loop handling 1M input and 200K output runs $1.74 plus $0.70, totaling $2.44/day. A customer-support deployment at 1,000 turns of 2K input and 500 output per turn costs $3.48 plus $1.74 = $5.22/day. At these rates, V4-Pro output costs 4x less than Claude Opus 4.7 ($15/M output) and 7x less than GPT-5.5 ($25/M output), making it the lowest-cost frontier reasoning model per token among public APIs. DeepSeek grants 5 million free tokens to every new API account. DeepSeek-V4-Pro is available through multiple API providers: DeepSeek's direct API, AWS Bedrock, Google Vertex AI, Azure AI Foundry, Fireworks, Together.ai, DeepInfra, Lightning AI, Nebius, SiliconFlow, and Novita. On DeepSeek's own API, generation speed is approximately 29.8 tokens per second; Fireworks leads at 169.9 t/s and Lightning AI at 162.4 t/s, a 5x speed differential across providers. Time to first token on DeepSeek's API averages 2.1 seconds; for Think Max mode the time to first answer token can exceed 2 minutes on complex tasks due to extended chain-of-thought reasoning. Open weights are on Hugging Face at deepseek-ai/DeepSeek-V4-Pro under MIT license, with both base and instruct variants available. Self-hosting requires approximately 865GB at the official FP4+FP8 mixed precision format, making a cluster of 4x H200 141GB GPUs the minimum practical configuration. Community GGUF quantizations (Q4_K_M, Q2_K) are available via Unsloth, but V4-Pro at Q2_K still exceeds 400GB, making self-hosted V4-Pro impractical below a serious GPU cluster. DeepSeek's published safety documentation for V4 is limited compared to Western frontier labs. After a two-stage post-training pipeline (SFT plus GRPO domain cultivation, then unified consolidation via on-policy distillation), V4 undergoes safety-focused alignment tuning including constitutional-style guidelines and multi-language safety alignment. The April 27, 2026 model card states that sensitive personal data, credit card numbers, and identification information are excluded from training data. No external red-team partners are disclosed. The MIT license enables any party to download and modify the weights, including removing safety training, a categorically different risk profile from API-only closed models. The direct DeepSeek API does not hold SOC 2 Type II, ISO 27001, or HIPAA certification; teams with data compliance requirements should deploy through AWS Bedrock or Azure AI Foundry. Teams running agentic coding workflows (V4-Pro leads SWE-bench at 80.6% Think Max and Codeforces at 3,206), long-document analysis needing 1M token context, and cost-sensitive production workloads at $1.74/$3.48 per 1M tokens have the clearest use case for V4-Pro. Open-source teams requiring fine-tuning, private inference, or air-gapped deployment benefit from the MIT license and downloadable weights. Teams building vision or multimodal applications should not use V4-Pro at launch (text-only) and should use GPT-5.4, Gemini 3.1 Pro, or Claude Opus 4.6 instead. Voice-first applications are ruled out by the absence of audio I/O. For the hardest scientific reasoning tasks, Gemini 3.1 Pro (GPQA Diamond 94.3%) outperforms V4-Pro (90.1%) by 4.2 points. DeepSeek-V4-Pro was trained on 33 trillion tokens, more than double V3's 14.8 trillion, with emphasis on long documents and agentic execution traces. Post-training uses two stages: independent domain-expert cultivation through SFT and GRPO, followed by unified model consolidation via on-policy distillation. The Muon optimizer (Momentum plus Orthogonalization) replaces standard AdamW; MoE expert weights use FP4 precision and most other parameters use FP8, reducing memory footprint. Training cutoff date has not been publicly disclosed. All training was executed on Huawei Ascend 950PR chips with no Nvidia hardware in the compute stack. DeepSeek-V4-Pro launched April 24, 2026 after three delays spanning four months since the December 2025 preview window target. Compared to V3.2, V4 raises SWE-bench Verified from 67.8% to 80.6% (Think Max, +12.8 points), LiveCodeBench from 74.1 to 93.5 (+19.4 points), and extends the context window from 128K to 1M tokens. The legacy API aliases deepseek-chat (routing to V4-Flash non-thinking) and deepseek-reasoner (routing to V4-Flash thinking) are deprecated with full access removal on July 24, 2026 at 15:59 UTC. The V4 series marks DeepSeek's first model line trained entirely on Huawei Ascend chips. No stable (non-preview) release date has been announced as of May 2026.

Pricing

Standard: $1.74/$3.48 per 1M cache-miss input/output. Promotional rate (75% off, until 2026-05-31): $0.435/$0.87 per 1M. Cache-hit input $0.145/M standard ($0.003625/M promo). New accounts receive 5M free tokens. No batch API; prompt caching is the primary cost optimization lever.

Key Features

Pros

Cons

Benchmarks

Frequently Asked Questions

What is DeepSeek-V4-Pro and who built it?

DeepSeek-V4-Pro is the flagship model from DeepSeek, a Chinese AI research company founded in 2023 by High-Flyer hedge fund CEO Liang Wenfeng. Released April 24, 2026 as a preview under the MIT license, it is a Mixture-of-Experts transformer with 1.6 trillion total parameters and 49 billion activated per token, pre-trained on 33 trillion tokens. The hybrid attention architecture combines Compressed Sparse Attention and Heavily Compressed Attention, requiring only 27% of the inference FLOPs of its predecessor DeepSeek-V3.2 at 1M-token context. In Think Max mode it scores 80.6% on SWE-bench Verified (within 0.2 points of Claude Opus 4.6), 90.1% on GPQA Diamond, and holds a Codeforces rating of 3,206, the highest of any publicly ranked model as of May 2026. The model was trained entirely on Huawei Ascend 950PR chips, the first frontier-scale model to complete training on Huawei silicon. V4-Pro sits at the top of DeepSeek's lineup above V4-Flash (284B total, 13B active), the faster companion for throughput-sensitive workloads. Standard API pricing is $1.74 per 1M input tokens and $3.48 per 1M output tokens with a 1M-token context window and 384K max output.

How much does DeepSeek-V4-Pro cost per 1M tokens?

Standard list pricing on the DeepSeek API is $1.74 per 1M cache-miss input tokens and $3.48 per 1M output tokens. A 75% promotional discount is active until May 31, 2026, reducing rates to $0.435 per 1M input and $0.87 per 1M output. Cache-hit input pricing is $0.145 per 1M at standard rates ($0.003625/M on promo), after a 1/10 reduction applied April 26, 2026. A daily agentic coding loop processing 1M input and 200K output costs $1.74 plus $0.70 = $2.44/day at standard rates. A customer-support deployment at 1,000 turns of 2K input and 500 output per turn costs $3.48 plus $1.74 = $5.22/day. Compared to Claude Opus 4.7 ($15/M output) and GPT-5.5 ($25/M output), V4-Pro output tokens are 4-7x cheaper at list price on comparable coding benchmarks. Self-hosted deployments on open weights are free beyond infrastructure; a 4-node H200 141GB GPU cluster is the minimum practical hardware. The DeepSeek API grants 5 million free tokens to every new account.

What is DeepSeek-V4-Pro's context window and max output?

DeepSeek-V4-Pro supports a context window of 1,048,576 tokens (one million tokens) with a maximum output of 384,000 tokens per request. The Think Max reasoning mode requires at least 384K tokens of context budget to operate at maximum reasoning depth. At 1M-token context, the hybrid CSA+HCA attention uses only 27% of single-token inference FLOPs versus DeepSeek-V3.2 and 10% of V3.2's KV cache, via HCA's 128x sequence compression before dense attention. No independent third-party needle-in-haystack evaluation at 1M depth has been published as of May 2026; efficiency claims are from DeepSeek's April 27, 2026 model card. Among current frontier models, Gemini 3.1 Pro also provides 1M-token context; GPT-5.4 supports 128K by default with a 1M-token preview tier; Claude Opus 4.6 maxes out at 200K with 64K max output. The 384K max output per request exceeds Claude's 64K limit, making V4-Pro well suited for generating large code files or full document drafts in a single API call. PDF and multi-file inputs are handled via standard text tokenization; there is no native document parser distinct from the text context window.

How does DeepSeek-V4-Pro compare on benchmarks vs Claude Opus 4.6?

On SWE-bench Verified (real-world GitHub issue resolution), DeepSeek-V4-Pro in Think Max mode scores 80.6% versus Claude Opus 4.6 at approximately 80.8%, a gap under 0.3 percentage points. On GPQA Diamond (graduate-level scientific reasoning), DeepSeek-V4-Pro scores 90.1%, while Gemini 3.1 Pro leads at 94.3%; Claude Opus 4.6 falls between these two. On Humanity's Last Exam, DeepSeek-V4-Pro reaches 37.7% versus Claude Opus 4.6's 40.0%, a 2.3-point gap on the hardest evaluation published as of May 2026. On MMLU-Pro, DeepSeek-V4-Pro scores 87.5% against Gemini 3.1 Pro's 91.0%, with Claude in the same range. For competitive coding, V4-Pro leads with a Codeforces rating of 3,206 versus GPT-5.4 at 3,168 and Gemini 3.1 Pro at 3,052, and achieves 93.5 on LiveCodeBench. V4-Pro leads on competitive coding and math olympiad benchmarks; Claude and Gemini lead on the hardest scientific reasoning tasks (GPQA Diamond, HLE); the gap on most benchmarks stays within 5 percentage points, with output cost being the clearest differentiator at $3.48/M versus $15/M. All V4 benchmark scores are vendor-reported at this stage; independent third-party reproductions are still being published.

Is DeepSeek-V4-Pro open source or proprietary?

DeepSeek-V4-Pro is fully open-source under the MIT license, with weights available at huggingface.co/deepseek-ai/DeepSeek-V4-Pro since April 24, 2026. The MIT license imposes zero restrictions on commercial use, fine-tuning, redistribution, or modification of the weights. Four model variants are available: DeepSeek-V4-Pro-Base (raw pretrained), DeepSeek-V4-Pro-Instruct (post-trained with SFT and GRPO), and equivalent Flash variants. Official weights use FP4 plus FP8 mixed precision with the full V4-Pro model occupying approximately 865GB. Community GGUF quantizations (Q4_K_M, Q2_K) are available via Unsloth; Q2_K is approximately 400GB in compressed format, so self-hosting V4-Pro remains a multi-GPU proposition. Self-hosting requires a multi-GPU cluster of 4x H200 141GB at minimum; for V4-Flash, a single H200 141GB is feasible. The open weights mean any party can remove or alter DeepSeek's safety training without restriction, a categorically different risk profile from API-only closed models. V4-Flash is the practical self-hosted alternative for teams without full cluster access.

What modalities does DeepSeek-V4-Pro support?

At launch in April 2026, DeepSeek-V4-Pro supports text input and text output only; there is no native image, audio, or video processing in the preview release. Confirmed output modalities are text and tool-calls via function calling. The model supports tool use and function calling through both an OpenAI ChatCompletions-compatible interface and an Anthropic-compatible API, with JSON mode confirmed across all eight current hosting providers. Parallel tool calls are supported, enabling multi-step agentic loops where the model issues multiple function calls in a single response. The three reasoning modes (Non-think, Think High, Think Max) are toggled per request through API parameters, with Think Max requiring a 384K-token context budget. DeepSeek has indicated that multimodal support (image input) is in development, potentially as a V4 Vision extension; no release date has been announced as of May 2026. Computer use and web browsing are not available in the April 2026 preview; agentic computer interaction requires pairing V4-Pro with a separate browser or shell execution layer.

Does DeepSeek-V4-Pro train on user data?

The direct DeepSeek API does not publish a data retention or training-on-inputs policy equivalent to what Anthropic or OpenAI disclose for their enterprise tiers. DeepSeek's April 27, 2026 model card states that sensitive personal information, credit card numbers, and identification data are excluded from training data sources, but does not specify API input retention periods or opt-out mechanisms. The direct DeepSeek API is not certified under SOC 2 Type II, ISO 27001, or HIPAA; organizations with compliance requirements should access V4-Pro through AWS Bedrock, Google Vertex AI, or Azure AI Foundry where platform-level certifications apply. On Bedrock and Vertex, API inputs are governed by AWS and Google's DPAs, not DeepSeek's, providing EU data residency and contractual data handling guarantees. Because V4-Pro weights are MIT-licensed, self-hosting provides maximum data privacy: inference stays entirely on the team's own infrastructure with zero external data transmission. The EU AI Act classification for V4-Pro has not been formally assessed; at 1.6T parameters it likely meets the GPAI systemic risk threshold. Teams handling HIPAA-regulated data must confirm their chosen cloud provider has a BAA covering DeepSeek V4 Pro before deploying.

Who is DeepSeek-V4-Pro best for and who should avoid it?

DeepSeek-V4-Pro is the strongest open-weight choice for agentic coding teams, with 80.6% SWE-bench Verified (Think Max) and Codeforces rating 3,206, the highest published for any model as of May 2026. Cost-sensitive teams needing frontier-grade reasoning benefit most from $1.74/$3.48 per 1M token pricing, roughly 4-7x cheaper on output than Claude Opus 4.7 or GPT-5.5. Open-source teams requiring fine-tuning, private inference, or air-gapped deployment should prioritize V4-Pro over any closed model given the MIT license and downloadable weights. Teams building vision-first or multimodal applications should not use V4-Pro at launch (text-only in the April 2026 preview); use GPT-5.4, Gemini 3.1 Pro, or Claude Opus 4.6 for image understanding. Voice-first products and real-time speech applications are ruled out by the absence of native audio I/O. Teams requiring HIPAA compliance or SOC 2 Type II on the API endpoint should use AWS Bedrock or Azure AI Foundry rather than the direct DeepSeek API. For the hardest scientific reasoning tasks (GPQA Diamond 94.3%), Gemini 3.1 Pro currently leads V4-Pro by 4.2 points; teams where that gap matters should prefer Gemini or Claude. Organizations with EU data sovereignty requirements should verify their chosen provider's data residency options before deploying.

Visit DeepSeek-V4-Pro Official Page