DeepSeek V4: 80.6% SWE-bench, MIT License, $0.43/M (2026)

DeepSeek V4 Pro, released April 2026 under MIT, scores 80.6% SWE-bench Verified and 90.1% GPQA Diamond. Open-source MoE, 1M context, $0.435/$0.87 per 1M tokens.

DeepSeek V4, released April 24, 2026 by DeepSeek under MIT license, is an open-source Mixture-of-Experts model in two sizes: V4-Pro (1.6T parameters, 49B active) scoring 80.6% SWE-bench Verified and 90.1% GPQA Diamond, and V4-Flash (284B parameters). API pricing for V4-Pro is $0.435 input / $0.87 output per 1M tokens; both models support 1M-token context and 384K max output, making V4-Pro the first open-weight model to rival closed frontier models on agentic coding.

DeepSeek V4 is a family of open-source MoE models released by DeepSeek on April 24, 2026 under the MIT license. V4-Pro (1.6T parameters, 49B active) scores 80.6% on SWE-bench Verified and 90.1% on GPQA Diamond. API pricing is $0.435 input / $0.87 output per 1M tokens. V4-Flash (284B parameters) offers a lower-cost alternative. Both support 1M-token context with 384K max output.

Provider: DeepSeek · Family: DeepSeek V4

Context window: 1,000,000 tokens · Max output: 384,000

Input modalities: text, image, audio, video, tool-calls · Output: text, tool-calls

About DeepSeek V4

DeepSeek V4, released April 24, 2026 by DeepSeek (a Chinese AI research lab), is a family of open-source Mixture-of-Experts models available in two sizes: V4-Pro (1.6 trillion total parameters, 49B active per token) and V4-Flash (284B total parameters, 13B active). Both ship under the MIT license with weights publicly available on HuggingFace, making them the largest open-weight models to reach production-quality benchmark scores as of June 2026. DeepSeek explicitly labels the April 24 launch as a preview; the models are production-stable but capabilities may expand. V4 is the first DeepSeek release with native multimodal architecture, processing text, images, audio, and video in a unified pipeline. DeepSeek V4-Pro scores 80.6% on SWE-bench Verified (agentic coding), 90.1% on GPQA Diamond (graduate-level reasoning), 90.1% on MMLU (multitask language understanding), 73.5% on MMLU-Pro, approximately 96.4% on HumanEval, 79.5% on ARC-AGI-2, and 85.0% on AIME 2025. Some of these figures carry the disclaimer 'internal claim only' as of June 2026; third-party verification is still ongoing. At 80.6% SWE-bench Verified, V4-Pro is within 0.2 points of Claude Opus 4.6 and rivals GPT-5.5 on agentic coding at roughly one-thirtieth of GPT-5.5's per-token cost. One source also cites 91.2% SWE-bench on a harder evaluation variant; treat both with caution until full independent verification is published. Both V4-Pro and V4-Flash support a 1M-token (1,000,000) context window with up to 384,000 tokens of output per request. The V4 architecture introduces a hybrid attention mechanism combining Compressed Sparse Attention (CSA) and Heavily Compressed Attention (HCA), requiring only 27% of single-token inference FLOPs and 10% of the KV cache compared to DeepSeek-V3.2 at the 1M-token context setting. This efficiency gain makes very long-context inference feasible even on constrained hardware. DeepSeek V4 natively accepts text, images, audio, and video. Image understanding covers OCR and technical schematic interpretation, with performance comparable to GPT-5 on standard vision benchmarks. Video processing handles short clips under 10 minutes for temporal question answering. Audio supports transcription plus speaker identification. V4 is not a bolt-on vision module: multimodality is part of the base architecture. Function calling, structured output, and tool use are confirmed available. Audio and video output are not supported; the model produces text and tool-calls only. V4-Pro API pricing from the DeepSeek platform (post-May 31, 2026 permanent discount) is $0.435 per 1M input tokens (cache miss) and $0.87 per 1M output tokens. Cached input carries a deep additional discount. Third-party providers vary: Fireworks AI prices V4-Pro at comparable rates with 120.2 t/s output speed; Together AI offers SLA-backed reliability with a Startup Accelerator program offering up to $50K in credits. V4-Flash is significantly cheaper at $0.14 per 1M cache-miss input and $0.87 per 1M output. For self-hosted deployments under the MIT license, there are no per-token API costs. A daily coding agent running 1M input and 200K output at $0.435/$0.87 costs about $0.61, compared to $11 for GPT-5.5 at the same scale. V4-Pro is available via the DeepSeek API, Together AI, Fireworks AI (167.1 t/s peak, fastest provider on Artificial Analysis), Azure, and Vercel AI Gateway. Self-hosting is possible under the MIT license. V4-Pro requires approximately 864.7GB of storage for full FP16 weights, necessitating multi-node GPU infrastructure (multiple H100s or H200s). V4-Flash is more accessible: approximately 33GB VRAM heavily quantized (1x RTX 6000 Ada or 2x RTX 4090), 80GB FP8 on a single H100, or 170GB for full weights plus KV cache. Community GGUF quantizations (antirez/deepseek-v4-gguf on HuggingFace) are available for V4-Flash. Safety considerations for DeepSeek V4 are material for enterprise users. DeepSeek's own API Terms of Service allow the company to train on user inputs, which is a significant data privacy risk for proprietary data. US-based inference providers (Together AI, Fireworks AI) avoid this by running DeepSeek model weights on US servers under their own data terms. As an open-source Chinese model, V4 carries known sensitivity around Chinese political topics while being relatively permissive on other content. For air-gapped or fully controlled deployments, self-hosting under the MIT license is the recommended path. The model does not have a published system card or formal red-teaming disclosure as of June 2026. DeepSeek V4-Pro is best suited for teams doing agentic coding, research, and long-context analysis who need frontier-quality results at a fraction of closed-model pricing. It is the first open-weight model to sit within 0.2 points of Claude Opus 4.6 on SWE-bench Verified. For teams with data privacy requirements, routing through Together AI or Fireworks AI rather than DeepSeek's own API resolves the ToS training-data concern. Organizations with compliance needs (HIPAA, enterprise data governance) should self-host under the MIT license. Teams needing audio or video output, or requiring sub-second TTFT for real-time voice applications, should look elsewhere as V4 produces text output only. DeepSeek V4 was pre-trained on more than 32 trillion diverse and high-quality tokens. The training pipeline uses the Muon optimizer for faster convergence and Manifold-Constrained Hyper-Connections (mHC) to strengthen residual connections. Training data cutoff has not been explicitly published; the April 2026 release suggests data through at least late 2025. As an MIT-licensed open-source model, V4 weights can be fine-tuned, distilled, or modified without restriction, making it a base for derivative models. DeepSeek V4 was previewed to select users before April 24, 2026, with the public preview shipping that day. V4-Pro-Max is a maximum reasoning effort mode with higher latency but stronger benchmark performance. The predecessor DeepSeek V3.2 introduced a hybrid MoE architecture in August 2025. DeepSeek V3-0324 (March 2025) improved AIME scores by 19.8 points over V3. Both V4-Flash and V4-Pro have been available on Fireworks AI since shortly after launch. DeepSeek confirmed that deepseek-chat and deepseek-reasoner (prior API endpoints) will be fully retired after July 24, 2026.

Pricing

V4-Pro (DeepSeek API, post-May 2026 permanent pricing): $0.435/M cache-miss input, $0.87/M output. Cached input carries a deep additional discount. V4-Flash: $0.14/M cache-miss input, $0.0028/M cached input, $0.87/M output. Third-party providers (Fireworks, Together) may vary. Self-hosting is free under the MIT license; VRAM costs apply.

Key Features

Pros

Cons

Benchmarks

Frequently Asked Questions

What is DeepSeek V4 and who built it?

DeepSeek V4 is a family of open-source Mixture-of-Experts (MoE) language models developed by DeepSeek, a Chinese AI research lab based in Hangzhou. It was released as a public preview on April 24, 2026, under the MIT license. The family includes two models: V4-Pro (1.6 trillion total parameters, 49 billion active per token) and V4-Flash (284 billion total, 13 billion active). Both were pre-trained on more than 32 trillion tokens and support a 1M-token context window with 384K max output. V4 is the first DeepSeek model with native multimodal architecture, accepting text, image, audio, and video in a single API call. V4-Pro scores 80.6% on SWE-bench Verified and 90.1% on GPQA Diamond, placing it within 0.2 points of Claude Opus 4.6 on agentic coding at roughly one-thirtieth the API cost of GPT-5.5.

How much does DeepSeek V4 cost per 1M tokens?

DeepSeek V4-Pro on the DeepSeek API costs $0.435 per 1M cache-miss input tokens and $0.87 per 1M output tokens, as the permanent pricing after a 75% discount applied post-May 31, 2026 (the original launch price was $1.74/$3.48 per 1M). Cached input tokens carry a deep additional discount. DeepSeek V4-Flash is significantly cheaper: $0.14 per 1M cache-miss input with $0.0028 per 1M cached (a 98% discount off cache-miss), and $0.87 per 1M output. Third-party providers (Fireworks AI, Together AI) offer comparable or slightly different rates depending on their infrastructure tier. Self-hosting under the MIT license eliminates per-token costs entirely; infrastructure costs depend on VRAM (a single H100 80GB runs V4-Flash FP8 for approximately $20K in hardware). A daily coding agent at V4-Pro rates for 1M input / 200K output costs about $0.61, versus $11.00 at GPT-5.5's $5/$30 pricing.

What is DeepSeek V4's context window and max output?

DeepSeek V4-Pro and V4-Flash both support a 1,000,000-token (1M) context window. Maximum output is 384,000 tokens per completion, which is the highest max output of any frontier model as of June 2026. The CSA/HCA hybrid attention mechanism reduces KV cache to 10% of DeepSeek-V3.2's requirement at the 1M-context setting, enabling long-context inference at dramatically lower memory cost than prior-generation architectures. The 384K output limit means entire codebases, book-length documents, or comprehensive dataset transformations can be generated in a single API call without multi-turn stitching. No public needle-in-haystack independent evaluation at 1M tokens has been published for V4 as of June 2026. The 27% reduction in single-token inference FLOPs vs V3.2 at 1M context translates to lower serving cost on shared inference providers like Fireworks AI.

How does DeepSeek V4 compare on benchmarks vs GPT-5.5?

DeepSeek V4-Pro scores 80.6% on SWE-bench Verified; GPT-5.5 scores 88.7%, an 8.1-point gap in favor of GPT-5.5 on agentic coding. On GPQA Diamond, V4-Pro posts 90.1% versus GPT-5.5's unpublished score (GPT-5, its predecessor, was 88.4%). V4-Pro scores 90.1% MMLU vs GPT-5.5's 92.4%, a 2.3-point gap. On ARC-AGI-2, V4-Pro posts 79.5%; GPT-5.5's score has not been published. V4-Pro's HumanEval score is approximately 96.4%. The cost difference is the decisive factor for many teams: V4-Pro costs $0.435/$0.87 per 1M vs GPT-5.5's $5.00/$30.00 per 1M, making V4-Pro approximately 11-34x cheaper depending on the input/output mix. For most coding and research tasks, V4-Pro delivers comparable output quality at a fraction of the cost; GPT-5.5 leads on multimodal tasks and on benchmarks requiring more than 80K AIME-level math reasoning. Note that some V4 benchmark figures are still labeled internal claims pending independent verification.

Is DeepSeek V4 open source or proprietary?

DeepSeek V4 is fully open-source under the MIT license, making it one of the most permissively licensed frontier models available as of June 2026. Weights for both V4-Pro and V4-Flash are freely downloadable from HuggingFace at huggingface.co/deepseek-ai/DeepSeek-V4-Pro and huggingface.co/deepseek-ai/DeepSeek-V4-Flash. The MIT license permits commercial use, fine-tuning, distillation, modification, and redistribution without restriction. Community GGUF quantizations are available (antirez/deepseek-v4-gguf on HuggingFace). VRAM requirements: V4-Flash runs at approximately 33GB (heavily quantized) to 170GB (full weights); V4-Pro requires 865GB+ for full FP16 weights across multiple H100s or H200s. The open weights allow air-gapped self-hosted deployment for organizations with strict data governance requirements.

What modalities does DeepSeek V4 support?

DeepSeek V4 accepts text, images, audio, and video as inputs in a unified architecture; this is the first DeepSeek model with native multimodal support. Multimodality is not a bolt-on module but part of the base model architecture. Image understanding covers OCR, technical schematic interpretation, and diagram analysis at GPT-5-comparable quality on standard vision benchmarks. Video processing supports short clips under 10 minutes for temporal question answering. Audio supports transcription plus speaker identification. Output modalities are text and tool-calls only; there is no audio or video output. Function calling and structured output are confirmed in the API. Code execution and web browsing are not natively supported; external tools or agent frameworks are required for those capabilities.

Does DeepSeek V4 train on user data?

DeepSeek's own API Terms of Service explicitly allow the company to train on user-submitted inputs by default, with no published opt-out mechanism. This is a significant distinction from OpenAI and Anthropic, which do not train on API data. Any proprietary code, customer information, or business-sensitive data sent to api.deepseek.com is at risk of inclusion in future model training. US-based inference providers (Together AI, Fireworks AI) run DeepSeek V4 weights on US servers under their own data terms, which prohibit training on user data, and are recommended for enterprise workloads. For maximum control, self-hosting under the MIT license eliminates any data sharing with DeepSeek entirely. DeepSeek is a Chinese company; data governance considerations under Chinese law differ from GDPR and US data protection frameworks. HIPAA-eligible and SOC 2 configurations are not available through DeepSeek's own API; enterprise compliance requires routing through US providers or self-hosting.

Who is DeepSeek V4 best for and who should avoid it?

DeepSeek V4-Pro is best for engineering teams doing agentic coding who need frontier-quality results at a fraction of closed-model pricing: 80.6% SWE-bench Verified at $0.435/M makes it the most cost-efficient path to top-10 coding performance. Research teams who need open weights for fine-tuning, distillation, or self-hosted deployment on specific datasets benefit from the MIT license. Startups and cost-conscious teams where GPT-5.5's $5/$30 per 1M pricing is prohibitive should evaluate V4 first. Organizations that must avoid third-party API data exposure can self-host V4-Flash on a single H100 at no per-token cost. Teams who should avoid V4 via api.deepseek.com: any enterprise with data governance, HIPAA, SOC 2, or GDPR requirements, since DeepSeek's ToS allows training on inputs. Route through Together AI or Fireworks AI instead. Teams who need audio output, real-time voice, or native web browsing in the model loop should use GPT-5.5 or Claude Opus 4.8.

Visit DeepSeek V4 Official Page