Name: DeepSeek V4: 80.6% SWE-bench, MIT License, $0.43/M (2026)
Brand: DeepSeek
Price: 0.43 USD
Availability: InStock

Question 1

What is DeepSeek V4 and who built it?

Accepted Answer

DeepSeek V4 is a family of open-source Mixture-of-Experts (MoE) language models developed by DeepSeek, a Chinese AI research lab based in Hangzhou. It was released as a public preview on April 24, 2026, under the MIT license. The family includes two models: V4-Pro (1.6 trillion total parameters, 49 billion active per token) and V4-Flash (284 billion total, 13 billion active). Both were pre-trained on more than 32 trillion tokens and support a 1M-token context window with 384K max output. V4 is the first DeepSeek model with native multimodal architecture, accepting text, image, audio, and video in a single API call. V4-Pro scores 80.6% on SWE-bench Verified and 90.1% on GPQA Diamond, placing it within 0.2 points of Claude Opus 4.6 on agentic coding at roughly one-thirtieth the API cost of GPT-5.5.

Question 2

How much does DeepSeek V4 cost per 1M tokens?

Accepted Answer

DeepSeek V4-Pro on the DeepSeek API costs $0.435 per 1M cache-miss input tokens and $0.87 per 1M output tokens, as the permanent pricing after a 75% discount applied post-May 31, 2026 (the original launch price was $1.74/$3.48 per 1M). Cached input tokens carry a deep additional discount. DeepSeek V4-Flash is significantly cheaper: $0.14 per 1M cache-miss input with $0.0028 per 1M cached (a 98% discount off cache-miss), and $0.87 per 1M output. Third-party providers (Fireworks AI, Together AI) offer comparable or slightly different rates depending on their infrastructure tier. Self-hosting under the MIT license eliminates per-token costs entirely; infrastructure costs depend on VRAM (a single H100 80GB runs V4-Flash FP8 for approximately $20K in hardware). A daily coding agent at V4-Pro rates for 1M input / 200K output costs about $0.61, versus $11.00 at GPT-5.5's $5/$30 pricing.

Question 3

What is DeepSeek V4's context window and max output?

Accepted Answer

DeepSeek V4-Pro and V4-Flash both support a 1,000,000-token (1M) context window. Maximum output is 384,000 tokens per completion, which is the highest max output of any frontier model as of June 2026. The CSA/HCA hybrid attention mechanism reduces KV cache to 10% of DeepSeek-V3.2's requirement at the 1M-context setting, enabling long-context inference at dramatically lower memory cost than prior-generation architectures. The 384K output limit means entire codebases, book-length documents, or comprehensive dataset transformations can be generated in a single API call without multi-turn stitching. No public needle-in-haystack independent evaluation at 1M tokens has been published for V4 as of June 2026. The 27% reduction in single-token inference FLOPs vs V3.2 at 1M context translates to lower serving cost on shared inference providers like Fireworks AI.

Question 4

How does DeepSeek V4 compare on benchmarks vs GPT-5.5?

Accepted Answer

DeepSeek V4-Pro scores 80.6% on SWE-bench Verified; GPT-5.5 scores 88.7%, an 8.1-point gap in favor of GPT-5.5 on agentic coding. On GPQA Diamond, V4-Pro posts 90.1% versus GPT-5.5's unpublished score (GPT-5, its predecessor, was 88.4%). V4-Pro scores 90.1% MMLU vs GPT-5.5's 92.4%, a 2.3-point gap. On ARC-AGI-2, V4-Pro posts 79.5%; GPT-5.5's score has not been published. V4-Pro's HumanEval score is approximately 96.4%. The cost difference is the decisive factor for many teams: V4-Pro costs $0.435/$0.87 per 1M vs GPT-5.5's $5.00/$30.00 per 1M, making V4-Pro approximately 11-34x cheaper depending on the input/output mix. For most coding and research tasks, V4-Pro delivers comparable output quality at a fraction of the cost; GPT-5.5 leads on multimodal tasks and on benchmarks requiring more than 80K AIME-level math reasoning. Note that some V4 benchmark figures are still labeled internal claims pending independent verification.

Question 5

Is DeepSeek V4 open source or proprietary?

Accepted Answer

DeepSeek V4 is fully open-source under the MIT license, making it one of the most permissively licensed frontier models available as of June 2026. Weights for both V4-Pro and V4-Flash are freely downloadable from HuggingFace at huggingface.co/deepseek-ai/DeepSeek-V4-Pro and huggingface.co/deepseek-ai/DeepSeek-V4-Flash. The MIT license permits commercial use, fine-tuning, distillation, modification, and redistribution without restriction. Community GGUF quantizations are available (antirez/deepseek-v4-gguf on HuggingFace). VRAM requirements: V4-Flash runs at approximately 33GB (heavily quantized) to 170GB (full weights); V4-Pro requires 865GB+ for full FP16 weights across multiple H100s or H200s. The open weights allow air-gapped self-hosted deployment for organizations with strict data governance requirements.

Question 6

What modalities does DeepSeek V4 support?

Accepted Answer

DeepSeek V4 accepts text, images, audio, and video as inputs in a unified architecture; this is the first DeepSeek model with native multimodal support. Multimodality is not a bolt-on module but part of the base model architecture. Image understanding covers OCR, technical schematic interpretation, and diagram analysis at GPT-5-comparable quality on standard vision benchmarks. Video processing supports short clips under 10 minutes for temporal question answering. Audio supports transcription plus speaker identification. Output modalities are text and tool-calls only; there is no audio or video output. Function calling and structured output are confirmed in the API. Code execution and web browsing are not natively supported; external tools or agent frameworks are required for those capabilities.

Question 7

Does DeepSeek V4 train on user data?

Accepted Answer

DeepSeek's own API Terms of Service explicitly allow the company to train on user-submitted inputs by default, with no published opt-out mechanism. This is a significant distinction from OpenAI and Anthropic, which do not train on API data. Any proprietary code, customer information, or business-sensitive data sent to api.deepseek.com is at risk of inclusion in future model training. US-based inference providers (Together AI, Fireworks AI) run DeepSeek V4 weights on US servers under their own data terms, which prohibit training on user data, and are recommended for enterprise workloads. For maximum control, self-hosting under the MIT license eliminates any data sharing with DeepSeek entirely. DeepSeek is a Chinese company; data governance considerations under Chinese law differ from GDPR and US data protection frameworks. HIPAA-eligible and SOC 2 configurations are not available through DeepSeek's own API; enterprise compliance requires routing through US providers or self-hosting.

Question 8

Who is DeepSeek V4 best for and who should avoid it?

Accepted Answer

DeepSeek V4-Pro is best for engineering teams doing agentic coding who need frontier-quality results at a fraction of closed-model pricing: 80.6% SWE-bench Verified at $0.435/M makes it the most cost-efficient path to top-10 coding performance. Research teams who need open weights for fine-tuning, distillation, or self-hosted deployment on specific datasets benefit from the MIT license. Startups and cost-conscious teams where GPT-5.5's $5/$30 per 1M pricing is prohibitive should evaluate V4 first. Organizations that must avoid third-party API data exposure can self-host V4-Flash on a single H100 at no per-token cost. Teams who should avoid V4 via api.deepseek.com: any enterprise with data governance, HIPAA, SOC 2, or GDPR requirements, since DeepSeek's ToS allows training on inputs. Route through Together AI or Fireworks AI instead. Teams who need audio output, real-time voice, or native web browsing in the model loop should use GPT-5.5 or Claude Opus 4.8.

DeepSeek V4

DeepSeek V4: 80.6% SWE-bench, MIT License, $0.43/M (2026)

About DeepSeek V4

Pricing

Key Features

Pros

Cons

Benchmarks

Frequently Asked Questions