Skip to content
Models · Jun 19, 2026

DeepSeek-V4 series introduces two MoE models with million-token context support and efficiency gains

DeepSeek-V4-Pro (1.6T parameters, 49B activated) and DeepSeek-V4-Flash (284B parameters, 13B activated) demonstrate reduced inference costs and state-of-the-art performance in long-context tasks.

Trust79
HypeLow hype

1 source · cross-referenced

ShareXLinkedInEmail
TL;DR
  • Two new Mixture-of-Experts models, DeepSeek-V4-Pro and DeepSeek-V4-Flash, support one-million-token contexts with improved efficiency.
  • DeepSeek-V4-Pro-Max achieves state-of-the-art performance among open models in core tasks, per the preprint.
  • The models incorporate a hybrid attention architecture, manifold-constrained hyper-connections, and the Muon optimizer.
  • DeepSeek-V4-Pro reduces inference FLOPs by 73% and KV cache by 90% compared with DeepSeek-V3.2 in million-token settings.

The DeepSeek-V4 series introduces two Mixture-of-Experts (MoE) language models: DeepSeek-V4-Pro with 1.6 trillion parameters (49 billion activated) and DeepSeek-V4-Flash with 284 billion parameters (13 billion activated). Both models support a context length of one million tokens, according to the arXiv preprint.

Architectural upgrades include a hybrid attention mechanism combining Compressed Sparse Attention (CSA) and Heavily Compressed Attention (HCA) to improve long-context efficiency, Manifold-Constrained Hyper-Connections (mHC) to enhance residual connections, and the Muon optimizer for faster convergence and training stability.

The models were pre-trained on more than 32 trillion diverse and high-quality tokens and underwent a comprehensive post-training pipeline to unlock and enhance their capabilities. DeepSeek-V4-Pro-Max, the maximum reasoning effort mode of DeepSeek-V4-Pro, is reported to redefine the state-of-the-art for open models in core tasks.

In million-token context settings, DeepSeek-V4-Pro requires only 27% of single-token inference FLOPs and 10% of KV cache compared with DeepSeek-V3.2, enabling routine support for million-token contexts and making long-horizon tasks and test-time scaling more feasible.

The model checkpoints are available via Hugging Face under the DeepSeek-V4 collection.

Sources
  1. 01arXiv cs.CLDeepSeek-V4: Towards Highly Efficient Million-Token Context Intelligence
Also on Models

Stories may contain errors. Dispatch is assembled with AI assistance and curated by human editors; despite the trust-score filter, mistakes happen. We correct publicly — every article links to its revision history. Nothing here is financial, legal, or medical advice. Verify before relying on any claim.

© 2026 Dispatch. No ads. No sponsorships. No paid placement. Reader-supported via Ko-fi.

Built by a person who cares about honest AI news.