Models · Jun 19, 2026

DeepSeek-V4 series introduces two MoE models with million-token context support and efficiency gains

DeepSeek-V4-Pro (1.6T parameters, 49B activated) and DeepSeek-V4-Flash (284B parameters, 13B activated) demonstrate reduced inference costs and state-of-the-art performance in long-context tasks.

Trust79

HypeLow hype

1 source · cross-referenced

ShareX LinkedIn Email

TL;DR

Two new Mixture-of-Experts models, DeepSeek-V4-Pro and DeepSeek-V4-Flash, support one-million-token contexts with improved efficiency.
DeepSeek-V4-Pro-Max achieves state-of-the-art performance among open models in core tasks, per the preprint.
The models incorporate a hybrid attention architecture, manifold-constrained hyper-connections, and the Muon optimizer.
DeepSeek-V4-Pro reduces inference FLOPs by 73% and KV cache by 90% compared with DeepSeek-V3.2 in million-token settings.

The DeepSeek-V4 series introduces two Mixture-of-Experts (MoE) language models: DeepSeek-V4-Pro with 1.6 trillion parameters (49 billion activated) and DeepSeek-V4-Flash with 284 billion parameters (13 billion activated). Both models support a context length of one million tokens, according to the arXiv preprint.

Architectural upgrades include a hybrid attention mechanism combining Compressed Sparse Attention (CSA) and Heavily Compressed Attention (HCA) to improve long-context efficiency, Manifold-Constrained Hyper-Connections (mHC) to enhance residual connections, and the Muon optimizer for faster convergence and training stability.

The models were pre-trained on more than 32 trillion diverse and high-quality tokens and underwent a comprehensive post-training pipeline to unlock and enhance their capabilities. DeepSeek-V4-Pro-Max, the maximum reasoning effort mode of DeepSeek-V4-Pro, is reported to redefine the state-of-the-art for open models in core tasks.

In million-token context settings, DeepSeek-V4-Pro requires only 27% of single-token inference FLOPs and 10% of KV cache compared with DeepSeek-V3.2, enabling routine support for million-token contexts and making long-horizon tasks and test-time scaling more feasible.

The model checkpoints are available via Hugging Face under the DeepSeek-V4 collection.

Sources

01arXiv cs.CL — DeepSeek-V4: Towards Highly Efficient Million-Token Context Intelligence

Also on Models

DeepSeek-V4 series introduces two MoE models with million-token context support and efficiency gains

Startup claims sparse-attention LLM rivals top dense models on coding benchmarks

Z.ai releases GLM-5.2, a 753B-parameter open-weights text-only LLM with 1M token context

OpenAI adds spend controls and usage analytics for ChatGPT Enterprise