GLM-5.2 released with 1M-token context and long-horizon coding improvements
The open-source model introduces IndexShare architecture for efficient 1M-token context and effort-level control for coding tasks.
1 source · cross-referenced
- GLM-5.2 is an open-source model optimized for long-horizon coding tasks with a solid 1M-token context.
- It introduces IndexShare architecture to reduce per-token FLOPs by 2.9× at 1M context length.
- The model includes effort-level control for balancing performance, latency, and computational cost.
- On long-horizon coding benchmarks, GLM-5.2 trails only Opus 4.8 while outperforming other closed-source and open-source models.
- It achieves 81.0 on Terminal-Bench 2.1 and 62.1 on SWE-bench Pro, surpassing GLM-5.1.
Z.AI’s GLM-5.2 is positioned as the company’s latest flagship model for long-horizon coding tasks, delivering what the team calls a "solid" 1M-token context that remains reliable under real engineering workloads. The model is released under an MIT open-source license with no regional restrictions.
Architectural improvements include IndexShare, which reuses a lightweight indexer across every four sparse attention layers, reducing per-token FLOPs by 2.9× at 1M context length. The team also enhanced the MTP layer for speculative decoding, increasing acceptance length by up to 20% through techniques like rejection sampling and end-to-end TV loss training.
GLM-5.2 introduces effort-level control, allowing users to explicitly balance model capability against task execution speed and computational cost. The "Max" effort level allocates additional computation for challenging tasks, positioning the model’s performance between Claude Opus 4.7 and Opus 4.8 under similar token consumption.
On long-horizon coding benchmarks, GLM-5.2 trails only Opus 4.8 by 1% on FrontierSWE, outperforms Opus 4.7 and GPT-5.5 on PostTrainBench, and ranks second only to Opus 4.8 on SWE-Marathon. Across all three benchmarks, it is the highest-ranked open-source model.
On standard coding benchmarks, GLM-5.2 improves substantially over GLM-5.1: 81.0 vs. 63.5 on Terminal-Bench 2.1 and 62.1 vs. 58.4 on SWE-bench Pro. It also narrows the gap to closed-source models, achieving 81.0 on Terminal-Bench 2.1 compared to Claude Opus 4.8’s 85.0.
The model’s 1M-token context is designed to handle messy, multi-hour coding-agent trajectories, including large-scale implementation, automated research, performance optimization, and complex debugging. The team emphasizes that maintaining quality under real engineering pressure distinguishes a "solid" long context from one that is merely wide.
Efficient serving at 1M context length shifts inference bottlenecks from computation to KV-cache capacity, long-context kernel overhead, and CPU-side overhead. The team notes that while IndexShare reduces computational FLOPs, per-token KV-cache size does not decrease proportionally, requiring further inference engine optimizations such as finer-grained memory management.
- Jun 17, 2026 · Google AI — Blog
Google DeepMind’s AMIE matches primary care physicians in disease management study
Trust78 - Jun 17, 2026 · Hugging Face
AllenAI releases MolmoMotion, a language-guided 3D motion forecasting model with new dataset and benchmark
Trust79 - Jun 16, 2026 · OpenAI — News
OpenAI unveils Deployment Simulation to predict model behavior before release
Trust72