Skip to content
Models · Jun 17, 2026

GLM-5.2 released with 1M-token context and long-horizon coding improvements

The open-source model introduces IndexShare architecture for efficient 1M-token context and effort-level control for coding tasks.

Trust79
HypeLow hype

1 source · cross-referenced

ShareXLinkedInEmail
TL;DR
  • GLM-5.2 is an open-source model optimized for long-horizon coding tasks with a solid 1M-token context.
  • It introduces IndexShare architecture to reduce per-token FLOPs by 2.9× at 1M context length.
  • The model includes effort-level control for balancing performance, latency, and computational cost.
  • On long-horizon coding benchmarks, GLM-5.2 trails only Opus 4.8 while outperforming other closed-source and open-source models.
  • It achieves 81.0 on Terminal-Bench 2.1 and 62.1 on SWE-bench Pro, surpassing GLM-5.1.

Z.AI’s GLM-5.2 is positioned as the company’s latest flagship model for long-horizon coding tasks, delivering what the team calls a "solid" 1M-token context that remains reliable under real engineering workloads. The model is released under an MIT open-source license with no regional restrictions.

Architectural improvements include IndexShare, which reuses a lightweight indexer across every four sparse attention layers, reducing per-token FLOPs by 2.9× at 1M context length. The team also enhanced the MTP layer for speculative decoding, increasing acceptance length by up to 20% through techniques like rejection sampling and end-to-end TV loss training.

GLM-5.2 introduces effort-level control, allowing users to explicitly balance model capability against task execution speed and computational cost. The "Max" effort level allocates additional computation for challenging tasks, positioning the model’s performance between Claude Opus 4.7 and Opus 4.8 under similar token consumption.

On long-horizon coding benchmarks, GLM-5.2 trails only Opus 4.8 by 1% on FrontierSWE, outperforms Opus 4.7 and GPT-5.5 on PostTrainBench, and ranks second only to Opus 4.8 on SWE-Marathon. Across all three benchmarks, it is the highest-ranked open-source model.

On standard coding benchmarks, GLM-5.2 improves substantially over GLM-5.1: 81.0 vs. 63.5 on Terminal-Bench 2.1 and 62.1 vs. 58.4 on SWE-bench Pro. It also narrows the gap to closed-source models, achieving 81.0 on Terminal-Bench 2.1 compared to Claude Opus 4.8’s 85.0.

The model’s 1M-token context is designed to handle messy, multi-hour coding-agent trajectories, including large-scale implementation, automated research, performance optimization, and complex debugging. The team emphasizes that maintaining quality under real engineering pressure distinguishes a "solid" long context from one that is merely wide.

Efficient serving at 1M context length shifts inference bottlenecks from computation to KV-cache capacity, long-context kernel overhead, and CPU-side overhead. The team notes that while IndexShare reduces computational FLOPs, per-token KV-cache size does not decrease proportionally, requiring further inference engine optimizations such as finer-grained memory management.

Sources
  1. 01Hugging FaceGLM-5.2: Built for Long-Horizon Tasks
Also on Models

Stories may contain errors. Dispatch is assembled with AI assistance and curated by human editors; despite the trust-score filter, mistakes happen. We correct publicly — every article links to its revision history. Nothing here is financial, legal, or medical advice. Verify before relying on any claim.

© 2026 Dispatch. No ads. No sponsorships. No paid placement. Reader-supported via Ko-fi.

Built by a person who cares about honest AI news.