Models · Jun 17, 2026

GLM-5.2 released with 1M-token context and long-horizon coding improvements

The open-source model introduces IndexShare architecture for efficient 1M-token context and effort-level control for coding tasks.

Trust79

HypeLow hype

1 source · cross-referenced

ShareX LinkedIn Email

TL;DR

GLM-5.2 is an open-source model optimized for long-horizon coding tasks with a solid 1M-token context.
It introduces IndexShare architecture to reduce per-token FLOPs by 2.9× at 1M context length.
The model includes effort-level control for balancing performance, latency, and computational cost.
On long-horizon coding benchmarks, GLM-5.2 trails only Opus 4.8 while outperforming other closed-source and open-source models.
It achieves 81.0 on Terminal-Bench 2.1 and 62.1 on SWE-bench Pro, surpassing GLM-5.1.

Z.AI’s GLM-5.2 is positioned as the company’s latest flagship model for long-horizon coding tasks, delivering what the team calls a "solid" 1M-token context that remains reliable under real engineering workloads. The model is released under an MIT open-source license with no regional restrictions.

Architectural improvements include IndexShare, which reuses a lightweight indexer across every four sparse attention layers, reducing per-token FLOPs by 2.9× at 1M context length. The team also enhanced the MTP layer for speculative decoding, increasing acceptance length by up to 20% through techniques like rejection sampling and end-to-end TV loss training.

GLM-5.2 introduces effort-level control, allowing users to explicitly balance model capability against task execution speed and computational cost. The "Max" effort level allocates additional computation for challenging tasks, positioning the model’s performance between Claude Opus 4.7 and Opus 4.8 under similar token consumption.

On long-horizon coding benchmarks, GLM-5.2 trails only Opus 4.8 by 1% on FrontierSWE, outperforms Opus 4.7 and GPT-5.5 on PostTrainBench, and ranks second only to Opus 4.8 on SWE-Marathon. Across all three benchmarks, it is the highest-ranked open-source model.

On standard coding benchmarks, GLM-5.2 improves substantially over GLM-5.1: 81.0 vs. 63.5 on Terminal-Bench 2.1 and 62.1 vs. 58.4 on SWE-bench Pro. It also narrows the gap to closed-source models, achieving 81.0 on Terminal-Bench 2.1 compared to Claude Opus 4.8’s 85.0.

The model’s 1M-token context is designed to handle messy, multi-hour coding-agent trajectories, including large-scale implementation, automated research, performance optimization, and complex debugging. The team emphasizes that maintaining quality under real engineering pressure distinguishes a "solid" long context from one that is merely wide.

Efficient serving at 1M context length shifts inference bottlenecks from computation to KV-cache capacity, long-context kernel overhead, and CPU-side overhead. The team notes that while IndexShare reduces computational FLOPs, per-token KV-cache size does not decrease proportionally, requiring further inference engine optimizations such as finer-grained memory management.

Sources

01Hugging Face — GLM-5.2: Built for Long-Horizon Tasks

Also on Models

GLM-5.2 released with 1M-token context and long-horizon coding improvements

DeepSeek releases V4-Flash-0731 with 304B parameters and enhanced agentic capabilities

OpenAI disrupts scam operation using ChatGPT in Cambodia

OpenAI offers free advanced ChatGPT access to 100,000 academic researchers