GLM-4.6 Expands to 200K-Token Context, Improves Coding & Agents

GLM-4.6 expands context to 200K tokens and improves coding, reasoning, and agent integration. It's about 15% more token-efficient, shows gains over GLM-4.5, and is available via Z.ai API and public hubs.

GLM-4.6 Expands to 200K-Token Context, Improves Coding & Agents

TL;DR

  • 200K-token context window: expanded from 128K to support longer multi-turn interactions and more complex agentic workflows
  • Improved coding, reasoning, and tool use, with higher code benchmark scores, better real-world coding behavior, and support for tool invocation during inference
  • Stronger agent integration and deployment in coding agents (Claude Code, Kilo Code, Roo Code, Cline); existing GLM Coding Plan subscribers slated for automatic upgrade; previously customized configs update by switching model name to "glm-4.6"
  • CC-Bench real-world evaluation: extended multi-turn tasks showed gains over GLM-4.5, near parity with Claude Sonnet 4 (48.6% win rate), and ~15% fewer tokens used; trajectories dataset: https://huggingface.co/datasets/zai-org/CC-Bench-trajectories

GLM-4.6 arrives with larger context and improved coding and agent skills

Z.ai has published GLM-4.6, the next iteration of its flagship model family. This release focuses on expanded context handling, better code generation in real-world settings, stronger reasoning and tool use, and improved integration within agent frameworks.

Key changes and capabilities

  • Context window expanded to 200K tokens, up from 128K in GLM-4.5, enabling longer multi-turn interactions and more complex agentic workflows.
  • Improved coding performance, with higher scores on code benchmarks and better behavior in practical coding tasks spanning front-end pages, tool-building, testing, and algorithms.
  • Advanced reasoning and tool use: GLM-4.6 shows measurable gains in reasoning benchmarks and supports tool invocation during inference.
  • Stronger agent integration, exhibiting better performance for tool-using and search-based agents and smoother embedding into agent frameworks.
  • Refined writing and role play, with outputs that align more closely with human preferences in style and readability.

Benchmarks and real-world evaluation

Evaluation comprised eight public benchmarks across agents, reasoning, and coding. GLM-4.6 demonstrated clear gains over GLM-4.5, and remained competitive with other models such as DeepSeek-V3.2-Exp and Claude Sonnet 4, though it still trails Claude Sonnet 4.5 on coding ability according to the reported comparisons.

Beyond synthetic leaderboards, the team extended CC-Bench from GLM-4.5 to include more challenging, multi-turn real-world tasks. Human evaluators worked with models inside isolated Docker containers across front-end development, tool building, data analysis, testing, and algorithms. On this extended CC-Bench, GLM-4.6 showed improvement over GLM-4.5 and achieved near parity with Claude Sonnet 4 (48.6% win rate). From a token-efficiency standpoint, GLM-4.6 completed tasks using about 15% fewer tokens than its predecessor. Evaluation details and trajectory data are available at the CC-Bench trajectories dataset: https://huggingface.co/datasets/zai-org/CC-Bench-trajectories

Access and deployment

  • API access and integration guidance are available in the Z.ai documentation: https://docs.z.ai/guides/llm/glm-4.6. Models can also be called via OpenRouter.
  • GLM-4.6 is deployable in coding agents such as Claude Code, Kilo Code, Roo Code, and Cline. Existing GLM Coding Plan subscribers are slated for automatic upgrade; previously customized client configs (for example, a settings file used by Claude Code) can be updated by switching the model name to "glm-4.6". New subscription details are referenced at https://z.ai/subscribe.
  • Models are accessible for hosted chat at https://chat.z.ai.
  • Model weights are publicly available on HuggingFace: https://huggingface.co/zai-org/GLM-4.6 and on ModelScope: https://modelscope.cn/models/ZhipuAI/GLM-4.6. Local inference support includes frameworks such as vLLM and SGLang, with deployment instructions provided in the project’s repository.

Further technical context and a prior tech report are available via the GLM-4.5 tech report: https://arxiv.org/abs/2508.06471

Original source: https://z.ai/blog/glm-4.6

Continue the conversation on Slack

Did this article spark your interest? Join our community of experts and enthusiasts to dive deeper, ask questions, and share your ideas.

Join our community