GLM-4.6 Expands to 200K-Token Context, Improves Coding & Agents

GLM-4.6 arrives with larger context and improved coding and agent skills

Z.ai has published GLM-4.6, the next iteration of its flagship model family. This release focuses on expanded context handling, better code generation in real-world settings, stronger reasoning and tool use, and improved integration within agent frameworks.

Key changes and capabilities

Context window expanded to 200K tokens, up from 128K in GLM-4.5, enabling longer multi-turn interactions and more complex agentic workflows.
Improved coding performance, with higher scores on code benchmarks and better behavior in practical coding tasks spanning front-end pages, tool-building, testing, and algorithms.
Advanced reasoning and tool use: GLM-4.6 shows measurable gains in reasoning benchmarks and supports tool invocation during inference.
Stronger agent integration, exhibiting better performance for tool-using and search-based agents and smoother embedding into agent frameworks.
Refined writing and role play, with outputs that align more closely with human preferences in style and readability.

Benchmarks and real-world evaluation

Evaluation comprised eight public benchmarks across agents, reasoning, and coding. GLM-4.6 demonstrated clear gains over GLM-4.5, and remained competitive with other models such as DeepSeek-V3.2-Exp and Claude Sonnet 4, though it still trails Claude Sonnet 4.5 on coding ability according to the reported comparisons.

Beyond synthetic leaderboards, the team extended CC-Bench from GLM-4.5 to include more challenging, multi-turn real-world tasks. Human evaluators worked with models inside isolated Docker containers across front-end development, tool building, data analysis, testing, and algorithms. On this extended CC-Bench, GLM-4.6 showed improvement over GLM-4.5 and achieved near parity with Claude Sonnet 4 (48.6% win rate). From a token-efficiency standpoint, GLM-4.6 completed tasks using about 15% fewer tokens than its predecessor. Evaluation details and trajectory data are available at the CC-Bench trajectories dataset: https://huggingface.co/datasets/zai-org/CC-Bench-trajectories

Access and deployment

API access and integration guidance are available in the Z.ai documentation: https://docs.z.ai/guides/llm/glm-4.6. Models can also be called via OpenRouter.
GLM-4.6 is deployable in coding agents such as Claude Code, Kilo Code, Roo Code, and Cline. Existing GLM Coding Plan subscribers are slated for automatic upgrade; previously customized client configs (for example, a settings file used by Claude Code) can be updated by switching the model name to "glm-4.6". New subscription details are referenced at https://z.ai/subscribe.
Models are accessible for hosted chat at https://chat.z.ai.
Model weights are publicly available on HuggingFace: https://huggingface.co/zai-org/GLM-4.6 and on ModelScope: https://modelscope.cn/models/ZhipuAI/GLM-4.6. Local inference support includes frameworks such as vLLM and SGLang, with deployment instructions provided in the project’s repository.

Further technical context and a prior tech report are available via the GLM-4.5 tech report: https://arxiv.org/abs/2508.06471

Original source: https://z.ai/blog/glm-4.6

TL;DR

GLM-4.6 arrives with larger context and improved coding and agent skills

Key changes and capabilities

Benchmarks and real-world evaluation

Access and deployment

Continue the conversation on Slack