Moonshot has announced Kimi K2 Thinking, a new version of its K2 model designed to act less like a chatbot and more like a full reasoning agent. Instead of generating a single response, K2 Thinking can plan, search the web, write and run code, and adjust its own steps — reportedly handling 200 to 300 tool calls in a row without human input.
The company says this is its most advanced open-weight model to date, built around “test-time scaling,” where performance improves by giving the model more time to think and more opportunities to act. In practical terms, that means it can work through multi-step problems — from academic questions to software development — using a think → search → verify → respond cycle.
Performance
K2 Thinking delivers strong benchmark scores across reasoning, search, and coding:
- 44.9% on Humanity’s Last Exam, a difficult tool-enabled reasoning benchmark.
- 60.2% on BrowseComp, which measures real-time browsing and search abilities.
- 71.3% on SWE-Bench Verified, placing it near the top of coding-focused models.
That said, it still doesn’t take the overall lead in programming tasks — Claude Sonnet 4.5 remains ahead on the SWE-Bench leaderboard.
Why it matters
K2 Thinking is notable not just for accuracy, but for persistence. It can chain together hundreds of reasoning and tool-use steps, making it particularly relevant for autonomous agents, coding assistants, and research workflows. It signals how quickly open-weight models are closing the gap with proprietary systems in complex reasoning — even if they haven’t fully overtaken them yet.
Availability
Kimi K2 Thinking is live now on kimi.com in chat mode. A full agent mode and API access are coming soon, according to Moonshot.