Moonshot launches Kimi K2 Thinking, a tool-using AI built for long-form reasoning

Moonshot has announced Kimi K2 Thinking, a new version of its K2 model designed to act less like a chatbot and more like a full reasoning agent. Instead of generating a single response, K2 Thinking can plan, search the web, write and run code, and adjust its own steps — reportedly handling 200 to 300 tool calls in a row without human input.

The company says this is its most advanced open-weight model to date, built around “test-time scaling,” where performance improves by giving the model more time to think and more opportunities to act. In practical terms, that means it can work through multi-step problems — from academic questions to software development — using a think → search → verify → respond cycle.

Performance

K2 Thinking delivers strong benchmark scores across reasoning, search, and coding:

44.9% on Humanity’s Last Exam, a difficult tool-enabled reasoning benchmark.
60.2% on BrowseComp, which measures real-time browsing and search abilities.
71.3% on SWE-Bench Verified, placing it near the top of coding-focused models.

That said, it still doesn’t take the overall lead in programming tasks — Claude Sonnet 4.5 remains ahead on the SWE-Bench leaderboard.

Why it matters

K2 Thinking is notable not just for accuracy, but for persistence. It can chain together hundreds of reasoning and tool-use steps, making it particularly relevant for autonomous agents, coding assistants, and research workflows. It signals how quickly open-weight models are closing the gap with proprietary systems in complex reasoning — even if they haven’t fully overtaken them yet.

Availability

Kimi K2 Thinking is live now on kimi.com in chat mode. A full agent mode and API access are coming soon, according to Moonshot.

TL;DR

Performance

Why it matters

Availability

Continue the conversation on Slack