Moonshot launches Kimi K2 Thinking, a tool-using AI built for long-form reasoning

Kimi has unveiled a new version of its Kimi model built specifically for long-form reasoning, tool use, and multi-step execution.

Moonshot launches Kimi K2 Thinking, a tool-using AI built for long-form reasoning

TL;DR

  • Moonshot launches Kimi K2 Thinking, a tool-using AI agent model
  • Handles 200–300 tool calls in a single reasoning chain
  • Hits 44.9% on HLE, 60.2% on BrowseComp, 71.3% on SWE-Bench Verified
  • Strong performer, but still behind Claude Sonnet 4.5 at the top of coding benchmarks
  • Marks a big step forward for open-weight “thinking” models
  • Claude Sonnet 4.5 remains the top performer on coding leaderboards.

Moonshot has announced Kimi K2 Thinking, a new version of its K2 model designed to act less like a chatbot and more like a full reasoning agent. Instead of generating a single response, K2 Thinking can plan, search the web, write and run code, and adjust its own steps — reportedly handling 200 to 300 tool calls in a row without human input.

The company says this is its most advanced open-weight model to date, built around “test-time scaling,” where performance improves by giving the model more time to think and more opportunities to act. In practical terms, that means it can work through multi-step problems — from academic questions to software development — using a think → search → verify → respond cycle.

Performance

K2 Thinking delivers strong benchmark scores across reasoning, search, and coding:

  • 44.9% on Humanity’s Last Exam, a difficult tool-enabled reasoning benchmark.
  • 60.2% on BrowseComp, which measures real-time browsing and search abilities.
  • 71.3% on SWE-Bench Verified, placing it near the top of coding-focused models.

That said, it still doesn’t take the overall lead in programming tasks — Claude Sonnet 4.5 remains ahead on the SWE-Bench leaderboard.

Why it matters

K2 Thinking is notable not just for accuracy, but for persistence. It can chain together hundreds of reasoning and tool-use steps, making it particularly relevant for autonomous agents, coding assistants, and research workflows. It signals how quickly open-weight models are closing the gap with proprietary systems in complex reasoning — even if they haven’t fully overtaken them yet.

Availability

Kimi K2 Thinking is live now on kimi.com in chat mode. A full agent mode and API access are coming soon, according to Moonshot.

Continue the conversation on Slack

Did this article spark your interest? Join our community of experts and enthusiasts to dive deeper, ask questions, and share your ideas.

Join our community