Anthropic Unveils Claude Opus 4.5 — Faster, Smarter Coding and Agents | Oltre.dev

Anthropic launches Claude Opus 4.5 — focused gains for coding, agents, and long-running tasks

Anthropic has released Claude Opus 4.5, positioned as its most capable model to date for coding, agentic workflows, and interacting with computer interfaces like spreadsheets and slides. The model is available immediately across Anthropic’s apps, the Claude API, and on all three major cloud platforms. Developers can call the model as claude-opus-4-5-20251101 via the Claude API (see the model overview).

Where Opus 4.5 moves the needle

Anthropic reports that Opus 4.5 sets new records on real-world software engineering benchmarks, with notable strengths in multi-step reasoning and agentic workflows. On internal and public benchmarks the model demonstrates:

Stronger coding performance across multiple languages on SWE-bench Multilingual and a reported double-digit improvement on Aider Polyglot compared with Sonnet 4.5.
Improved agentic problem solving (e.g., BrowseComp-Plus and τ2-bench scenarios) where the model finds legitimate multi-step workarounds in constrained settings.
Long-horizon reliability, with higher pass rates on held-out tests while using substantially fewer tokens — Anthropic cites up to 65% token savings in some workloads.
Improved vision, reasoning, and mathematics relative to prior Claude releases, and a reported ~15% uplift on a fetch-enabled BrowseComp-Plus research evaluation when using advanced context and tool techniques.

Anthropic also benchmarked Opus 4.5 on a timed engineering take-home exam used for hiring. With parallel test-time compute, the model scored higher than any human candidate previously recorded for that exam.

Developer controls and platform features

Opus 4.5 is accompanied by platform features aimed at trading off speed, cost, and capability:

A new effort parameter lets callers select behavior from nimble to deeper thought; Anthropic reports that at medium effort Opus 4.5 matched Sonnet 4.5’s best SWE-bench Verified score while using 76% fewer output tokens, and at high effort it exceeded Sonnet by 4.3 percentage points while using 48% fewer tokens.
Context compaction and extended thinking budgets (evaluation runs referenced a 64K thinking budget and a 200K context window) to support longer reasoning and agent runs.
Improved tool use, context management, and memory capabilities to support coordinated multi-agent setups and long-running tasks. Anthropic points to a nearly 15-point boost on a deep research evaluation when combining these techniques.

Developers can read more on the Claude Developer Platform and the available tooling in the platform docs.

Product updates

Several product-level changes take advantage of Opus 4.5’s capabilities:

Claude Code gains an improved Plan Mode that produces editable plan.md files and executes with more precise planning and fewer dead-ends. Claude Code is also available in the desktop app for parallel local and remote sessions.
The Claude apps now keep long conversations from “hitting a wall” by summarizing earlier context as needed.
Claude for Chrome is available to all Max users, and Claude for Excel beta access has been expanded to Max, Team, and Enterprise tiers.
For users with Opus access, Anthropic removed Opus-specific caps and increased usage limits for Max and Team Premium customers to approximate prior Sonnet token quotas.

Pricing for Opus 4.5 is listed at $5/$25 per million tokens, making Opus-level capability more accessible according to the announcement.

Safety and alignment

Anthropic frames Opus 4.5 as the company’s most robustly aligned model so far and reports meaningful improvements in resistance to prompt-injection attacks. Independent-style prompt-injection adversarial evaluations (run by Gray Swan in the announcement) placed Opus 4.5 ahead of other frontier models on this axis. Full capability and safety evaluations are documented in the model’s system card.