GPT‑5.1‑Codex‑Max Delivers Compaction, Multicontext and Long Coding Workflows

Introduction

OpenAI has introduced GPT‑5.1‑Codex‑Max, a coding-focused model designed for extended, agentic workflows within Codex. Built on an updated foundational reasoning model trained on agentic tasks across software engineering, research, and mathematics, the model targets long-running development tasks by operating coherently across multiple context windows.

GPT‑5.1‑Codex‑Max is available in Codex today for the CLI, IDE extension, cloud, and code review surfaces, with API access slated to arrive soon.

Frontier coding capabilities

The model was trained on real-world software engineering tasks such as PR creation, code review, frontend coding, and Q&A. It includes explicit training to operate in Windows environments and to collaborate more effectively within the Codex CLI. These design choices aim to improve performance on practical engineering workflows beyond benchmark gains.

Key technical points:

Native multicontext operation via compaction, enabling work that spans multiple context windows.
Training emphasis on agentic tasks to improve collaboration inside the Codex CLI and related tooling.

Speed, token efficiency, and reasoning effort

GPT‑5.1‑Codex‑Max emphasizes token efficiency through more effective internal reasoning. On SWE‑bench Verified, the model at medium reasoning effort outperforms GPT‑5.1‑Codex with the same effort while using about 30% fewer thinking tokens. For tasks that tolerate longer latency, a new Extra High (xhigh) reasoning effort is available; medium remains recommended for typical daily use.

A few evaluation snapshots:

SWE‑bench Verified: 73.7% → 77.9% (GPT‑5.1‑Codex high → Codex‑Max xhigh)
SWE‑Lancer IC SWE: 66.3% → 79.9%
Terminal‑Bench 2.0: 52.8% → 58.1%

Improvements in token efficiency are positioned to reduce real-world usage costs for extended coding tasks, including producing comparable frontend apps at lower token expense.

Long-running tasks and compaction

The defining operational feature is compaction, a process that prunes session history while preserving salient context so the model can continue coherent work across very long horizons. In Codex applications, sessions are compacted automatically as the context window fills, allowing repeated refreshes until the task completes.

OpenAI reports internal evaluations where GPT‑5.1‑Codex‑Max sustained independent work for more than 24 hours, iterating on implementations, resolving failing tests, and delivering finished results. The capability is intended for project‑scale refactors, extended debugging, and multi‑hour agent loops that previously hit context limits.

Safety, cybersecurity, and deployment posture

GPT‑5.1‑Codex‑Max shows improved performance on long‑horizon reasoning challenges, including areas relevant to cybersecurity. Under OpenAI’s Preparedness Framework, the model does not reach High capability on Cybersecurity, though it is described as the most capable cybersecurity model deployed to date. To address dual‑use risks, OpenAI continues iterative deployments, monitoring, and targeted safeguards; examples include cybersecurity‑specific monitoring and interventions for suspicious activity.

Codex runs in a sandboxed environment by default: file writes are confined to the workspace and network access is disabled unless explicitly enabled, reducing exposure to untrusted content and prompt‑injection risks. Codex also surfaces terminal logs, tool call citations, and test results to aid human review. Defensive tools such as automated vulnerability scanning and remediation assistance remain part of the ecosystem.

Relevant links:

System card: GPT‑5.1‑Codex‑Max system card
Preparedness Framework: /index/updating-our-preparedness-framework/
Aardvark program: /index/introducing-aardvark/
Disrupted cyber operations example: /global-affairs/disrupting-malicious-uses-of-ai-october-2025/
Prompt‑injection guidance: /safety/prompt-injections/

Availability

GPT‑5.1‑Codex‑Max is available in Codex for ChatGPT Plus, Pro, Business, Edu, and Enterprise plans. For details on plan usage limits, see the Codex models documentation: documentation. The model will replace GPT‑5.1‑Codex as the default in Codex surfaces; API availability for Codex CLI users with API keys is planned for the near future.

Closing notes

The release highlights two practical shifts: sustained multicontext operation via compaction and measurable token‑efficiency gains during reasoning. Those shifts aim to make longer, project‑scale agent workflows feasible within Codex tooling while keeping deployments under tightened sandbox and monitoring controls. OpenAI also reports internal adoption metrics—95% weekly usage among OpenAI engineers and increased pull request throughput—framing Codex‑Max as an incremental toolchain enhancement within existing Dev workflows.

Original source: https://openai.com/index/gpt-5-1-codex-max/