Performance Benchmarks
GPT-5 demonstrates notable gains over previous models in real software engineering tasks. On the SWE-bench Verified suite, which uses actual GitHub issues, GPT-5 reaches 74.9% accuracy with reasoning enabled—more than double GPT-4o’s 30.8%. Factuality tests show a roughly 80% reduction in errors compared to o3: the hallucination rate on LongFact-Concepts falls to 1.0% (versus 5.2%), and FActScore improves to 2.8% errors (versus 23.5%). In tool-calling evaluations, GPT-5 achieves 96.7% on the τ-bench telecom benchmark, where no model scored above 49% two months earlier.
Agent-Oriented Design
OpenAI’s training for GPT-5 emphasized four core traits for coding agents:
- Autonomy: Initiates long chains of reasoning and tool calls without stalling.
- Collaboration: Interacts like a teammate, not just a standalone utility.
- Communication: Offers explanations when needed and stays quiet during execution.
- Context Management & Testing: Maintains project context, runs builds and tests before completing tasks.
Cline’s Plan Mode leverages these strengths by prompting GPT-5 to ask targeted questions, map out dependencies, and confirm implementation plans. Act Mode then executes those plans, chaining tool calls reliably (in both sequence and parallel), producing clean, style-compliant diffs, handling multi-file refactors, and maintaining progress on extended task lists.
Long Context Support
GPT-5 handles up to 256,000 input tokens with sustained accuracy. On the OpenAI-MRCR benchmark at 256k tokens, it achieves 86.8% accuracy, while o3 cannot process contexts of that length. This capacity enables GPT-5 in Cline to retain information across complex, ongoing coding sessions.
Integrating GPT-5 in Cline
Cline incorporates OpenAI’s updated prompting guidelines, including verbosity controls and preamble messages. Its open-source framework provides full transparency: users can inspect the exact prompts sent to GPT-5. To activate the model, select “gpt-5” in Cline’s settings dropdown (available from OpenAI, Cline, or OpenRouter providers). The model runs via the standard OpenAI API key.
Pricing and Availability
GPT-5 in Cline is priced at:
- $1.25 per million input tokens (after a 90% cache discount).
- $10 per million output tokens. This structure represents roughly half the cost of Claude Sonnet 4 ($3/$15). GPT-5 is available immediately in Cline’s latest release.
TL;DR
- Performance: 74.9% on SWE-bench, 96.7% on τ-bench telecom.
- Factuality: 1.0% hallucinations on LongFact-Concepts; 2.8% errors on FActScore.
- Agent Traits: Autonomy, Collaboration, Communication, Context & Testing.
- Modes: Plan Mode for planning questions and dependency mapping; Act Mode for reliable execution and clean diffs.
- Context: Supports up to 256,000 tokens with 86.8% accuracy on long-context benchmarks.
- Pricing: $1.25/million input tokens, $10/million output tokens—about half Sonnet 4’s rate.
- Availability: Select “gpt-5” in Cline settings via OpenAI, Cline, or OpenRouter providers.
- Community: Feedback channels on Reddit and Discord.