Hands-on GPT-5.2 Review: Stronger Code, Vision, and Deep Reasoning

Hands-on: GPT-5.2’s gains for coding, vision, and long-form reasoning

After two weeks of hands-on testing that began November 25, the GPT-5.2 family shows clear improvements in several areas that matter for developers and researchers. The release brings better task completion behavior, stronger code generation, and improved vision and long-context handling — though latency and occasional reasoning stalls remain notable downsides.

Better instruction-following and ambition

One of the most useful advances is improved instruction-following in the sense of completing multi-step workflows rather than stopping early. In practice this means the model more reliably carries out longer, explicitly described processes (for example, generating a full list of 50 options before selecting the best). It also demonstrates a willingness to attempt much larger tasks end-to-end—such as drafting a full 200-page book—rather than defaulting to outlines and section-by-section offers. The outputs are not production-ready in those extreme cases, but the willingness to execute entire workflows opens new iterative approaches for creative and research tasks.

Code generation: more capable and persistent

Code generation is a tangible step up from GPT-5.1. The model tends to produce longer, more autonomous coding sessions, remains engaged for more complex tasks, and gets more things right on the first pass. Tests with Three.js highlighted improved styling (textures and lighting) but also revealed that spatial placement and layout reasoning still need work. Overall, the model is more reliable across larger code tasks and shows stronger context-awareness.

Vision and long-context handling

Vision capabilities are noticeably improved, especially in spatial understanding and object positioning within images—though generation of exact spatial layouts can still be imperfect. Long-context performance is also stronger: working with huge codebases, extended analysis threads, and large agentic workflows feels more stable than before, which benefits agent-style coding and repo-aware tasks.

GPT-5.2 Pro: deeper reasoning at a cost

The Pro variant delivers the clearest leap. Pro is stronger at deep reasoning, understanding intent beyond literal instructions, and holding more context while synthesizing multiple angles. Examples include meal-planning prompts where Pro optimized not just cooking time but shopping complexity and prep overhead, showing a grasp of user constraints beyond the literal brief.

Those gains come with trade-offs. Pro is notably slower, and occasional cases occur where it becomes stuck between conflicting directives—spending a long time thinking and sometimes still failing. Pro is available only inside ChatGPT and not exposed in Codex CLI or API, which limits where its reasoning strength can be applied directly.

Codex CLI and agentic coding

In Codex CLI, GPT-5.2 is the closest experience to Pro-quality coding in a CLI environment so far. It excels at context-gathering: asking clarifying questions, reading files, and exploring the repo before implementing changes. That behavior reduces blind assumptions and increases first-shot correctness. The trade-off is that the highest-reasoning modes can be very slow, sometimes taking significantly longer than Pro in ChatGPT.

Workflow comparisons and practical guidance

Across parallel usage with other frontier models, the practical roles have settled into distinct buckets:

Quick lookups and syntax questions: competitors may be faster and more concise.
Deep research and complex reasoning: GPT-5.2 Pro tends to produce stronger, more thoughtful results.
Frontend aesthetics: other models can produce more polished-looking UIs, though GPT-5.2 is more reliable for engineering correctness.

Quirks and the speed problem

The biggest friction point is speed. Standard GPT-5.2 Thinking is often slow enough to deter frequent use for everyday queries, and the extra-deep reasoning modes (including Pro) increase latency further. Occasional reasoning loops or prolonged deliberation that ends in failure are another practical annoyance.

Conclusion

GPT-5.2 advances instruction fidelity, code generation, vision, and long-context stability. For tasks that benefit from careful thought—research, complex debugging, and agentic coding—GPT-5.2 Pro is a notable step forward, albeit with real-world trade-offs in latency and availability. For quick interactive work, faster alternatives still occupy a useful place in the toolkit.

Further reading and the original review: https://shumer.dev/gpt52review

TL;DR

Hands-on: GPT-5.2’s gains for coding, vision, and long-form reasoning

Better instruction-following and ambition

Code generation: more capable and persistent

Vision and long-context handling

GPT-5.2 Pro: deeper reasoning at a cost

Codex CLI and agentic coding

Workflow comparisons and practical guidance

Quirks and the speed problem

Conclusion

Continue the conversation on Slack

Related Articles

Prevent Context Rot: Practical Context Engineering for AI Agent Harnesses

Mistral 3 Debuts: Open Multimodal Models from 3B to 675B

ByteDance’s Volcano Engine Launches Ultra‑Cheap AI Coding Assistant