AI-Written Go: 90% of a 40k-Line Service Explained

Armin Ronacher on letting AI produce most of a Go service

Armin Ronacher describes a recent Go-based project of roughly 40,000 lines where AI produced about 90% of the code. The service implements an OpenAPI-compatible REST API, sends and receives email, and ships SDKs for Python and TypeScript, with infrastructure managed via Pulumi. The setup began with traditional system design and evolved into a development loop that leans heavily on Claude and Codex for generation, debugging, and review.

Placing the workflow in context

Some startups already operate with near-complete AI-generated codebases. Regardless of origin, every line is treated as a human responsibility and judged as if written personally. The result here avoids the typical prototype clutter: no stray files, duplicate implementations, or noisy emojis. Careful attention to architecture, code layout, and database interaction remained non-negotiable.

Foundation and architectural choices

The project started with manual system design, schema, and architecture, using AI as a rubber duck during planning rather than an author of foundational decisions. One early design choice was rolled back and reworked with the LLM’s help.

Two technical patterns stand out:

Raw SQL: A deliberate move away from an ORM toward handwritten SQL and migrations. The author values seeing the actual queries in code and logs, and relies on AI to generate the SQL that would otherwise be tedious to write.
OpenAPI-first: The OpenAPI spec became the canonical source, with code generation producing server shims and client SDKs. This approach integrates well with an AI-driven workflow.

Iteration patterns and tooling

Development relies on both Claude and Codex, each bringing different strengths: Codex is used heavily for code review after PRs, while Claude excels at debugging and tool-driven investigations. Work is split into PR-sized chunks to keep reviews manageable, and two main iteration strategies are used:

Agent loop with finishing touches — prompt until results are close, then clean up.
Lockstep loop — edit-by-edit collaboration with the agent.

Success requires providing the agent with precise context and pointing it to existing implementations to avoid reinvention.

Where agents fail

Agents can lose sight of global system state. Common failure modes include inappropriate abstractions, recreation of existing code, ill-suited threading/goroutine choices, reliance on outdated dependencies, and swallowing errors that should surface. An example cited: a rate limiter that “worked” but lacked jitter and used poor storage decisions. Without domain knowledge and careful review, such issues can produce brittle and opaque systems.

Where agents excel

AI reduced research time and accelerated hands-on experimentation. Notable wins included:

Faster research-to-code cycles and immediate evaluation of implementations.
Rapid prototyping of multiple OpenAPI approaches in a single day.
Low-cost refactoring that led to more organized code.
Infrastructure work with Pulumi and AWS completed far faster than usual.
Test infrastructure recommendations, such as using testcontainers for Postgres-backed tests, implemented quickly.
Strong SQL generation, including use of constructs that are easy to forget when writing by hand.

Final perspective

For this project, AI already handles the majority of code generation while human engineers retain responsibility for architecture, review, and production operation. The approach enables different trade-offs and faster experimentation, but it does not remove the need for engineering rigor. Absent careful oversight, AI-assisted development risks brittle, insecure, or unscalable results.

Original source: https://lucumr.pocoo.org/2025/9/29/90-percent/?