AI-Written Go: 90% of a 40k-Line Service Explained

Armin Ronacher describes a Go project where AI wrote ~90% of 40k lines—handling raw SQL, OpenAPI REST, SDKs, and Pulumi infra. He praises speed and iteration but warns human review is essential to avoid brittle systems.

AI-Written Go: 90% of a 40k-Line Service Explained

TL;DR

  • ~90% AI-generated across a 40k-line Go service: OpenAPI-compatible REST API, email send/receive, Python and TypeScript SDKs, Pulumi-managed infra
  • OpenAPI-first workflow with spec as canonical source and code generation for server shims and clients; initial system design done manually with AI as a planning “rubber duck”
  • Raw SQL over ORM: handwritten SQL and migrations, with AI used to generate tedious queries and migrations
  • Tooling split: Codex for post-PR code review; Claude for debugging and tool-driven investigations; work broken into PR-sized chunks and two iteration patterns (agent loop with finishing touches; lockstep edit-by-edit)
  • Agent failure modes: loss of global state leading to bad abstractions, duplicated implementations, poor goroutine/threading choices, outdated deps, and swallowed errors (example: rate limiter lacking jitter and using poor storage)
  • Faster research-to-code, rapid OpenAPI prototyping, low-cost refactors, quicker Pulumi/AWS infra, and test recommendations like testcontainers for Postgres; humans remain responsible for architecture, review, and production rigor to avoid brittle or insecure outcomes

Armin Ronacher on letting AI produce most of a Go service

Armin Ronacher describes a recent Go-based project of roughly 40,000 lines where AI produced about 90% of the code. The service implements an OpenAPI-compatible REST API, sends and receives email, and ships SDKs for Python and TypeScript, with infrastructure managed via Pulumi. The setup began with traditional system design and evolved into a development loop that leans heavily on Claude and Codex for generation, debugging, and review.

Placing the workflow in context

Some startups already operate with near-complete AI-generated codebases. Regardless of origin, every line is treated as a human responsibility and judged as if written personally. The result here avoids the typical prototype clutter: no stray files, duplicate implementations, or noisy emojis. Careful attention to architecture, code layout, and database interaction remained non-negotiable.

Foundation and architectural choices

The project started with manual system design, schema, and architecture, using AI as a rubber duck during planning rather than an author of foundational decisions. One early design choice was rolled back and reworked with the LLM’s help.

Two technical patterns stand out:

  • Raw SQL: A deliberate move away from an ORM toward handwritten SQL and migrations. The author values seeing the actual queries in code and logs, and relies on AI to generate the SQL that would otherwise be tedious to write.
  • OpenAPI-first: The OpenAPI spec became the canonical source, with code generation producing server shims and client SDKs. This approach integrates well with an AI-driven workflow.

Iteration patterns and tooling

Development relies on both Claude and Codex, each bringing different strengths: Codex is used heavily for code review after PRs, while Claude excels at debugging and tool-driven investigations. Work is split into PR-sized chunks to keep reviews manageable, and two main iteration strategies are used:

  1. Agent loop with finishing touches — prompt until results are close, then clean up.
  2. Lockstep loop — edit-by-edit collaboration with the agent.

Success requires providing the agent with precise context and pointing it to existing implementations to avoid reinvention.

Where agents fail

Agents can lose sight of global system state. Common failure modes include inappropriate abstractions, recreation of existing code, ill-suited threading/goroutine choices, reliance on outdated dependencies, and swallowing errors that should surface. An example cited: a rate limiter that “worked” but lacked jitter and used poor storage decisions. Without domain knowledge and careful review, such issues can produce brittle and opaque systems.

Where agents excel

AI reduced research time and accelerated hands-on experimentation. Notable wins included:

  • Faster research-to-code cycles and immediate evaluation of implementations.
  • Rapid prototyping of multiple OpenAPI approaches in a single day.
  • Low-cost refactoring that led to more organized code.
  • Infrastructure work with Pulumi and AWS completed far faster than usual.
  • Test infrastructure recommendations, such as using testcontainers for Postgres-backed tests, implemented quickly.
  • Strong SQL generation, including use of constructs that are easy to forget when writing by hand.

Final perspective

For this project, AI already handles the majority of code generation while human engineers retain responsibility for architecture, review, and production operation. The approach enables different trade-offs and faster experimentation, but it does not remove the need for engineering rigor. Absent careful oversight, AI-assisted development risks brittle, insecure, or unscalable results.

Original source: https://lucumr.pocoo.org/2025/9/29/90-percent/?

Continue the conversation on Slack

Did this article spark your interest? Join our community of experts and enthusiasts to dive deeper, ask questions, and share your ideas.

Join our community