Stop “vibe coding” unit tests — a concise critique
Andrew Gallagher argues that the common modern workflow of asking an LLM to scaffold code, tests, and docs in one go produces large volumes of brittle, implementation-focused unit tests rather than tests that validate intended behavior. This trend — sometimes called “vibe coding” — makes test suites noisy and expensive to maintain, and it can degrade both human and agent productivity.
What goes wrong
- Quantity over value: LLMs routinely generate many tests. Gallagher cites Claude producing roughly 30 tests (~200 LOC) for a small React button component in repeated attempts. The result is lots of surface-level checks.
- Testing implementation, not intent: Generated tests often assert how code does things (class names, rendering details) instead of what the software should accomplish. That locks behavior into tests and increases churn whenever legitimate refactors occur.
- Agent and team friction: Bloated spec files consume context window space, pollute semantic search results used by agents, and lead to unwieldy PRs that teammates dislike. Tests that mirror implementation increase the frequency of test updates after routine changes.
Where LLMs help — and where they don’t
Gallagher acknowledges LLMs can be excellent at producing comprehensive tests for highly abstract or algorithmic problems. However, most day-to-day product work involves verifying integration and intent against dependencies, not re-checking trivial UI rendering or library behavior.
A practical counterproposal
Rather than mass-generating tests, Gallagher recommends writing tests one at a time: ask the agent a focused question about a single desired behavior, verify that the produced test aligns with intentions, and keep each test brief and focused. The guiding principle: “Less is more.”
For the fuller discussion, examples, and the original commentary, read the full post: https://www.andy-gallagher.com/blog/stop-vibe-coding-your-unit-tests/
