Stop 'Vibe Coding' Unit Tests: Write Focused, Intent-Driven Tests

Andrew Gallagher warns that LLMs often generate bloated, brittle unit tests that verify implementation details over intent. He recommends writing one focused test at a time so suites stay meaningful and maintainable.

llm cover

TL;DR

  • LLM-generated suites often produce many brittle, implementation-focused unit tests that check rendering details or class names instead of intended behavior.
  • Claude example: roughly 30 tests (~200 LOC) for a small React button component.
  • Consequences: noisy, expensive-to-maintain test suites; larger context-window usage; polluted semantic search results; unwieldy PRs and increased churn.
  • LLMs help for highly abstract or algorithmic problems but less so for routine integration/UI intent checks.
  • Recommendation: write tests one at a time — ask a focused question, verify alignment, and keep each test brief and focused.
  • Full post and examples: https://www.andy-gallagher.com/blog/stop-vibe-coding-your-unit-tests/

Stop “vibe coding” unit tests — a concise critique

Andrew Gallagher argues that the common modern workflow of asking an LLM to scaffold code, tests, and docs in one go produces large volumes of brittle, implementation-focused unit tests rather than tests that validate intended behavior. This trend — sometimes called “vibe coding” — makes test suites noisy and expensive to maintain, and it can degrade both human and agent productivity.

What goes wrong

  • Quantity over value: LLMs routinely generate many tests. Gallagher cites Claude producing roughly 30 tests (~200 LOC) for a small React button component in repeated attempts. The result is lots of surface-level checks.
  • Testing implementation, not intent: Generated tests often assert how code does things (class names, rendering details) instead of what the software should accomplish. That locks behavior into tests and increases churn whenever legitimate refactors occur.
  • Agent and team friction: Bloated spec files consume context window space, pollute semantic search results used by agents, and lead to unwieldy PRs that teammates dislike. Tests that mirror implementation increase the frequency of test updates after routine changes.

Where LLMs help — and where they don’t

Gallagher acknowledges LLMs can be excellent at producing comprehensive tests for highly abstract or algorithmic problems. However, most day-to-day product work involves verifying integration and intent against dependencies, not re-checking trivial UI rendering or library behavior.

A practical counterproposal

Rather than mass-generating tests, Gallagher recommends writing tests one at a time: ask the agent a focused question about a single desired behavior, verify that the produced test aligns with intentions, and keep each test brief and focused. The guiding principle: “Less is more.”

For the fuller discussion, examples, and the original commentary, read the full post: https://www.andy-gallagher.com/blog/stop-vibe-coding-your-unit-tests/

Continue the conversation on Slack

Did this article spark your interest? Join our community of experts and enthusiasts to dive deeper, ask questions, and share your ideas.

Join our community