Superpowers 4.0 Tightens Subagent Workflows, Condenses Skills, Adds E2E Tests

Superpowers 4.0 tightens subagent workflows, trims skill descriptions, and adds basic E2E tests

Superpowers 4.0 arrives with a set of focused changes aimed at making the AI-driven development flow more reliable and predictable. The release refines the subagent architecture, streamlines how skills are triggered, introduces a modest end-to-end test suite, and increases internal use of GraphViz for process documentation. The project repository is available at the Superpowers 4.0 GitHub page: https://github.com/obra/superpowers.

Subagent-driven development becomes more formal

A central change is a clearer separation of the review steps into two dedicated agents. A spec review agent now evaluates whether an implementation matches the plan, and only after that agent signs off does a code review agent assess code quality. Both review steps are implemented as formal loops rather than potentially one-shot checks, allowing the coordinating agent to rerun code review after fixes are applied by an implementer.

This refinement moves the workflow closer to removing the legacy “two windows” interaction where a human bridged the implementing agent and the coordinator. Progress toward that change is limited by OpenAI Codex’s current lack of subagent support; a fix exists but is not yet production-ready.

Skill triggering and condensed descriptions

Skill triggering behavior has been adjusted to reduce instances where Claude-based agents claim familiarity with a skill but proceed without actually consulting it. To address this, skill description fields were revised to contain only guidance about when to use a skill, rather than describing what the skill does.

An illustrative change: the brainstorming skill’s description shifted from a mix of purpose and guidance to a focused directive specifying when it must be used. Relatedly, several narrowly scoped skills were consolidated into broader skills with progressively disclosed parts — for example, test-driven-development now contains testing-anti-patterns, and systematic-debugging absorbs root-cause-tracing, defense-in-depth, and condition-based-waiting. This consolidation also mitigates issues with hidden limits on the total number of characters allowed for skill descriptions in Claude Code (see the write-up on Claude Code skill limits: https://blog.fsck.com/2025/12/17/claude-code-skills-not-triggering/).

Tests and process notation

A lightweight set of end-to-end tests has been added. These are not a formal evals suite but cover the full brainstorming → planning → implementing flow and verify skill usage, and have already helped improve skill triggering.

Internally, Superpowers is making heavier use of GraphViz ‘dot’ notation for process documentation. Dot provides a slightly more formal process representation than prose, and has proven usable for describing agent workflows (see: https://blog.fsck.com/2025/09/29/using-graphviz-for-claudemd/).

Sponsorship and maintenance

The project remains free and open. The maintainer has enabled GitHub Sponsorships for those who wish to support ongoing development: https://github.com/sponsors/obra

Original post: https://blog.fsck.com/2025/12/18/superpowers-4/

TL;DR

Superpowers 4.0 tightens subagent workflows, trims skill descriptions, and adds basic E2E tests

Subagent-driven development becomes more formal

Skill triggering and condensed descriptions

Tests and process notation

Sponsorship and maintenance

Continue the conversation on Slack

Related Articles

Stop 'Vibe Coding' Unit Tests: Write Focused, Intent-Driven Tests