The Philosophy
Why Agent Flow
Specialisation and adversarial review are not optimisations on top of a working system — they are the conditions that make a working system possible.
Philosophy 01
Quality emerges from adversarial loops
A dedicated adversary that actively tries to break the code finds what a reviewer never sees. The critic's job is not to approve — it is to fail the work. The loop runs unattended until nothing breaks, compressing days of real-team code review into minutes.
Philosophy 02
Specialisation beats generalism
A single agent that must design, implement, test, and review the same work carries compounding bias. Separation of roles removes the tension between building and verifying. Twelve agents each own exactly one part of the problem — clean context, no role confusion.
Works with any technology stack
The pipeline adapts to your project, not the other way around. Builder agents read TECHSTACK.md — auto-detected from your codebase — instead of assuming React, Node, or Postgres.
Auto-detection on first run. The explorer agent scans CLAUDE.md, package files, config files, and code patterns to detect your stack and propose a TECHSTACK.md profile. You review before it's created.
Manual edits are authoritative. Add entries for technologies you intend to use (future intentions). The explorer will never overwrite your additions without asking. TECHSTACK.md is your desire sheet as much as an auto-generated profile.
Works by reference, not by value. Agents receive a pointer to TECHSTACK.md, not the full content in their brief. Each agent reads only the sections it needs. No token bloat.
How It Works
The Critic–Builder Feedback Cycle
The critic is not a reviewer — it actively constructs failure scenarios, adversarial inputs, and edge cases the builder didn't consider. When it finds something, it returns a specific ISSUES list. The builder fixes only those items. Then the critic runs again from scratch.
Step 1
Builder
Implements
Step 2
Critic
Tries to break it
Step 3
Builder
Fixes issues
Step 4
Critic
Reviews again
Result
PASS
Done
Repeats up to N iterations · No human intervention required
Why an Adversary?
Every builder carries a constructive bias. It is not a flaw — it is what makes building possible. When you narrow your focus to a specific diff, you inevitably optimise toward success. You read code in light of your intent. You test the paths you thought to test. The gaps you leave are not the gaps you see.
A reviewer faces a subtler version of the same problem. They arrive after the fact, reconstructing intent from output. Good reviewers compensate with experience and pattern recognition, but they are still reading with, not against. The mental posture is evaluative, not adversarial.
An adversary arrives with a different mandate: not to assess quality, but to find failure. That shift produces a different kind of attention — looking for the edge case just outside the spec, the implicit assumption that holds until it doesn't, the interaction between two things that each look correct in isolation. Fresh eyes on a narrowed diff find what builders miss precisely because they bring no investment in the outcome.
Single Pass vs Adversarial Loop
The difference isn't speed — it's mandate. A reviewer assesses quality. An adversary hunts for failure. That shift in posture produces a different class of result.
Single Pass Review
Catches
- Obvious logic errors and style inconsistencies
- Missing null checks on the happy path
- Tests that clearly don't cover changed code
- Straightforward typos
Misses
- Race conditions the builder never modelled
- Edge cases hidden behind surface-level issues
- Contract violations between components that each look correct alone
- Side effects that break downstream consumers
- Security paths in uncommon-but-valid states
Adversarial Loop
Also catches
- Everything above — with higher confidence
- Race conditions constructed from adversarial inputs
- Edge cases outside the spec the builder didn't consider
- Contract violations where each side looks correct in isolation
- Implicit assumptions about ordering, timing, and state
- Errors introduced by the fix (critic re-runs from scratch each time)
Unlike human review
- No calendar dependency or reviewer fatigue
- No social pressure to approve a colleague's work
- Runs the moment the builder finishes
Unlike automated tests
- Tests verify what you thought to test
- The critic constructs what you didn't think of
What the Critic Catches
Real failure scenarios from build sessions — each one would have reached production without the adversarial loop.
Race condition
Two auth tokens were valid for overlapping windows; the critic constructed a concurrent session that could read stale permissions. Builder had tested single-session only.
Silent failure path
An external API failure was caught and logged, but the calling function returned a success response. The critic traced the return value through 3 call stack frames to find the silent success.
Fresh-eyes principle
A second critic pass on patched code found that the fix for a null-check introduced a new off-by-one in the boundary condition. The first pass had marked it clean.
Mobile layout broken
The desktop layout looked correct, but the critic loaded the page at 375px and found the hero grid overflowed horizontally. The builder had only tested at 1280px.
Insecure code
User input was interpolated directly into a shell command without sanitisation. The critic flagged the injection vector and returned FAIL with a recommended fix before the code ever ran.
Diverged from the plan
The builder implemented a caching layer that wasn't in the spec. The critic compared implementation against the plan file and flagged the scope creep before it was merged.
The Single-Agent Trap
What Goes Wrong When One Agent Does Everything
These are not hypothetical failure modes — they are the structural problems that drove the team's design.
Context Pollution
An agent that has seen schema, frontend, tests, and business requirements all in one session reasons poorly about any of them. Each new piece of context crowds out earlier reasoning. The signal-to-noise ratio drops with every tool call.
Role Confusion
A general-purpose agent asked to both design and implement makes compromises — cutting corners on design to ship faster, or over-engineering implementation to prove capability. Specialisation removes this tension completely.
Quality Cliff
Single-session quality degrades predictably as tasks grow. The first 200 lines are good. By 500 lines, the agent is fighting its own earlier decisions. By 1000 lines, it contradicts the architecture it designed 20 minutes ago.
12 Agents, Each With One Job
Every agent has a single domain, a single model tier, and a single place in the sequence. No agent makes decisions outside its scope.
See It In Practice
Each pipeline has a dedicated page explaining the agents involved, the sequence, and the design decisions behind it.
Plan Pipeline
Brainstorm → architect → researcher. From idea to implementation brief.
Build Pipeline
Explorer → builders → adversarial critic loop → reviewer → author.
Review Pipeline
Critic against existing code. Structured findings: BLOCKER, WARNING, SUGGESTION.