The Philosophy

Why Agent Flow

Specialisation and adversarial review are not optimisations on top of a working system — they are the conditions that make a working system possible.

Philosophy 01

Quality emerges from adversarial loops

A dedicated adversary that actively tries to break the code finds what a reviewer never sees. The critic's job is not to approve — it is to fail the work. The loop runs unattended until nothing breaks, compressing days of real-team code review into minutes.

critic → builder → critic unattended no human needed

Philosophy 02

Specialisation beats generalism

A single agent that must design, implement, test, and review the same work carries compounding bias. Separation of roles removes the tension between building and verifying. Twelve agents each own exactly one part of the problem — clean context, no role confusion.

12 specialists clean context role clarity
Key Differentiator

Works with any technology stack

The pipeline adapts to your project, not the other way around. Builder agents read TECHSTACK.md — auto-detected from your codebase — instead of assuming React, Node, or Postgres.

Auto-detection on first run. The explorer agent scans CLAUDE.md, package files, config files, and code patterns to detect your stack and propose a TECHSTACK.md profile. You review before it's created.

Manual edits are authoritative. Add entries for technologies you intend to use (future intentions). The explorer will never overwrite your additions without asking. TECHSTACK.md is your desire sheet as much as an auto-generated profile.

Works by reference, not by value. Agents receive a pointer to TECHSTACK.md, not the full content in their brief. Each agent reads only the sections it needs. No token bloat.

Philosophy 01 · The Adversarial Loop

How It Works

The Critic–Builder Feedback Cycle

The critic is not a reviewer — it actively constructs failure scenarios, adversarial inputs, and edge cases the builder didn't consider. When it finds something, it returns a specific ISSUES list. The builder fixes only those items. Then the critic runs again from scratch.

Repeats up to N iterations · No human intervention required

Philosophy 01 · Why It's Different

Why an Adversary?

Every builder carries a constructive bias. It is not a flaw — it is what makes building possible. When you narrow your focus to a specific diff, you inevitably optimise toward success. You read code in light of your intent. You test the paths you thought to test. The gaps you leave are not the gaps you see.

A reviewer faces a subtler version of the same problem. They arrive after the fact, reconstructing intent from output. Good reviewers compensate with experience and pattern recognition, but they are still reading with, not against. The mental posture is evaluative, not adversarial.

An adversary arrives with a different mandate: not to assess quality, but to find failure. That shift produces a different kind of attention — looking for the edge case just outside the spec, the implicit assumption that holds until it doesn't, the interaction between two things that each look correct in isolation. Fresh eyes on a narrowed diff find what builders miss precisely because they bring no investment in the outcome.

Philosophy 01 · How It Compares

Single Pass vs Adversarial Loop

The difference isn't speed — it's mandate. A reviewer assesses quality. An adversary hunts for failure. That shift in posture produces a different class of result.

Single Pass Review

Catches

  • Obvious logic errors and style inconsistencies
  • Missing null checks on the happy path
  • Tests that clearly don't cover changed code
  • Straightforward typos

Misses

  • Race conditions the builder never modelled
  • Edge cases hidden behind surface-level issues
  • Contract violations between components that each look correct alone
  • Side effects that break downstream consumers
  • Security paths in uncommon-but-valid states

Adversarial Loop

Also catches

  • Everything above — with higher confidence
  • Race conditions constructed from adversarial inputs
  • Edge cases outside the spec the builder didn't consider
  • Contract violations where each side looks correct in isolation
  • Implicit assumptions about ordering, timing, and state
  • Errors introduced by the fix (critic re-runs from scratch each time)

Unlike human review

  • No calendar dependency or reviewer fatigue
  • No social pressure to approve a colleague's work
  • Runs the moment the builder finishes

Unlike automated tests

  • Tests verify what you thought to test
  • The critic constructs what you didn't think of
Philosophy 01 · Evidence

What the Critic Catches

Real failure scenarios from build sessions — each one would have reached production without the adversarial loop.

Race condition

Two auth tokens were valid for overlapping windows; the critic constructed a concurrent session that could read stale permissions. Builder had tested single-session only.

Silent failure path

An external API failure was caught and logged, but the calling function returned a success response. The critic traced the return value through 3 call stack frames to find the silent success.

Fresh-eyes principle

A second critic pass on patched code found that the fix for a null-check introduced a new off-by-one in the boundary condition. The first pass had marked it clean.

Mobile layout broken

The desktop layout looked correct, but the critic loaded the page at 375px and found the hero grid overflowed horizontally. The builder had only tested at 1280px.

Insecure code

User input was interpolated directly into a shell command without sanitisation. The critic flagged the injection vector and returned FAIL with a recommended fix before the code ever ran.

Diverged from the plan

The builder implemented a caching layer that wasn't in the spec. The critic compared implementation against the plan file and flagged the scope creep before it was merged.

Philosophy 02 · The Problem

The Single-Agent Trap

What Goes Wrong When One Agent Does Everything

These are not hypothetical failure modes — they are the structural problems that drove the team's design.

Context Pollution

An agent that has seen schema, frontend, tests, and business requirements all in one session reasons poorly about any of them. Each new piece of context crowds out earlier reasoning. The signal-to-noise ratio drops with every tool call.

Role Confusion

A general-purpose agent asked to both design and implement makes compromises — cutting corners on design to ship faster, or over-engineering implementation to prove capability. Specialisation removes this tension completely.

Quality Cliff

Single-session quality degrades predictably as tasks grow. The first 200 lines are good. By 500 lines, the agent is fighting its own earlier decisions. By 1000 lines, it contradicts the architecture it designed 20 minutes ago.

Philosophy 02 · The Solution

12 Agents, Each With One Job

Every agent has a single domain, a single model tier, and a single place in the sequence. No agent makes decisions outside its scope.

orchestrator
Routes and sequences, never writes code
architect
Design decisions, pre-implementation
ideator
Lateral thinking, output to human only
critic
Adversarial review, tries to break the code
🌐 Playwright
reviewer
Final quality pass — security, correctness, perf
frontend
UI components, styling, client-side state
🌐 Playwright
backend
API routes, business logic, data persistence, auth
researcher
Live docs lookup, library comparison
explorer
Read-only codebase navigator
tester
Tests only — never marks done if tests are red
author
Docs and CHANGELOG
storage
DB schema, RLS policies, cloud storage
Opus 4
Sonnet 4.5
Haiku 4.5

See It In Practice

Each pipeline has a dedicated page explaining the agents involved, the sequence, and the design decisions behind it.