The Philosophy

Why Agent Flow

Specialisation and adversarial review are not optimisations on top of a working system — they are the conditions that make a working system possible.

Philosophy 01

Quality emerges from adversarial loops

A dedicated adversary that actively tries to break the code finds what a reviewer never sees. The critic's job is not to approve — it is to fail the work. The loop runs unattended until nothing breaks, compressing days of real-team code review into minutes.

critic → builder → critic unattended no human needed

Philosophy 02

Specialisation beats generalism

A single agent that must design, implement, test, and review the same work carries compounding bias. Separation of roles removes the tension between building and verifying. Twelve agents each own exactly one part of the problem — clean context, no role confusion.

12 specialists clean context role clarity

Key Differentiator

Works with any technology stack

The pipeline adapts to your project, not the other way around. Builder agents read TECHSTACK.md — auto-detected from your codebase — instead of assuming React, Node, or Postgres.

Auto-detection on first run. The explorer agent scans CLAUDE.md, package files, config files, and code patterns to detect your stack and propose a TECHSTACK.md profile. You review before it's created.

Manual edits are authoritative. Add entries for technologies you intend to use (future intentions). The explorer will never overwrite your additions without asking. TECHSTACK.md is your desire sheet as much as an auto-generated profile.

Works by reference, not by value. Agents receive a pointer to TECHSTACK.md, not the full content in their brief. Each agent reads only the sections it needs. No token bloat.

Philosophy 01 · The Adversarial Loop

How It Works

The Critic–Builder Feedback Cycle

The critic is not a reviewer — it actively constructs failure scenarios, adversarial inputs, and edge cases the builder didn't consider. When it finds something, it returns a specific ISSUES list. The builder fixes only those items. Then the critic runs again from scratch.

Step 1

Builder
Implements

Step 2

Critic
Tries to break it

Step 3

Builder
Fixes issues

Step 4

Critic
Reviews again

Result

PASS
Done

Repeats up to N iterations · No human intervention required

Philosophy 01 · Why It's Different

Why an Adversary?

Every builder carries a constructive bias. It is not a flaw — it is what makes building possible. When you narrow your focus to a specific diff, you inevitably optimise toward success. You read code in light of your intent. You test the paths you thought to test. The gaps you leave are not the gaps you see.

A reviewer faces a subtler version of the same problem. They arrive after the fact, reconstructing intent from output. Good reviewers compensate with experience and pattern recognition, but they are still reading with, not against. The mental posture is evaluative, not adversarial.

An adversary arrives with a different mandate: not to assess quality, but to find failure. That shift produces a different kind of attention — looking for the edge case just outside the spec, the implicit assumption that holds until it doesn't, the interaction between two things that each look correct in isolation. Fresh eyes on a narrowed diff find what builders miss precisely because they bring no investment in the outcome.

Philosophy 01 · How It Compares

Single Pass vs Adversarial Loop

The difference isn't speed — it's mandate. A reviewer assesses quality. An adversary hunts for failure. That shift in posture produces a different class of result.

Single Pass Review

Catches

Obvious logic errors and style inconsistencies
Missing null checks on the happy path
Tests that clearly don't cover changed code
Straightforward typos

Misses

Race conditions the builder never modelled
Edge cases hidden behind surface-level issues
Contract violations between components that each look correct alone
Side effects that break downstream consumers
Security paths in uncommon-but-valid states

Adversarial Loop

Also catches

Everything above — with higher confidence
Race conditions constructed from adversarial inputs
Edge cases outside the spec the builder didn't consider
Contract violations where each side looks correct in isolation
Implicit assumptions about ordering, timing, and state
Errors introduced by the fix (critic re-runs from scratch each time)

Unlike human review

No calendar dependency or reviewer fatigue
No social pressure to approve a colleague's work
Runs the moment the builder finishes

Unlike automated tests

Tests verify what you thought to test
The critic constructs what you didn't think of

Philosophy 01 · Evidence

What the Critic Catches

Real failure scenarios from build sessions — each one would have reached production without the adversarial loop.

Race condition

Two auth tokens were valid for overlapping windows; the critic constructed a concurrent session that could read stale permissions. Builder had tested single-session only.

Silent failure path

An external API failure was caught and logged, but the calling function returned a success response. The critic traced the return value through 3 call stack frames to find the silent success.

Fresh-eyes principle

A second critic pass on patched code found that the fix for a null-check introduced a new off-by-one in the boundary condition. The first pass had marked it clean.

Mobile layout broken

The desktop layout looked correct, but the critic loaded the page at 375px and found the hero grid overflowed horizontally. The builder had only tested at 1280px.

Insecure code

User input was interpolated directly into a shell command without sanitisation. The critic flagged the injection vector and returned FAIL with a recommended fix before the code ever ran.

Diverged from the plan

The builder implemented a caching layer that wasn't in the spec. The critic compared implementation against the plan file and flagged the scope creep before it was merged.

Philosophy 02 · The Problem

The Single-Agent Trap

What Goes Wrong When One Agent Does Everything

These are not hypothetical failure modes — they are the structural problems that drove the team's design.

Context Pollution

An agent that has seen schema, frontend, tests, and business requirements all in one session reasons poorly about any of them. Each new piece of context crowds out earlier reasoning. The signal-to-noise ratio drops with every tool call.

Role Confusion

A general-purpose agent asked to both design and implement makes compromises — cutting corners on design to ship faster, or over-engineering implementation to prove capability. Specialisation removes this tension completely.

Quality Cliff

Single-session quality degrades predictably as tasks grow. The first 200 lines are good. By 500 lines, the agent is fighting its own earlier decisions. By 1000 lines, it contradicts the architecture it designed 20 minutes ago.

Philosophy 02 · The Solution

12 Agents, Each With One Job

Every agent has a single domain, a single model tier, and a single place in the sequence. No agent makes decisions outside its scope.

orchestrator

Routes and sequences, never writes code

architect

Design decisions, pre-implementation

ideator

Lateral thinking, output to human only

critic

Adversarial review, tries to break the code

🌐 Playwright

reviewer

Final quality pass — security, correctness, perf

frontend

UI components, styling, client-side state

🌐 Playwright

backend

API routes, business logic, data persistence, auth

researcher

Live docs lookup, library comparison

explorer

Read-only codebase navigator

tester

Tests only — never marks done if tests are red

author

Docs and CHANGELOG

storage

DB schema, RLS policies, cloud storage

Opus 4

Sonnet 4.5

Haiku 4.5

See It In Practice

Each pipeline has a dedicated page explaining the agents involved, the sequence, and the design decisions behind it.

/plan

Plan Pipeline

Brainstorm → architect → researcher. From idea to implementation brief.

View pipeline →

/build

Build Pipeline

Explorer → builders → adversarial critic loop → reviewer → author.

View pipeline →

/review

Review Pipeline

Critic against existing code. Structured findings: BLOCKER, WARNING, SUGGESTION.

View pipeline →

Quality emerges from adversarial loops

Specialisation beats generalism

Works with any technology stack

The Critic–Builder Feedback Cycle

BuilderImplements

CriticTries to break it

BuilderFixes issues

CriticReviews again

PASSDone

Why an Adversary?

Single Pass vs Adversarial Loop

Single Pass Review

Catches

Misses

Adversarial Loop

Also catches

Unlike human review

Unlike automated tests

What the Critic Catches

Race condition

Silent failure path

Fresh-eyes principle

Mobile layout broken

Insecure code

Diverged from the plan

What Goes Wrong When One Agent Does Everything

Context Pollution

Role Confusion

Quality Cliff

12 Agents, Each With One Job

See It In Practice

Plan Pipeline

Build Pipeline

Review Pipeline

Builder
Implements

Critic
Tries to break it

Builder
Fixes issues

Critic
Reviews again

PASS
Done