Engineering · Strict execution agent

An engineering agent that ships MVP phases without scope creep.

Eight mandatory steps, TDD enforced per task, four verification gates before done, STRICT mode that halts instead of improvising. Scope-locked handoff, not a heap of code needing cleanup.

8
Mandatory steps per phase, no skipping
TDD
RED-GREEN-REFACTOR on every task
4
Verification gates before task-complete
The Challenge

MVP execution without gates becomes a two-month cleanup every time.

A plan says Phase 1 uses PostgreSQL. Midway through the build, the agent substitutes SQLite "for simplicity." Tests get deferred with a "we'll refactor later" comment. The handoff to marketing is a code dump with no demo URL, no test report, and three weeks of cleanup hiding inside.

The failures are always the same. Scope creep gets discovered at code review, not at the decision point. Test coverage becomes debt that compounds into Phase 2. The agent hits an architectural decision the plan didn't cover and improvises rather than halting. "Task complete" starts to mean "code is written" instead of "tests pass, build succeeds, runs locally, acceptance met."

The root cause isn't that AI coding tools can't code. It's that there's no forced gate between "code written" and "task complete" — no enforced step order, no mandatory tests, no STRICT mode that halts on scope deviation. Without those gates, every MVP slips in the same three ways.

This agent replaces model discretion with mandatory step sequencing and four verification checks per task. Scope creep gets blocked structurally. Deferred tests stop being possible. The MVP arrives merge-ready instead of arriving as a pile of cleanup.

How the agent handles it

Eight locked steps. TDD per task. Four checks before done.

Handoff from planner apps/X/planning/ STEP 1 context-load CONTEXT.md STEP 2 architecture build/arch.md STEP 3 brainstorm 2-3 approaches STEP 4 plan build/plan.md STEP 5 build TDD RED-GREEN STEP 6 test tests + build + run STEP 7 code-review verify-build skill STEP 8 · HANDOFF to marketing MVP ready + demo 4 VERIFICATION CHECKS (Step 6) Tests pass All unit + integration tests must pass before task done. Build succeeds App compiles/bundles without errors. Runs locally Dev server starts; main features work. Acceptance met Task acceptance criteria fully satisfied.
1

Step order is enforced, not suggested.

Context-load → architecture → brainstorm → plan → build → test → code-review → handoff. A progress tracker physically blocks Step N+1 until Step N completes. You cannot write tasks before the architecture exists; you cannot build before the plan exists. Parallel rush is impossible.

2

TDD is mandatory per task, not a nice-to-have.

Every task runs RED → GREEN → REFACTOR. Failing test first. Minimal implementation. Refactor for clarity. Tests are the executable spec, and the "we'll add tests later" escape hatch does not exist in this workflow.

3

Four gates must pass before a task is marked done.

Tests pass. Build succeeds. Runs locally. Acceptance criteria met. All four, every task, no exceptions. "Done" means something specific — and none of the four can be faked because each has a deterministic check.

4

STRICT mode halts on scope drift instead of improvising.

Plan says PostgreSQL? PostgreSQL it is. If a simpler path could work, the agent halts and asks: "The plan specifies PostgreSQL. I see a Postgres advisory-lock approach that's simpler than the Redis queue. Switch or stick?" Decisions stay yours. Scope creep stays structurally impossible.

What you get

Three things change once the step gates are on.

0scope drift

Substitutions without a halt

STRICT mode blocks the "let me just swap Postgres for SQLite" failure mode that eats most MVP timelines.

100%

Tasks with tests before merge

TDD is enforced at the task level. Deferred tests do not exist in this workflow.

4gates

Deterministic checks before done

Tests, build, run, acceptance — all four required. Handoff to marketing includes demo URL, test report, and open items, not a code dump.

Numbers observed in Brilworks' internal reference deployment. Actual figures on your stack will depend on phase scope, language/framework, and how clean your planner handoff is.

Is this right for you?

Honest fit criteria. We'd rather say no than oversell.

Strong fit if

  • You ship MVP phases on 2–4 week timelines with a clear scoped plan per phase
  • Scope creep is the recurring killer of your build estimates
  • Test coverage on past phases was spotty, deferred, or added after the fact
  • Architecture drift typically gets discovered at code review, not at decision time

Not a fit if

  • You deploy continuously with one-day iterations and no phase boundaries
  • Your project scope evolves daily and a strict spec is impossible to write
  • Code generation speed — not execution discipline — is your actual bottleneck
  • You're not willing to write a real plan before the build starts

Book a 30-minute scoping call.

We'll walk through your current phase pattern, map it against the eight-step gauntlet, and tell you honestly whether deterministic execution is the right next step.