done mean verified. When an agent completes file-mutating work, two independent checks must agree before that task can reach done: a deterministic gate (the task’s own build, test, or lint command, judged by its exit code) and — for risky or large changes — a read-only critic (an independent reviewer that cannot push changes). The principle is builder ≠ judge: the agent that did the work never certifies its own work.
This matters because a self-grading model is a known failure mode. A code generator that also grades its own output has every incentive — intentional or not — to assess itself favorably. Clawboo’s only signals into done are a machine truth (an exit code) and a structurally-independent reviewer.
The verification gate is built into the board state machine, not an optional step. Every transition to
done is checked against the task’s stored verification result — by any caller, through any path. The only escape is an explicit, audited humanOverride.What you will see
When an agent completes work and transitions a task toin_review, verification runs automatically:
- The build/test gate runs. The task’s configured verify command executes inside the isolated worktree. If it exits non-zero or times out, the task moves back to
in_progresswith a structured explanation of what failed. - The critic runs (for risky changes). If the build passes and the change is large, delegated, or explicitly flagged sensitive, an independent reviewer examines the diff from a detached checkout it cannot push to. Blocking findings (security issues, crashes, data loss, wrong algorithms, or unmet acceptance criteria) send the task back for fixes.
- The task reaches
donewhen both checks are satisfied.
completed_with_debt — see below.
The two verification layers
The deterministic gate
The deterministic gate is the strongest signal because it reads a truth: an exit code. It runs the task’s configured verify command (set asVERIFY_CMD in the task’s worktree) and records:
- The command that ran
- The exit code
- Whether it timed out
- A scrubbed tail of stdout/stderr as evidence
- The duration
done requires real evidence; the absence of a check cannot certify completion. Until you configure VERIFY_CMD for a task, it will fail the gate.
The independent critic
When the build gate passes and the change meets at least one of these criteria, an independent critic reviews it:| Trigger | Threshold |
|---|---|
riskFlag | The task was explicitly marked sensitive |
| Delegated work | Any task with a parent (depth > 0) |
| Large file count | More than 5 files changed |
| Large diff | More than 300 lines changed |
CLAWBOO_REVIEWER_MODEL, the independence extends to the model level too.
The critic emits structured findings, each with a severity. Only five severities block the task from completing:
| Severity | Blocks? |
|---|---|
security | ✅ Yes |
crash | ✅ Yes |
data_loss | ✅ Yes |
wrong_algorithm | ✅ Yes |
missing_ac (unmet acceptance criteria) | ✅ Yes |
style | ❌ No (recorded as debt) |
perf | ❌ No (recorded as debt) |
other | ❌ No (recorded as debt) |
The bounded fix loop and completed_with_debt
The verification evaluator is permanent — it always runs. Only the retry budget is bounded. When verification fails, the task returns to in_progress with a structured explanation of what needs fixing. After a configurable number of attempts (maxCycles, default 3), the loop stops:
- If the build gate was passing when the budget ran out: the task reaches
doneascompleted_with_debt. The build is green; there are critic findings the loop couldn’t resolve, and they are recorded asdebtNotes. The task ships with a paper trail. - If the build gate was still failing when the budget ran out: the task moves to
blockedand a human receives a clear description of why. A broken build never silently ships.
The humanOverride
When a task has a non-promotable verification result and you need to ship it anyway, you can use humanOverride on the board task. This is the only way to move a task with a known-failing result to done. Every override is written to the audit log — including which result was overridden, the previous status, and who requested it. Overrides are never silent.
What verification does not cover
- Read-only tasks. Verification only applies to file-mutating work with an isolated worktree and a non-empty diff.
- Non-worktree runs. Tasks without a worktree carry no verification result and reach
doneun-gated. - The quality of your verify command. Verification reads an exit code; it cannot validate that your
VERIFY_CMDactually exercises the change. A weak test suite yields a weak gate.
The Board
The state machine the
→ done gate lives in.Governance
Budgets, circuit breakers, and caps that bound runs.
Observability
Where verification events and verdicts appear in traces.
Board API
The REST surface, including
humanOverride and 409 verification_required.