Delegation and orchestration

When a team’s leader decides a teammate should do something, that decision has to become real work: claimed, tracked, recovered if the run dies, and reported back when it’s done. Clawboo’s orchestrator is the machine that turns a delegation signal into a board task and shepherds it to completion. The orchestrator observes every team participant’s RuntimeEvent stream and reads structured delegation signals from typed lifecycle events, never from a regex over prose. A signal becomes a board task that is created, atomically claimed, and delivered; the result rounds back to the board and reflects to the leader for synthesis. This page explains the structured-first contract, how a delegation flows end-to-end, how plans and fan-out are handled, the depth cap and the single reduce point, and what happens when a delegate fails or goes silent.

What it is, and what it isn’t

Orchestration is board-driven. A delegation does not live in chat scrollback; it lives as a durable board task with a claim, an execution row, and a status. Chat is narration of that state; the orchestrator reflects board mutations into the team room, but a chat message is never a write path back to the board. This is the same authority rule the board enforces: the board is canonical, peer chat is narration. The orchestrator is structured-first. It reads exactly two kinds of typed signal and nothing else:

a sessions_send tool-call event (the primary signal), and
a <delegate> or <plan> directive parsed once from a terminal done summary (the structured-output contract).

It does not scan prose for natural-language delegation intent. An earlier engine matched nine natural-language patterns (@Name, please …, route to @Name, and so on) plus an implicit fan-out heuristic over plural prose; that engine was retired. The structured-tag parsing the lifecycle path still needs was relocated into a pure parsing module (delegationTags.ts) that does tag extraction and stripping only; it does not interpret prose. Fan-out survives, but as a structured fact: two or more structured delegations in one turn become two or more parallel tasks.

Agents are instructed to emit the structured tags. The team’s AGENTS.md (generated by buildTeamAgentsMd) tells each agent the required <delegate to="@Name">task</delegate> form and explicitly warns that plain prose @-mentions “are only a fallback and may not route.” The tag parser tolerates common model drift (a dropped <, smart quotes, a mangled opener) by anchoring on the closing </delegate> plus the to="…"> attribute shape; prose effectively never contains that shape, so the tolerance recovers weak-model output without false positives.

The model

A delegation flows from the delegator’s terminal event through the board to the delegate, and the delegate’s result flows back. The orchestrator is the per-team mediator in the middle. The orchestrator core is a pure, framework-free function (createBoardOrchestrator) holding small in-memory maps that bridge a session to its task. A thin React hook (useBoardOrchestration) wires it to the live OpenClaw RuntimeAdapter: it subscribes to each participant’s event stream, feeds events to the core, routes deliveries through a nudge-queue, and feeds board mutations to the read-only projection store. The hook runs only while the team chat is mounted, the Gateway is connected, and history hydration has completed; so hydrated history is never replayed as new work.

How it works

Reading a signal

For each observed event, the orchestrator extracts signals from typed fields only:

A tool-call event whose name matches sessions_send becomes a single delegation, with the target resolved from the call’s sessionKey / agentId / label against the team roster.
A done event with a <plan> block becomes an ordered list of plan steps.
A done event with <delegate> blocks (and no plan) becomes a list of independent parallel delegations.

Self-delegations, unknown targets, and empty task bodies are filtered out. Anything that is not one of these typed shapes yields nothing; there is no prose fallback.

Deriving a task

Each independent delegation runs through spawn, which is the DERIVE step of the board fusion:

Resolve the target’s session; if it has none, tell the delegator (so it never waits forever) and stop.
Refuse if the same (target, task) has already failed MAX_DELEGATION_FAILURES (3) times in a row, a code-level loop breaker the model’s own judgement can’t override.
Refuse if the source task is already at the depth cap.
If the delegation is risky (a heuristic flags destructive/external verbs), surface it on the leader’s approval queue and proceed only on allow_once / allow_always.
Create the board task (recording parentTaskId and a sourceDelegationId that encodes the delegator and, when deferred, the target), then atomically claim it, open an execution row, and deliver the task message.

A claim that comes back { ok: false } is a 409, someone else owns the work, and is never retried, the same rule the board itself follows.

Serial delivery and deferral

An agent has one team-scoped session, so it works delegations serially. If a target’s session is already running another delegated task, the orchestrator does not overwrite the session-to-task mapping (which would orphan the first task and misattribute its completion). Instead it creates the second task as a durable todo with an :agent: marker in its sourceDelegationId, and the ready-pump fires it once the session frees. Delivery itself is routed through a non-destructive nudge-queue: a message to a busy session is queued FIFO and sent at the next turn boundary, never interrupting an in-flight run.

Plans become dependency chains

A <plan> of ordered <step> blocks becomes a durable dependency chain on the board: step i is linked to depend on step i−1, so only step 0 starts ready. Each step’s intended target is encoded in its task so the chain survives a refresh. When a step completes, the orchestrator runs pumpReady, which claims and delivers any step whose blocker just finished, auto-unblock. The board’s ready query (todo, not dropped, every dependency done) and the atomic claim are the final arbiters of double-firing.

Round-trip and reflect

When a delegate finishes successfully, the orchestrator runs the ROUND-TRIP and REFLECT steps: it moves the task to done, closes the execution as succeeded, and records the summary (not the transcript) as a report-up comment on the task. Completed tasks are batched within a short window into one [Task Update] message delivered to the reduce point, the immediate delegator (so a mid-chain delegator isn’t left standing), falling back to the leader. The leader synthesizes across the batch. A success also resets the failure loop breaker for that (agent, task).

The single reduce point

The depth cap plus report-up-by-default establish a single point where results converge. A sub-task reports to its parent; a top-level task reports to the leader. Children are told (via a child tool blocklist on delivery) not to use sessions_send, so they can’t fan work out further on their own; recursion is bounded by the board’s ancestor chain, not by a prompt.

The depth cap

A source task at ancestor-depth MAX_SPAWN_DEPTH (2) or deeper may not spawn children. Depth is read from the board’s ancestor chain (getTask(...).ancestors.length), so it is enforced by durable state rather than by trusting the model. A leader/user-initiated turn has no source task, it is depth 0, and may always delegate. Hitting the cap is reported: a system comment lands on the source task and the delegator is told to handle the work directly or report it back.

Fan-out

Two or more structured delegations in one turn fire as parallel tasks. A per-turn fan-out cap (default 8) bounds how many one turn may spawn; overflow is counted and reported, not silently dropped. A system comment records the cap hit and the delegator is told to re-issue any dropped delegations in a follow-up.

Failure handling

The orchestrator’s hardest job is making sure a delegating agent is never “left standing”, waiting forever for an answer that will never come. Every way a delegate can fail to deliver a result is handled and reflected:

Failure	How it’s detected	What happens
Errored / out of room	a `done` with `reason` `error` / `max_turns`, or a fatal `error` event	task → `blocked`, execution closed `failed`, failure reflected to the delegator
Went silent	the idle watchdog (8 min, refreshed on every observed event)	task → `blocked` (timed-out), failure reflected
Session dropped	the per-session observer ended outside teardown	task → `blocked`, failure reflected
Can’t be delivered	the nudge-queue rejects the send	task failed immediately, delegator told

When a task is blocked by failure, its downstream plan steps can never become ready, so the orchestrator cancels the still-pending (todo / backlog) transitive dependents and rolls them into the reflection; the delegator learns the whole chain stalled, not just one step. A failure reflection is a [Task Update] entry marked “DID NOT COMPLETE” with the reason, telling the leader to retry, reassign, or report the failure rather than keep waiting. A user pressing Stop is deliberately not a failure. Stop is detected as a generation change since the delegation was dispatched; the task is released cleanly back to todo (re-runnable), with no blocked, no dependent cancellation, and no failure reflection, releasing the work the user just halted, not re-amplifying it.

Design rationale and trade-offs

Prose-scanning orchestration is fundamentally a heuristic: an LLM is a creative writer and will always invent a delegation phrasing the regex didn’t anticipate. The earlier engine’s nine natural-language patterns were a maintenance treadmill, and a missed pattern meant a silently dropped delegation. The structured-first contract flips the burden: the agent is constrained to a machine-parseable directive, so a routed delegation comes from explicit intent rather than a guess. The cost is that the agent must be told the contract (the AGENTS.md instructions) and the parser must tolerate weak-model drift, which it does, by anchoring on the closing tag. Making the board the substrate, rather than chat, buys durability and recoverability that narration can’t. A board task survives a refresh as authority, can be atomically claimed so two clients can’t both run it, and carries an execution ledger that crash recovery reads on restart. The trade-off is a second persistence layer beside each runtime’s own session state, and an orchestrator that must carefully bridge an ephemeral session to a durable task, which is most of the in-memory bookkeeping in the core. The depth cap, the report-up-to-a-single-reduce-point discipline, the fan-out cap, and the failure loop breaker are all enforced in code, below the model. A prompt can ask an agent to behave; only durable state and a hard counter can guarantee a bounded, terminating fan-out tree. The “leader left standing” class of bug, a delegating agent waiting forever on a dead delegate, is closed structurally by the watchdog plus the failure reflection, not by hoping a run always emits a clean terminal.

Boundaries and non-goals

No prose interpretation. The orchestrator does not infer delegation from natural language. A teammate that wants to delegate must emit a <delegate> / <plan> tag or call sessions_send; an @-mention in prose does not route.
Single team, in the browser. The shipping engine drives the OpenClaw team-chat path in the browser, observing agents over the live Gateway connection. The server-side executor runner is a separate, complementary path for spawned coding-agent runtimes.
Bounded fan-out, not arbitrary recursion. Delegation depth is capped at 2 and per-turn fan-out at 8 by default. These are safety ceilings, not workflow limits; a deep or wide plan must be expressed as a <plan> chain or re-issued across turns.
The orchestrator coordinates; it does not execute. It creates, claims, delivers, and reflects. The actual work happens inside the delegate’s runtime, and the result is reported up as a condensed summary, never the raw transcript.

This documents the v0.2.0 working tree (commit 03b206a). The current npm latest is clawboo@0.1.9, so npx clawboo installs 0.1.9 until the v0.2.0 tag is published. Differences are noted in Known Issues.

​What it is, and what it isn’t

​The model

​How it works

​Reading a signal

​Deriving a task

​Serial delivery and deferral

​Plans become dependency chains

​Round-trip and reflect

​The single reduce point

​The depth cap

​Fan-out

​Failure handling

​Design rationale and trade-offs

​Boundaries and non-goals

​See also

What it is, and what it isn’t

The model

How it works

Reading a signal

Deriving a task

Serial delivery and deferral

Plans become dependency chains

Round-trip and reflect

The single reduce point

The depth cap

Fan-out

Failure handling

Design rationale and trade-offs

Boundaries and non-goals

See also