What it is, and what it isn’t
Orchestration is board-driven. A delegation does not live in chat scrollback; it lives as a durable board task with a claim, an execution row, and a status. Chat is narration of that state; the orchestrator reflects board mutations into the team room, but a chat message is never a write path back to the board. This is the same authority rule the board enforces: the board is canonical, peer chat is narration. The orchestrator is structured-first. It reads exactly two kinds of typed signal and nothing else:- a
sessions_sendtool-call event (the primary signal), and - a
<delegate>or<plan>directive parsed once from a terminaldonesummary (the structured-output contract).
@Name, please …, route to @Name, and so on) plus an implicit fan-out heuristic over plural prose; that engine was retired. The structured-tag parsing the lifecycle path still needs was relocated into a pure parsing module (delegationTags.ts) that does tag extraction and stripping only; it does not interpret prose. Fan-out survives, but as a structured fact: two or more structured delegations in one turn become two or more parallel tasks.
Agents are instructed to emit the structured tags. The team’s
AGENTS.md (generated by buildTeamAgentsMd) tells each agent the required <delegate to="@Name">task</delegate> form and explicitly warns that plain prose @-mentions “are only a fallback and may not route.” The tag parser tolerates common model drift (a dropped <, smart quotes, a mangled opener) by anchoring on the closing </delegate> plus the to="…"> attribute shape; prose effectively never contains that shape, so the tolerance recovers weak-model output without false positives.The model
A delegation flows from the delegator’s terminal event through the board to the delegate, and the delegate’s result flows back. The orchestrator is the per-team mediator in the middle. The orchestrator core is a pure, framework-free function (createBoardOrchestrator) holding small in-memory maps that bridge a session to its task. A thin React hook (useBoardOrchestration) wires it to the live OpenClaw RuntimeAdapter: it subscribes to each participant’s event stream, feeds events to the core, routes deliveries through a nudge-queue, and feeds board mutations to the read-only projection store. The hook runs only while the team chat is mounted, the Gateway is connected, and history hydration has completed; so hydrated history is never replayed as new work.
How it works
Reading a signal
For each observed event, the orchestrator extracts signals from typed fields only:- A
tool-callevent whose name matchessessions_sendbecomes a single delegation, with the target resolved from the call’ssessionKey/agentId/labelagainst the team roster. - A
doneevent with a<plan>block becomes an ordered list of plan steps. - A
doneevent with<delegate>blocks (and no plan) becomes a list of independent parallel delegations.
Deriving a task
Each independent delegation runs throughspawn, which is the DERIVE step of the board fusion:
- Resolve the target’s session; if it has none, tell the delegator (so it never waits forever) and stop.
- Refuse if the same
(target, task)has already failedMAX_DELEGATION_FAILURES(3) times in a row, a code-level loop breaker the model’s own judgement can’t override. - Refuse if the source task is already at the depth cap.
- If the delegation is risky (a heuristic flags destructive/external verbs), surface it on the leader’s approval queue and proceed only on
allow_once/allow_always. - Create the board task (recording
parentTaskIdand asourceDelegationIdthat encodes the delegator and, when deferred, the target), then atomically claim it, open an execution row, and deliver the task message.
{ ok: false } is a 409, someone else owns the work, and is never retried, the same rule the board itself follows.
Serial delivery and deferral
An agent has one team-scoped session, so it works delegations serially. If a target’s session is already running another delegated task, the orchestrator does not overwrite the session-to-task mapping (which would orphan the first task and misattribute its completion). Instead it creates the second task as a durabletodo with an :agent: marker in its sourceDelegationId, and the ready-pump fires it once the session frees. Delivery itself is routed through a non-destructive nudge-queue: a message to a busy session is queued FIFO and sent at the next turn boundary, never interrupting an in-flight run.
Plans become dependency chains
A<plan> of ordered <step> blocks becomes a durable dependency chain on the board: step i is linked to depend on step i−1, so only step 0 starts ready. Each step’s intended target is encoded in its task so the chain survives a refresh. When a step completes, the orchestrator runs pumpReady, which claims and delivers any step whose blocker just finished, auto-unblock. The board’s ready query (todo, not dropped, every dependency done) and the atomic claim are the final arbiters of double-firing.
Round-trip and reflect
When a delegate finishes successfully, the orchestrator runs the ROUND-TRIP and REFLECT steps: it moves the task todone, closes the execution as succeeded, and records the summary (not the transcript) as a report-up comment on the task. Completed tasks are batched within a short window into one [Task Update] message delivered to the reduce point, the immediate delegator (so a mid-chain delegator isn’t left standing), falling back to the leader. The leader synthesizes across the batch. A success also resets the failure loop breaker for that (agent, task).
The single reduce point
The depth cap plus report-up-by-default establish a single point where results converge. A sub-task reports to its parent; a top-level task reports to the leader. Children are told (via a child tool blocklist on delivery) not to usesessions_send, so they can’t fan work out further on their own; recursion is bounded by the board’s ancestor chain, not by a prompt.
The depth cap
A source task at ancestor-depthMAX_SPAWN_DEPTH (2) or deeper may not spawn children. Depth is read from the board’s ancestor chain (getTask(...).ancestors.length), so it is enforced by durable state rather than by trusting the model. A leader/user-initiated turn has no source task, it is depth 0, and may always delegate. Hitting the cap is reported: a system comment lands on the source task and the delegator is told to handle the work directly or report it back.
Fan-out
Two or more structured delegations in one turn fire as parallel tasks. A per-turn fan-out cap (default 8) bounds how many one turn may spawn; overflow is counted and reported, not silently dropped. A system comment records the cap hit and the delegator is told to re-issue any dropped delegations in a follow-up.Failure handling
The orchestrator’s hardest job is making sure a delegating agent is never “left standing”, waiting forever for an answer that will never come. Every way a delegate can fail to deliver a result is handled and reflected:| Failure | How it’s detected | What happens |
|---|---|---|
| Errored / out of room | a done with reason error / max_turns, or a fatal error event | task → blocked, execution closed failed, failure reflected to the delegator |
| Went silent | the idle watchdog (8 min, refreshed on every observed event) | task → blocked (timed-out), failure reflected |
| Session dropped | the per-session observer ended outside teardown | task → blocked, failure reflected |
| Can’t be delivered | the nudge-queue rejects the send | task failed immediately, delegator told |
todo / backlog) transitive dependents and rolls them into the reflection; the delegator learns the whole chain stalled, not just one step. A failure reflection is a [Task Update] entry marked “DID NOT COMPLETE” with the reason, telling the leader to retry, reassign, or report the failure rather than keep waiting.
A user pressing Stop is deliberately not a failure. Stop is detected as a generation change since the delegation was dispatched; the task is released cleanly back to todo (re-runnable), with no blocked, no dependent cancellation, and no failure reflection, releasing the work the user just halted, not re-amplifying it.
Design rationale and trade-offs
Prose-scanning orchestration is fundamentally a heuristic: an LLM is a creative writer and will always invent a delegation phrasing the regex didn’t anticipate. The earlier engine’s nine natural-language patterns were a maintenance treadmill, and a missed pattern meant a silently dropped delegation. The structured-first contract flips the burden: the agent is constrained to a machine-parseable directive, so a routed delegation comes from explicit intent rather than a guess. The cost is that the agent must be told the contract (theAGENTS.md instructions) and the parser must tolerate weak-model drift, which it does, by anchoring on the closing tag.
Making the board the substrate, rather than chat, buys durability and recoverability that narration can’t. A board task survives a refresh as authority, can be atomically claimed so two clients can’t both run it, and carries an execution ledger that crash recovery reads on restart. The trade-off is a second persistence layer beside each runtime’s own session state, and an orchestrator that must carefully bridge an ephemeral session to a durable task, which is most of the in-memory bookkeeping in the core.
The depth cap, the report-up-to-a-single-reduce-point discipline, the fan-out cap, and the failure loop breaker are all enforced in code, below the model. A prompt can ask an agent to behave; only durable state and a hard counter can guarantee a bounded, terminating fan-out tree. The “leader left standing” class of bug, a delegating agent waiting forever on a dead delegate, is closed structurally by the watchdog plus the failure reflection, not by hoping a run always emits a clean terminal.
Boundaries and non-goals
- No prose interpretation. The orchestrator does not infer delegation from natural language. A teammate that wants to delegate must emit a
<delegate>/<plan>tag or callsessions_send; an@-mention in prose does not route. - Single team, in the browser. The shipping engine drives the OpenClaw team-chat path in the browser, observing agents over the live Gateway connection. The server-side executor runner is a separate, complementary path for spawned coding-agent runtimes.
- Bounded fan-out, not arbitrary recursion. Delegation depth is capped at 2 and per-turn fan-out at 8 by default. These are safety ceilings, not workflow limits; a deep or wide plan must be expressed as a
<plan>chain or re-issued across turns. - The orchestrator coordinates; it does not execute. It creates, claims, delivers, and reflects. The actual work happens inside the delegate’s runtime, and the result is reported up as a condensed summary, never the raw transcript.
This documents the v0.2.0 working tree (commit
03b206a). The current npm latest is clawboo@0.1.9, so npx clawboo installs 0.1.9 until the v0.2.0 tag is published. Differences are noted in Known Issues.See also
- The board, the durable substrate every delegation becomes a task on
- Peer chat, the team room delegations narrate into
- Verification, the builder-≠-judge gate a task crosses to reach
done - Governance, the depth/fan-out/cost caps and the approval gate
- Observability, the event log the orchestrator’s mutations project into
- Board API, the REST surface the orchestrator drives
- Glossary, canonical term definitions