Files

adlee-was-taken 3b09adf3b2 docs(coordination): add RELAY.md — multi-agent kickoff + relay reference

TL;DR-first guide to the PM/Senior-Dev paradigm: how to invoke
/multi-agent-kickoff, how the launcher's three modes (manual/tmux/kitty)
work, the in-memory queue + per-role inbox semantics, the call.py /
call.ts fallback shims, message kinds, conventions, and troubleshooting.

Lives next to the kickoff prompts in docs/superpowers/coordination/ so
the workflow's docs and outputs share one home.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

2026-05-05 20:02:48 -04:00

12 KiB

Raw Blame History

RELAY — Multi-Agent Kickoff & Coordination

How to spin up parallel Claude Code sessions that coordinate over a shared MCP relay. One PM, two or more Devs, each in their own terminal, each on their own branch / worktree, exchanging structured messages.

TL;DR — three commands

# 1. Generate kickoff prompts (interactive — answers the design questions)
#    In any Claude Code session in this repo:
/multi-agent-kickoff

# 2. Start the relay + open the windows
bash tools/relay/start.sh --kitty   # or --tmux, or --manual

# 3. In each new Claude window, paste the prompt below the `---` line
#    from the file the launcher prints (e.g. coordination/<date>-pm-prompt.md)

That's the whole workflow. Everything below is the why and the troubleshooting.

What this is

The "PM/Senior-Dev paradigm" — one Claude session acts as project manager, two or more Claude sessions act as senior developers, each running its own subagents on a feature branch in its own worktree. They coordinate by sending each other typed messages (status / question / directive / free) through a tiny MCP server running locally.

When to use it:

You have 2+ implementation plans that share a release target and want to execute them in parallel under one coordinator.
You want each stream isolated (separate worktree, separate branch) so subagents can't accidentally commit to main or step on each other's files.
You want one human (the user) to be a relay-of-last-resort but not a router — the PM does the routing.

When NOT to use it: one-off tasks, single-stream plans, anything where the overhead of "spin up four windows" exceeds the work itself. For those, just work in the foreground.

The pieces

┌──────────────────┐  HTTP/SSE  ┌──────────────────┐
│  Relay (MCP)     │◀───────────│  PM session      │
│  tools/relay/    │            │  (Claude Code)   │
│  port 7331       │            └──────────────────┘
│                  │            ┌──────────────────┐
│  Per-role inbox  │◀───────────│  Dev-A session   │
│  in-memory       │            │  (Claude Code)   │
│  consume-once    │            └──────────────────┘
└──────────────────┘            ┌──────────────────┐
                       ◀────────│  Dev-B session   │
                                │  (Claude Code)   │
                                └──────────────────┘
                       ┌──────── (optional) ───────┐
                       ◀────────│  Dev-C session    │
                                └───────────────────┘

Relay MCP server — tools/relay/server.ts. HTTP/SSE on localhost:7331. Exposes three MCP tools: post_message, read_messages, list_pending. Per-connection MCP server instance prevents routing collisions across concurrent SSE clients.
In-memory queue — tools/relay/queue.ts. Per-role inbox (pm, dev-a, dev-b, dev-c). read is consume-once (FIFO drain). No TTL, no persistence, no cap — relay is dev-only ephemeral; restart the server to wipe state.
Launcher — tools/relay/start.sh. Three modes (manual / tmux / kitty) that all start the relay and either open the role windows or print the commands for you to open them by hand.
Fallback shim — tools/relay/call.py (Python) and tools/relay/call.ts (TS). Direct CLI access to the same MCP tools, for when the in-Claude MCP client isn't loading or you want to script a status check from a regular shell. Both are tracked in the repo and load-bearing for the multi-agent flow — do not delete.

Invocation

Step 1 — Generate the kickoff prompts

In any Claude Code session inside this repo, run:

/multi-agent-kickoff

The skill walks you through a short Q&A (release target, branch names per dev, plan-file paths, coordination cadence) and writes four prompt files to docs/superpowers/coordination/:

<date>-pm-prompt.md
<date>-dev-a-prompt.md
<date>-dev-b-prompt.md
<date>-dev-c-prompt.md   (only if 3 devs)

Each prompt is self-contained: it tells the receiving session its role, its branch / worktree, the plan it owns, and the coordination protocol (block format, when to send status, who to escalate to). The launcher script discovers the latest *-pm-prompt.md / *-dev-a-prompt.md / etc. by mtime, so the most recently generated set wins automatically.

Step 2 — Start the relay

Pick a launcher mode that matches your terminal setup.

--kitty (recommended on kitty)

bash tools/relay/start.sh --kitty

Opens 4 (or 5 with dev-c) tabs in the current kitty window: one for the relay log, one per role. Each role tab launches claude in the repo root. Paste the corresponding prompt into each role tab to start the session.

--tmux (recommended on non-kitty)

bash tools/relay/start.sh --tmux

Creates a tmux session relay-lift with windows relay, pm, dev-a, dev-b (and dev-c if a fourth prompt is found). Attaches automatically. Ctrl-b N to navigate windows. Detach with Ctrl-b d.

--manual (for any terminal)

bash tools/relay/start.sh --manual

Starts the relay in the current terminal and prints cat <path> commands for each role. Open new terminals yourself and paste the printed commands; this is the most flexible mode for unusual setups (split panes, remote sessions, terminal multiplexers other than tmux).

The launcher uses port 7331. If it's already in use the script aborts with the kill command — kill $(lsof -ti:7331) clears it.

Step 3 — Drive the coordination

The PM session is the entry point. Talk to PM about goals; PM decides who's working on what and posts directives to the dev sessions via the relay. Each dev reads its inbox, executes, and posts back status / questions. The user (you) is mostly a watcher — PM should self-route.

Common rhythm:

PM at start: posts a directive to each dev describing the first slice.
Dev on completion: posts status with branch / commit / what shipped.
Dev when blocked: posts a question; PM unblocks (decision) or escalates to user.
PM end-of-cycle: asks each dev for a status, summarizes, decides next slice.

Message kinds (MessageKind in queue.ts):

Kind	Use when
`status`	"I shipped X, branch is at Y, ready for next slice"
`question`	"Should I do A or B? Blocking until I hear back"
`directive`	PM-to-dev: "Next, do X. Constraints are Y. Acceptance is Z."
`free`	Anything that doesn't fit the above (FYI, side-channel chatter)

Block format inside body is freeform markdown. The kickoff prompts include the project's preferred block templates.

Fallback — when the MCP client misbehaves

If a Claude session can't reach the relay's MCP tools (transient SSE hiccup, MCP server failed to register, sandboxed network), use the shim:

# From any shell, with the relay running on 7331:
python3 tools/relay/call.py read_messages '{"for":"pm"}'
python3 tools/relay/call.py post_message '{"from":"dev-a","to":"pm","kind":"status","body":"shipped X"}'
python3 tools/relay/call.py list_pending '{"for":"dev-b"}'

call.ts is the same surface in TypeScript (bun run tools/relay/call.ts ...) for when you want to script from a TS context. Both shims speak raw MCP over the SSE transport; output is the JSON-RPC response.

The kickoff prompts reference call.py by path — if the in-Claude MCP client breaks mid-session, the dev can fall back to Bash python3 tools/relay/call.py ... and keep coordinating without restarting.

Where things live

docs/superpowers/coordination/
├── RELAY.md                                  ← you are here
├── <date>-pm-prompt.md                       generated by /multi-agent-kickoff
├── <date>-dev-a-prompt.md
├── <date>-dev-b-prompt.md
├── <date>-dev-c-prompt.md                    (optional, 4-role mode)
└── archive/                                  older kickoff sets

tools/relay/
├── start.sh                                  launcher (manual / tmux / kitty)
├── server.ts                                 MCP server (HTTP/SSE on :7331)
├── queue.ts                                  in-memory per-role FIFO
├── queue.test.ts                             node:test — run with `bun test`
├── call.py                                   Python MCP-client shim (fallback)
├── call.ts                                   TypeScript MCP-client shim (fallback)
├── package.json
└── tsconfig.json

The launcher's prompt-discovery is ls -t "$COORD_DIR"/*-<role>-prompt.md | head -1 — newest wins. To switch back to a previous kickoff set, either delete the newer files or move them under archive/.

Conventions

Roles are fixed strings: pm, dev-a, dev-b, dev-c. Adding a new role means editing Role in queue.ts, KNOWN_ROLES, the enum in server.ts's tool schema, and the launcher.
Worktree per dev: each dev session works in its own git worktree on its own branch. Subagents must cd into the worktree first — the multi-agent-kickoff skill bakes this rule into the dev prompts (subagents have been known to commit to main if the worktree cwd is only set in a header).
Branch naming follows the release train: feature/<release>-<dev>-<scope>. PM owns the merge order; devs do not merge each other's branches.
No squashing: the project preserves git history as audit log (per CLAUDE.md). Devs commit small and often; PM coordinates rebases at integration time, not before.
The user is not the router. PM should issue directives directly to devs via the relay. The user steps in only for cross-stream design decisions or when PM explicitly escalates.

Troubleshooting

"port 7331 is already in use" — another relay is running. kill $(lsof -ti:7331), then re-run start.sh.
Launcher can't find a prompt ((none found) in the printed paths) — /multi-agent-kickoff hasn't been run yet, or all generated prompts are under archive/. Re-run the skill.
Dev session committing to main instead of its worktree — its subagent prompts are missing the force-cd header. Regenerate the dev prompt via /multi-agent-kickoff (the skill bakes in the cd rule) or hand-edit the prompt to start with cd <worktree-path>.
MCP tools don't show up in the Claude session — restart the session. If it persists, fall back to call.py. If call.py also fails, check the relay log window for stack traces; the SSE transport sometimes wedges if a client disconnects ungracefully.
bun test failing in tools/relay/ — relay tests use node:test via bun. Run from tools/relay/, not the repo root: cd tools/relay && bun test. Extension tests use vitest and live elsewhere; don't conflate.
One dev session is silent — check python3 tools/relay/call.py list_pending '{"for":"<role>"}' from any shell. If the dev's inbox has unread messages, they may have crashed or detached. Open the role's window and resume.

Caveats

In-memory queue is dev-only. Restart the relay = lose all queued messages. There is no persistence by design — coordination is meant to flow forward, not be replayable.
No auth. The relay binds to localhost:7331 with no token. Don't expose the port; don't run on a shared machine.
The relay is not a chat history. read_messages drains the inbox. If you need to refer back to what was said, copy-paste into a session note or the PR description; don't expect the relay to remember.
Context costs scale with session count. Four parallel Claude sessions burn four context windows. Use this paradigm when the parallel speedup justifies the cost — for sequential work, one session is cheaper.

12 KiB Raw Blame History