docs(coordination): add RELAY.md — multi-agent kickoff + relay reference

TL;DR-first guide to the PM/Senior-Dev paradigm: how to invoke /multi-agent-kickoff, how the launcher's three modes (manual/tmux/kitty) work, the in-memory queue + per-role inbox semantics, the call.py / call.ts fallback shims, message kinds, conventions, and troubleshooting. Lives next to the kickoff prompts in docs/superpowers/coordination/ so the workflow's docs and outputs share one home. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-05 20:02:48 -04:00
parent 4f7ab91f14
commit 3b09adf3b2
1 changed files with 199 additions and 0 deletions
--- a/docs/superpowers/coordination/RELAY.md
+++ b/docs/superpowers/coordination/RELAY.md
@@ -0,0 +1,199 @@
+# RELAY — Multi-Agent Kickoff & Coordination
+
+How to spin up parallel Claude Code sessions that coordinate over a shared MCP relay. One PM, two or more Devs, each in their own terminal, each on their own branch / worktree, exchanging structured messages.
+
+## TL;DR — three commands
+
+```bash
+# 1. Generate kickoff prompts (interactive — answers the design questions)
+#    In any Claude Code session in this repo:
+/multi-agent-kickoff
+
+# 2. Start the relay + open the windows
+bash tools/relay/start.sh --kitty   # or --tmux, or --manual
+
+# 3. In each new Claude window, paste the prompt below the `---` line
+#    from the file the launcher prints (e.g. coordination/<date>-pm-prompt.md)
+```
+
+That's the whole workflow. Everything below is the why and the troubleshooting.
+
+## What this is
+
+The "PM/Senior-Dev paradigm" — one Claude session acts as project manager, two or more Claude sessions act as senior developers, each running its own subagents on a feature branch in its own worktree. They coordinate by sending each other typed messages (status / question / directive / free) through a tiny MCP server running locally.
+
+When to use it:
+
+- You have **2+ implementation plans** that share a release target and want to execute them in parallel under one coordinator.
+- You want each stream isolated (separate worktree, separate branch) so subagents can't accidentally commit to main or step on each other's files.
+- You want one human (the user) to be a relay-of-last-resort but not a router — the PM does the routing.
+
+When NOT to use it: one-off tasks, single-stream plans, anything where the overhead of "spin up four windows" exceeds the work itself. For those, just work in the foreground.
+
+## The pieces
+
+```
+┌──────────────────┐  HTTP/SSE  ┌──────────────────┐
+│  Relay (MCP)     │◀───────────│  PM session      │
+│  tools/relay/    │            │  (Claude Code)   │
+│  port 7331       │            └──────────────────┘
+│                  │            ┌──────────────────┐
+│  Per-role inbox  │◀───────────│  Dev-A session   │
+│  in-memory       │            │  (Claude Code)   │
+│  consume-once    │            └──────────────────┘
+└──────────────────┘            ┌──────────────────┐
+                       ◀────────│  Dev-B session   │
+                                │  (Claude Code)   │
+                                └──────────────────┘
+                       ┌──────── (optional) ───────┐
+                       ◀────────│  Dev-C session    │
+                                └───────────────────┘
+```
+
+- **Relay MCP server** — `tools/relay/server.ts`. HTTP/SSE on `localhost:7331`. Exposes three MCP tools: `post_message`, `read_messages`, `list_pending`. Per-connection MCP server instance prevents routing collisions across concurrent SSE clients.
+- **In-memory queue** — `tools/relay/queue.ts`. Per-role inbox (`pm`, `dev-a`, `dev-b`, `dev-c`). `read` is consume-once (FIFO drain). No TTL, no persistence, no cap — relay is dev-only ephemeral; restart the server to wipe state.
+- **Launcher** — `tools/relay/start.sh`. Three modes (manual / tmux / kitty) that all start the relay and either open the role windows or print the commands for you to open them by hand.
+- **Fallback shim** — `tools/relay/call.py` (Python) and `tools/relay/call.ts` (TS). Direct CLI access to the same MCP tools, for when the in-Claude MCP client isn't loading or you want to script a status check from a regular shell. Both are tracked in the repo and load-bearing for the multi-agent flow — do not delete.
+
+## Invocation
+
+### Step 1 — Generate the kickoff prompts
+
+In any Claude Code session inside this repo, run:
+
+```
+/multi-agent-kickoff
+```
+
+The skill walks you through a short Q&A (release target, branch names per dev, plan-file paths, coordination cadence) and writes four prompt files to `docs/superpowers/coordination/`:
+
+```
+<date>-pm-prompt.md
+<date>-dev-a-prompt.md
+<date>-dev-b-prompt.md
+<date>-dev-c-prompt.md   (only if 3 devs)
+```
+
+Each prompt is self-contained: it tells the receiving session its role, its branch / worktree, the plan it owns, and the coordination protocol (block format, when to send status, who to escalate to). The launcher script discovers the latest `*-pm-prompt.md` / `*-dev-a-prompt.md` / etc. by `mtime`, so the most recently generated set wins automatically.
+
+### Step 2 — Start the relay
+
+Pick a launcher mode that matches your terminal setup.
+
+**`--kitty` (recommended on kitty)**
+
+```bash
+bash tools/relay/start.sh --kitty
+```
+
+Opens 4 (or 5 with dev-c) tabs in the current kitty window: one for the relay log, one per role. Each role tab launches `claude` in the repo root. Paste the corresponding prompt into each role tab to start the session.
+
+**`--tmux` (recommended on non-kitty)**
+
+```bash
+bash tools/relay/start.sh --tmux
+```
+
+Creates a tmux session `relay-lift` with windows `relay`, `pm`, `dev-a`, `dev-b` (and `dev-c` if a fourth prompt is found). Attaches automatically. `Ctrl-b N` to navigate windows. Detach with `Ctrl-b d`.
+
+**`--manual` (for any terminal)**
+
+```bash
+bash tools/relay/start.sh --manual
+```
+
+Starts the relay in the current terminal and prints `cat <path>` commands for each role. Open new terminals yourself and paste the printed commands; this is the most flexible mode for unusual setups (split panes, remote sessions, terminal multiplexers other than tmux).
+
+The launcher uses port **7331**. If it's already in use the script aborts with the kill command — `kill $(lsof -ti:7331)` clears it.
+
+### Step 3 — Drive the coordination
+
+The PM session is the entry point. Talk to PM about goals; PM decides who's working on what and posts directives to the dev sessions via the relay. Each dev reads its inbox, executes, and posts back status / questions. The user (you) is mostly a watcher — PM should self-route.
+
+Common rhythm:
+
+- **PM at start:** posts a directive to each dev describing the first slice.
+- **Dev on completion:** posts status with branch / commit / what shipped.
+- **Dev when blocked:** posts a question; PM unblocks (decision) or escalates to user.
+- **PM end-of-cycle:** asks each dev for a status, summarizes, decides next slice.
+
+Message kinds (`MessageKind` in `queue.ts`):
+
+| Kind | Use when |
+|------|----------|
+| `status` | "I shipped X, branch is at Y, ready for next slice" |
+| `question` | "Should I do A or B? Blocking until I hear back" |
+| `directive` | PM-to-dev: "Next, do X. Constraints are Y. Acceptance is Z." |
+| `free` | Anything that doesn't fit the above (FYI, side-channel chatter) |
+
+Block format inside `body` is freeform markdown. The kickoff prompts include the project's preferred block templates.
+
+## Fallback — when the MCP client misbehaves
+
+If a Claude session can't reach the relay's MCP tools (transient SSE hiccup, MCP server failed to register, sandboxed network), use the shim:
+
+```bash
+# From any shell, with the relay running on 7331:
+python3 tools/relay/call.py read_messages '{"for":"pm"}'
+python3 tools/relay/call.py post_message '{"from":"dev-a","to":"pm","kind":"status","body":"shipped X"}'
+python3 tools/relay/call.py list_pending '{"for":"dev-b"}'
+```
+
+`call.ts` is the same surface in TypeScript (`bun run tools/relay/call.ts ...`) for when you want to script from a TS context. Both shims speak raw MCP over the SSE transport; output is the JSON-RPC response.
+
+The kickoff prompts reference `call.py` by path — if the in-Claude MCP client breaks mid-session, the dev can fall back to `Bash python3 tools/relay/call.py ...` and keep coordinating without restarting.
+
+## Where things live
+
+```
+docs/superpowers/coordination/
+├── RELAY.md                                  ← you are here
+├── <date>-pm-prompt.md                       generated by /multi-agent-kickoff
+├── <date>-dev-a-prompt.md
+├── <date>-dev-b-prompt.md
+├── <date>-dev-c-prompt.md                    (optional, 4-role mode)
+└── archive/                                  older kickoff sets
+
+tools/relay/
+├── start.sh                                  launcher (manual / tmux / kitty)
+├── server.ts                                 MCP server (HTTP/SSE on :7331)
+├── queue.ts                                  in-memory per-role FIFO
+├── queue.test.ts                             node:test — run with `bun test`
+├── call.py                                   Python MCP-client shim (fallback)
+├── call.ts                                   TypeScript MCP-client shim (fallback)
+├── package.json
+└── tsconfig.json
+```
+
+The launcher's prompt-discovery is `ls -t "$COORD_DIR"/*-<role>-prompt.md | head -1` — newest wins. To switch back to a previous kickoff set, either delete the newer files or move them under `archive/`.
+
+## Conventions
+
+- **Roles are fixed strings:** `pm`, `dev-a`, `dev-b`, `dev-c`. Adding a new role means editing `Role` in `queue.ts`, `KNOWN_ROLES`, the `enum` in `server.ts`'s tool schema, and the launcher.
+- **Worktree per dev:** each dev session works in its own git worktree on its own branch. Subagents must `cd` into the worktree first — the multi-agent-kickoff skill bakes this rule into the dev prompts (subagents have been known to commit to `main` if the worktree cwd is only set in a header).
+- **Branch naming follows the release train:** `feature/<release>-<dev>-<scope>`. PM owns the merge order; devs do not merge each other's branches.
+- **No squashing:** the project preserves git history as audit log (per `CLAUDE.md`). Devs commit small and often; PM coordinates rebases at integration time, not before.
+- **The user is not the router.** PM should issue directives directly to devs via the relay. The user steps in only for cross-stream design decisions or when PM explicitly escalates.
+
+## Troubleshooting
+
+- **"port 7331 is already in use"** — another relay is running. `kill $(lsof -ti:7331)`, then re-run `start.sh`.
+- **Launcher can't find a prompt** (`(none found)` in the printed paths) — `/multi-agent-kickoff` hasn't been run yet, or all generated prompts are under `archive/`. Re-run the skill.
+- **Dev session committing to `main` instead of its worktree** — its subagent prompts are missing the force-`cd` header. Regenerate the dev prompt via `/multi-agent-kickoff` (the skill bakes in the cd rule) or hand-edit the prompt to start with `cd <worktree-path>`.
+- **MCP tools don't show up in the Claude session** — restart the session. If it persists, fall back to `call.py`. If `call.py` also fails, check the relay log window for stack traces; the SSE transport sometimes wedges if a client disconnects ungracefully.
+- **`bun test` failing in `tools/relay/`** — relay tests use `node:test` via bun. Run from `tools/relay/`, not the repo root: `cd tools/relay && bun test`. Extension tests use vitest and live elsewhere; don't conflate.
+- **One dev session is silent** — check `python3 tools/relay/call.py list_pending '{"for":"<role>"}'` from any shell. If the dev's inbox has unread messages, they may have crashed or detached. Open the role's window and resume.
+
+## Caveats
+
+- **In-memory queue is dev-only.** Restart the relay = lose all queued messages. There is no persistence by design — coordination is meant to flow forward, not be replayable.
+- **No auth.** The relay binds to `localhost:7331` with no token. Don't expose the port; don't run on a shared machine.
+- **The relay is not a chat history.** `read_messages` drains the inbox. If you need to refer back to what was said, copy-paste into a session note or the PR description; don't expect the relay to remember.
+- **Context costs scale with session count.** Four parallel Claude sessions burn four context windows. Use this paradigm when the parallel speedup justifies the cost — for sequential work, one session is cheaper.
+
+## See also
+
+- `tools/relay/server.ts` — MCP tool definitions (`post_message`, `read_messages`, `list_pending`) and their schemas.
+- `tools/relay/queue.ts` — `Role` / `MessageKind` types; the canonical per-role-inbox semantics.
+- `docs/superpowers/coordination/<date>-pm-prompt.md` — the latest PM kickoff (the actual operational instructions PM runs by).
+- The `multi-agent-kickoff` skill — generates the kickoff prompt set.