docs(coordination): add RELAY.md — multi-agent kickoff + relay reference
TL;DR-first guide to the PM/Senior-Dev paradigm: how to invoke /multi-agent-kickoff, how the launcher's three modes (manual/tmux/kitty) work, the in-memory queue + per-role inbox semantics, the call.py / call.ts fallback shims, message kinds, conventions, and troubleshooting. Lives next to the kickoff prompts in docs/superpowers/coordination/ so the workflow's docs and outputs share one home. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
199
docs/superpowers/coordination/RELAY.md
Normal file
199
docs/superpowers/coordination/RELAY.md
Normal file
@@ -0,0 +1,199 @@
|
||||
# RELAY — Multi-Agent Kickoff & Coordination
|
||||
|
||||
How to spin up parallel Claude Code sessions that coordinate over a shared MCP relay. One PM, two or more Devs, each in their own terminal, each on their own branch / worktree, exchanging structured messages.
|
||||
|
||||
## TL;DR — three commands
|
||||
|
||||
```bash
|
||||
# 1. Generate kickoff prompts (interactive — answers the design questions)
|
||||
# In any Claude Code session in this repo:
|
||||
/multi-agent-kickoff
|
||||
|
||||
# 2. Start the relay + open the windows
|
||||
bash tools/relay/start.sh --kitty # or --tmux, or --manual
|
||||
|
||||
# 3. In each new Claude window, paste the prompt below the `---` line
|
||||
# from the file the launcher prints (e.g. coordination/<date>-pm-prompt.md)
|
||||
```
|
||||
|
||||
That's the whole workflow. Everything below is the why and the troubleshooting.
|
||||
|
||||
## What this is
|
||||
|
||||
The "PM/Senior-Dev paradigm" — one Claude session acts as project manager, two or more Claude sessions act as senior developers, each running its own subagents on a feature branch in its own worktree. They coordinate by sending each other typed messages (status / question / directive / free) through a tiny MCP server running locally.
|
||||
|
||||
When to use it:
|
||||
|
||||
- You have **2+ implementation plans** that share a release target and want to execute them in parallel under one coordinator.
|
||||
- You want each stream isolated (separate worktree, separate branch) so subagents can't accidentally commit to main or step on each other's files.
|
||||
- You want one human (the user) to be a relay-of-last-resort but not a router — the PM does the routing.
|
||||
|
||||
When NOT to use it: one-off tasks, single-stream plans, anything where the overhead of "spin up four windows" exceeds the work itself. For those, just work in the foreground.
|
||||
|
||||
## The pieces
|
||||
|
||||
```
|
||||
┌──────────────────┐ HTTP/SSE ┌──────────────────┐
|
||||
│ Relay (MCP) │◀───────────│ PM session │
|
||||
│ tools/relay/ │ │ (Claude Code) │
|
||||
│ port 7331 │ └──────────────────┘
|
||||
│ │ ┌──────────────────┐
|
||||
│ Per-role inbox │◀───────────│ Dev-A session │
|
||||
│ in-memory │ │ (Claude Code) │
|
||||
│ consume-once │ └──────────────────┘
|
||||
└──────────────────┘ ┌──────────────────┐
|
||||
◀────────│ Dev-B session │
|
||||
│ (Claude Code) │
|
||||
└──────────────────┘
|
||||
┌──────── (optional) ───────┐
|
||||
◀────────│ Dev-C session │
|
||||
└───────────────────┘
|
||||
```
|
||||
|
||||
- **Relay MCP server** — `tools/relay/server.ts`. HTTP/SSE on `localhost:7331`. Exposes three MCP tools: `post_message`, `read_messages`, `list_pending`. Per-connection MCP server instance prevents routing collisions across concurrent SSE clients.
|
||||
- **In-memory queue** — `tools/relay/queue.ts`. Per-role inbox (`pm`, `dev-a`, `dev-b`, `dev-c`). `read` is consume-once (FIFO drain). No TTL, no persistence, no cap — relay is dev-only ephemeral; restart the server to wipe state.
|
||||
- **Launcher** — `tools/relay/start.sh`. Three modes (manual / tmux / kitty) that all start the relay and either open the role windows or print the commands for you to open them by hand.
|
||||
- **Fallback shim** — `tools/relay/call.py` (Python) and `tools/relay/call.ts` (TS). Direct CLI access to the same MCP tools, for when the in-Claude MCP client isn't loading or you want to script a status check from a regular shell. Both are tracked in the repo and load-bearing for the multi-agent flow — do not delete.
|
||||
|
||||
## Invocation
|
||||
|
||||
### Step 1 — Generate the kickoff prompts
|
||||
|
||||
In any Claude Code session inside this repo, run:
|
||||
|
||||
```
|
||||
/multi-agent-kickoff
|
||||
```
|
||||
|
||||
The skill walks you through a short Q&A (release target, branch names per dev, plan-file paths, coordination cadence) and writes four prompt files to `docs/superpowers/coordination/`:
|
||||
|
||||
```
|
||||
<date>-pm-prompt.md
|
||||
<date>-dev-a-prompt.md
|
||||
<date>-dev-b-prompt.md
|
||||
<date>-dev-c-prompt.md (only if 3 devs)
|
||||
```
|
||||
|
||||
Each prompt is self-contained: it tells the receiving session its role, its branch / worktree, the plan it owns, and the coordination protocol (block format, when to send status, who to escalate to). The launcher script discovers the latest `*-pm-prompt.md` / `*-dev-a-prompt.md` / etc. by `mtime`, so the most recently generated set wins automatically.
|
||||
|
||||
### Step 2 — Start the relay
|
||||
|
||||
Pick a launcher mode that matches your terminal setup.
|
||||
|
||||
**`--kitty` (recommended on kitty)**
|
||||
|
||||
```bash
|
||||
bash tools/relay/start.sh --kitty
|
||||
```
|
||||
|
||||
Opens 4 (or 5 with dev-c) tabs in the current kitty window: one for the relay log, one per role. Each role tab launches `claude` in the repo root. Paste the corresponding prompt into each role tab to start the session.
|
||||
|
||||
**`--tmux` (recommended on non-kitty)**
|
||||
|
||||
```bash
|
||||
bash tools/relay/start.sh --tmux
|
||||
```
|
||||
|
||||
Creates a tmux session `relay-lift` with windows `relay`, `pm`, `dev-a`, `dev-b` (and `dev-c` if a fourth prompt is found). Attaches automatically. `Ctrl-b N` to navigate windows. Detach with `Ctrl-b d`.
|
||||
|
||||
**`--manual` (for any terminal)**
|
||||
|
||||
```bash
|
||||
bash tools/relay/start.sh --manual
|
||||
```
|
||||
|
||||
Starts the relay in the current terminal and prints `cat <path>` commands for each role. Open new terminals yourself and paste the printed commands; this is the most flexible mode for unusual setups (split panes, remote sessions, terminal multiplexers other than tmux).
|
||||
|
||||
The launcher uses port **7331**. If it's already in use the script aborts with the kill command — `kill $(lsof -ti:7331)` clears it.
|
||||
|
||||
### Step 3 — Drive the coordination
|
||||
|
||||
The PM session is the entry point. Talk to PM about goals; PM decides who's working on what and posts directives to the dev sessions via the relay. Each dev reads its inbox, executes, and posts back status / questions. The user (you) is mostly a watcher — PM should self-route.
|
||||
|
||||
Common rhythm:
|
||||
|
||||
- **PM at start:** posts a directive to each dev describing the first slice.
|
||||
- **Dev on completion:** posts status with branch / commit / what shipped.
|
||||
- **Dev when blocked:** posts a question; PM unblocks (decision) or escalates to user.
|
||||
- **PM end-of-cycle:** asks each dev for a status, summarizes, decides next slice.
|
||||
|
||||
Message kinds (`MessageKind` in `queue.ts`):
|
||||
|
||||
| Kind | Use when |
|
||||
|------|----------|
|
||||
| `status` | "I shipped X, branch is at Y, ready for next slice" |
|
||||
| `question` | "Should I do A or B? Blocking until I hear back" |
|
||||
| `directive` | PM-to-dev: "Next, do X. Constraints are Y. Acceptance is Z." |
|
||||
| `free` | Anything that doesn't fit the above (FYI, side-channel chatter) |
|
||||
|
||||
Block format inside `body` is freeform markdown. The kickoff prompts include the project's preferred block templates.
|
||||
|
||||
## Fallback — when the MCP client misbehaves
|
||||
|
||||
If a Claude session can't reach the relay's MCP tools (transient SSE hiccup, MCP server failed to register, sandboxed network), use the shim:
|
||||
|
||||
```bash
|
||||
# From any shell, with the relay running on 7331:
|
||||
python3 tools/relay/call.py read_messages '{"for":"pm"}'
|
||||
python3 tools/relay/call.py post_message '{"from":"dev-a","to":"pm","kind":"status","body":"shipped X"}'
|
||||
python3 tools/relay/call.py list_pending '{"for":"dev-b"}'
|
||||
```
|
||||
|
||||
`call.ts` is the same surface in TypeScript (`bun run tools/relay/call.ts ...`) for when you want to script from a TS context. Both shims speak raw MCP over the SSE transport; output is the JSON-RPC response.
|
||||
|
||||
The kickoff prompts reference `call.py` by path — if the in-Claude MCP client breaks mid-session, the dev can fall back to `Bash python3 tools/relay/call.py ...` and keep coordinating without restarting.
|
||||
|
||||
## Where things live
|
||||
|
||||
```
|
||||
docs/superpowers/coordination/
|
||||
├── RELAY.md ← you are here
|
||||
├── <date>-pm-prompt.md generated by /multi-agent-kickoff
|
||||
├── <date>-dev-a-prompt.md
|
||||
├── <date>-dev-b-prompt.md
|
||||
├── <date>-dev-c-prompt.md (optional, 4-role mode)
|
||||
└── archive/ older kickoff sets
|
||||
|
||||
tools/relay/
|
||||
├── start.sh launcher (manual / tmux / kitty)
|
||||
├── server.ts MCP server (HTTP/SSE on :7331)
|
||||
├── queue.ts in-memory per-role FIFO
|
||||
├── queue.test.ts node:test — run with `bun test`
|
||||
├── call.py Python MCP-client shim (fallback)
|
||||
├── call.ts TypeScript MCP-client shim (fallback)
|
||||
├── package.json
|
||||
└── tsconfig.json
|
||||
```
|
||||
|
||||
The launcher's prompt-discovery is `ls -t "$COORD_DIR"/*-<role>-prompt.md | head -1` — newest wins. To switch back to a previous kickoff set, either delete the newer files or move them under `archive/`.
|
||||
|
||||
## Conventions
|
||||
|
||||
- **Roles are fixed strings:** `pm`, `dev-a`, `dev-b`, `dev-c`. Adding a new role means editing `Role` in `queue.ts`, `KNOWN_ROLES`, the `enum` in `server.ts`'s tool schema, and the launcher.
|
||||
- **Worktree per dev:** each dev session works in its own git worktree on its own branch. Subagents must `cd` into the worktree first — the multi-agent-kickoff skill bakes this rule into the dev prompts (subagents have been known to commit to `main` if the worktree cwd is only set in a header).
|
||||
- **Branch naming follows the release train:** `feature/<release>-<dev>-<scope>`. PM owns the merge order; devs do not merge each other's branches.
|
||||
- **No squashing:** the project preserves git history as audit log (per `CLAUDE.md`). Devs commit small and often; PM coordinates rebases at integration time, not before.
|
||||
- **The user is not the router.** PM should issue directives directly to devs via the relay. The user steps in only for cross-stream design decisions or when PM explicitly escalates.
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
- **"port 7331 is already in use"** — another relay is running. `kill $(lsof -ti:7331)`, then re-run `start.sh`.
|
||||
- **Launcher can't find a prompt** (`(none found)` in the printed paths) — `/multi-agent-kickoff` hasn't been run yet, or all generated prompts are under `archive/`. Re-run the skill.
|
||||
- **Dev session committing to `main` instead of its worktree** — its subagent prompts are missing the force-`cd` header. Regenerate the dev prompt via `/multi-agent-kickoff` (the skill bakes in the cd rule) or hand-edit the prompt to start with `cd <worktree-path>`.
|
||||
- **MCP tools don't show up in the Claude session** — restart the session. If it persists, fall back to `call.py`. If `call.py` also fails, check the relay log window for stack traces; the SSE transport sometimes wedges if a client disconnects ungracefully.
|
||||
- **`bun test` failing in `tools/relay/`** — relay tests use `node:test` via bun. Run from `tools/relay/`, not the repo root: `cd tools/relay && bun test`. Extension tests use vitest and live elsewhere; don't conflate.
|
||||
- **One dev session is silent** — check `python3 tools/relay/call.py list_pending '{"for":"<role>"}'` from any shell. If the dev's inbox has unread messages, they may have crashed or detached. Open the role's window and resume.
|
||||
|
||||
## Caveats
|
||||
|
||||
- **In-memory queue is dev-only.** Restart the relay = lose all queued messages. There is no persistence by design — coordination is meant to flow forward, not be replayable.
|
||||
- **No auth.** The relay binds to `localhost:7331` with no token. Don't expose the port; don't run on a shared machine.
|
||||
- **The relay is not a chat history.** `read_messages` drains the inbox. If you need to refer back to what was said, copy-paste into a session note or the PR description; don't expect the relay to remember.
|
||||
- **Context costs scale with session count.** Four parallel Claude sessions burn four context windows. Use this paradigm when the parallel speedup justifies the cost — for sequential work, one session is cheaper.
|
||||
|
||||
## See also
|
||||
|
||||
- `tools/relay/server.ts` — MCP tool definitions (`post_message`, `read_messages`, `list_pending`) and their schemas.
|
||||
- `tools/relay/queue.ts` — `Role` / `MessageKind` types; the canonical per-role-inbox semantics.
|
||||
- `docs/superpowers/coordination/<date>-pm-prompt.md` — the latest PM kickoff (the actual operational instructions PM runs by).
|
||||
- The `multi-agent-kickoff` skill — generates the kickoff prompt set.
|
||||
Reference in New Issue
Block a user