docs(soak): full README + validation checklist

Replaces the Task 31 stub README with complete documentation:
quickstart, first-time setup (invite flagging, seeding, smoke),
usage examples for all three watch modes, CLI flag reference, env
var table, scenario descriptions, error handling summary, test
account filtering explanation, and architecture overview.

Adds CHECKLIST.md with post-deploy verification, bring-up,
scenario, watch mode, failure handling, and staging gate items.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
adlee-was-taken
2026-04-11 23:05:22 -04:00
parent b9cc7d29cf
commit d5194f43ba
2 changed files with 349 additions and 10 deletions

View File

@@ -1,21 +1,296 @@
# Golf Soak & UX Test Harness
Runs 16 authenticated browser sessions across 4 rooms to populate
staging scoreboards and stress-test multiplayer stability.
Standalone Playwright-based runner that drives multiple authenticated
browser sessions playing real multiplayer games. Used for:
**Spec:** `docs/superpowers/specs/2026-04-10-multiplayer-soak-test-design.md`
**Bring-up:** `docs/soak-harness-bringup.md`
- **Scoreboard population** — fill staging leaderboards with realistic data
- **Stability stress testing** — hunt race conditions, WebSocket leaks, cleanup bugs
- **Live monitoring** — watch bot sessions play in real time via CDP screencast
## Quick start
## Prerequisites
- [Bun](https://bun.sh/) (or Node.js + npm)
- Chromium browser binary (installed via `bunx playwright install chromium`)
- A running Golf Card Game server (local dev or staging)
- An invite code flagged as `marks_as_test=TRUE` (see [Bring-up](#first-time-setup))
## First-time setup
### 1. Install dependencies
```bash
cd tests/soak
bun install
bun run seed # first run only
TEST_URL=http://localhost:8000 bun run smoke
bunx playwright install chromium
```
(The scripts also work with `npm run`, `pnpm run`, etc. — bun is what's installed
on this dev machine.)
### 2. Flag the invite code as test-seed
Full documentation arrives with Task 31.
Any account registered with a test-seed invite gets `is_test_account=TRUE`,
which keeps it out of real-user stats and leaderboards.
**Local dev:**
```bash
PGPASSWORD=devpassword psql -h localhost -U golf -d golf <<'SQL'
INSERT INTO invite_codes (code, created_by, expires_at, max_uses, is_active, marks_as_test)
SELECT 'SOAKTEST', id, NOW() + INTERVAL '10 years', 100, TRUE, TRUE
FROM users_v2 LIMIT 1
ON CONFLICT (code) DO UPDATE SET marks_as_test = TRUE;
SQL
```
**Staging:**
```bash
ssh root@129.212.150.189 \
'docker compose -f /opt/golfgame/docker-compose.staging.yml exec -T postgres psql -U postgres -d golfgame' <<'SQL'
UPDATE invite_codes SET marks_as_test = TRUE WHERE code = '5VC2MCCN';
SQL
```
### 3. Seed test accounts
```bash
# Local dev
TEST_URL=http://localhost:8000 SOAK_INVITE_CODE=SOAKTEST bun run seed
# Staging
TEST_URL=https://staging.adlee.work SOAK_INVITE_CODE=5VC2MCCN bun run seed
```
This registers 16 accounts via the invite code and caches their credentials
in `.env.stresstest`. Only needs to run once — subsequent runs reuse the
cached credentials (re-logging in if tokens expire).
### 4. Verify with a smoke test
```bash
# Local dev
TEST_URL=http://localhost:8000 SOAK_INVITE_CODE=SOAKTEST bash scripts/smoke.sh
```
Expected: one game plays to completion in ~60 seconds, exits 0.
## Usage
### Populate scoreboards (recommended first run)
```bash
TEST_URL=https://staging.adlee.work SOAK_INVITE_CODE=5VC2MCCN bun run soak -- \
--scenario=populate \
--watch=dashboard
```
This runs 4 rooms x 10 games x 9 holes with varied CPU personalities.
The dashboard opens automatically at `http://localhost:7777`.
### Quick smoke against staging
```bash
TEST_URL=https://staging.adlee.work SOAK_INVITE_CODE=5VC2MCCN bun run soak -- \
--scenario=populate \
--accounts=2 --rooms=1 --cpus-per-room=0 \
--games-per-room=1 --holes=1 \
--watch=dashboard
```
### Stress test with chaos injection
```bash
TEST_URL=https://staging.adlee.work SOAK_INVITE_CODE=5VC2MCCN bun run soak -- \
--scenario=stress \
--accounts=4 --rooms=1 --games-per-room=5 \
--watch=dashboard
```
Rapid 1-hole games with random chaos events (rapid clicks, tab blur,
brief network outage) injected during gameplay.
### Headless mode (CI / overnight)
```bash
TEST_URL=https://staging.adlee.work SOAK_INVITE_CODE=5VC2MCCN bun run soak -- \
--scenario=populate --watch=none
```
Outputs structured JSONL to stdout. Pipe to `jq` for filtering:
```bash
bun run soak -- --scenario=populate --watch=none 2>&1 | jq 'select(.msg == "game_complete")'
```
### Tiled mode (native browser windows)
```bash
bun run soak -- --scenario=populate --rooms=2 --watch=tiled
```
Opens visible Chromium windows for each room's host session. Useful for
hands-on debugging with DevTools.
## CLI flags
```
--scenario=populate|stress required — which scenario to run
--accounts=<n> total sessions (default: from scenario)
--rooms=<n> parallel rooms (default: from scenario)
--cpus-per-room=<n> CPU opponents per room (default: from scenario)
--games-per-room=<n> games per room (default: from scenario)
--holes=<n> holes per game (default: from scenario)
--watch=none|dashboard|tiled visualization mode (default: dashboard)
--dashboard-port=<n> dashboard server port (default: 7777)
--target=<url> override TEST_URL env var
--run-id=<string> custom run identifier (default: timestamp)
--list print available scenarios and exit
--dry-run validate config without running
```
`accounts / rooms` must divide evenly.
## Environment variables
| Variable | Description | Default |
|---|---|---|
| `TEST_URL` | Target server base URL | `http://localhost:8000` |
| `SOAK_INVITE_CODE` | Invite code for account seeding | `SOAKTEST` |
| `SOAK_HOLES` | Override `--holes` | — |
| `SOAK_ROOMS` | Override `--rooms` | — |
| `SOAK_ACCOUNTS` | Override `--accounts` | — |
| `SOAK_CPUS_PER_ROOM` | Override `--cpus-per-room` | — |
| `SOAK_GAMES_PER_ROOM` | Override `--games-per-room` | — |
| `SOAK_WATCH` | Override `--watch` | — |
| `SOAK_DASHBOARD_PORT` | Override `--dashboard-port` | — |
Config precedence: CLI flags > env vars > scenario defaults.
## Watch modes
### `dashboard` (default)
Opens `http://localhost:7777` with a live status grid:
- 2x2 room tiles showing phase, current player, move count, progress bar
- Activity log at the bottom
- **Click any player tile** to watch their live session via CDP screencast
- Press Esc or click Close to stop the video feed
- WS connection status indicator
The dashboard runs **locally on your machine** — the runner's headless
browsers connect to the target server remotely while the dashboard UI
is served from your workstation.
### `tiled`
Opens native Chromium windows for each room's host session, positioned
in a grid. Joiners stay headless. Useful for interactive debugging with
DevTools. The viewport is sized at 960x900 to show the full game table.
### `none`
Pure headless, structured JSONL to stdout. Use for CI, overnight runs,
or piping to `jq`.
## Scenarios
### `populate`
Long multi-round games to populate scoreboards with realistic data.
| Setting | Default |
|---|---|
| Accounts | 16 |
| Rooms | 4 |
| CPUs per room | 1 |
| Games per room | 10 |
| Holes | 9 |
| Decks | 2 |
| Think time | 800-2200ms |
### `stress`
Rapid short games with chaos injection for stability testing.
| Setting | Default |
|---|---|
| Accounts | 16 |
| Rooms | 4 |
| CPUs per room | 2 |
| Games per room | 50 |
| Holes | 1 |
| Decks | 1 |
| Think time | 50-150ms |
| Chaos chance | 5% per turn |
Chaos events: `rapid_clicks`, `tab_blur`, `brief_offline`
### Adding new scenarios
Create `scenarios/<name>.ts` exporting a `Scenario` object, then register
it in `scenarios/index.ts`. See existing scenarios for the pattern.
## Error handling
- **Per-room isolation**: a failure in one room never unwinds other rooms
(`Promise.allSettled`)
- **Watchdog**: 60s per-room timeout — fires if no heartbeat arrives
- **Health probes**: `GET /health` every 30s, 3 consecutive failures = fatal abort
- **Graceful shutdown**: Ctrl-C finishes current turn, then cleans up (10s timeout).
Double Ctrl-C = immediate force exit
- **Artifacts**: on failure, screenshots + HTML + game state JSON saved to
`artifacts/<run-id>/`. Old artifacts auto-pruned after 7 days
- **Exit codes**: `0` = success, `1` = errors, `2` = interrupted
## Test account filtering
Soak accounts are flagged `is_test_account=TRUE` in the database. They are:
- **Hidden by default** from public leaderboards and stats (`?include_test=false`)
- **Visible to admins** by default in the admin panel
- **Togglable** via the "Include test accounts" checkbox in the admin panel
- **Badged** with `[Test]` in the admin user list and `[Test-seed]` on the invite code
## Unit tests
```bash
bun run test
```
27 tests covering Deferred, RoomCoordinator, Watchdog, Logger, and Config.
Integration-level modules (SessionPool, scenarios, dashboard) are verified
by the smoke test and live runs.
## Architecture
```
runner.ts CLI entry — parses flags, wires everything, runs scenario
core/
session-pool.ts Owns browser contexts, seeds/logs in accounts
room-coordinator Deferred-based host→joiners room code handoff
watchdog.ts Per-room timeout detector
screencaster.ts CDP Page.startScreencast for live video
logger.ts Structured JSONL logger with child contexts
artifacts.ts Screenshot/HTML/state capture on failure
types.ts Scenario/Session/Logger contracts
scenarios/
populate.ts Long multi-round games
stress.ts Rapid games with chaos injection
shared/
multiplayer-game.ts Shared "play one game" loop
chaos.ts Chaos event injector
dashboard/
server.ts HTTP + WS server
index.html Status grid UI
dashboard.js WS client + click-to-watch
scripts/
seed-accounts.ts Account seeding CLI
smoke.sh End-to-end canary (~60s)
```
Reuses `tests/e2e/bot/golf-bot.ts` unchanged for all game interactions.
## Related docs
- [Design spec](../../docs/superpowers/specs/2026-04-10-multiplayer-soak-test-design.md)
- [Bring-up steps](../../docs/soak-harness-bringup.md)
- [Implementation plan](../../docs/superpowers/plans/2026-04-10-multiplayer-soak-test.md)