docs(soak): full README + validation checklist
Replaces the Task 31 stub README with complete documentation: quickstart, first-time setup (invite flagging, seeding, smoke), usage examples for all three watch modes, CLI flag reference, env var table, scenario descriptions, error handling summary, test account filtering explanation, and architecture overview. Adds CHECKLIST.md with post-deploy verification, bring-up, scenario, watch mode, failure handling, and staging gate items. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -1,21 +1,296 @@
|
||||
# Golf Soak & UX Test Harness
|
||||
|
||||
Runs 16 authenticated browser sessions across 4 rooms to populate
|
||||
staging scoreboards and stress-test multiplayer stability.
|
||||
Standalone Playwright-based runner that drives multiple authenticated
|
||||
browser sessions playing real multiplayer games. Used for:
|
||||
|
||||
**Spec:** `docs/superpowers/specs/2026-04-10-multiplayer-soak-test-design.md`
|
||||
**Bring-up:** `docs/soak-harness-bringup.md`
|
||||
- **Scoreboard population** — fill staging leaderboards with realistic data
|
||||
- **Stability stress testing** — hunt race conditions, WebSocket leaks, cleanup bugs
|
||||
- **Live monitoring** — watch bot sessions play in real time via CDP screencast
|
||||
|
||||
## Quick start
|
||||
## Prerequisites
|
||||
|
||||
- [Bun](https://bun.sh/) (or Node.js + npm)
|
||||
- Chromium browser binary (installed via `bunx playwright install chromium`)
|
||||
- A running Golf Card Game server (local dev or staging)
|
||||
- An invite code flagged as `marks_as_test=TRUE` (see [Bring-up](#first-time-setup))
|
||||
|
||||
## First-time setup
|
||||
|
||||
### 1. Install dependencies
|
||||
|
||||
```bash
|
||||
cd tests/soak
|
||||
bun install
|
||||
bun run seed # first run only
|
||||
TEST_URL=http://localhost:8000 bun run smoke
|
||||
bunx playwright install chromium
|
||||
```
|
||||
|
||||
(The scripts also work with `npm run`, `pnpm run`, etc. — bun is what's installed
|
||||
on this dev machine.)
|
||||
### 2. Flag the invite code as test-seed
|
||||
|
||||
Full documentation arrives with Task 31.
|
||||
Any account registered with a test-seed invite gets `is_test_account=TRUE`,
|
||||
which keeps it out of real-user stats and leaderboards.
|
||||
|
||||
**Local dev:**
|
||||
|
||||
```bash
|
||||
PGPASSWORD=devpassword psql -h localhost -U golf -d golf <<'SQL'
|
||||
INSERT INTO invite_codes (code, created_by, expires_at, max_uses, is_active, marks_as_test)
|
||||
SELECT 'SOAKTEST', id, NOW() + INTERVAL '10 years', 100, TRUE, TRUE
|
||||
FROM users_v2 LIMIT 1
|
||||
ON CONFLICT (code) DO UPDATE SET marks_as_test = TRUE;
|
||||
SQL
|
||||
```
|
||||
|
||||
**Staging:**
|
||||
|
||||
```bash
|
||||
ssh root@129.212.150.189 \
|
||||
'docker compose -f /opt/golfgame/docker-compose.staging.yml exec -T postgres psql -U postgres -d golfgame' <<'SQL'
|
||||
UPDATE invite_codes SET marks_as_test = TRUE WHERE code = '5VC2MCCN';
|
||||
SQL
|
||||
```
|
||||
|
||||
### 3. Seed test accounts
|
||||
|
||||
```bash
|
||||
# Local dev
|
||||
TEST_URL=http://localhost:8000 SOAK_INVITE_CODE=SOAKTEST bun run seed
|
||||
|
||||
# Staging
|
||||
TEST_URL=https://staging.adlee.work SOAK_INVITE_CODE=5VC2MCCN bun run seed
|
||||
```
|
||||
|
||||
This registers 16 accounts via the invite code and caches their credentials
|
||||
in `.env.stresstest`. Only needs to run once — subsequent runs reuse the
|
||||
cached credentials (re-logging in if tokens expire).
|
||||
|
||||
### 4. Verify with a smoke test
|
||||
|
||||
```bash
|
||||
# Local dev
|
||||
TEST_URL=http://localhost:8000 SOAK_INVITE_CODE=SOAKTEST bash scripts/smoke.sh
|
||||
```
|
||||
|
||||
Expected: one game plays to completion in ~60 seconds, exits 0.
|
||||
|
||||
## Usage
|
||||
|
||||
### Populate scoreboards (recommended first run)
|
||||
|
||||
```bash
|
||||
TEST_URL=https://staging.adlee.work SOAK_INVITE_CODE=5VC2MCCN bun run soak -- \
|
||||
--scenario=populate \
|
||||
--watch=dashboard
|
||||
```
|
||||
|
||||
This runs 4 rooms x 10 games x 9 holes with varied CPU personalities.
|
||||
The dashboard opens automatically at `http://localhost:7777`.
|
||||
|
||||
### Quick smoke against staging
|
||||
|
||||
```bash
|
||||
TEST_URL=https://staging.adlee.work SOAK_INVITE_CODE=5VC2MCCN bun run soak -- \
|
||||
--scenario=populate \
|
||||
--accounts=2 --rooms=1 --cpus-per-room=0 \
|
||||
--games-per-room=1 --holes=1 \
|
||||
--watch=dashboard
|
||||
```
|
||||
|
||||
### Stress test with chaos injection
|
||||
|
||||
```bash
|
||||
TEST_URL=https://staging.adlee.work SOAK_INVITE_CODE=5VC2MCCN bun run soak -- \
|
||||
--scenario=stress \
|
||||
--accounts=4 --rooms=1 --games-per-room=5 \
|
||||
--watch=dashboard
|
||||
```
|
||||
|
||||
Rapid 1-hole games with random chaos events (rapid clicks, tab blur,
|
||||
brief network outage) injected during gameplay.
|
||||
|
||||
### Headless mode (CI / overnight)
|
||||
|
||||
```bash
|
||||
TEST_URL=https://staging.adlee.work SOAK_INVITE_CODE=5VC2MCCN bun run soak -- \
|
||||
--scenario=populate --watch=none
|
||||
```
|
||||
|
||||
Outputs structured JSONL to stdout. Pipe to `jq` for filtering:
|
||||
|
||||
```bash
|
||||
bun run soak -- --scenario=populate --watch=none 2>&1 | jq 'select(.msg == "game_complete")'
|
||||
```
|
||||
|
||||
### Tiled mode (native browser windows)
|
||||
|
||||
```bash
|
||||
bun run soak -- --scenario=populate --rooms=2 --watch=tiled
|
||||
```
|
||||
|
||||
Opens visible Chromium windows for each room's host session. Useful for
|
||||
hands-on debugging with DevTools.
|
||||
|
||||
## CLI flags
|
||||
|
||||
```
|
||||
--scenario=populate|stress required — which scenario to run
|
||||
--accounts=<n> total sessions (default: from scenario)
|
||||
--rooms=<n> parallel rooms (default: from scenario)
|
||||
--cpus-per-room=<n> CPU opponents per room (default: from scenario)
|
||||
--games-per-room=<n> games per room (default: from scenario)
|
||||
--holes=<n> holes per game (default: from scenario)
|
||||
--watch=none|dashboard|tiled visualization mode (default: dashboard)
|
||||
--dashboard-port=<n> dashboard server port (default: 7777)
|
||||
--target=<url> override TEST_URL env var
|
||||
--run-id=<string> custom run identifier (default: timestamp)
|
||||
--list print available scenarios and exit
|
||||
--dry-run validate config without running
|
||||
```
|
||||
|
||||
`accounts / rooms` must divide evenly.
|
||||
|
||||
## Environment variables
|
||||
|
||||
| Variable | Description | Default |
|
||||
|---|---|---|
|
||||
| `TEST_URL` | Target server base URL | `http://localhost:8000` |
|
||||
| `SOAK_INVITE_CODE` | Invite code for account seeding | `SOAKTEST` |
|
||||
| `SOAK_HOLES` | Override `--holes` | — |
|
||||
| `SOAK_ROOMS` | Override `--rooms` | — |
|
||||
| `SOAK_ACCOUNTS` | Override `--accounts` | — |
|
||||
| `SOAK_CPUS_PER_ROOM` | Override `--cpus-per-room` | — |
|
||||
| `SOAK_GAMES_PER_ROOM` | Override `--games-per-room` | — |
|
||||
| `SOAK_WATCH` | Override `--watch` | — |
|
||||
| `SOAK_DASHBOARD_PORT` | Override `--dashboard-port` | — |
|
||||
|
||||
Config precedence: CLI flags > env vars > scenario defaults.
|
||||
|
||||
## Watch modes
|
||||
|
||||
### `dashboard` (default)
|
||||
|
||||
Opens `http://localhost:7777` with a live status grid:
|
||||
|
||||
- 2x2 room tiles showing phase, current player, move count, progress bar
|
||||
- Activity log at the bottom
|
||||
- **Click any player tile** to watch their live session via CDP screencast
|
||||
- Press Esc or click Close to stop the video feed
|
||||
- WS connection status indicator
|
||||
|
||||
The dashboard runs **locally on your machine** — the runner's headless
|
||||
browsers connect to the target server remotely while the dashboard UI
|
||||
is served from your workstation.
|
||||
|
||||
### `tiled`
|
||||
|
||||
Opens native Chromium windows for each room's host session, positioned
|
||||
in a grid. Joiners stay headless. Useful for interactive debugging with
|
||||
DevTools. The viewport is sized at 960x900 to show the full game table.
|
||||
|
||||
### `none`
|
||||
|
||||
Pure headless, structured JSONL to stdout. Use for CI, overnight runs,
|
||||
or piping to `jq`.
|
||||
|
||||
## Scenarios
|
||||
|
||||
### `populate`
|
||||
|
||||
Long multi-round games to populate scoreboards with realistic data.
|
||||
|
||||
| Setting | Default |
|
||||
|---|---|
|
||||
| Accounts | 16 |
|
||||
| Rooms | 4 |
|
||||
| CPUs per room | 1 |
|
||||
| Games per room | 10 |
|
||||
| Holes | 9 |
|
||||
| Decks | 2 |
|
||||
| Think time | 800-2200ms |
|
||||
|
||||
### `stress`
|
||||
|
||||
Rapid short games with chaos injection for stability testing.
|
||||
|
||||
| Setting | Default |
|
||||
|---|---|
|
||||
| Accounts | 16 |
|
||||
| Rooms | 4 |
|
||||
| CPUs per room | 2 |
|
||||
| Games per room | 50 |
|
||||
| Holes | 1 |
|
||||
| Decks | 1 |
|
||||
| Think time | 50-150ms |
|
||||
| Chaos chance | 5% per turn |
|
||||
|
||||
Chaos events: `rapid_clicks`, `tab_blur`, `brief_offline`
|
||||
|
||||
### Adding new scenarios
|
||||
|
||||
Create `scenarios/<name>.ts` exporting a `Scenario` object, then register
|
||||
it in `scenarios/index.ts`. See existing scenarios for the pattern.
|
||||
|
||||
## Error handling
|
||||
|
||||
- **Per-room isolation**: a failure in one room never unwinds other rooms
|
||||
(`Promise.allSettled`)
|
||||
- **Watchdog**: 60s per-room timeout — fires if no heartbeat arrives
|
||||
- **Health probes**: `GET /health` every 30s, 3 consecutive failures = fatal abort
|
||||
- **Graceful shutdown**: Ctrl-C finishes current turn, then cleans up (10s timeout).
|
||||
Double Ctrl-C = immediate force exit
|
||||
- **Artifacts**: on failure, screenshots + HTML + game state JSON saved to
|
||||
`artifacts/<run-id>/`. Old artifacts auto-pruned after 7 days
|
||||
- **Exit codes**: `0` = success, `1` = errors, `2` = interrupted
|
||||
|
||||
## Test account filtering
|
||||
|
||||
Soak accounts are flagged `is_test_account=TRUE` in the database. They are:
|
||||
|
||||
- **Hidden by default** from public leaderboards and stats (`?include_test=false`)
|
||||
- **Visible to admins** by default in the admin panel
|
||||
- **Togglable** via the "Include test accounts" checkbox in the admin panel
|
||||
- **Badged** with `[Test]` in the admin user list and `[Test-seed]` on the invite code
|
||||
|
||||
## Unit tests
|
||||
|
||||
```bash
|
||||
bun run test
|
||||
```
|
||||
|
||||
27 tests covering Deferred, RoomCoordinator, Watchdog, Logger, and Config.
|
||||
Integration-level modules (SessionPool, scenarios, dashboard) are verified
|
||||
by the smoke test and live runs.
|
||||
|
||||
## Architecture
|
||||
|
||||
```
|
||||
runner.ts CLI entry — parses flags, wires everything, runs scenario
|
||||
core/
|
||||
session-pool.ts Owns browser contexts, seeds/logs in accounts
|
||||
room-coordinator Deferred-based host→joiners room code handoff
|
||||
watchdog.ts Per-room timeout detector
|
||||
screencaster.ts CDP Page.startScreencast for live video
|
||||
logger.ts Structured JSONL logger with child contexts
|
||||
artifacts.ts Screenshot/HTML/state capture on failure
|
||||
types.ts Scenario/Session/Logger contracts
|
||||
scenarios/
|
||||
populate.ts Long multi-round games
|
||||
stress.ts Rapid games with chaos injection
|
||||
shared/
|
||||
multiplayer-game.ts Shared "play one game" loop
|
||||
chaos.ts Chaos event injector
|
||||
dashboard/
|
||||
server.ts HTTP + WS server
|
||||
index.html Status grid UI
|
||||
dashboard.js WS client + click-to-watch
|
||||
scripts/
|
||||
seed-accounts.ts Account seeding CLI
|
||||
smoke.sh End-to-end canary (~60s)
|
||||
```
|
||||
|
||||
Reuses `tests/e2e/bot/golf-bot.ts` unchanged for all game interactions.
|
||||
|
||||
## Related docs
|
||||
|
||||
- [Design spec](../../docs/superpowers/specs/2026-04-10-multiplayer-soak-test-design.md)
|
||||
- [Bring-up steps](../../docs/soak-harness-bringup.md)
|
||||
- [Implementation plan](../../docs/superpowers/plans/2026-04-10-multiplayer-soak-test.md)
|
||||
|
||||
Reference in New Issue
Block a user