docs(soak): full README + validation checklist
Replaces the Task 31 stub README with complete documentation: quickstart, first-time setup (invite flagging, seeding, smoke), usage examples for all three watch modes, CLI flag reference, env var table, scenario descriptions, error handling summary, test account filtering explanation, and architecture overview. Adds CHECKLIST.md with post-deploy verification, bring-up, scenario, watch mode, failure handling, and staging gate items. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
64
tests/soak/CHECKLIST.md
Normal file
64
tests/soak/CHECKLIST.md
Normal file
@@ -0,0 +1,64 @@
|
||||
# Soak Harness Validation Checklist
|
||||
|
||||
Run after significant changes or before calling the harness implementation complete.
|
||||
|
||||
## Post-deploy schema verification
|
||||
|
||||
Run after the server-side changes deploy to each environment.
|
||||
|
||||
- [ ] Server restarted (docker compose up -d or CI/CD deploy)
|
||||
- [ ] Server logs show `User store schema initialized` after restart
|
||||
- [ ] `\d users_v2` shows `is_test_account` column with default `false`
|
||||
- [ ] `\d invite_codes` shows `marks_as_test` column with default `false`
|
||||
- [ ] `\d leaderboard_overall` shows `is_test_account` column
|
||||
- [ ] `\di idx_users_test_account` shows the partial index
|
||||
- [ ] Leaderboard query still works: `curl .../api/stats/leaderboard` returns entries
|
||||
- [ ] `?include_test=true` parameter is accepted (no 422/500)
|
||||
|
||||
## Bring-up
|
||||
|
||||
- [ ] Invite code flagged with `marks_as_test=TRUE` on target environment
|
||||
- [ ] `bun run seed` creates/updates accounts in `.env.stresstest`
|
||||
- [ ] All seeded users show `is_test_account=TRUE` in the DB
|
||||
|
||||
## Smoke test
|
||||
|
||||
- [ ] `bash scripts/smoke.sh` exits 0 within 60s
|
||||
|
||||
## Scenarios
|
||||
|
||||
- [ ] `--scenario=populate --rooms=1 --games-per-room=1` completes cleanly
|
||||
- [ ] `--scenario=populate --rooms=2 --games-per-room=2` runs multiple rooms and multiple games
|
||||
- [ ] `--scenario=stress --games-per-room=3` logs `chaos_injected` events and completes
|
||||
|
||||
## Watch modes
|
||||
|
||||
- [ ] `--watch=none` produces JSONL on stdout, nothing else
|
||||
- [ ] `--watch=dashboard` opens http://localhost:7777, grid renders, WS shows `healthy`
|
||||
- [ ] Clicking a player tile opens the video modal with live JPEG frames
|
||||
- [ ] Closing the modal (Esc or Close) stops the screencast (check logs for `screencast_stopped`)
|
||||
- [ ] `--watch=tiled` opens native Chromium windows sized to show the full game table
|
||||
|
||||
## Failure handling
|
||||
|
||||
- [ ] Ctrl-C during a run → graceful shutdown, summary printed, exit code 2
|
||||
- [ ] Double Ctrl-C → immediate hard exit (130)
|
||||
- [ ] Health probes detect server down (3 consecutive failures → fatal abort)
|
||||
- [ ] Artifacts directory contains screenshots + state JSON on failure
|
||||
- [ ] Artifacts older than 7 days are pruned on next startup
|
||||
|
||||
## Server-side filtering
|
||||
|
||||
- [ ] `GET /api/stats/leaderboard` (default) hides soak accounts
|
||||
- [ ] `GET /api/stats/leaderboard?include_test=true` shows soak accounts
|
||||
- [ ] Admin panel user list shows `[Test]` badge on soak accounts
|
||||
- [ ] Admin panel invite codes tab shows `[Test-seed]` badge
|
||||
- [ ] "Include test accounts" checkbox toggles visibility in admin
|
||||
|
||||
## Staging bring-up
|
||||
|
||||
- [ ] `5VC2MCCN` flagged with `marks_as_test=TRUE` on staging DB
|
||||
- [ ] 16 accounts seeded via `SOAK_INVITE_CODE=5VC2MCCN bun run seed`
|
||||
- [ ] Populate run against staging completes with `--watch=dashboard`
|
||||
- [ ] Staging leaderboard default does NOT show soak accounts
|
||||
- [ ] Staging leaderboard with `?include_test=true` does show them
|
||||
@@ -1,21 +1,296 @@
|
||||
# Golf Soak & UX Test Harness
|
||||
|
||||
Runs 16 authenticated browser sessions across 4 rooms to populate
|
||||
staging scoreboards and stress-test multiplayer stability.
|
||||
Standalone Playwright-based runner that drives multiple authenticated
|
||||
browser sessions playing real multiplayer games. Used for:
|
||||
|
||||
**Spec:** `docs/superpowers/specs/2026-04-10-multiplayer-soak-test-design.md`
|
||||
**Bring-up:** `docs/soak-harness-bringup.md`
|
||||
- **Scoreboard population** — fill staging leaderboards with realistic data
|
||||
- **Stability stress testing** — hunt race conditions, WebSocket leaks, cleanup bugs
|
||||
- **Live monitoring** — watch bot sessions play in real time via CDP screencast
|
||||
|
||||
## Quick start
|
||||
## Prerequisites
|
||||
|
||||
- [Bun](https://bun.sh/) (or Node.js + npm)
|
||||
- Chromium browser binary (installed via `bunx playwright install chromium`)
|
||||
- A running Golf Card Game server (local dev or staging)
|
||||
- An invite code flagged as `marks_as_test=TRUE` (see [Bring-up](#first-time-setup))
|
||||
|
||||
## First-time setup
|
||||
|
||||
### 1. Install dependencies
|
||||
|
||||
```bash
|
||||
cd tests/soak
|
||||
bun install
|
||||
bun run seed # first run only
|
||||
TEST_URL=http://localhost:8000 bun run smoke
|
||||
bunx playwright install chromium
|
||||
```
|
||||
|
||||
(The scripts also work with `npm run`, `pnpm run`, etc. — bun is what's installed
|
||||
on this dev machine.)
|
||||
### 2. Flag the invite code as test-seed
|
||||
|
||||
Full documentation arrives with Task 31.
|
||||
Any account registered with a test-seed invite gets `is_test_account=TRUE`,
|
||||
which keeps it out of real-user stats and leaderboards.
|
||||
|
||||
**Local dev:**
|
||||
|
||||
```bash
|
||||
PGPASSWORD=devpassword psql -h localhost -U golf -d golf <<'SQL'
|
||||
INSERT INTO invite_codes (code, created_by, expires_at, max_uses, is_active, marks_as_test)
|
||||
SELECT 'SOAKTEST', id, NOW() + INTERVAL '10 years', 100, TRUE, TRUE
|
||||
FROM users_v2 LIMIT 1
|
||||
ON CONFLICT (code) DO UPDATE SET marks_as_test = TRUE;
|
||||
SQL
|
||||
```
|
||||
|
||||
**Staging:**
|
||||
|
||||
```bash
|
||||
ssh root@129.212.150.189 \
|
||||
'docker compose -f /opt/golfgame/docker-compose.staging.yml exec -T postgres psql -U postgres -d golfgame' <<'SQL'
|
||||
UPDATE invite_codes SET marks_as_test = TRUE WHERE code = '5VC2MCCN';
|
||||
SQL
|
||||
```
|
||||
|
||||
### 3. Seed test accounts
|
||||
|
||||
```bash
|
||||
# Local dev
|
||||
TEST_URL=http://localhost:8000 SOAK_INVITE_CODE=SOAKTEST bun run seed
|
||||
|
||||
# Staging
|
||||
TEST_URL=https://staging.adlee.work SOAK_INVITE_CODE=5VC2MCCN bun run seed
|
||||
```
|
||||
|
||||
This registers 16 accounts via the invite code and caches their credentials
|
||||
in `.env.stresstest`. Only needs to run once — subsequent runs reuse the
|
||||
cached credentials (re-logging in if tokens expire).
|
||||
|
||||
### 4. Verify with a smoke test
|
||||
|
||||
```bash
|
||||
# Local dev
|
||||
TEST_URL=http://localhost:8000 SOAK_INVITE_CODE=SOAKTEST bash scripts/smoke.sh
|
||||
```
|
||||
|
||||
Expected: one game plays to completion in ~60 seconds, exits 0.
|
||||
|
||||
## Usage
|
||||
|
||||
### Populate scoreboards (recommended first run)
|
||||
|
||||
```bash
|
||||
TEST_URL=https://staging.adlee.work SOAK_INVITE_CODE=5VC2MCCN bun run soak -- \
|
||||
--scenario=populate \
|
||||
--watch=dashboard
|
||||
```
|
||||
|
||||
This runs 4 rooms x 10 games x 9 holes with varied CPU personalities.
|
||||
The dashboard opens automatically at `http://localhost:7777`.
|
||||
|
||||
### Quick smoke against staging
|
||||
|
||||
```bash
|
||||
TEST_URL=https://staging.adlee.work SOAK_INVITE_CODE=5VC2MCCN bun run soak -- \
|
||||
--scenario=populate \
|
||||
--accounts=2 --rooms=1 --cpus-per-room=0 \
|
||||
--games-per-room=1 --holes=1 \
|
||||
--watch=dashboard
|
||||
```
|
||||
|
||||
### Stress test with chaos injection
|
||||
|
||||
```bash
|
||||
TEST_URL=https://staging.adlee.work SOAK_INVITE_CODE=5VC2MCCN bun run soak -- \
|
||||
--scenario=stress \
|
||||
--accounts=4 --rooms=1 --games-per-room=5 \
|
||||
--watch=dashboard
|
||||
```
|
||||
|
||||
Rapid 1-hole games with random chaos events (rapid clicks, tab blur,
|
||||
brief network outage) injected during gameplay.
|
||||
|
||||
### Headless mode (CI / overnight)
|
||||
|
||||
```bash
|
||||
TEST_URL=https://staging.adlee.work SOAK_INVITE_CODE=5VC2MCCN bun run soak -- \
|
||||
--scenario=populate --watch=none
|
||||
```
|
||||
|
||||
Outputs structured JSONL to stdout. Pipe to `jq` for filtering:
|
||||
|
||||
```bash
|
||||
bun run soak -- --scenario=populate --watch=none 2>&1 | jq 'select(.msg == "game_complete")'
|
||||
```
|
||||
|
||||
### Tiled mode (native browser windows)
|
||||
|
||||
```bash
|
||||
bun run soak -- --scenario=populate --rooms=2 --watch=tiled
|
||||
```
|
||||
|
||||
Opens visible Chromium windows for each room's host session. Useful for
|
||||
hands-on debugging with DevTools.
|
||||
|
||||
## CLI flags
|
||||
|
||||
```
|
||||
--scenario=populate|stress required — which scenario to run
|
||||
--accounts=<n> total sessions (default: from scenario)
|
||||
--rooms=<n> parallel rooms (default: from scenario)
|
||||
--cpus-per-room=<n> CPU opponents per room (default: from scenario)
|
||||
--games-per-room=<n> games per room (default: from scenario)
|
||||
--holes=<n> holes per game (default: from scenario)
|
||||
--watch=none|dashboard|tiled visualization mode (default: dashboard)
|
||||
--dashboard-port=<n> dashboard server port (default: 7777)
|
||||
--target=<url> override TEST_URL env var
|
||||
--run-id=<string> custom run identifier (default: timestamp)
|
||||
--list print available scenarios and exit
|
||||
--dry-run validate config without running
|
||||
```
|
||||
|
||||
`accounts / rooms` must divide evenly.
|
||||
|
||||
## Environment variables
|
||||
|
||||
| Variable | Description | Default |
|
||||
|---|---|---|
|
||||
| `TEST_URL` | Target server base URL | `http://localhost:8000` |
|
||||
| `SOAK_INVITE_CODE` | Invite code for account seeding | `SOAKTEST` |
|
||||
| `SOAK_HOLES` | Override `--holes` | — |
|
||||
| `SOAK_ROOMS` | Override `--rooms` | — |
|
||||
| `SOAK_ACCOUNTS` | Override `--accounts` | — |
|
||||
| `SOAK_CPUS_PER_ROOM` | Override `--cpus-per-room` | — |
|
||||
| `SOAK_GAMES_PER_ROOM` | Override `--games-per-room` | — |
|
||||
| `SOAK_WATCH` | Override `--watch` | — |
|
||||
| `SOAK_DASHBOARD_PORT` | Override `--dashboard-port` | — |
|
||||
|
||||
Config precedence: CLI flags > env vars > scenario defaults.
|
||||
|
||||
## Watch modes
|
||||
|
||||
### `dashboard` (default)
|
||||
|
||||
Opens `http://localhost:7777` with a live status grid:
|
||||
|
||||
- 2x2 room tiles showing phase, current player, move count, progress bar
|
||||
- Activity log at the bottom
|
||||
- **Click any player tile** to watch their live session via CDP screencast
|
||||
- Press Esc or click Close to stop the video feed
|
||||
- WS connection status indicator
|
||||
|
||||
The dashboard runs **locally on your machine** — the runner's headless
|
||||
browsers connect to the target server remotely while the dashboard UI
|
||||
is served from your workstation.
|
||||
|
||||
### `tiled`
|
||||
|
||||
Opens native Chromium windows for each room's host session, positioned
|
||||
in a grid. Joiners stay headless. Useful for interactive debugging with
|
||||
DevTools. The viewport is sized at 960x900 to show the full game table.
|
||||
|
||||
### `none`
|
||||
|
||||
Pure headless, structured JSONL to stdout. Use for CI, overnight runs,
|
||||
or piping to `jq`.
|
||||
|
||||
## Scenarios
|
||||
|
||||
### `populate`
|
||||
|
||||
Long multi-round games to populate scoreboards with realistic data.
|
||||
|
||||
| Setting | Default |
|
||||
|---|---|
|
||||
| Accounts | 16 |
|
||||
| Rooms | 4 |
|
||||
| CPUs per room | 1 |
|
||||
| Games per room | 10 |
|
||||
| Holes | 9 |
|
||||
| Decks | 2 |
|
||||
| Think time | 800-2200ms |
|
||||
|
||||
### `stress`
|
||||
|
||||
Rapid short games with chaos injection for stability testing.
|
||||
|
||||
| Setting | Default |
|
||||
|---|---|
|
||||
| Accounts | 16 |
|
||||
| Rooms | 4 |
|
||||
| CPUs per room | 2 |
|
||||
| Games per room | 50 |
|
||||
| Holes | 1 |
|
||||
| Decks | 1 |
|
||||
| Think time | 50-150ms |
|
||||
| Chaos chance | 5% per turn |
|
||||
|
||||
Chaos events: `rapid_clicks`, `tab_blur`, `brief_offline`
|
||||
|
||||
### Adding new scenarios
|
||||
|
||||
Create `scenarios/<name>.ts` exporting a `Scenario` object, then register
|
||||
it in `scenarios/index.ts`. See existing scenarios for the pattern.
|
||||
|
||||
## Error handling
|
||||
|
||||
- **Per-room isolation**: a failure in one room never unwinds other rooms
|
||||
(`Promise.allSettled`)
|
||||
- **Watchdog**: 60s per-room timeout — fires if no heartbeat arrives
|
||||
- **Health probes**: `GET /health` every 30s, 3 consecutive failures = fatal abort
|
||||
- **Graceful shutdown**: Ctrl-C finishes current turn, then cleans up (10s timeout).
|
||||
Double Ctrl-C = immediate force exit
|
||||
- **Artifacts**: on failure, screenshots + HTML + game state JSON saved to
|
||||
`artifacts/<run-id>/`. Old artifacts auto-pruned after 7 days
|
||||
- **Exit codes**: `0` = success, `1` = errors, `2` = interrupted
|
||||
|
||||
## Test account filtering
|
||||
|
||||
Soak accounts are flagged `is_test_account=TRUE` in the database. They are:
|
||||
|
||||
- **Hidden by default** from public leaderboards and stats (`?include_test=false`)
|
||||
- **Visible to admins** by default in the admin panel
|
||||
- **Togglable** via the "Include test accounts" checkbox in the admin panel
|
||||
- **Badged** with `[Test]` in the admin user list and `[Test-seed]` on the invite code
|
||||
|
||||
## Unit tests
|
||||
|
||||
```bash
|
||||
bun run test
|
||||
```
|
||||
|
||||
27 tests covering Deferred, RoomCoordinator, Watchdog, Logger, and Config.
|
||||
Integration-level modules (SessionPool, scenarios, dashboard) are verified
|
||||
by the smoke test and live runs.
|
||||
|
||||
## Architecture
|
||||
|
||||
```
|
||||
runner.ts CLI entry — parses flags, wires everything, runs scenario
|
||||
core/
|
||||
session-pool.ts Owns browser contexts, seeds/logs in accounts
|
||||
room-coordinator Deferred-based host→joiners room code handoff
|
||||
watchdog.ts Per-room timeout detector
|
||||
screencaster.ts CDP Page.startScreencast for live video
|
||||
logger.ts Structured JSONL logger with child contexts
|
||||
artifacts.ts Screenshot/HTML/state capture on failure
|
||||
types.ts Scenario/Session/Logger contracts
|
||||
scenarios/
|
||||
populate.ts Long multi-round games
|
||||
stress.ts Rapid games with chaos injection
|
||||
shared/
|
||||
multiplayer-game.ts Shared "play one game" loop
|
||||
chaos.ts Chaos event injector
|
||||
dashboard/
|
||||
server.ts HTTP + WS server
|
||||
index.html Status grid UI
|
||||
dashboard.js WS client + click-to-watch
|
||||
scripts/
|
||||
seed-accounts.ts Account seeding CLI
|
||||
smoke.sh End-to-end canary (~60s)
|
||||
```
|
||||
|
||||
Reuses `tests/e2e/bot/golf-bot.ts` unchanged for all game interactions.
|
||||
|
||||
## Related docs
|
||||
|
||||
- [Design spec](../../docs/superpowers/specs/2026-04-10-multiplayer-soak-test-design.md)
|
||||
- [Bring-up steps](../../docs/soak-harness-bringup.md)
|
||||
- [Implementation plan](../../docs/superpowers/plans/2026-04-10-multiplayer-soak-test.md)
|
||||
|
||||
Reference in New Issue
Block a user