diff --git a/tests/soak/CHECKLIST.md b/tests/soak/CHECKLIST.md new file mode 100644 index 0000000..ec17fe1 --- /dev/null +++ b/tests/soak/CHECKLIST.md @@ -0,0 +1,64 @@ +# Soak Harness Validation Checklist + +Run after significant changes or before calling the harness implementation complete. + +## Post-deploy schema verification + +Run after the server-side changes deploy to each environment. + +- [ ] Server restarted (docker compose up -d or CI/CD deploy) +- [ ] Server logs show `User store schema initialized` after restart +- [ ] `\d users_v2` shows `is_test_account` column with default `false` +- [ ] `\d invite_codes` shows `marks_as_test` column with default `false` +- [ ] `\d leaderboard_overall` shows `is_test_account` column +- [ ] `\di idx_users_test_account` shows the partial index +- [ ] Leaderboard query still works: `curl .../api/stats/leaderboard` returns entries +- [ ] `?include_test=true` parameter is accepted (no 422/500) + +## Bring-up + +- [ ] Invite code flagged with `marks_as_test=TRUE` on target environment +- [ ] `bun run seed` creates/updates accounts in `.env.stresstest` +- [ ] All seeded users show `is_test_account=TRUE` in the DB + +## Smoke test + +- [ ] `bash scripts/smoke.sh` exits 0 within 60s + +## Scenarios + +- [ ] `--scenario=populate --rooms=1 --games-per-room=1` completes cleanly +- [ ] `--scenario=populate --rooms=2 --games-per-room=2` runs multiple rooms and multiple games +- [ ] `--scenario=stress --games-per-room=3` logs `chaos_injected` events and completes + +## Watch modes + +- [ ] `--watch=none` produces JSONL on stdout, nothing else +- [ ] `--watch=dashboard` opens http://localhost:7777, grid renders, WS shows `healthy` +- [ ] Clicking a player tile opens the video modal with live JPEG frames +- [ ] Closing the modal (Esc or Close) stops the screencast (check logs for `screencast_stopped`) +- [ ] `--watch=tiled` opens native Chromium windows sized to show the full game table + +## Failure handling + +- [ ] Ctrl-C during a run → graceful shutdown, summary printed, exit code 2 +- [ ] Double Ctrl-C → immediate hard exit (130) +- [ ] Health probes detect server down (3 consecutive failures → fatal abort) +- [ ] Artifacts directory contains screenshots + state JSON on failure +- [ ] Artifacts older than 7 days are pruned on next startup + +## Server-side filtering + +- [ ] `GET /api/stats/leaderboard` (default) hides soak accounts +- [ ] `GET /api/stats/leaderboard?include_test=true` shows soak accounts +- [ ] Admin panel user list shows `[Test]` badge on soak accounts +- [ ] Admin panel invite codes tab shows `[Test-seed]` badge +- [ ] "Include test accounts" checkbox toggles visibility in admin + +## Staging bring-up + +- [ ] `5VC2MCCN` flagged with `marks_as_test=TRUE` on staging DB +- [ ] 16 accounts seeded via `SOAK_INVITE_CODE=5VC2MCCN bun run seed` +- [ ] Populate run against staging completes with `--watch=dashboard` +- [ ] Staging leaderboard default does NOT show soak accounts +- [ ] Staging leaderboard with `?include_test=true` does show them diff --git a/tests/soak/README.md b/tests/soak/README.md index d1fb8f9..a79459f 100644 --- a/tests/soak/README.md +++ b/tests/soak/README.md @@ -1,21 +1,296 @@ # Golf Soak & UX Test Harness -Runs 16 authenticated browser sessions across 4 rooms to populate -staging scoreboards and stress-test multiplayer stability. +Standalone Playwright-based runner that drives multiple authenticated +browser sessions playing real multiplayer games. Used for: -**Spec:** `docs/superpowers/specs/2026-04-10-multiplayer-soak-test-design.md` -**Bring-up:** `docs/soak-harness-bringup.md` +- **Scoreboard population** — fill staging leaderboards with realistic data +- **Stability stress testing** — hunt race conditions, WebSocket leaks, cleanup bugs +- **Live monitoring** — watch bot sessions play in real time via CDP screencast -## Quick start +## Prerequisites + +- [Bun](https://bun.sh/) (or Node.js + npm) +- Chromium browser binary (installed via `bunx playwright install chromium`) +- A running Golf Card Game server (local dev or staging) +- An invite code flagged as `marks_as_test=TRUE` (see [Bring-up](#first-time-setup)) + +## First-time setup + +### 1. Install dependencies ```bash cd tests/soak bun install -bun run seed # first run only -TEST_URL=http://localhost:8000 bun run smoke +bunx playwright install chromium ``` -(The scripts also work with `npm run`, `pnpm run`, etc. — bun is what's installed -on this dev machine.) +### 2. Flag the invite code as test-seed -Full documentation arrives with Task 31. +Any account registered with a test-seed invite gets `is_test_account=TRUE`, +which keeps it out of real-user stats and leaderboards. + +**Local dev:** + +```bash +PGPASSWORD=devpassword psql -h localhost -U golf -d golf <<'SQL' +INSERT INTO invite_codes (code, created_by, expires_at, max_uses, is_active, marks_as_test) +SELECT 'SOAKTEST', id, NOW() + INTERVAL '10 years', 100, TRUE, TRUE +FROM users_v2 LIMIT 1 +ON CONFLICT (code) DO UPDATE SET marks_as_test = TRUE; +SQL +``` + +**Staging:** + +```bash +ssh root@129.212.150.189 \ + 'docker compose -f /opt/golfgame/docker-compose.staging.yml exec -T postgres psql -U postgres -d golfgame' <<'SQL' +UPDATE invite_codes SET marks_as_test = TRUE WHERE code = '5VC2MCCN'; +SQL +``` + +### 3. Seed test accounts + +```bash +# Local dev +TEST_URL=http://localhost:8000 SOAK_INVITE_CODE=SOAKTEST bun run seed + +# Staging +TEST_URL=https://staging.adlee.work SOAK_INVITE_CODE=5VC2MCCN bun run seed +``` + +This registers 16 accounts via the invite code and caches their credentials +in `.env.stresstest`. Only needs to run once — subsequent runs reuse the +cached credentials (re-logging in if tokens expire). + +### 4. Verify with a smoke test + +```bash +# Local dev +TEST_URL=http://localhost:8000 SOAK_INVITE_CODE=SOAKTEST bash scripts/smoke.sh +``` + +Expected: one game plays to completion in ~60 seconds, exits 0. + +## Usage + +### Populate scoreboards (recommended first run) + +```bash +TEST_URL=https://staging.adlee.work SOAK_INVITE_CODE=5VC2MCCN bun run soak -- \ + --scenario=populate \ + --watch=dashboard +``` + +This runs 4 rooms x 10 games x 9 holes with varied CPU personalities. +The dashboard opens automatically at `http://localhost:7777`. + +### Quick smoke against staging + +```bash +TEST_URL=https://staging.adlee.work SOAK_INVITE_CODE=5VC2MCCN bun run soak -- \ + --scenario=populate \ + --accounts=2 --rooms=1 --cpus-per-room=0 \ + --games-per-room=1 --holes=1 \ + --watch=dashboard +``` + +### Stress test with chaos injection + +```bash +TEST_URL=https://staging.adlee.work SOAK_INVITE_CODE=5VC2MCCN bun run soak -- \ + --scenario=stress \ + --accounts=4 --rooms=1 --games-per-room=5 \ + --watch=dashboard +``` + +Rapid 1-hole games with random chaos events (rapid clicks, tab blur, +brief network outage) injected during gameplay. + +### Headless mode (CI / overnight) + +```bash +TEST_URL=https://staging.adlee.work SOAK_INVITE_CODE=5VC2MCCN bun run soak -- \ + --scenario=populate --watch=none +``` + +Outputs structured JSONL to stdout. Pipe to `jq` for filtering: + +```bash +bun run soak -- --scenario=populate --watch=none 2>&1 | jq 'select(.msg == "game_complete")' +``` + +### Tiled mode (native browser windows) + +```bash +bun run soak -- --scenario=populate --rooms=2 --watch=tiled +``` + +Opens visible Chromium windows for each room's host session. Useful for +hands-on debugging with DevTools. + +## CLI flags + +``` +--scenario=populate|stress required — which scenario to run +--accounts= total sessions (default: from scenario) +--rooms= parallel rooms (default: from scenario) +--cpus-per-room= CPU opponents per room (default: from scenario) +--games-per-room= games per room (default: from scenario) +--holes= holes per game (default: from scenario) +--watch=none|dashboard|tiled visualization mode (default: dashboard) +--dashboard-port= dashboard server port (default: 7777) +--target= override TEST_URL env var +--run-id= custom run identifier (default: timestamp) +--list print available scenarios and exit +--dry-run validate config without running +``` + +`accounts / rooms` must divide evenly. + +## Environment variables + +| Variable | Description | Default | +|---|---|---| +| `TEST_URL` | Target server base URL | `http://localhost:8000` | +| `SOAK_INVITE_CODE` | Invite code for account seeding | `SOAKTEST` | +| `SOAK_HOLES` | Override `--holes` | — | +| `SOAK_ROOMS` | Override `--rooms` | — | +| `SOAK_ACCOUNTS` | Override `--accounts` | — | +| `SOAK_CPUS_PER_ROOM` | Override `--cpus-per-room` | — | +| `SOAK_GAMES_PER_ROOM` | Override `--games-per-room` | — | +| `SOAK_WATCH` | Override `--watch` | — | +| `SOAK_DASHBOARD_PORT` | Override `--dashboard-port` | — | + +Config precedence: CLI flags > env vars > scenario defaults. + +## Watch modes + +### `dashboard` (default) + +Opens `http://localhost:7777` with a live status grid: + +- 2x2 room tiles showing phase, current player, move count, progress bar +- Activity log at the bottom +- **Click any player tile** to watch their live session via CDP screencast +- Press Esc or click Close to stop the video feed +- WS connection status indicator + +The dashboard runs **locally on your machine** — the runner's headless +browsers connect to the target server remotely while the dashboard UI +is served from your workstation. + +### `tiled` + +Opens native Chromium windows for each room's host session, positioned +in a grid. Joiners stay headless. Useful for interactive debugging with +DevTools. The viewport is sized at 960x900 to show the full game table. + +### `none` + +Pure headless, structured JSONL to stdout. Use for CI, overnight runs, +or piping to `jq`. + +## Scenarios + +### `populate` + +Long multi-round games to populate scoreboards with realistic data. + +| Setting | Default | +|---|---| +| Accounts | 16 | +| Rooms | 4 | +| CPUs per room | 1 | +| Games per room | 10 | +| Holes | 9 | +| Decks | 2 | +| Think time | 800-2200ms | + +### `stress` + +Rapid short games with chaos injection for stability testing. + +| Setting | Default | +|---|---| +| Accounts | 16 | +| Rooms | 4 | +| CPUs per room | 2 | +| Games per room | 50 | +| Holes | 1 | +| Decks | 1 | +| Think time | 50-150ms | +| Chaos chance | 5% per turn | + +Chaos events: `rapid_clicks`, `tab_blur`, `brief_offline` + +### Adding new scenarios + +Create `scenarios/.ts` exporting a `Scenario` object, then register +it in `scenarios/index.ts`. See existing scenarios for the pattern. + +## Error handling + +- **Per-room isolation**: a failure in one room never unwinds other rooms + (`Promise.allSettled`) +- **Watchdog**: 60s per-room timeout — fires if no heartbeat arrives +- **Health probes**: `GET /health` every 30s, 3 consecutive failures = fatal abort +- **Graceful shutdown**: Ctrl-C finishes current turn, then cleans up (10s timeout). + Double Ctrl-C = immediate force exit +- **Artifacts**: on failure, screenshots + HTML + game state JSON saved to + `artifacts//`. Old artifacts auto-pruned after 7 days +- **Exit codes**: `0` = success, `1` = errors, `2` = interrupted + +## Test account filtering + +Soak accounts are flagged `is_test_account=TRUE` in the database. They are: + +- **Hidden by default** from public leaderboards and stats (`?include_test=false`) +- **Visible to admins** by default in the admin panel +- **Togglable** via the "Include test accounts" checkbox in the admin panel +- **Badged** with `[Test]` in the admin user list and `[Test-seed]` on the invite code + +## Unit tests + +```bash +bun run test +``` + +27 tests covering Deferred, RoomCoordinator, Watchdog, Logger, and Config. +Integration-level modules (SessionPool, scenarios, dashboard) are verified +by the smoke test and live runs. + +## Architecture + +``` +runner.ts CLI entry — parses flags, wires everything, runs scenario +core/ + session-pool.ts Owns browser contexts, seeds/logs in accounts + room-coordinator Deferred-based host→joiners room code handoff + watchdog.ts Per-room timeout detector + screencaster.ts CDP Page.startScreencast for live video + logger.ts Structured JSONL logger with child contexts + artifacts.ts Screenshot/HTML/state capture on failure + types.ts Scenario/Session/Logger contracts +scenarios/ + populate.ts Long multi-round games + stress.ts Rapid games with chaos injection + shared/ + multiplayer-game.ts Shared "play one game" loop + chaos.ts Chaos event injector +dashboard/ + server.ts HTTP + WS server + index.html Status grid UI + dashboard.js WS client + click-to-watch +scripts/ + seed-accounts.ts Account seeding CLI + smoke.sh End-to-end canary (~60s) +``` + +Reuses `tests/e2e/bot/golf-bot.ts` unchanged for all game interactions. + +## Related docs + +- [Design spec](../../docs/superpowers/specs/2026-04-10-multiplayer-soak-test-design.md) +- [Bring-up steps](../../docs/soak-harness-bringup.md) +- [Implementation plan](../../docs/superpowers/plans/2026-04-10-multiplayer-soak-test.md)