Compare commits
1 Commits
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
d5194f43ba |
64
tests/soak/CHECKLIST.md
Normal file
64
tests/soak/CHECKLIST.md
Normal file
@@ -0,0 +1,64 @@
|
||||
# Soak Harness Validation Checklist
|
||||
|
||||
Run after significant changes or before calling the harness implementation complete.
|
||||
|
||||
## Post-deploy schema verification
|
||||
|
||||
Run after the server-side changes deploy to each environment.
|
||||
|
||||
- [ ] Server restarted (docker compose up -d or CI/CD deploy)
|
||||
- [ ] Server logs show `User store schema initialized` after restart
|
||||
- [ ] `\d users_v2` shows `is_test_account` column with default `false`
|
||||
- [ ] `\d invite_codes` shows `marks_as_test` column with default `false`
|
||||
- [ ] `\d leaderboard_overall` shows `is_test_account` column
|
||||
- [ ] `\di idx_users_test_account` shows the partial index
|
||||
- [ ] Leaderboard query still works: `curl .../api/stats/leaderboard` returns entries
|
||||
- [ ] `?include_test=true` parameter is accepted (no 422/500)
|
||||
|
||||
## Bring-up
|
||||
|
||||
- [ ] Invite code flagged with `marks_as_test=TRUE` on target environment
|
||||
- [ ] `bun run seed` creates/updates accounts in `.env.stresstest`
|
||||
- [ ] All seeded users show `is_test_account=TRUE` in the DB
|
||||
|
||||
## Smoke test
|
||||
|
||||
- [ ] `bash scripts/smoke.sh` exits 0 within 60s
|
||||
|
||||
## Scenarios
|
||||
|
||||
- [ ] `--scenario=populate --rooms=1 --games-per-room=1` completes cleanly
|
||||
- [ ] `--scenario=populate --rooms=2 --games-per-room=2` runs multiple rooms and multiple games
|
||||
- [ ] `--scenario=stress --games-per-room=3` logs `chaos_injected` events and completes
|
||||
|
||||
## Watch modes
|
||||
|
||||
- [ ] `--watch=none` produces JSONL on stdout, nothing else
|
||||
- [ ] `--watch=dashboard` opens http://localhost:7777, grid renders, WS shows `healthy`
|
||||
- [ ] Clicking a player tile opens the video modal with live JPEG frames
|
||||
- [ ] Closing the modal (Esc or Close) stops the screencast (check logs for `screencast_stopped`)
|
||||
- [ ] `--watch=tiled` opens native Chromium windows sized to show the full game table
|
||||
|
||||
## Failure handling
|
||||
|
||||
- [ ] Ctrl-C during a run → graceful shutdown, summary printed, exit code 2
|
||||
- [ ] Double Ctrl-C → immediate hard exit (130)
|
||||
- [ ] Health probes detect server down (3 consecutive failures → fatal abort)
|
||||
- [ ] Artifacts directory contains screenshots + state JSON on failure
|
||||
- [ ] Artifacts older than 7 days are pruned on next startup
|
||||
|
||||
## Server-side filtering
|
||||
|
||||
- [ ] `GET /api/stats/leaderboard` (default) hides soak accounts
|
||||
- [ ] `GET /api/stats/leaderboard?include_test=true` shows soak accounts
|
||||
- [ ] Admin panel user list shows `[Test]` badge on soak accounts
|
||||
- [ ] Admin panel invite codes tab shows `[Test-seed]` badge
|
||||
- [ ] "Include test accounts" checkbox toggles visibility in admin
|
||||
|
||||
## Staging bring-up
|
||||
|
||||
- [ ] `5VC2MCCN` flagged with `marks_as_test=TRUE` on staging DB
|
||||
- [ ] 16 accounts seeded via `SOAK_INVITE_CODE=5VC2MCCN bun run seed`
|
||||
- [ ] Populate run against staging completes with `--watch=dashboard`
|
||||
- [ ] Staging leaderboard default does NOT show soak accounts
|
||||
- [ ] Staging leaderboard with `?include_test=true` does show them
|
||||
@@ -1,21 +1,296 @@
|
||||
# Golf Soak & UX Test Harness
|
||||
|
||||
Runs 16 authenticated browser sessions across 4 rooms to populate
|
||||
staging scoreboards and stress-test multiplayer stability.
|
||||
Standalone Playwright-based runner that drives multiple authenticated
|
||||
browser sessions playing real multiplayer games. Used for:
|
||||
|
||||
**Spec:** `docs/superpowers/specs/2026-04-10-multiplayer-soak-test-design.md`
|
||||
**Bring-up:** `docs/soak-harness-bringup.md`
|
||||
- **Scoreboard population** — fill staging leaderboards with realistic data
|
||||
- **Stability stress testing** — hunt race conditions, WebSocket leaks, cleanup bugs
|
||||
- **Live monitoring** — watch bot sessions play in real time via CDP screencast
|
||||
|
||||
## Quick start
|
||||
## Prerequisites
|
||||
|
||||
- [Bun](https://bun.sh/) (or Node.js + npm)
|
||||
- Chromium browser binary (installed via `bunx playwright install chromium`)
|
||||
- A running Golf Card Game server (local dev or staging)
|
||||
- An invite code flagged as `marks_as_test=TRUE` (see [Bring-up](#first-time-setup))
|
||||
|
||||
## First-time setup
|
||||
|
||||
### 1. Install dependencies
|
||||
|
||||
```bash
|
||||
cd tests/soak
|
||||
bun install
|
||||
bun run seed # first run only
|
||||
TEST_URL=http://localhost:8000 bun run smoke
|
||||
bunx playwright install chromium
|
||||
```
|
||||
|
||||
(The scripts also work with `npm run`, `pnpm run`, etc. — bun is what's installed
|
||||
on this dev machine.)
|
||||
### 2. Flag the invite code as test-seed
|
||||
|
||||
Full documentation arrives with Task 31.
|
||||
Any account registered with a test-seed invite gets `is_test_account=TRUE`,
|
||||
which keeps it out of real-user stats and leaderboards.
|
||||
|
||||
**Local dev:**
|
||||
|
||||
```bash
|
||||
PGPASSWORD=devpassword psql -h localhost -U golf -d golf <<'SQL'
|
||||
INSERT INTO invite_codes (code, created_by, expires_at, max_uses, is_active, marks_as_test)
|
||||
SELECT 'SOAKTEST', id, NOW() + INTERVAL '10 years', 100, TRUE, TRUE
|
||||
FROM users_v2 LIMIT 1
|
||||
ON CONFLICT (code) DO UPDATE SET marks_as_test = TRUE;
|
||||
SQL
|
||||
```
|
||||
|
||||
**Staging:**
|
||||
|
||||
```bash
|
||||
ssh root@129.212.150.189 \
|
||||
'docker compose -f /opt/golfgame/docker-compose.staging.yml exec -T postgres psql -U postgres -d golfgame' <<'SQL'
|
||||
UPDATE invite_codes SET marks_as_test = TRUE WHERE code = '5VC2MCCN';
|
||||
SQL
|
||||
```
|
||||
|
||||
### 3. Seed test accounts
|
||||
|
||||
```bash
|
||||
# Local dev
|
||||
TEST_URL=http://localhost:8000 SOAK_INVITE_CODE=SOAKTEST bun run seed
|
||||
|
||||
# Staging
|
||||
TEST_URL=https://staging.adlee.work SOAK_INVITE_CODE=5VC2MCCN bun run seed
|
||||
```
|
||||
|
||||
This registers 16 accounts via the invite code and caches their credentials
|
||||
in `.env.stresstest`. Only needs to run once — subsequent runs reuse the
|
||||
cached credentials (re-logging in if tokens expire).
|
||||
|
||||
### 4. Verify with a smoke test
|
||||
|
||||
```bash
|
||||
# Local dev
|
||||
TEST_URL=http://localhost:8000 SOAK_INVITE_CODE=SOAKTEST bash scripts/smoke.sh
|
||||
```
|
||||
|
||||
Expected: one game plays to completion in ~60 seconds, exits 0.
|
||||
|
||||
## Usage
|
||||
|
||||
### Populate scoreboards (recommended first run)
|
||||
|
||||
```bash
|
||||
TEST_URL=https://staging.adlee.work SOAK_INVITE_CODE=5VC2MCCN bun run soak -- \
|
||||
--scenario=populate \
|
||||
--watch=dashboard
|
||||
```
|
||||
|
||||
This runs 4 rooms x 10 games x 9 holes with varied CPU personalities.
|
||||
The dashboard opens automatically at `http://localhost:7777`.
|
||||
|
||||
### Quick smoke against staging
|
||||
|
||||
```bash
|
||||
TEST_URL=https://staging.adlee.work SOAK_INVITE_CODE=5VC2MCCN bun run soak -- \
|
||||
--scenario=populate \
|
||||
--accounts=2 --rooms=1 --cpus-per-room=0 \
|
||||
--games-per-room=1 --holes=1 \
|
||||
--watch=dashboard
|
||||
```
|
||||
|
||||
### Stress test with chaos injection
|
||||
|
||||
```bash
|
||||
TEST_URL=https://staging.adlee.work SOAK_INVITE_CODE=5VC2MCCN bun run soak -- \
|
||||
--scenario=stress \
|
||||
--accounts=4 --rooms=1 --games-per-room=5 \
|
||||
--watch=dashboard
|
||||
```
|
||||
|
||||
Rapid 1-hole games with random chaos events (rapid clicks, tab blur,
|
||||
brief network outage) injected during gameplay.
|
||||
|
||||
### Headless mode (CI / overnight)
|
||||
|
||||
```bash
|
||||
TEST_URL=https://staging.adlee.work SOAK_INVITE_CODE=5VC2MCCN bun run soak -- \
|
||||
--scenario=populate --watch=none
|
||||
```
|
||||
|
||||
Outputs structured JSONL to stdout. Pipe to `jq` for filtering:
|
||||
|
||||
```bash
|
||||
bun run soak -- --scenario=populate --watch=none 2>&1 | jq 'select(.msg == "game_complete")'
|
||||
```
|
||||
|
||||
### Tiled mode (native browser windows)
|
||||
|
||||
```bash
|
||||
bun run soak -- --scenario=populate --rooms=2 --watch=tiled
|
||||
```
|
||||
|
||||
Opens visible Chromium windows for each room's host session. Useful for
|
||||
hands-on debugging with DevTools.
|
||||
|
||||
## CLI flags
|
||||
|
||||
```
|
||||
--scenario=populate|stress required — which scenario to run
|
||||
--accounts=<n> total sessions (default: from scenario)
|
||||
--rooms=<n> parallel rooms (default: from scenario)
|
||||
--cpus-per-room=<n> CPU opponents per room (default: from scenario)
|
||||
--games-per-room=<n> games per room (default: from scenario)
|
||||
--holes=<n> holes per game (default: from scenario)
|
||||
--watch=none|dashboard|tiled visualization mode (default: dashboard)
|
||||
--dashboard-port=<n> dashboard server port (default: 7777)
|
||||
--target=<url> override TEST_URL env var
|
||||
--run-id=<string> custom run identifier (default: timestamp)
|
||||
--list print available scenarios and exit
|
||||
--dry-run validate config without running
|
||||
```
|
||||
|
||||
`accounts / rooms` must divide evenly.
|
||||
|
||||
## Environment variables
|
||||
|
||||
| Variable | Description | Default |
|
||||
|---|---|---|
|
||||
| `TEST_URL` | Target server base URL | `http://localhost:8000` |
|
||||
| `SOAK_INVITE_CODE` | Invite code for account seeding | `SOAKTEST` |
|
||||
| `SOAK_HOLES` | Override `--holes` | — |
|
||||
| `SOAK_ROOMS` | Override `--rooms` | — |
|
||||
| `SOAK_ACCOUNTS` | Override `--accounts` | — |
|
||||
| `SOAK_CPUS_PER_ROOM` | Override `--cpus-per-room` | — |
|
||||
| `SOAK_GAMES_PER_ROOM` | Override `--games-per-room` | — |
|
||||
| `SOAK_WATCH` | Override `--watch` | — |
|
||||
| `SOAK_DASHBOARD_PORT` | Override `--dashboard-port` | — |
|
||||
|
||||
Config precedence: CLI flags > env vars > scenario defaults.
|
||||
|
||||
## Watch modes
|
||||
|
||||
### `dashboard` (default)
|
||||
|
||||
Opens `http://localhost:7777` with a live status grid:
|
||||
|
||||
- 2x2 room tiles showing phase, current player, move count, progress bar
|
||||
- Activity log at the bottom
|
||||
- **Click any player tile** to watch their live session via CDP screencast
|
||||
- Press Esc or click Close to stop the video feed
|
||||
- WS connection status indicator
|
||||
|
||||
The dashboard runs **locally on your machine** — the runner's headless
|
||||
browsers connect to the target server remotely while the dashboard UI
|
||||
is served from your workstation.
|
||||
|
||||
### `tiled`
|
||||
|
||||
Opens native Chromium windows for each room's host session, positioned
|
||||
in a grid. Joiners stay headless. Useful for interactive debugging with
|
||||
DevTools. The viewport is sized at 960x900 to show the full game table.
|
||||
|
||||
### `none`
|
||||
|
||||
Pure headless, structured JSONL to stdout. Use for CI, overnight runs,
|
||||
or piping to `jq`.
|
||||
|
||||
## Scenarios
|
||||
|
||||
### `populate`
|
||||
|
||||
Long multi-round games to populate scoreboards with realistic data.
|
||||
|
||||
| Setting | Default |
|
||||
|---|---|
|
||||
| Accounts | 16 |
|
||||
| Rooms | 4 |
|
||||
| CPUs per room | 1 |
|
||||
| Games per room | 10 |
|
||||
| Holes | 9 |
|
||||
| Decks | 2 |
|
||||
| Think time | 800-2200ms |
|
||||
|
||||
### `stress`
|
||||
|
||||
Rapid short games with chaos injection for stability testing.
|
||||
|
||||
| Setting | Default |
|
||||
|---|---|
|
||||
| Accounts | 16 |
|
||||
| Rooms | 4 |
|
||||
| CPUs per room | 2 |
|
||||
| Games per room | 50 |
|
||||
| Holes | 1 |
|
||||
| Decks | 1 |
|
||||
| Think time | 50-150ms |
|
||||
| Chaos chance | 5% per turn |
|
||||
|
||||
Chaos events: `rapid_clicks`, `tab_blur`, `brief_offline`
|
||||
|
||||
### Adding new scenarios
|
||||
|
||||
Create `scenarios/<name>.ts` exporting a `Scenario` object, then register
|
||||
it in `scenarios/index.ts`. See existing scenarios for the pattern.
|
||||
|
||||
## Error handling
|
||||
|
||||
- **Per-room isolation**: a failure in one room never unwinds other rooms
|
||||
(`Promise.allSettled`)
|
||||
- **Watchdog**: 60s per-room timeout — fires if no heartbeat arrives
|
||||
- **Health probes**: `GET /health` every 30s, 3 consecutive failures = fatal abort
|
||||
- **Graceful shutdown**: Ctrl-C finishes current turn, then cleans up (10s timeout).
|
||||
Double Ctrl-C = immediate force exit
|
||||
- **Artifacts**: on failure, screenshots + HTML + game state JSON saved to
|
||||
`artifacts/<run-id>/`. Old artifacts auto-pruned after 7 days
|
||||
- **Exit codes**: `0` = success, `1` = errors, `2` = interrupted
|
||||
|
||||
## Test account filtering
|
||||
|
||||
Soak accounts are flagged `is_test_account=TRUE` in the database. They are:
|
||||
|
||||
- **Hidden by default** from public leaderboards and stats (`?include_test=false`)
|
||||
- **Visible to admins** by default in the admin panel
|
||||
- **Togglable** via the "Include test accounts" checkbox in the admin panel
|
||||
- **Badged** with `[Test]` in the admin user list and `[Test-seed]` on the invite code
|
||||
|
||||
## Unit tests
|
||||
|
||||
```bash
|
||||
bun run test
|
||||
```
|
||||
|
||||
27 tests covering Deferred, RoomCoordinator, Watchdog, Logger, and Config.
|
||||
Integration-level modules (SessionPool, scenarios, dashboard) are verified
|
||||
by the smoke test and live runs.
|
||||
|
||||
## Architecture
|
||||
|
||||
```
|
||||
runner.ts CLI entry — parses flags, wires everything, runs scenario
|
||||
core/
|
||||
session-pool.ts Owns browser contexts, seeds/logs in accounts
|
||||
room-coordinator Deferred-based host→joiners room code handoff
|
||||
watchdog.ts Per-room timeout detector
|
||||
screencaster.ts CDP Page.startScreencast for live video
|
||||
logger.ts Structured JSONL logger with child contexts
|
||||
artifacts.ts Screenshot/HTML/state capture on failure
|
||||
types.ts Scenario/Session/Logger contracts
|
||||
scenarios/
|
||||
populate.ts Long multi-round games
|
||||
stress.ts Rapid games with chaos injection
|
||||
shared/
|
||||
multiplayer-game.ts Shared "play one game" loop
|
||||
chaos.ts Chaos event injector
|
||||
dashboard/
|
||||
server.ts HTTP + WS server
|
||||
index.html Status grid UI
|
||||
dashboard.js WS client + click-to-watch
|
||||
scripts/
|
||||
seed-accounts.ts Account seeding CLI
|
||||
smoke.sh End-to-end canary (~60s)
|
||||
```
|
||||
|
||||
Reuses `tests/e2e/bot/golf-bot.ts` unchanged for all game interactions.
|
||||
|
||||
## Related docs
|
||||
|
||||
- [Design spec](../../docs/superpowers/specs/2026-04-10-multiplayer-soak-test-design.md)
|
||||
- [Bring-up steps](../../docs/soak-harness-bringup.md)
|
||||
- [Implementation plan](../../docs/superpowers/plans/2026-04-10-multiplayer-soak-test.md)
|
||||
|
||||
Reference in New Issue
Block a user