Replaces the Task 31 stub README with complete documentation: quickstart, first-time setup (invite flagging, seeding, smoke), usage examples for all three watch modes, CLI flag reference, env var table, scenario descriptions, error handling summary, test account filtering explanation, and architecture overview. Adds CHECKLIST.md with post-deploy verification, bring-up, scenario, watch mode, failure handling, and staging gate items. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
9.1 KiB
Golf Soak & UX Test Harness
Standalone Playwright-based runner that drives multiple authenticated browser sessions playing real multiplayer games. Used for:
- Scoreboard population — fill staging leaderboards with realistic data
- Stability stress testing — hunt race conditions, WebSocket leaks, cleanup bugs
- Live monitoring — watch bot sessions play in real time via CDP screencast
Prerequisites
- Bun (or Node.js + npm)
- Chromium browser binary (installed via
bunx playwright install chromium) - A running Golf Card Game server (local dev or staging)
- An invite code flagged as
marks_as_test=TRUE(see Bring-up)
First-time setup
1. Install dependencies
cd tests/soak
bun install
bunx playwright install chromium
2. Flag the invite code as test-seed
Any account registered with a test-seed invite gets is_test_account=TRUE,
which keeps it out of real-user stats and leaderboards.
Local dev:
PGPASSWORD=devpassword psql -h localhost -U golf -d golf <<'SQL'
INSERT INTO invite_codes (code, created_by, expires_at, max_uses, is_active, marks_as_test)
SELECT 'SOAKTEST', id, NOW() + INTERVAL '10 years', 100, TRUE, TRUE
FROM users_v2 LIMIT 1
ON CONFLICT (code) DO UPDATE SET marks_as_test = TRUE;
SQL
Staging:
ssh root@129.212.150.189 \
'docker compose -f /opt/golfgame/docker-compose.staging.yml exec -T postgres psql -U postgres -d golfgame' <<'SQL'
UPDATE invite_codes SET marks_as_test = TRUE WHERE code = '5VC2MCCN';
SQL
3. Seed test accounts
# Local dev
TEST_URL=http://localhost:8000 SOAK_INVITE_CODE=SOAKTEST bun run seed
# Staging
TEST_URL=https://staging.adlee.work SOAK_INVITE_CODE=5VC2MCCN bun run seed
This registers 16 accounts via the invite code and caches their credentials
in .env.stresstest. Only needs to run once — subsequent runs reuse the
cached credentials (re-logging in if tokens expire).
4. Verify with a smoke test
# Local dev
TEST_URL=http://localhost:8000 SOAK_INVITE_CODE=SOAKTEST bash scripts/smoke.sh
Expected: one game plays to completion in ~60 seconds, exits 0.
Usage
Populate scoreboards (recommended first run)
TEST_URL=https://staging.adlee.work SOAK_INVITE_CODE=5VC2MCCN bun run soak -- \
--scenario=populate \
--watch=dashboard
This runs 4 rooms x 10 games x 9 holes with varied CPU personalities.
The dashboard opens automatically at http://localhost:7777.
Quick smoke against staging
TEST_URL=https://staging.adlee.work SOAK_INVITE_CODE=5VC2MCCN bun run soak -- \
--scenario=populate \
--accounts=2 --rooms=1 --cpus-per-room=0 \
--games-per-room=1 --holes=1 \
--watch=dashboard
Stress test with chaos injection
TEST_URL=https://staging.adlee.work SOAK_INVITE_CODE=5VC2MCCN bun run soak -- \
--scenario=stress \
--accounts=4 --rooms=1 --games-per-room=5 \
--watch=dashboard
Rapid 1-hole games with random chaos events (rapid clicks, tab blur, brief network outage) injected during gameplay.
Headless mode (CI / overnight)
TEST_URL=https://staging.adlee.work SOAK_INVITE_CODE=5VC2MCCN bun run soak -- \
--scenario=populate --watch=none
Outputs structured JSONL to stdout. Pipe to jq for filtering:
bun run soak -- --scenario=populate --watch=none 2>&1 | jq 'select(.msg == "game_complete")'
Tiled mode (native browser windows)
bun run soak -- --scenario=populate --rooms=2 --watch=tiled
Opens visible Chromium windows for each room's host session. Useful for hands-on debugging with DevTools.
CLI flags
--scenario=populate|stress required — which scenario to run
--accounts=<n> total sessions (default: from scenario)
--rooms=<n> parallel rooms (default: from scenario)
--cpus-per-room=<n> CPU opponents per room (default: from scenario)
--games-per-room=<n> games per room (default: from scenario)
--holes=<n> holes per game (default: from scenario)
--watch=none|dashboard|tiled visualization mode (default: dashboard)
--dashboard-port=<n> dashboard server port (default: 7777)
--target=<url> override TEST_URL env var
--run-id=<string> custom run identifier (default: timestamp)
--list print available scenarios and exit
--dry-run validate config without running
accounts / rooms must divide evenly.
Environment variables
| Variable | Description | Default |
|---|---|---|
TEST_URL |
Target server base URL | http://localhost:8000 |
SOAK_INVITE_CODE |
Invite code for account seeding | SOAKTEST |
SOAK_HOLES |
Override --holes |
— |
SOAK_ROOMS |
Override --rooms |
— |
SOAK_ACCOUNTS |
Override --accounts |
— |
SOAK_CPUS_PER_ROOM |
Override --cpus-per-room |
— |
SOAK_GAMES_PER_ROOM |
Override --games-per-room |
— |
SOAK_WATCH |
Override --watch |
— |
SOAK_DASHBOARD_PORT |
Override --dashboard-port |
— |
Config precedence: CLI flags > env vars > scenario defaults.
Watch modes
dashboard (default)
Opens http://localhost:7777 with a live status grid:
- 2x2 room tiles showing phase, current player, move count, progress bar
- Activity log at the bottom
- Click any player tile to watch their live session via CDP screencast
- Press Esc or click Close to stop the video feed
- WS connection status indicator
The dashboard runs locally on your machine — the runner's headless browsers connect to the target server remotely while the dashboard UI is served from your workstation.
tiled
Opens native Chromium windows for each room's host session, positioned in a grid. Joiners stay headless. Useful for interactive debugging with DevTools. The viewport is sized at 960x900 to show the full game table.
none
Pure headless, structured JSONL to stdout. Use for CI, overnight runs,
or piping to jq.
Scenarios
populate
Long multi-round games to populate scoreboards with realistic data.
| Setting | Default |
|---|---|
| Accounts | 16 |
| Rooms | 4 |
| CPUs per room | 1 |
| Games per room | 10 |
| Holes | 9 |
| Decks | 2 |
| Think time | 800-2200ms |
stress
Rapid short games with chaos injection for stability testing.
| Setting | Default |
|---|---|
| Accounts | 16 |
| Rooms | 4 |
| CPUs per room | 2 |
| Games per room | 50 |
| Holes | 1 |
| Decks | 1 |
| Think time | 50-150ms |
| Chaos chance | 5% per turn |
Chaos events: rapid_clicks, tab_blur, brief_offline
Adding new scenarios
Create scenarios/<name>.ts exporting a Scenario object, then register
it in scenarios/index.ts. See existing scenarios for the pattern.
Error handling
- Per-room isolation: a failure in one room never unwinds other rooms
(
Promise.allSettled) - Watchdog: 60s per-room timeout — fires if no heartbeat arrives
- Health probes:
GET /healthevery 30s, 3 consecutive failures = fatal abort - Graceful shutdown: Ctrl-C finishes current turn, then cleans up (10s timeout). Double Ctrl-C = immediate force exit
- Artifacts: on failure, screenshots + HTML + game state JSON saved to
artifacts/<run-id>/. Old artifacts auto-pruned after 7 days - Exit codes:
0= success,1= errors,2= interrupted
Test account filtering
Soak accounts are flagged is_test_account=TRUE in the database. They are:
- Hidden by default from public leaderboards and stats (
?include_test=false) - Visible to admins by default in the admin panel
- Togglable via the "Include test accounts" checkbox in the admin panel
- Badged with
[Test]in the admin user list and[Test-seed]on the invite code
Unit tests
bun run test
27 tests covering Deferred, RoomCoordinator, Watchdog, Logger, and Config. Integration-level modules (SessionPool, scenarios, dashboard) are verified by the smoke test and live runs.
Architecture
runner.ts CLI entry — parses flags, wires everything, runs scenario
core/
session-pool.ts Owns browser contexts, seeds/logs in accounts
room-coordinator Deferred-based host→joiners room code handoff
watchdog.ts Per-room timeout detector
screencaster.ts CDP Page.startScreencast for live video
logger.ts Structured JSONL logger with child contexts
artifacts.ts Screenshot/HTML/state capture on failure
types.ts Scenario/Session/Logger contracts
scenarios/
populate.ts Long multi-round games
stress.ts Rapid games with chaos injection
shared/
multiplayer-game.ts Shared "play one game" loop
chaos.ts Chaos event injector
dashboard/
server.ts HTTP + WS server
index.html Status grid UI
dashboard.js WS client + click-to-watch
scripts/
seed-accounts.ts Account seeding CLI
smoke.sh End-to-end canary (~60s)
Reuses tests/e2e/bot/golf-bot.ts unchanged for all game interactions.