Files
golfgame/tests/soak
adlee-was-taken d5194f43ba docs(soak): full README + validation checklist
Replaces the Task 31 stub README with complete documentation:
quickstart, first-time setup (invite flagging, seeding, smoke),
usage examples for all three watch modes, CLI flag reference, env
var table, scenario descriptions, error handling summary, test
account filtering explanation, and architecture overview.

Adds CHECKLIST.md with post-deploy verification, bring-up,
scenario, watch mode, failure handling, and staging gate items.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-11 23:05:22 -04:00
..

Golf Soak & UX Test Harness

Standalone Playwright-based runner that drives multiple authenticated browser sessions playing real multiplayer games. Used for:

  • Scoreboard population — fill staging leaderboards with realistic data
  • Stability stress testing — hunt race conditions, WebSocket leaks, cleanup bugs
  • Live monitoring — watch bot sessions play in real time via CDP screencast

Prerequisites

  • Bun (or Node.js + npm)
  • Chromium browser binary (installed via bunx playwright install chromium)
  • A running Golf Card Game server (local dev or staging)
  • An invite code flagged as marks_as_test=TRUE (see Bring-up)

First-time setup

1. Install dependencies

cd tests/soak
bun install
bunx playwright install chromium

2. Flag the invite code as test-seed

Any account registered with a test-seed invite gets is_test_account=TRUE, which keeps it out of real-user stats and leaderboards.

Local dev:

PGPASSWORD=devpassword psql -h localhost -U golf -d golf <<'SQL'
INSERT INTO invite_codes (code, created_by, expires_at, max_uses, is_active, marks_as_test)
SELECT 'SOAKTEST', id, NOW() + INTERVAL '10 years', 100, TRUE, TRUE
FROM users_v2 LIMIT 1
ON CONFLICT (code) DO UPDATE SET marks_as_test = TRUE;
SQL

Staging:

ssh root@129.212.150.189 \
  'docker compose -f /opt/golfgame/docker-compose.staging.yml exec -T postgres psql -U postgres -d golfgame' <<'SQL'
UPDATE invite_codes SET marks_as_test = TRUE WHERE code = '5VC2MCCN';
SQL

3. Seed test accounts

# Local dev
TEST_URL=http://localhost:8000 SOAK_INVITE_CODE=SOAKTEST bun run seed

# Staging
TEST_URL=https://staging.adlee.work SOAK_INVITE_CODE=5VC2MCCN bun run seed

This registers 16 accounts via the invite code and caches their credentials in .env.stresstest. Only needs to run once — subsequent runs reuse the cached credentials (re-logging in if tokens expire).

4. Verify with a smoke test

# Local dev
TEST_URL=http://localhost:8000 SOAK_INVITE_CODE=SOAKTEST bash scripts/smoke.sh

Expected: one game plays to completion in ~60 seconds, exits 0.

Usage

TEST_URL=https://staging.adlee.work SOAK_INVITE_CODE=5VC2MCCN bun run soak -- \
  --scenario=populate \
  --watch=dashboard

This runs 4 rooms x 10 games x 9 holes with varied CPU personalities. The dashboard opens automatically at http://localhost:7777.

Quick smoke against staging

TEST_URL=https://staging.adlee.work SOAK_INVITE_CODE=5VC2MCCN bun run soak -- \
  --scenario=populate \
  --accounts=2 --rooms=1 --cpus-per-room=0 \
  --games-per-room=1 --holes=1 \
  --watch=dashboard

Stress test with chaos injection

TEST_URL=https://staging.adlee.work SOAK_INVITE_CODE=5VC2MCCN bun run soak -- \
  --scenario=stress \
  --accounts=4 --rooms=1 --games-per-room=5 \
  --watch=dashboard

Rapid 1-hole games with random chaos events (rapid clicks, tab blur, brief network outage) injected during gameplay.

Headless mode (CI / overnight)

TEST_URL=https://staging.adlee.work SOAK_INVITE_CODE=5VC2MCCN bun run soak -- \
  --scenario=populate --watch=none

Outputs structured JSONL to stdout. Pipe to jq for filtering:

bun run soak -- --scenario=populate --watch=none 2>&1 | jq 'select(.msg == "game_complete")'

Tiled mode (native browser windows)

bun run soak -- --scenario=populate --rooms=2 --watch=tiled

Opens visible Chromium windows for each room's host session. Useful for hands-on debugging with DevTools.

CLI flags

--scenario=populate|stress    required — which scenario to run
--accounts=<n>                total sessions (default: from scenario)
--rooms=<n>                   parallel rooms (default: from scenario)
--cpus-per-room=<n>           CPU opponents per room (default: from scenario)
--games-per-room=<n>          games per room (default: from scenario)
--holes=<n>                   holes per game (default: from scenario)
--watch=none|dashboard|tiled  visualization mode (default: dashboard)
--dashboard-port=<n>          dashboard server port (default: 7777)
--target=<url>                override TEST_URL env var
--run-id=<string>             custom run identifier (default: timestamp)
--list                        print available scenarios and exit
--dry-run                     validate config without running

accounts / rooms must divide evenly.

Environment variables

Variable Description Default
TEST_URL Target server base URL http://localhost:8000
SOAK_INVITE_CODE Invite code for account seeding SOAKTEST
SOAK_HOLES Override --holes
SOAK_ROOMS Override --rooms
SOAK_ACCOUNTS Override --accounts
SOAK_CPUS_PER_ROOM Override --cpus-per-room
SOAK_GAMES_PER_ROOM Override --games-per-room
SOAK_WATCH Override --watch
SOAK_DASHBOARD_PORT Override --dashboard-port

Config precedence: CLI flags > env vars > scenario defaults.

Watch modes

dashboard (default)

Opens http://localhost:7777 with a live status grid:

  • 2x2 room tiles showing phase, current player, move count, progress bar
  • Activity log at the bottom
  • Click any player tile to watch their live session via CDP screencast
  • Press Esc or click Close to stop the video feed
  • WS connection status indicator

The dashboard runs locally on your machine — the runner's headless browsers connect to the target server remotely while the dashboard UI is served from your workstation.

tiled

Opens native Chromium windows for each room's host session, positioned in a grid. Joiners stay headless. Useful for interactive debugging with DevTools. The viewport is sized at 960x900 to show the full game table.

none

Pure headless, structured JSONL to stdout. Use for CI, overnight runs, or piping to jq.

Scenarios

populate

Long multi-round games to populate scoreboards with realistic data.

Setting Default
Accounts 16
Rooms 4
CPUs per room 1
Games per room 10
Holes 9
Decks 2
Think time 800-2200ms

stress

Rapid short games with chaos injection for stability testing.

Setting Default
Accounts 16
Rooms 4
CPUs per room 2
Games per room 50
Holes 1
Decks 1
Think time 50-150ms
Chaos chance 5% per turn

Chaos events: rapid_clicks, tab_blur, brief_offline

Adding new scenarios

Create scenarios/<name>.ts exporting a Scenario object, then register it in scenarios/index.ts. See existing scenarios for the pattern.

Error handling

  • Per-room isolation: a failure in one room never unwinds other rooms (Promise.allSettled)
  • Watchdog: 60s per-room timeout — fires if no heartbeat arrives
  • Health probes: GET /health every 30s, 3 consecutive failures = fatal abort
  • Graceful shutdown: Ctrl-C finishes current turn, then cleans up (10s timeout). Double Ctrl-C = immediate force exit
  • Artifacts: on failure, screenshots + HTML + game state JSON saved to artifacts/<run-id>/. Old artifacts auto-pruned after 7 days
  • Exit codes: 0 = success, 1 = errors, 2 = interrupted

Test account filtering

Soak accounts are flagged is_test_account=TRUE in the database. They are:

  • Hidden by default from public leaderboards and stats (?include_test=false)
  • Visible to admins by default in the admin panel
  • Togglable via the "Include test accounts" checkbox in the admin panel
  • Badged with [Test] in the admin user list and [Test-seed] on the invite code

Unit tests

bun run test

27 tests covering Deferred, RoomCoordinator, Watchdog, Logger, and Config. Integration-level modules (SessionPool, scenarios, dashboard) are verified by the smoke test and live runs.

Architecture

runner.ts          CLI entry — parses flags, wires everything, runs scenario
core/
  session-pool.ts  Owns browser contexts, seeds/logs in accounts
  room-coordinator Deferred-based host→joiners room code handoff
  watchdog.ts      Per-room timeout detector
  screencaster.ts  CDP Page.startScreencast for live video
  logger.ts        Structured JSONL logger with child contexts
  artifacts.ts     Screenshot/HTML/state capture on failure
  types.ts         Scenario/Session/Logger contracts
scenarios/
  populate.ts      Long multi-round games
  stress.ts        Rapid games with chaos injection
  shared/
    multiplayer-game.ts  Shared "play one game" loop
    chaos.ts             Chaos event injector
dashboard/
  server.ts        HTTP + WS server
  index.html       Status grid UI
  dashboard.js     WS client + click-to-watch
scripts/
  seed-accounts.ts Account seeding CLI
  smoke.sh         End-to-end canary (~60s)

Reuses tests/e2e/bot/golf-bot.ts unchanged for all game interactions.