Batched remaining harness tasks (27-30, 33):
Task 27 — Artifact capture on failure: screenshots, HTML snapshots,
game state JSON, and console error tails are captured into
tests/soak/artifacts/<run-id>/ when a scenario throws. Successful
runs get a summary.json. Old runs (>7d) are pruned on startup.
Task 28 — Graceful shutdown: first SIGINT/SIGTERM flips the abort
signal (scenarios finish current turn then unwind). 10s after, a
hard-kill fires if cleanup hangs. Double Ctrl-C = immediate exit.
Exit codes: 0 success, 1 errors, 2 interrupted.
Task 29 — Periodic health probes: every 30s GET /health against the
target server. Three consecutive failures abort the run with
health_fatal, preventing staging outages from being misattributed
to harness bugs. Corrected endpoint from /api/health to /health
per server/routers/health.py.
Task 30 — Smoke test script: tests/soak/scripts/smoke.sh, a 60s
end-to-end canary that health-probes the target, seeds if needed,
and runs one minimal populate game.
Task 33 — Version bump to v3.3.4: both index.html footers (was
v3.1.6), new footer added to admin.html (had none), pyproject.toml.
Also fixes discovered during stress testing:
- SessionPool sets baseURL on all contexts so relative goto('/')
resolves correctly between games (was "invalid URL" error)
- RoomCoordinator key is now unique per game-start (Date.now
suffix) so Deferred promises don't carry stale room codes from
previous games
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Watchdog class with 4 Vitest tests (27 total now), wired into
ctx.heartbeat in the runner. One watchdog per room with a 60s
timeout; firing logs an error, marks the room's dashboard tile
as errored, and triggers the abort signal so the scenario unwinds.
Watchdogs are explicitly stopped in the runner's finally block
so pending timers don't keep the node process alive on exit.
Also fixes a multi-game bug discovered during stress scenario
verification: after a game ends sessions stay parked on the
game_over screen, which hides the lobby and makes a subsequent
#create-room-btn click time out. runOneMultiplayerGame now
navigates every session to / before each game — localStorage
auth persists so nothing re-logs in.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Rapid short games with a parallel chaos loop that has a 5% per-turn
chance of firing one of:
- rapid_clicks: 5 quick clicks at the player's own cards
- tab_blur: window blur/focus event pair
- brief_offline: 300ms network outage via context.setOffline
Chaos counts roll up into ScenarioResult.customMetrics.chaos_fired.
Important detail: chaos loop has a 3-second initial delay so room
creation, joiners, and game start can complete without interference.
Chaos during lobby setup (especially brief_offline) was causing
#create-room-btn to go unstable.
Verified: stress smoke with --games-per-room=3, 4 accounts + 1 CPU,
first game completed with 37 turns and chaos events fired across all
three event types.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
SessionPool accepts headedHostCount; when > 0 it launches a second
Chromium in headed mode and creates the first N sessions in it.
Joiners (sessions N..count) stay headless in the main browser.
Each headed context gets a 960×900 viewport — tall enough to show
the full game table (deck + opponent row + own 2×3 card grid +
status area) without clipping. Horizontal tiling still fits two
windows side-by-side on a 1920-wide display.
window.moveTo is kept as a best-effort tile-placement hint, but
viewport from newContext() is what actually sizes the window
(window.resizeTo is a no-op on modern Chromium / Wayland).
Verified: 1-room tiled run plays a full game cleanly; 2-room
parallel tiled had one window get closed mid-run, which is
consistent with a user manually dismissing a window — tiled mode
is a best-effort hands-on debugging aid, not an automation mode.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Runner creates a Screencaster before sessions are acquired, then
wires its start/stop into DashboardServer.onStartStream / onStopStream
after sessions exist (the handlers close over a sessionsByKey map).
Clicking a player tile in the dashboard starts a CDP screencast on
that session's page, forwards JPEG frames as WS "frame" messages.
Closing the modal (or disconnecting the WS) stops all screencasts.
Verified end-to-end: programmatically connected WS, sent start_stream,
received 5 frames (13.7KB each), sent stop_stream, screencast_stopped
log line fired, run completed cleanly.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Attach/detach CDP sessions per Playwright Page, start/stop JPEG
screencasts with configurable quality and frame rate, forward each
frame to a callback. Used by the dashboard for click-to-watch
live video (wired in Task 23).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Starts DashboardServer on port 7777 (or --dashboard-port), uses its
reporter as ctx.dashboard, auto-opens the URL via xdg-open/open/start.
Cleans up on exit. WS client connections logged as info events so
you can see when the browser attaches.
Verified: 2-account populate run with --watch=dashboard serves the
static page on :7777, accepts WS connections, cleanly shuts down
when the run completes.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Static HTML page served by DashboardServer. Renders the 2×2 room
grid with progress bars and player tiles, subscribes to WS events,
updates tiles live. Click-to-watch modal is wired but receives
frames once the CDP screencaster ships in Task 22.
Adds escapeHtml() on all user-controlled strings (roomId, player
key) — not strictly needed for our trusted bot traffic but cheap
XSS hardening against future scenarios that accept user input.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Serves one static HTML page, accepts WS connections, broadcasts
room_state/log/metric messages to all clients. Replays current
state to late-joining clients so refreshing the dashboard during
a run shows the right grid. Exposes a reporter() method that
returns a DashboardReporter scenarios can call without knowing
about sockets.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
First full end-to-end milestone: parses CLI, builds SessionPool +
RoomCoordinator, loads a scenario by name, runs it, reports results,
cleans up. Watch modes other than "none" log a warning and fall back
until Tasks 19-24 implement them.
Smoke test passed against local dev:
bun run soak -- --scenario=populate --accounts=2 --rooms=1
--cpus-per-room=0 --games-per-room=1 --holes=1 --watch=none
→ Games completed: 1, Errors: 0, Duration: 78.2s, exit 0
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
parseArgs pulls --scenario/--rooms/--watch/etc from argv,
mergeConfig layers scenarioDefaults → env → CLI so CLI flags
always win. 12 Vitest unit tests cover both parse happy/edge
paths and the 4-way merge precedence matrix.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Partitions sessions into N rooms, runs gamesPerRoom games per room
in parallel via Promise.allSettled so a failure in one room never
unwinds the others. Errors roll up into ScenarioResult.errors.
Verified via tsx: listScenarios() returns [populate], getScenario()
resolves by name and returns undefined for unknown names.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Encapsulates the host-creates/joiners-join/loop-until-done flow so
populate and stress scenarios don't duplicate it. Honors abort
signal and a max-duration timeout, heartbeats on every turn.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Thin standalone entry for pre-seeding N accounts before the first
harness run. Wraps SessionPool.seed and writes .env.stresstest.
End-to-end verified: ran against local dev with --count=4, all 4
accounts landed in the DB with is_test_account=TRUE, cred file
written with correct format.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Owns BrowserContexts, seeds via POST /api/auth/register with the
invite code on cold start, warm-starts via localStorage injection of
the cached JWT, falls back to POST /api/auth/login if the token is
rejected. Exposes acquire(n) for scenarios.
Infrastructure changes needed to import the real GolfBot class from
tests/e2e/bot/ without the Task-10 structural-interface workaround:
- Add @playwright/test as devDep so value-imports in e2e/bot/*.ts
resolve at runtime (Page/Locator/expect are pulled even as types)
- Remove rootDir from tsconfig so TS follows cross-package imports;
add a paths entry so TS can resolve @playwright/test from the soak
package's node_modules when compiling files under tests/e2e/bot
- Drop the local GolfBot structural interface + its placeholder
GamePhase/StartGameOptions/TurnResult types; re-export the real
class from types.ts
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Single file, no transport, writes one JSON line per call to stdout.
Child loggers inherit parent meta so scenarios can bind room/game
context once and forget about it.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Lazy Deferred per roomId with a timeout on await. Lets concurrent
joiner sessions block until their host announces the room code
without polling or page scraping.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Establishes the Scenario/Session/Logger/DashboardReporter contracts
the rest of the harness builds on. Deferred is the building block
for RoomCoordinator's host→joiners handoff.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Placeholder runner, tsconfig with @bot alias to tests/e2e/bot,
gitignored .env.stresstest + artifacts. Real behavior follows
in Task 10 onward.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>