Batched remaining harness tasks (27-30, 33):
Task 27 — Artifact capture on failure: screenshots, HTML snapshots,
game state JSON, and console error tails are captured into
tests/soak/artifacts/<run-id>/ when a scenario throws. Successful
runs get a summary.json. Old runs (>7d) are pruned on startup.
Task 28 — Graceful shutdown: first SIGINT/SIGTERM flips the abort
signal (scenarios finish current turn then unwind). 10s after, a
hard-kill fires if cleanup hangs. Double Ctrl-C = immediate exit.
Exit codes: 0 success, 1 errors, 2 interrupted.
Task 29 — Periodic health probes: every 30s GET /health against the
target server. Three consecutive failures abort the run with
health_fatal, preventing staging outages from being misattributed
to harness bugs. Corrected endpoint from /api/health to /health
per server/routers/health.py.
Task 30 — Smoke test script: tests/soak/scripts/smoke.sh, a 60s
end-to-end canary that health-probes the target, seeds if needed,
and runs one minimal populate game.
Task 33 — Version bump to v3.3.4: both index.html footers (was
v3.1.6), new footer added to admin.html (had none), pyproject.toml.
Also fixes discovered during stress testing:
- SessionPool sets baseURL on all contexts so relative goto('/')
resolves correctly between games (was "invalid URL" error)
- RoomCoordinator key is now unique per game-start (Date.now
suffix) so Deferred promises don't carry stale room codes from
previous games
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Watchdog class with 4 Vitest tests (27 total now), wired into
ctx.heartbeat in the runner. One watchdog per room with a 60s
timeout; firing logs an error, marks the room's dashboard tile
as errored, and triggers the abort signal so the scenario unwinds.
Watchdogs are explicitly stopped in the runner's finally block
so pending timers don't keep the node process alive on exit.
Also fixes a multi-game bug discovered during stress scenario
verification: after a game ends sessions stay parked on the
game_over screen, which hides the lobby and makes a subsequent
#create-room-btn click time out. runOneMultiplayerGame now
navigates every session to / before each game — localStorage
auth persists so nothing re-logs in.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Encapsulates the host-creates/joiners-join/loop-until-done flow so
populate and stress scenarios don't duplicate it. Honors abort
signal and a max-duration timeout, heartbeats on every turn.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>