22 Commits

Author SHA1 Message Date
adlee-was-taken
b8bc432175 feat(soak): artifacts, graceful shutdown, health probes, smoke script, v3.3.4
Batched remaining harness tasks (27-30, 33):

Task 27 — Artifact capture on failure: screenshots, HTML snapshots,
game state JSON, and console error tails are captured into
tests/soak/artifacts/<run-id>/ when a scenario throws. Successful
runs get a summary.json. Old runs (>7d) are pruned on startup.

Task 28 — Graceful shutdown: first SIGINT/SIGTERM flips the abort
signal (scenarios finish current turn then unwind). 10s after, a
hard-kill fires if cleanup hangs. Double Ctrl-C = immediate exit.
Exit codes: 0 success, 1 errors, 2 interrupted.

Task 29 — Periodic health probes: every 30s GET /health against the
target server. Three consecutive failures abort the run with
health_fatal, preventing staging outages from being misattributed
to harness bugs. Corrected endpoint from /api/health to /health
per server/routers/health.py.

Task 30 — Smoke test script: tests/soak/scripts/smoke.sh, a 60s
end-to-end canary that health-probes the target, seeds if needed,
and runs one minimal populate game.

Task 33 — Version bump to v3.3.4: both index.html footers (was
v3.1.6), new footer added to admin.html (had none), pyproject.toml.

Also fixes discovered during stress testing:
- SessionPool sets baseURL on all contexts so relative goto('/')
  resolves correctly between games (was "invalid URL" error)
- RoomCoordinator key is now unique per game-start (Date.now
  suffix) so Deferred promises don't carry stale room codes from
  previous games

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-11 22:57:15 -04:00
adlee-was-taken
d3b468575b feat(soak): per-room watchdog + heartbeat wiring + multi-game lobby fix
Watchdog class with 4 Vitest tests (27 total now), wired into
ctx.heartbeat in the runner. One watchdog per room with a 60s
timeout; firing logs an error, marks the room's dashboard tile
as errored, and triggers the abort signal so the scenario unwinds.
Watchdogs are explicitly stopped in the runner's finally block
so pending timers don't keep the node process alive on exit.

Also fixes a multi-game bug discovered during stress scenario
verification: after a game ends sessions stay parked on the
game_over screen, which hides the lobby and makes a subsequent
#create-room-btn click time out. runOneMultiplayerGame now
navigates every session to / before each game — localStorage
auth persists so nothing re-logs in.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-11 21:52:49 -04:00
adlee-was-taken
921a6ad984 feat(soak): stress scenario with chaos injection
Rapid short games with a parallel chaos loop that has a 5% per-turn
chance of firing one of:
  - rapid_clicks: 5 quick clicks at the player's own cards
  - tab_blur:    window blur/focus event pair
  - brief_offline: 300ms network outage via context.setOffline

Chaos counts roll up into ScenarioResult.customMetrics.chaos_fired.

Important detail: chaos loop has a 3-second initial delay so room
creation, joiners, and game start can complete without interference.
Chaos during lobby setup (especially brief_offline) was causing
#create-room-btn to go unstable.

Verified: stress smoke with --games-per-room=3, 4 accounts + 1 CPU,
first game completed with 37 turns and chaos events fired across all
three event types.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-11 21:49:30 -04:00
adlee-was-taken
c307027dd0 feat(soak): --watch=tiled launches N headed host windows
SessionPool accepts headedHostCount; when > 0 it launches a second
Chromium in headed mode and creates the first N sessions in it.
Joiners (sessions N..count) stay headless in the main browser.

Each headed context gets a 960×900 viewport — tall enough to show
the full game table (deck + opponent row + own 2×3 card grid +
status area) without clipping. Horizontal tiling still fits two
windows side-by-side on a 1920-wide display.

window.moveTo is kept as a best-effort tile-placement hint, but
viewport from newContext() is what actually sizes the window
(window.resizeTo is a no-op on modern Chromium / Wayland).

Verified: 1-room tiled run plays a full game cleanly; 2-room
parallel tiled had one window get closed mid-run, which is
consistent with a user manually dismissing a window — tiled mode
is a best-effort hands-on debugging aid, not an automation mode.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-11 21:01:21 -04:00
adlee-was-taken
21fe53eaf7 feat(soak): click-to-watch live video via CDP screencast
Runner creates a Screencaster before sessions are acquired, then
wires its start/stop into DashboardServer.onStartStream / onStopStream
after sessions exist (the handlers close over a sessionsByKey map).
Clicking a player tile in the dashboard starts a CDP screencast on
that session's page, forwards JPEG frames as WS "frame" messages.
Closing the modal (or disconnecting the WS) stops all screencasts.

Verified end-to-end: programmatically connected WS, sent start_stream,
received 5 frames (13.7KB each), sent stop_stream, screencast_stopped
log line fired, run completed cleanly.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-11 20:54:52 -04:00
adlee-was-taken
34ce7d1d32 feat(soak): Screencaster — CDP Page.startScreencast wrapper
Attach/detach CDP sessions per Playwright Page, start/stop JPEG
screencasts with configurable quality and frame rate, forward each
frame to a callback. Used by the dashboard for click-to-watch
live video (wired in Task 23).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-11 19:06:17 -04:00
adlee-was-taken
796de876b7 feat(soak): wire --watch=dashboard in runner
Starts DashboardServer on port 7777 (or --dashboard-port), uses its
reporter as ctx.dashboard, auto-opens the URL via xdg-open/open/start.
Cleans up on exit. WS client connections logged as info events so
you can see when the browser attaches.

Verified: 2-account populate run with --watch=dashboard serves the
static page on :7777, accepts WS connections, cleanly shuts down
when the run completes.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-11 19:05:44 -04:00
adlee-was-taken
a35e789eb9 feat(soak): dashboard status grid UI
Static HTML page served by DashboardServer. Renders the 2×2 room
grid with progress bars and player tiles, subscribes to WS events,
updates tiles live. Click-to-watch modal is wired but receives
frames once the CDP screencaster ships in Task 22.

Adds escapeHtml() on all user-controlled strings (roomId, player
key) — not strictly needed for our trusted bot traffic but cheap
XSS hardening against future scenarios that accept user input.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-11 18:55:21 -04:00
adlee-was-taken
9d1d4f899b feat(soak): DashboardServer — vanilla http + ws
Serves one static HTML page, accepts WS connections, broadcasts
room_state/log/metric messages to all clients. Replays current
state to late-joining clients so refreshing the dashboard during
a run shows the right grid. Exposes a reporter() method that
returns a DashboardReporter scenarios can call without knowing
about sockets.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-11 18:53:59 -04:00
adlee-was-taken
d1688aae0b feat(soak): runner.ts end-to-end with --watch=none
First full end-to-end milestone: parses CLI, builds SessionPool +
RoomCoordinator, loads a scenario by name, runs it, reports results,
cleans up. Watch modes other than "none" log a warning and fall back
until Tasks 19-24 implement them.

Smoke test passed against local dev:
  bun run soak -- --scenario=populate --accounts=2 --rooms=1
    --cpus-per-room=0 --games-per-room=1 --holes=1 --watch=none
  → Games completed: 1, Errors: 0, Duration: 78.2s, exit 0

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-11 18:52:08 -04:00
adlee-was-taken
a6a276b509 fix(bot): GolfBot handles authenticated sessions + num-decks stepper
Two small fixes to tests/e2e/bot/golf-bot.ts needed to run the bot
from the soak harness with authenticated accounts:

1. createGame and joinGame now check whether #player-name is
   visible before filling it. Authenticated sessions hide that
   input (the server uses the logged-in username); guest sessions
   still fill it as before. Existing e2e tests behave identically
   since they register guests who always see the input.

2. startGame's 'decks' option was calling selectOption on
   #num-decks, which is a hidden input inside a stepper widget,
   not a <select>. Replaced with stepper-click logic that
   increments/decrements until the hidden input matches the
   target value.

End-to-end verified via the soak runner: 2 authenticated sessions
played a complete 1-hole game against local dev, 26 turns, exit 0.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-11 18:52:07 -04:00
adlee-was-taken
6df81e6f8d feat(soak): CLI parsing + config precedence
parseArgs pulls --scenario/--rooms/--watch/etc from argv,
mergeConfig layers scenarioDefaults → env → CLI so CLI flags
always win. 12 Vitest unit tests cover both parse happy/edge
paths and the 4-way merge precedence matrix.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-11 17:25:04 -04:00
adlee-was-taken
2c20b6c7b5 feat(soak): populate scenario + scenario registry
Partitions sessions into N rooms, runs gamesPerRoom games per room
in parallel via Promise.allSettled so a failure in one room never
unwinds the others. Errors roll up into ScenarioResult.errors.

Verified via tsx: listScenarios() returns [populate], getScenario()
resolves by name and returns undefined for unknown names.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-11 17:23:56 -04:00
adlee-was-taken
722934bdf2 feat(soak): shared runOneMultiplayerGame helper
Encapsulates the host-creates/joiners-join/loop-until-done flow so
populate and stress scenarios don't duplicate it. Honors abort
signal and a max-duration timeout, heartbeats on every turn.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-11 17:23:00 -04:00
adlee-was-taken
2a86b3cc54 feat(soak): scripts/seed-accounts.ts CLI wrapper
Thin standalone entry for pre-seeding N accounts before the first
harness run. Wraps SessionPool.seed and writes .env.stresstest.

End-to-end verified: ran against local dev with --count=4, all 4
accounts landed in the DB with is_test_account=TRUE, cred file
written with correct format.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-11 17:22:10 -04:00
adlee-was-taken
3bc0270eb9 feat(soak): SessionPool — seed, login, acquire contexts
Owns BrowserContexts, seeds via POST /api/auth/register with the
invite code on cold start, warm-starts via localStorage injection of
the cached JWT, falls back to POST /api/auth/login if the token is
rejected. Exposes acquire(n) for scenarios.

Infrastructure changes needed to import the real GolfBot class from
tests/e2e/bot/ without the Task-10 structural-interface workaround:
 - Add @playwright/test as devDep so value-imports in e2e/bot/*.ts
   resolve at runtime (Page/Locator/expect are pulled even as types)
 - Remove rootDir from tsconfig so TS follows cross-package imports;
   add a paths entry so TS can resolve @playwright/test from the soak
   package's node_modules when compiling files under tests/e2e/bot
 - Drop the local GolfBot structural interface + its placeholder
   GamePhase/StartGameOptions/TurnResult types; re-export the real
   class from types.ts

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-11 17:19:39 -04:00
adlee-was-taken
066e482f06 feat(soak): structured JSONL logger with child contexts
Single file, no transport, writes one JSON line per call to stdout.
Child loggers inherit parent meta so scenarios can bind room/game
context once and forget about it.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-11 17:12:27 -04:00
adlee-was-taken
02642840da feat(soak): RoomCoordinator with host→joiners handoff
Lazy Deferred per roomId with a timeout on await. Lets concurrent
joiner sessions block until their host announces the room code
without polling or page scraping.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-11 17:11:05 -04:00
adlee-was-taken
1565046ab7 feat(soak): core types + Deferred primitive
Establishes the Scenario/Session/Logger/DashboardReporter contracts
the rest of the harness builds on. Deferred is the building block
for RoomCoordinator's host→joiners handoff.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-11 17:09:36 -04:00
adlee-was-taken
5478a4299e feat(soak): scaffold tests/soak package
Placeholder runner, tsconfig with @bot alias to tests/e2e/bot,
gitignored .env.stresstest + artifacts. Real behavior follows
in Task 10 onward.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-11 08:19:09 -04:00
adlee-was-taken
9fc6b83bba v3.0.0: V3 features, server refactoring, and documentation overhaul
- Extract WebSocket handlers from main.py into handlers.py
- Add V3 feature docs (dealer rotation, dealing animation, round end reveal,
  column pair celebration, final turn urgency, opponent thinking, score tallying,
  card hover/selection, knock early drama, column pair indicator, swap animation
  improvements, draw source distinction, card value tooltips, active rules context,
  discard pile history, realistic card sounds)
- Add V3 refactoring docs (ai.py, main.py/game.py, misc improvements)
- Add installation guide with Docker, systemd, and nginx setup
- Add helper scripts (install.sh, dev-server.sh, docker-build.sh)
- Add animation flow diagrams documentation
- Add test files for handlers, rooms, and V3 features
- Add e2e test specs for V3 features
- Update README with complete project structure and current tech stack
- Update CLAUDE.md with full architecture tree and server layer descriptions
- Update .env.example to reflect PostgreSQL (remove SQLite references)
- Update .gitignore to exclude virtualenv files, .claude/, and .db files
- Remove tracked virtualenv files (bin/, lib64, pyvenv.cfg)
- Remove obsolete game_log.py (SQLite) and games.db

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-14 10:03:45 -05:00
Aaron D. Lee
6950769bc3 Version 2.0.0: Animation fixes, timing improvements, and E2E test suite
Animation fixes:
- Fix held card positioning bug (was appearing at bottom of page)
- Fix discard pile blank/white flash on turn transitions
- Fix blank card at round end by skipping animations during round_over/game_over
- Set card content before triggering flip animation to prevent flash
- Center suit symbol on 10 cards

Timing improvements:
- Reduce post-discard delay from 700ms to 500ms
- Reduce post-swap delay from 1800ms to 1000ms
- Speed up swap flip animation from 1150ms to 550ms
- Reduce CPU initial thinking delay from 150-250ms to 80-150ms
- Pause now happens after swap completes (showing result) instead of before

E2E test suite:
- Add Playwright-based test bot that plays full games
- State parser extracts game state from DOM for validation
- AI brain ports decision logic for automated play
- Freeze detector monitors for UI hangs
- Visual validator checks CSS states
- Full game, stress, and visual test specs

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-29 18:33:28 -05:00