Design for a standalone Playwright-based soak runner that drives 16 authenticated browser sessions across 4 concurrent rooms to populate staging scoreboards and hunt stability bugs. Architected as a pluggable scenario harness so future UX test scenarios (reconnect, invite flow, admin workflows, mobile) slot in cleanly. Also gitignores .superpowers/ (brainstorming session artifacts). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
28 KiB
Multiplayer Soak & UX Test Harness — Design
Date: 2026-04-10 Status: Design approved, pending implementation plan
Context
Golf Card Game is a real-time multiplayer WebSocket application with event-sourced game state, a leaderboard system, and an aggressive animation pipeline. Current test coverage is:
server/— pytest unit/integration teststests/e2e/specs/— Playwright tests exercising single-context flows (full game, stress with rapid clicks, visual regression, v3 features)
What's missing: a way to exercise the system with many concurrent authenticated users playing real multiplayer games for long durations. We can't currently:
- Populate staging scoreboards with realistic game history for demos and visual verification
- Hunt race conditions, WebSocket leaks, and room cleanup bugs under sustained concurrent load
- Validate multiplayer UX end-to-end across rooms without manual coordination
- Exercise authentication, room lifecycle, and stats aggregation as a cohesive system
This spec defines a multi-scenario soak and UX test harness: a standalone Playwright-based runner that drives 16 authenticated browser sessions across 4 concurrent rooms playing many games against each other (plus optional CPU opponents). It starts as a soak tool with two scenarios (populate, stress) and grows into the project's general-purpose multi-user UX test platform.
Goals
- Scoreboard population — run long multi-round games against staging with varied CPU personalities to produce realistic scoreboard data
- Stability stress — run rapid short games with chaos injection to surface race conditions and cleanup bugs
- Extensibility — new scenarios (reconnect, invite flow, admin workflow, mobile) slot in without runner changes
- Watchability — a dashboard mode with click-to-watch live video of any player, usable for demos and debugging
- Per-run isolation — test account traffic must be cleanly separable from real user traffic in stats queries
Non-goals
- Replacing the existing
tests/e2e/specs/Playwright tests (they serve a different purpose — single-context edge cases) - Distributed runner across multiple machines
- Concurrent scenario execution (one scenario per run for MVP)
- Grafana/OTEL integration
- Auto-promoting findings to regression tests
Constraints
- Staging auth gate — staging runs
INVITE_ONLY=true; seeding must go through the register endpoint with an invite code - Invite code
5VC2MCCN— provisioned with 16 uses, used once per test account on first-ever run, cached afterward - Per-IP rate limiting —
DAILY_SIGNUPS_PER_IP=20on prod, lower default elsewhere; seeding must stay within budget - Room idle cleanup —
ROOM_IDLE_TIMEOUT_SECONDS=300means the scenario must keep rooms active or tolerate cleanup cascades - Existing bot code —
tests/e2e/bot/golf-bot.tsalready providescreateGame,joinGame,addCPU,playTurn,playGame; the harness reuses it verbatim
Architecture
Module layout
runner.ts (entry)
├─ SessionPool owns 16 BrowserContexts, seeds/logs in, allocates
├─ Scenario pluggable interface, per-scenario file
├─ RoomCoordinator host→joiners room-code handoff via Deferred<string>
├─ Dashboard (optional) HTTP + WS server, status grid + click-to-watch video
└─ GolfBot (reused) tests/e2e/bot/golf-bot.ts, unchanged
Default: one browser, 16 contexts (lowest RAM, fastest startup). WATCH=tiled is the exception — it launches two browsers, one headed (hosts) and one headless (joiners), because Chromium's headed/headless flag is browser-scoped, not context-scoped. See the tiled implementation detail below.
Location
New sibling directory tests/soak/ — does not modify tests/e2e/. Shares GolfBot via direct import from ../e2e/bot/.
Rationale: Playwright Test is designed for short isolated tests. A single test() running 16 contexts for hours fights the test model (worker limits, all-or-nothing failure, single giant trace file). A standalone node script gives first-class CLI flags, full control over the event loop, clean home for the dashboard server, and reuses the GolfBot class unchanged. Existing tests/e2e/specs/stress.spec.ts stays as-is for single-context edge cases.
Components
SessionPool
Owns the lifecycle of 16 authenticated BrowserContexts.
Responsibilities:
- On first run: register 16 accounts via
POST /api/auth/registerwith invite code5VC2MCCN, cache credentials to.env.stresstest - On subsequent runs: read cached credentials, create contexts, inject auth into each (localStorage token, or re-login via cached password if token rejected)
- Expose
acquire({ count }): Promise<Session[]>— scenarios request N authenticated sessions without caring how they got there - On scenario completion: close all contexts cleanly
Session shape:
interface Session {
context: BrowserContext;
page: Page;
bot: GolfBot;
account: Account; // { username, password, token }
key: string; // stable identifier, e.g., "soak_07"
}
.env.stresstest format (gitignored, local-only, plaintext — this is a test tool):
SOAK_ACCOUNT_00=soak_00_a7bx:Hunter2!xK9mQ:eyJhbGc...
SOAK_ACCOUNT_01=soak_01_c3pz:Kc82!wQm4Rt:eyJhbGc...
...
SOAK_ACCOUNT_15=soak_15_m9fy:Px7!eR4sTn2:eyJhbGc...
Line format: username:password:token. Password kept so the pool can recover from token expiry automatically.
Scenario interface
export interface ScenarioNeeds {
accounts: number;
rooms?: number;
cpusPerRoom?: number;
}
export interface ScenarioContext {
config: ScenarioConfig; // CLI flags merged with scenario defaults
sessions: Session[]; // pre-authenticated, pre-navigated
coordinator: RoomCoordinator;
dashboard: DashboardReporter; // no-op when watch mode doesn't use it
logger: Logger;
signal: AbortSignal; // graceful shutdown
heartbeat(roomId: string): void; // resets the per-room watchdog
}
export interface ScenarioResult {
gamesCompleted: number;
errors: ScenarioError[];
durationMs: number;
customMetrics?: Record<string, number>;
}
export interface Scenario {
name: string;
description: string;
defaultConfig: ScenarioConfig;
needs: ScenarioNeeds;
run(ctx: ScenarioContext): Promise<ScenarioResult>;
}
Scenarios are plain objects exported as default from files in tests/soak/scenarios/. The runner discovers them via a registry (scenarios/index.ts) that maps name → module. No filesystem scanning, no magic.
RoomCoordinator
~30 lines. Solves host→joiners room-code handoff:
class RoomCoordinator {
private rooms = new Map<string, Deferred<string>>();
announce(roomId: string, code: string) { this.get(roomId).resolve(code); }
async await(roomId: string): Promise<string> { return this.get(roomId).promise; }
private get(roomId: string) {
if (!this.rooms.has(roomId)) this.rooms.set(roomId, deferred());
return this.rooms.get(roomId)!;
}
}
Usage:
// Host
const code = await host.bot.createGame(host.account.username);
coordinator.announce('room-1', code);
// Joiners (concurrent)
const code = await coordinator.await('room-1');
await joiner.bot.joinGame(code, joiner.account.username);
No polling, no sleeps, no cross-page scraping.
Dashboard
Optional — only instantiated when WATCH=dashboard.
Server side (dashboard/server.ts): vanilla http + ws module. Serves a single static HTML page, accepts WebSocket connections, relays messages between scenarios and the browser.
Client side (dashboard/index.html + dashboard.js): 2×2 room grid, per-player tiles with live status (current player, score, held card, phase, moves), progress bars per hole, activity log at the bottom. No framework, ~300 lines total.
Click-to-watch: clicking a player tile sends start_stream(sessionKey) over WS. The runner attaches a CDP session to that player's page via context.newCDPSession(page), calls Page.startScreencast with {format: 'jpeg', quality: 60, maxWidth: 640, maxHeight: 360, everyNthFrame: 2}, and forwards each Page.screencastFrame event to the dashboard as { sessionKey, jpeg_b64 }. The dashboard renders it into an <img> that swaps src on each frame.
Returning to the grid sends stop_stream(sessionKey) and the runner detaches the CDP session. On WS disconnect, all active screencasts stop. This keeps CPU cost zero except while someone is actively watching.
DashboardReporter interface exposed to scenarios:
interface DashboardReporter {
update(roomId: string, state: Partial<RoomState>): void;
log(level: 'info'|'warn'|'error', msg: string, meta?: object): void;
incrementMetric(name: string, by?: number): void;
}
When WATCH is not dashboard, all three methods are no-ops; structured logs still go to stdout.
Runner
runner.ts is the CLI entry point. Parses flags, resolves config precedence, launches browser(s), instantiates SessionPool + RoomCoordinator + (optional) Dashboard, loads the requested scenario by name, executes it, reports results, cleans up.
Scenarios
Scenario 1: populate
Goal: produce realistic scoreboard data for staging demos.
Config:
{
name: 'populate',
description: 'Long multi-round games to populate scoreboards',
needs: { accounts: 16, rooms: 4, cpusPerRoom: 1 },
defaultConfig: {
gamesPerRoom: 10,
holes: 9,
decks: 2,
cpuPersonalities: ['Sofia', 'Marcus', 'Kenji', 'Priya'],
thinkTimeMs: [800, 2200],
interGamePauseMs: 3000,
},
}
Shape: 4 rooms × 4 accounts + 1 CPU each. Each room runs gamesPerRoom sequential games. Inside a room: host creates game → joiners join → host adds CPU → host starts game → all sessions loop on isMyTurn() + playTurn() with randomized human-like think time between turns. Between games, rooms pause briefly to mimic natural pacing.
Scenario 2: stress
Goal: hunt race conditions and stability bugs.
Config:
{
name: 'stress',
description: 'Rapid short games for stability & race condition hunting',
needs: { accounts: 16, rooms: 4, cpusPerRoom: 2 },
defaultConfig: {
gamesPerRoom: 50,
holes: 1,
decks: 1,
thinkTimeMs: [50, 150],
interGamePauseMs: 200,
chaosChance: 0.05,
},
}
Shape: same as populate but tight loops, 1-hole games, and a chaos injector that fires with 5% probability per turn. Chaos events:
- Rapid concurrent clicks on multiple cards
- Random tab-navigation away and back
- Simultaneous click on card + discard button
- Brief WebSocket drop via Playwright's
context.setOffline()followed by reconnect
Each chaos event is logged with enough context to reproduce (room, player, turn, event type).
Future scenarios (not MVP, design anticipates them)
reconnect— 2 accounts, deliberate mid-game disconnect, verify recoveryinvite-flow— 0 accounts (fresh signups), exercise invite request → approval → first-game pipelineadmin-workflow— 1 admin account, drive the admin panelmobile-populate— reusespopulatewithdevices['iPhone 13']context optionsreplay-viewer— watches completed games via the replay UI
Each is a new file in tests/soak/scenarios/, zero runner changes.
Data flow
Cold start (first-ever run)
- Runner reads
.env.stresstest→ file missing SessionPool.seedAccounts():- For
iin0..15:POST /api/auth/registerwith{ username, password, email, invite_code: '5VC2MCCN' } - Receive
{ user, token, expires_at }, write to.env.stresstest
- For
- Server sets
is_test_account=trueautomatically because the invite code hasmarks_as_test=true(see Server changes) - Runner proceeds to normal startup
Warm start (subsequent runs)
- Runner reads
.env.stresstest→ 16 entries SessionPoolcreates 16BrowserContexts- For each context: inject token into localStorage using the key the client app reads on load (resolved during implementation by inspecting
client/app.js; see Open Questions) - Each session navigates to
/and lands post-auth - If any token is rejected (401), pool silently re-logs in via cached password and refreshes the token in
.env.stresstest
Seeding: explicit script vs automatic fallback
Two paths to the same result, for flexibility:
- Preferred: explicit
npm run seed— runsscripts/seed-accounts.tsonce during bring-up. Gives clear feedback, fails loudly on rate limits or network issues, lets you verify the accounts exist before a real run. - Fallback: auto-seed on cold start — if
runner.tsstarts and.env.stresstestis missing,SessionPoolinvokes the same seeding logic transparently. Useful for CI or fresh clones where nobody ran the explicit step.
Both paths share the same code in core/session-pool.ts; the script is a thin CLI wrapper around SessionPool.seedAccounts(). Documented in tests/soak/README.md with "run npm run seed first" as the happy path.
Room code handoff
Host session calls createGame → receives room code → coordinator.announce(roomId, code). Joiner sessions await coordinator.await(roomId) → receive code → call joinGame. All in-process, no polling.
Watch modes
| Mode | Flag | Rendering | When to use |
|---|---|---|---|
none |
WATCH=none |
Pure headless, JSONL stdout | CI, overnight unattended |
dashboard |
WATCH=dashboard (default) |
HTML status grid + click-to-watch live video | Interactive runs, demos, debugging |
tiled |
WATCH=tiled |
4 native Chromium windows positioned 2×2 | Hands-on power-user debugging |
tiled implementation detail
Two browsers launched: one headed (headless: false, slowMo: 50) for the 4 host contexts, one headless for the 12 joiner contexts. Host windows positioned via page.evaluate(() => window.moveTo(x, y)) after load. Grid computed from screen size with a default of 1920×1080.
Server-side changes
All changes are additive and fit the existing inline migration pattern in server/stores/user_store.py.
1. Schema
Two new columns + one partial index:
ALTER TABLE users_v2 ADD COLUMN IF NOT EXISTS is_test_account BOOLEAN DEFAULT FALSE;
CREATE INDEX IF NOT EXISTS idx_users_v2_is_test_account ON users_v2(is_test_account)
WHERE is_test_account = TRUE;
ALTER TABLE invite_codes ADD COLUMN IF NOT EXISTS marks_as_test BOOLEAN DEFAULT FALSE;
Partial index because ~99% of rows will be FALSE; we only want to accelerate the "show test accounts" admin queries, not pay index-maintenance cost on every normal write.
2. Register flow propagates the flag
In services/auth_service.py, after resolving the invite code, read marks_as_test and pass through to user_store.create_user:
invite = await admin_service.get_invite_code(invite_code)
is_test = bool(invite and invite.marks_as_test)
user = await user_store.create_user(
username=..., password_hash=..., email=...,
is_test_account=is_test,
)
Users signing up without an invite or with a non-test invite are unaffected.
3. One-time: flag 5VC2MCCN as test-seed
Executed once against staging (and any other environment the harness runs against):
UPDATE invite_codes SET marks_as_test = TRUE WHERE code = '5VC2MCCN';
Documented in the seeder script as a comment, and in tests/soak/README.md as a bring-up step. No admin UI for flagging invites as test-seed in MVP — add later if needed.
4. Stats filtering
Add include_test: bool = False parameter to stats queries in services/stats_service.py:
async def get_leaderboard(self, limit=50, include_test=False):
query = """
SELECT ... FROM player_stats ps
JOIN users_v2 u ON u.id = ps.user_id
WHERE ($1 OR NOT u.is_test_account)
ORDER BY ps.total_points DESC
LIMIT $2
"""
return await conn.fetch(query, include_test, limit)
Router in routers/stats.py exposes include_test as an optional query parameter. Default False — real users visiting the site never see soak traffic. Admin panel and debugging views pass ?include_test=true.
Same treatment for:
get_player_stats(user_id, include_test)— gates individual profile lookupsget_recent_games(include_test)— hides games where any participant is a test account by default
5. Admin panel surfacing
Small additions to client/admin.html + client/admin.js:
- User list: "Test" badge column for
is_test_account=truerows - Invite codes: "Test-seed" indicator next to
marks_as_test=truecodes - Leaderboard + user list: "Include test accounts" toggle → passes
?include_test=true
Out of scope (server-side)
- New admin endpoint for marking existing accounts as test
- Admin UI for flagging invites as test-seed at creation time
- Separate "test stats only" aggregation (admins invert their mental filter)
test_only=truequery mode
Error handling
Failure taxonomy
| Category | Example | Strategy |
|---|---|---|
| Recoverable game error | Animation flag stuck, click missed target | Log, continue, bot retries via existing GolfBot fallbacks |
| Recoverable session error | WS disconnect for one player, token expires | Reconnect session, rejoin game if possible, abort that room only if unrecoverable |
| Unrecoverable room error | Room stuck >60s, impossible state | Kill the room, capture artifacts, let other rooms continue |
| Fatal runner error | Staging unreachable, invite code exhausted, OOM | Stop everything cleanly, dump summary, exit non-zero |
Core principle: per-room isolation. A failure in room 3 never unwinds rooms 1, 2, 4. Each room runs in its own Promise.allSettled branch.
Per-room watchdog
Each room gets a watchdog that resets on every ctx.heartbeat(roomId) call. If a room hasn't heartbeat'd in 60s, the watchdog captures artifacts, aborts that room only, and the runner continues with the remaining rooms.
Scenarios call heartbeat at each significant progress point (turn played, game started, game finished). The helper DashboardReporter.update() internally calls heartbeat as a convenience, so scenarios that use the dashboard reporter get watchdog resets for free. Scenarios that run with WATCH=none still need to call heartbeat explicitly at least once per 60s — a single call at the top of the per-turn loop is sufficient.
Artifact capture on failure
Captured per-room into tests/soak/artifacts/<run-id>/<room-id>/:
- Screenshot of every context in the affected room
page.content()HTML snapshot per context- Last 200 console log messages per context (already captured by
GolfBot) - Game state JSON from the state parser
- Error stack trace
- Scenario config snapshot
Directory structure:
tests/soak/artifacts/
2026-04-10-populate-14.23.05/
run.log # structured JSONL, full run
summary.json # final stats
room-0/
screenshot-host.png
screenshot-joiner-1.png
page-host.html
console.txt
state.json
error.txt
Artifacts directory is gitignored. Runs older than 7 days auto-pruned on startup.
Structured logging
Single logger, JSON Lines to stdout, pretty mirror to the dashboard. Every log line carries run_id, scenario, room (when applicable), and timestamp. Grep-friendly and jq-friendly.
Graceful shutdown
SIGINT / SIGTERM trigger shutdown via AbortController:
- Global
AbortSignalflips to aborted - Scenarios check
ctx.signal.abortedin loops, finish current turn, exit cleanly - Runner waits up to 10s for scenarios to unwind
- After 10s, force-closes all contexts + browser
- Writes final
summary.jsonand prints results - Exit codes:
0= all rooms completed target games,1= any room failed,2= interrupted before completion
Double Ctrl-C = immediate force exit.
Periodic health probes
Every 30s during a run:
GET /api/healthagainst the target server- Count of open browser contexts vs expected
- Runner memory usage
If /api/health fails 3 consecutive times, declare fatal error, capture artifacts, stop. This prevents staging outages from being misattributed to bot bugs.
Retry policy
Retry only at the session level, never at the scenario level.
- WS drop → reconnect session, rejoin game if possible, 3 attempts max
- Token rejected → re-login via cached password, 1 attempt
- Click missed → existing
GolfBotretry (already built in)
Never retry: whole games, whole scenarios, fatal errors.
Cleanup guarantees
Three cleanup points, all going through the same cleanup() function wrapped in top-level try/finally:
- Success — close contexts, close browsers, flush logs, write summary
- Exception — capture artifacts first, then close contexts, flush logs, write partial summary
- Signal interrupt — graceful shutdown as above, best-effort artifact capture
File layout
tests/soak/
├── package.json # standalone (separate from tests/e2e/)
├── tsconfig.json
├── README.md # quickstart + flag reference + bring-up steps
├── .env.stresstest.example # template (real file gitignored)
│
├── runner.ts # CLI entry — `npm run soak`
├── config.ts # CLI parsing + defaults merging
│
├── core/
│ ├── session-pool.ts
│ ├── room-coordinator.ts
│ ├── screencaster.ts # CDP attach/detach on demand
│ ├── watchdog.ts
│ ├── artifacts.ts
│ ├── logger.ts
│ └── types.ts # Scenario, Session, ScenarioContext interfaces
│
├── scenarios/
│ ├── populate.ts
│ ├── stress.ts
│ └── index.ts # name → module registry
│
├── dashboard/
│ ├── server.ts # http + ws
│ ├── index.html
│ ├── dashboard.css
│ └── dashboard.js
│
├── scripts/
│ ├── seed-accounts.ts # one-shot seeding
│ ├── reset-accounts.ts # future: wipe test account stats
│ └── smoke.sh # bring-up validation
│
└── artifacts/ # gitignored, auto-pruned 7d
└── <run-id>/...
Dependencies
New tests/soak/package.json:
{
"name": "golf-soak",
"private": true,
"scripts": {
"soak": "tsx runner.ts",
"soak:populate": "tsx runner.ts --scenario=populate",
"soak:stress": "tsx runner.ts --scenario=stress",
"seed": "tsx scripts/seed-accounts.ts",
"smoke": "scripts/smoke.sh"
},
"dependencies": {
"playwright-core": "^1.40.0",
"ws": "^8.16.0"
},
"devDependencies": {
"tsx": "^4.7.0",
"@types/ws": "^8.5.0",
"@types/node": "^20.10.0",
"typescript": "^5.3.0"
}
}
Three runtime deps: playwright-core (already in tests/e2e/), ws (WebSocket for dashboard), tsx (dev-only, runs TypeScript directly). No HTTP framework, no bundler, no build step.
CLI flags
--scenario=populate|stress required
--accounts=<n> total sessions (default: scenario.needs.accounts)
--rooms=<n> default from scenario.needs
--cpus-per-room=<n> default from scenario.needs
--games-per-room=<n> default from scenario.defaultConfig
--holes=<n> default from scenario.defaultConfig
--watch=none|dashboard|tiled default: dashboard
--dashboard-port=<n> default: 7777
--target=<url> default: TEST_URL env or http://localhost:8000
--run-id=<string> default: ISO timestamp
--list print available scenarios and exit
--dry-run validate config without running
Derived: accounts-per-room = accounts / rooms. Must divide evenly; runner errors out with a clear message if not.
Config precedence: CLI flags → environment variables → scenario defaultConfig → runner defaults.
Meta-testing
Unit tests (Vitest, minimal)
room-coordinator.ts— announce/await correctness, timeout behaviorwatchdog.ts— fires on timeout, resets on heartbeat, cancels cleanlyconfig.ts— CLI precedence, required field validation
Bring-up smoke test (tests/soak/scripts/smoke.sh)
Runs against local dev server with minimum viable config:
TEST_URL=http://localhost:8000 \
npm run soak -- \
--scenario=populate \
--accounts=2 \
--rooms=1 \
--cpus-per-room=0 \
--games-per-room=1 \
--holes=1 \
--watch=none
Exit 0 = full harness works end-to-end. ~30 seconds. Run after any change.
Manual validation checklist
Documented in tests/soak/CHECKLIST.md:
- Seed 16 accounts against staging using the invite code
--scenario=populate --rooms=1 --games-per-room=1completes cleanly--scenario=populate --rooms=4 --games-per-room=1— 4 rooms in parallel, no cross-contamination--watch=dashboardopens browser, grid renders, progress updates- Click a player tile → live video appears, Esc → stops
--watch=tiledopens 4 browser windows in 2×2 grid- Ctrl-C during a run → graceful shutdown, summary printed, exit 2
- Kill the target server mid-run → runner detects, captures artifacts, exits 1
- Stats query
?include_test=falsehides soak accounts,?include_test=trueshows them - Full stress run (
--scenario=stress --games-per-room=10) — no console errors, all rooms complete
Implementation order
Sequenced so each step produces something demonstrable before moving on. The writing-plans skill will break this into concrete tasks.
- Server-side changes — schema alters, register flow, stats filter, admin badge. Independent, ships first, unblocks local testing.
- Scaffold
tests/soak/— package.json, tsconfig, core/types, logger. No behavior yet. SessionPool+scripts/seed-accounts.ts— end-to-end auth: seed, cache, load, validate login.RoomCoordinator+ minimalpopulatescenario body — proves multi-room orchestration.runner.ts— CLI, config merging, scenario loading, top-level error handling.--watch=noneworks — runs against local dev, produces clean logs, exits 0. First end-to-end milestone.--watch=dashboardstatus grid — HTML + WS + tile updates (no video yet).- CDP screencast / click-to-watch — the live video feature.
--watch=tiledmode — native windows viapage.evaluate(window.moveTo).stressscenario — chaos injection, rapid games.- Failure handling — watchdog, artifact capture, graceful shutdown.
- Smoke test script + CHECKLIST.md — validation.
- Run against staging for real — populate scoreboard, hunt bugs, report findings.
If step 6 takes longer than planned, steps 1–5 are still useful standalone.
Out of scope for MVP
- Mobile viewport scenarios (future
mobile-populate) - Reconnect-storm scenarios
- Admin workflow scenarios
- Concurrent scenario execution
- Distributed runner
- Grafana / OTEL / custom metrics push
- Test account stat reset tooling
- Auto-promoting stress findings into Playwright regression tests
- New admin endpoints for account marking
- Admin UI for flagging invites as test-seed
All of these are cheap to add later because the scenario interface and session pool don't presuppose them.
Open questions (to resolve during implementation)
- localStorage auth key — exact keys used by
client/app.jsto persist the JWT and user blob; verified by reading the file during step 3. - Chaos event set for
stressscenario — finalize which chaos events are in scope for MVP vs added incrementally (start with rapid clicks + tab nav +setOffline, add more as the server proves robust). - CDP screencast frame rate tuning — start at
everyNthFrame: 2(~15fps), adjust down if bandwidth/CPU is excessive on long runs. - Screen bounds detection for
tiledmode — default to 1920×1080, expose override via--tiled-bounds=WxH; auto-detect later if useful.