Files

adlee-was-taken 97036be319 docs: multiplayer soak & UX test harness design

Design for a standalone Playwright-based soak runner that drives 16
authenticated browser sessions across 4 concurrent rooms to populate
staging scoreboards and hunt stability bugs. Architected as a
pluggable scenario harness so future UX test scenarios (reconnect,
invite flow, admin workflows, mobile) slot in cleanly.

Also gitignores .superpowers/ (brainstorming session artifacts).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

2026-04-10 23:03:28 -04:00

28 KiB

Raw Blame History

Multiplayer Soak & UX Test Harness — Design

Date: 2026-04-10 Status: Design approved, pending implementation plan

Context

Golf Card Game is a real-time multiplayer WebSocket application with event-sourced game state, a leaderboard system, and an aggressive animation pipeline. Current test coverage is:

server/ — pytest unit/integration tests
tests/e2e/specs/ — Playwright tests exercising single-context flows (full game, stress with rapid clicks, visual regression, v3 features)

What's missing: a way to exercise the system with many concurrent authenticated users playing real multiplayer games for long durations. We can't currently:

Populate staging scoreboards with realistic game history for demos and visual verification
Hunt race conditions, WebSocket leaks, and room cleanup bugs under sustained concurrent load
Validate multiplayer UX end-to-end across rooms without manual coordination
Exercise authentication, room lifecycle, and stats aggregation as a cohesive system

This spec defines a multi-scenario soak and UX test harness: a standalone Playwright-based runner that drives 16 authenticated browser sessions across 4 concurrent rooms playing many games against each other (plus optional CPU opponents). It starts as a soak tool with two scenarios (populate, stress) and grows into the project's general-purpose multi-user UX test platform.

Goals

Scoreboard population — run long multi-round games against staging with varied CPU personalities to produce realistic scoreboard data
Stability stress — run rapid short games with chaos injection to surface race conditions and cleanup bugs
Extensibility — new scenarios (reconnect, invite flow, admin workflow, mobile) slot in without runner changes
Watchability — a dashboard mode with click-to-watch live video of any player, usable for demos and debugging
Per-run isolation — test account traffic must be cleanly separable from real user traffic in stats queries

Non-goals

Replacing the existing tests/e2e/specs/ Playwright tests (they serve a different purpose — single-context edge cases)
Distributed runner across multiple machines
Concurrent scenario execution (one scenario per run for MVP)
Grafana/OTEL integration
Auto-promoting findings to regression tests

Constraints

Staging auth gate — staging runs INVITE_ONLY=true; seeding must go through the register endpoint with an invite code
Invite code 5VC2MCCN — provisioned with 16 uses, used once per test account on first-ever run, cached afterward
Per-IP rate limiting — DAILY_SIGNUPS_PER_IP=20 on prod, lower default elsewhere; seeding must stay within budget
Room idle cleanup — ROOM_IDLE_TIMEOUT_SECONDS=300 means the scenario must keep rooms active or tolerate cleanup cascades
Existing bot code — tests/e2e/bot/golf-bot.ts already provides createGame, joinGame, addCPU, playTurn, playGame; the harness reuses it verbatim

Architecture

Module layout

runner.ts (entry)
  ├─ SessionPool          owns 16 BrowserContexts, seeds/logs in, allocates
  ├─ Scenario             pluggable interface, per-scenario file
  ├─ RoomCoordinator      host→joiners room-code handoff via Deferred<string>
  ├─ Dashboard (optional) HTTP + WS server, status grid + click-to-watch video
  └─ GolfBot (reused)     tests/e2e/bot/golf-bot.ts, unchanged

Default: one browser, 16 contexts (lowest RAM, fastest startup). WATCH=tiled is the exception — it launches two browsers, one headed (hosts) and one headless (joiners), because Chromium's headed/headless flag is browser-scoped, not context-scoped. See the tiled implementation detail below.

Location

New sibling directory tests/soak/ — does not modify tests/e2e/. Shares GolfBot via direct import from ../e2e/bot/.

Rationale: Playwright Test is designed for short isolated tests. A single test() running 16 contexts for hours fights the test model (worker limits, all-or-nothing failure, single giant trace file). A standalone node script gives first-class CLI flags, full control over the event loop, clean home for the dashboard server, and reuses the GolfBot class unchanged. Existing tests/e2e/specs/stress.spec.ts stays as-is for single-context edge cases.

Components

SessionPool

Owns the lifecycle of 16 authenticated BrowserContexts.

Responsibilities:

On first run: register 16 accounts via POST /api/auth/register with invite code 5VC2MCCN, cache credentials to .env.stresstest
On subsequent runs: read cached credentials, create contexts, inject auth into each (localStorage token, or re-login via cached password if token rejected)
Expose acquire({ count }): Promise<Session[]> — scenarios request N authenticated sessions without caring how they got there
On scenario completion: close all contexts cleanly

Session shape:

interface Session {
  context: BrowserContext;
  page: Page;
  bot: GolfBot;
  account: Account;  // { username, password, token }
  key: string;       // stable identifier, e.g., "soak_07"
}

.env.stresstest format (gitignored, local-only, plaintext — this is a test tool):

SOAK_ACCOUNT_00=soak_00_a7bx:Hunter2!xK9mQ:eyJhbGc...
SOAK_ACCOUNT_01=soak_01_c3pz:Kc82!wQm4Rt:eyJhbGc...
...
SOAK_ACCOUNT_15=soak_15_m9fy:Px7!eR4sTn2:eyJhbGc...

Line format: username:password:token. Password kept so the pool can recover from token expiry automatically.

Scenario interface

export interface ScenarioNeeds {
  accounts: number;
  rooms?: number;
  cpusPerRoom?: number;
}

export interface ScenarioContext {
  config: ScenarioConfig;        // CLI flags merged with scenario defaults
  sessions: Session[];           // pre-authenticated, pre-navigated
  coordinator: RoomCoordinator;
  dashboard: DashboardReporter;  // no-op when watch mode doesn't use it
  logger: Logger;
  signal: AbortSignal;           // graceful shutdown
  heartbeat(roomId: string): void;  // resets the per-room watchdog
}

export interface ScenarioResult {
  gamesCompleted: number;
  errors: ScenarioError[];
  durationMs: number;
  customMetrics?: Record<string, number>;
}

export interface Scenario {
  name: string;
  description: string;
  defaultConfig: ScenarioConfig;
  needs: ScenarioNeeds;
  run(ctx: ScenarioContext): Promise<ScenarioResult>;
}

Scenarios are plain objects exported as default from files in tests/soak/scenarios/. The runner discovers them via a registry (scenarios/index.ts) that maps name → module. No filesystem scanning, no magic.

RoomCoordinator

~30 lines. Solves host→joiners room-code handoff:

class RoomCoordinator {
  private rooms = new Map<string, Deferred<string>>();

  announce(roomId: string, code: string) { this.get(roomId).resolve(code); }
  async await(roomId: string): Promise<string> { return this.get(roomId).promise; }
  private get(roomId: string) {
    if (!this.rooms.has(roomId)) this.rooms.set(roomId, deferred());
    return this.rooms.get(roomId)!;
  }
}

Usage:

// Host
const code = await host.bot.createGame(host.account.username);
coordinator.announce('room-1', code);

// Joiners (concurrent)
const code = await coordinator.await('room-1');
await joiner.bot.joinGame(code, joiner.account.username);

No polling, no sleeps, no cross-page scraping.

Dashboard

Optional — only instantiated when WATCH=dashboard.

Server side (dashboard/server.ts): vanilla http + ws module. Serves a single static HTML page, accepts WebSocket connections, relays messages between scenarios and the browser.

Client side (dashboard/index.html + dashboard.js): 2×2 room grid, per-player tiles with live status (current player, score, held card, phase, moves), progress bars per hole, activity log at the bottom. No framework, ~300 lines total.

Click-to-watch: clicking a player tile sends start_stream(sessionKey) over WS. The runner attaches a CDP session to that player's page via context.newCDPSession(page), calls Page.startScreencast with {format: 'jpeg', quality: 60, maxWidth: 640, maxHeight: 360, everyNthFrame: 2}, and forwards each Page.screencastFrame event to the dashboard as { sessionKey, jpeg_b64 }. The dashboard renders it into an <img> that swaps src on each frame.

Returning to the grid sends stop_stream(sessionKey) and the runner detaches the CDP session. On WS disconnect, all active screencasts stop. This keeps CPU cost zero except while someone is actively watching.

DashboardReporter interface exposed to scenarios:

interface DashboardReporter {
  update(roomId: string, state: Partial<RoomState>): void;
  log(level: 'info'|'warn'|'error', msg: string, meta?: object): void;
  incrementMetric(name: string, by?: number): void;
}

When WATCH is not dashboard, all three methods are no-ops; structured logs still go to stdout.

Runner

runner.ts is the CLI entry point. Parses flags, resolves config precedence, launches browser(s), instantiates SessionPool + RoomCoordinator + (optional) Dashboard, loads the requested scenario by name, executes it, reports results, cleans up.

Scenarios

Scenario 1: `populate`

Goal: produce realistic scoreboard data for staging demos.

Config:

{
  name: 'populate',
  description: 'Long multi-round games to populate scoreboards',
  needs: { accounts: 16, rooms: 4, cpusPerRoom: 1 },
  defaultConfig: {
    gamesPerRoom: 10,
    holes: 9,
    decks: 2,
    cpuPersonalities: ['Sofia', 'Marcus', 'Kenji', 'Priya'],
    thinkTimeMs: [800, 2200],
    interGamePauseMs: 3000,
  },
}

Shape: 4 rooms × 4 accounts + 1 CPU each. Each room runs gamesPerRoom sequential games. Inside a room: host creates game → joiners join → host adds CPU → host starts game → all sessions loop on isMyTurn() + playTurn() with randomized human-like think time between turns. Between games, rooms pause briefly to mimic natural pacing.

Scenario 2: `stress`

Goal: hunt race conditions and stability bugs.

Config:

{
  name: 'stress',
  description: 'Rapid short games for stability & race condition hunting',
  needs: { accounts: 16, rooms: 4, cpusPerRoom: 2 },
  defaultConfig: {
    gamesPerRoom: 50,
    holes: 1,
    decks: 1,
    thinkTimeMs: [50, 150],
    interGamePauseMs: 200,
    chaosChance: 0.05,
  },
}

Shape: same as populate but tight loops, 1-hole games, and a chaos injector that fires with 5% probability per turn. Chaos events:

Rapid concurrent clicks on multiple cards
Random tab-navigation away and back
Simultaneous click on card + discard button
Brief WebSocket drop via Playwright's context.setOffline() followed by reconnect

Each chaos event is logged with enough context to reproduce (room, player, turn, event type).

Future scenarios (not MVP, design anticipates them)

reconnect — 2 accounts, deliberate mid-game disconnect, verify recovery
invite-flow — 0 accounts (fresh signups), exercise invite request → approval → first-game pipeline
admin-workflow — 1 admin account, drive the admin panel
mobile-populate — reuses populate with devices['iPhone 13'] context options
replay-viewer — watches completed games via the replay UI

Each is a new file in tests/soak/scenarios/, zero runner changes.

Data flow

Cold start (first-ever run)

Runner reads .env.stresstest → file missing
SessionPool.seedAccounts():
- For i in 0..15: POST /api/auth/register with { username, password, email, invite_code: '5VC2MCCN' }
- Receive { user, token, expires_at }, write to .env.stresstest
Server sets is_test_account=true automatically because the invite code has marks_as_test=true (see Server changes)
Runner proceeds to normal startup

Warm start (subsequent runs)

Runner reads .env.stresstest → 16 entries
SessionPool creates 16 BrowserContexts
For each context: inject token into localStorage using the key the client app reads on load (resolved during implementation by inspecting client/app.js; see Open Questions)
Each session navigates to / and lands post-auth
If any token is rejected (401), pool silently re-logs in via cached password and refreshes the token in .env.stresstest

Seeding: explicit script vs automatic fallback

Two paths to the same result, for flexibility:

Preferred: explicit npm run seed — runs scripts/seed-accounts.ts once during bring-up. Gives clear feedback, fails loudly on rate limits or network issues, lets you verify the accounts exist before a real run.
Fallback: auto-seed on cold start — if runner.ts starts and .env.stresstest is missing, SessionPool invokes the same seeding logic transparently. Useful for CI or fresh clones where nobody ran the explicit step.

Both paths share the same code in core/session-pool.ts; the script is a thin CLI wrapper around SessionPool.seedAccounts(). Documented in tests/soak/README.md with "run npm run seed first" as the happy path.

Room code handoff

Host session calls createGame → receives room code → coordinator.announce(roomId, code). Joiner sessions await coordinator.await(roomId) → receive code → call joinGame. All in-process, no polling.

Watch modes

Mode	Flag	Rendering	When to use
`none`	`WATCH=none`	Pure headless, JSONL stdout	CI, overnight unattended
`dashboard`	`WATCH=dashboard` (default)	HTML status grid + click-to-watch live video	Interactive runs, demos, debugging
`tiled`	`WATCH=tiled`	4 native Chromium windows positioned 2×2	Hands-on power-user debugging

`tiled` implementation detail

Two browsers launched: one headed (headless: false, slowMo: 50) for the 4 host contexts, one headless for the 12 joiner contexts. Host windows positioned via page.evaluate(() => window.moveTo(x, y)) after load. Grid computed from screen size with a default of 1920×1080.

Server-side changes

All changes are additive and fit the existing inline migration pattern in server/stores/user_store.py.

1. Schema

Two new columns + one partial index:

ALTER TABLE users_v2 ADD COLUMN IF NOT EXISTS is_test_account BOOLEAN DEFAULT FALSE;
CREATE INDEX IF NOT EXISTS idx_users_v2_is_test_account ON users_v2(is_test_account)
  WHERE is_test_account = TRUE;
ALTER TABLE invite_codes ADD COLUMN IF NOT EXISTS marks_as_test BOOLEAN DEFAULT FALSE;

Partial index because ~99% of rows will be FALSE; we only want to accelerate the "show test accounts" admin queries, not pay index-maintenance cost on every normal write.

2. Register flow propagates the flag

In services/auth_service.py, after resolving the invite code, read marks_as_test and pass through to user_store.create_user:

invite = await admin_service.get_invite_code(invite_code)
is_test = bool(invite and invite.marks_as_test)
user = await user_store.create_user(
    username=..., password_hash=..., email=...,
    is_test_account=is_test,
)

Users signing up without an invite or with a non-test invite are unaffected.

3. One-time: flag `5VC2MCCN` as test-seed

Executed once against staging (and any other environment the harness runs against):

UPDATE invite_codes SET marks_as_test = TRUE WHERE code = '5VC2MCCN';

Documented in the seeder script as a comment, and in tests/soak/README.md as a bring-up step. No admin UI for flagging invites as test-seed in MVP — add later if needed.

4. Stats filtering

Add include_test: bool = False parameter to stats queries in services/stats_service.py:

async def get_leaderboard(self, limit=50, include_test=False):
    query = """
      SELECT ... FROM player_stats ps
      JOIN users_v2 u ON u.id = ps.user_id
      WHERE ($1 OR NOT u.is_test_account)
      ORDER BY ps.total_points DESC
      LIMIT $2
    """
    return await conn.fetch(query, include_test, limit)

Router in routers/stats.py exposes include_test as an optional query parameter. Default False — real users visiting the site never see soak traffic. Admin panel and debugging views pass ?include_test=true.

Same treatment for:

get_player_stats(user_id, include_test) — gates individual profile lookups
get_recent_games(include_test) — hides games where any participant is a test account by default

5. Admin panel surfacing

Small additions to client/admin.html + client/admin.js:

User list: "Test" badge column for is_test_account=true rows
Invite codes: "Test-seed" indicator next to marks_as_test=true codes
Leaderboard + user list: "Include test accounts" toggle → passes ?include_test=true

Out of scope (server-side)

New admin endpoint for marking existing accounts as test
Admin UI for flagging invites as test-seed at creation time
Separate "test stats only" aggregation (admins invert their mental filter)
test_only=true query mode

Error handling

Failure taxonomy

Category	Example	Strategy
Recoverable game error	Animation flag stuck, click missed target	Log, continue, bot retries via existing `GolfBot` fallbacks
Recoverable session error	WS disconnect for one player, token expires	Reconnect session, rejoin game if possible, abort that room only if unrecoverable
Unrecoverable room error	Room stuck >60s, impossible state	Kill the room, capture artifacts, let other rooms continue
Fatal runner error	Staging unreachable, invite code exhausted, OOM	Stop everything cleanly, dump summary, exit non-zero

Core principle: per-room isolation. A failure in room 3 never unwinds rooms 1, 2, 4. Each room runs in its own Promise.allSettled branch.

Per-room watchdog

Each room gets a watchdog that resets on every ctx.heartbeat(roomId) call. If a room hasn't heartbeat'd in 60s, the watchdog captures artifacts, aborts that room only, and the runner continues with the remaining rooms.

Scenarios call heartbeat at each significant progress point (turn played, game started, game finished). The helper DashboardReporter.update() internally calls heartbeat as a convenience, so scenarios that use the dashboard reporter get watchdog resets for free. Scenarios that run with WATCH=none still need to call heartbeat explicitly at least once per 60s — a single call at the top of the per-turn loop is sufficient.

Artifact capture on failure

Captured per-room into tests/soak/artifacts/<run-id>/<room-id>/:

Screenshot of every context in the affected room
page.content() HTML snapshot per context
Last 200 console log messages per context (already captured by GolfBot)
Game state JSON from the state parser
Error stack trace
Scenario config snapshot

Directory structure:

tests/soak/artifacts/
  2026-04-10-populate-14.23.05/
    run.log           # structured JSONL, full run
    summary.json      # final stats
    room-0/
      screenshot-host.png
      screenshot-joiner-1.png
      page-host.html
      console.txt
      state.json
      error.txt

Artifacts directory is gitignored. Runs older than 7 days auto-pruned on startup.

Structured logging

Single logger, JSON Lines to stdout, pretty mirror to the dashboard. Every log line carries run_id, scenario, room (when applicable), and timestamp. Grep-friendly and jq-friendly.

Graceful shutdown

SIGINT / SIGTERM trigger shutdown via AbortController:

Global AbortSignal flips to aborted
Scenarios check ctx.signal.aborted in loops, finish current turn, exit cleanly
Runner waits up to 10s for scenarios to unwind
After 10s, force-closes all contexts + browser
Writes final summary.json and prints results
Exit codes: 0 = all rooms completed target games, 1 = any room failed, 2 = interrupted before completion

Double Ctrl-C = immediate force exit.

Periodic health probes

Every 30s during a run:

GET /api/health against the target server
Count of open browser contexts vs expected
Runner memory usage

If /api/health fails 3 consecutive times, declare fatal error, capture artifacts, stop. This prevents staging outages from being misattributed to bot bugs.

Retry policy

Retry only at the session level, never at the scenario level.

WS drop → reconnect session, rejoin game if possible, 3 attempts max
Token rejected → re-login via cached password, 1 attempt
Click missed → existing GolfBot retry (already built in)

Never retry: whole games, whole scenarios, fatal errors.

Cleanup guarantees

Three cleanup points, all going through the same cleanup() function wrapped in top-level try/finally:

Success — close contexts, close browsers, flush logs, write summary
Exception — capture artifacts first, then close contexts, flush logs, write partial summary
Signal interrupt — graceful shutdown as above, best-effort artifact capture

File layout

tests/soak/
├── package.json              # standalone (separate from tests/e2e/)
├── tsconfig.json
├── README.md                 # quickstart + flag reference + bring-up steps
├── .env.stresstest.example   # template (real file gitignored)
│
├── runner.ts                 # CLI entry — `npm run soak`
├── config.ts                 # CLI parsing + defaults merging
│
├── core/
│   ├── session-pool.ts
│   ├── room-coordinator.ts
│   ├── screencaster.ts       # CDP attach/detach on demand
│   ├── watchdog.ts
│   ├── artifacts.ts
│   ├── logger.ts
│   └── types.ts              # Scenario, Session, ScenarioContext interfaces
│
├── scenarios/
│   ├── populate.ts
│   ├── stress.ts
│   └── index.ts              # name → module registry
│
├── dashboard/
│   ├── server.ts             # http + ws
│   ├── index.html
│   ├── dashboard.css
│   └── dashboard.js
│
├── scripts/
│   ├── seed-accounts.ts      # one-shot seeding
│   ├── reset-accounts.ts     # future: wipe test account stats
│   └── smoke.sh              # bring-up validation
│
└── artifacts/                # gitignored, auto-pruned 7d
    └── <run-id>/...

Dependencies

New tests/soak/package.json:

{
  "name": "golf-soak",
  "private": true,
  "scripts": {
    "soak": "tsx runner.ts",
    "soak:populate": "tsx runner.ts --scenario=populate",
    "soak:stress": "tsx runner.ts --scenario=stress",
    "seed": "tsx scripts/seed-accounts.ts",
    "smoke": "scripts/smoke.sh"
  },
  "dependencies": {
    "playwright-core": "^1.40.0",
    "ws": "^8.16.0"
  },
  "devDependencies": {
    "tsx": "^4.7.0",
    "@types/ws": "^8.5.0",
    "@types/node": "^20.10.0",
    "typescript": "^5.3.0"
  }
}

Three runtime deps: playwright-core (already in tests/e2e/), ws (WebSocket for dashboard), tsx (dev-only, runs TypeScript directly). No HTTP framework, no bundler, no build step.

CLI flags

--scenario=populate|stress    required
--accounts=<n>                total sessions (default: scenario.needs.accounts)
--rooms=<n>                   default from scenario.needs
--cpus-per-room=<n>           default from scenario.needs
--games-per-room=<n>          default from scenario.defaultConfig
--holes=<n>                   default from scenario.defaultConfig
--watch=none|dashboard|tiled  default: dashboard
--dashboard-port=<n>          default: 7777
--target=<url>                default: TEST_URL env or http://localhost:8000
--run-id=<string>             default: ISO timestamp
--list                        print available scenarios and exit
--dry-run                     validate config without running

Derived: accounts-per-room = accounts / rooms. Must divide evenly; runner errors out with a clear message if not.

Config precedence: CLI flags → environment variables → scenario defaultConfig → runner defaults.

Meta-testing

Unit tests (Vitest, minimal)

room-coordinator.ts — announce/await correctness, timeout behavior
watchdog.ts — fires on timeout, resets on heartbeat, cancels cleanly
config.ts — CLI precedence, required field validation

Bring-up smoke test (`tests/soak/scripts/smoke.sh`)

Runs against local dev server with minimum viable config:

TEST_URL=http://localhost:8000 \
  npm run soak -- \
  --scenario=populate \
  --accounts=2 \
  --rooms=1 \
  --cpus-per-room=0 \
  --games-per-room=1 \
  --holes=1 \
  --watch=none

Exit 0 = full harness works end-to-end. ~30 seconds. Run after any change.

Manual validation checklist

Documented in tests/soak/CHECKLIST.md:

Seed 16 accounts against staging using the invite code
--scenario=populate --rooms=1 --games-per-room=1 completes cleanly
--scenario=populate --rooms=4 --games-per-room=1 — 4 rooms in parallel, no cross-contamination
--watch=dashboard opens browser, grid renders, progress updates
Click a player tile → live video appears, Esc → stops
--watch=tiled opens 4 browser windows in 2×2 grid
Ctrl-C during a run → graceful shutdown, summary printed, exit 2
Kill the target server mid-run → runner detects, captures artifacts, exits 1
Stats query ?include_test=false hides soak accounts, ?include_test=true shows them
Full stress run (--scenario=stress --games-per-room=10) — no console errors, all rooms complete

Implementation order

Sequenced so each step produces something demonstrable before moving on. The writing-plans skill will break this into concrete tasks.

Server-side changes — schema alters, register flow, stats filter, admin badge. Independent, ships first, unblocks local testing.
Scaffold tests/soak/ — package.json, tsconfig, core/types, logger. No behavior yet.
SessionPool + scripts/seed-accounts.ts — end-to-end auth: seed, cache, load, validate login.
RoomCoordinator + minimal populate scenario body — proves multi-room orchestration.
runner.ts — CLI, config merging, scenario loading, top-level error handling.
--watch=none works — runs against local dev, produces clean logs, exits 0. First end-to-end milestone.
--watch=dashboard status grid — HTML + WS + tile updates (no video yet).
CDP screencast / click-to-watch — the live video feature.
--watch=tiled mode — native windows via page.evaluate(window.moveTo).
stress scenario — chaos injection, rapid games.
Failure handling — watchdog, artifact capture, graceful shutdown.
Smoke test script + CHECKLIST.md — validation.
Run against staging for real — populate scoreboard, hunt bugs, report findings.

If step 6 takes longer than planned, steps 1–5 are still useful standalone.

Out of scope for MVP

Mobile viewport scenarios (future mobile-populate)
Reconnect-storm scenarios
Admin workflow scenarios
Concurrent scenario execution
Distributed runner
Grafana / OTEL / custom metrics push
Test account stat reset tooling
Auto-promoting stress findings into Playwright regression tests
New admin endpoints for account marking
Admin UI for flagging invites as test-seed

All of these are cheap to add later because the scenario interface and session pool don't presuppose them.

Open questions (to resolve during implementation)

localStorage auth key — exact keys used by client/app.js to persist the JWT and user blob; verified by reading the file during step 3.
Chaos event set for stress scenario — finalize which chaos events are in scope for MVP vs added incrementally (start with rapid clicks + tab nav + setOffline, add more as the server proves robust).
CDP screencast frame rate tuning — start at everyNthFrame: 2 (~15fps), adjust down if bandwidth/CPU is excessive on long runs.
Screen bounds detection for tiled mode — default to 1920×1080, expose override via --tiled-bounds=WxH; auto-detect later if useful.

28 KiB Raw Blame History Unescape Escape