Files

adlee-was-taken e8051b256b docs(plan): harden soak-harness schema migration for deploy

Makes the deployment path explicit in Task 1: traces the existing
lifespan → get_user_store → initialize_schema → conn.execute(SCHEMA_SQL)
flow, notes that the DO $$/IF NOT EXISTS pattern is the same one
every post-v1 column migration uses, and explains why rollback is
safe (additive changes only).

Adds two new verification steps to Task 1:
 - Step 7: post-deploy psql checks against staging
 - Step 8: same against production

Adds a "Post-deploy schema verification" block to CHECKLIST.md so
the schema state is verified after every server restart against
each target environment.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

2026-04-10 23:40:28 -04:00

168 KiB

Raw Blame History

Multiplayer Soak & UX Test Harness — Implementation Plan

For agentic workers: REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (- [ ]) syntax for tracking.

Goal: Build a standalone Playwright-based soak runner in tests/soak/ that drives 16 authenticated browser sessions across 4 concurrent rooms playing many multiplayer games, with pluggable scenarios, a click-to-watch dashboard via CDP screencast, and strict per-room failure isolation.

Architecture: Single-process node runner reusing the existing GolfBot class from tests/e2e/bot/. One shared browser (16 contexts) by default; WATCH=tiled uses a second headed browser for the 4 host contexts. Scenarios are plain TS modules exported from tests/soak/scenarios/. Dashboard is a tiny HTTP+WS server serving one static page that pushes live status and on-demand CDP screencast frames.

Tech Stack: TypeScript + tsx (no build step), Playwright Core, ws (WebSocket server), Vitest for unit tests, FastAPI + asyncpg (existing server), PostgreSQL (existing).

Spec: docs/superpowers/specs/2026-04-10-multiplayer-soak-test-design.md

Testing Strategy Notes

Server-side Python changes: The existing test suite mocks stores with AsyncMock and has no real-Postgres fixtures. Rather than inventing a new fixture pattern for this plan, server tasks use curl-based verification against a running local dev server as the explicit verification step after each commit. Run python server/main.py in another terminal (requires Postgres + Redis running — see docs/INSTALL.md).
TypeScript harness logic: Unit-tested with Vitest for pure modules (Deferred, RoomCoordinator, Watchdog, Config). Integration-level modules (SessionPool, Dashboard, Screencaster, Scenarios) are verified by running the harness itself via the smoke test.
End-to-end validation: tests/soak/scripts/smoke.sh is the canary — after every non-trivial change, run it against local dev and expect exit 0 within ~30s.

Phase 1 — Server-side changes (independent, ships first)

Task 1: Schema migration for `is_test_account` and `marks_as_test`

Add two columns, one partial index, and rebuild the leaderboard_overall materialized view to include is_test_account (so the filter works through the view fast path).

Deploy path (this is load-bearing — read before editing):

The existing codebase applies schema changes via inline DO $$ BEGIN IF NOT EXISTS (...) THEN ALTER TABLE ... END IF; END $$; blocks inside SCHEMA_SQL in server/stores/user_store.py. That string gets executed on every server startup by UserStore.create() → initialize_schema() → conn.execute(SCHEMA_SQL), which is called from the FastAPI lifespan via get_user_store(config.POSTGRES_URL) in server/main.py. Same pattern added every other post-v1 column (is_banned, force_password_reset, last_seen_at, rating, and many others — see the existing DO blocks in SCHEMA_SQL).

What this means for deploy:

No separate migration tool needed. CI/CD rebuilds the image, docker compose up -d restarts the container, lifespan fires, SCHEMA_SQL executes, the new DO $$ blocks see the missing columns and ALTER TABLE ADD COLUMN them in place.
Idempotent by construction. Re-running against an already-migrated DB is a no-op — the IF NOT EXISTS guard in each DO block skips the ALTER.
Fresh installs work. CREATE TABLE IF NOT EXISTS users_v2 uses the current column list; the ADD COLUMN DO blocks are no-ops because the column is already there from the CREATE.
Matview rebuild is atomic. The DO $$ block that DROPs+CREATEs leaderboard_overall runs inside a single transaction. CREATE MATERIALIZED VIEW ... AS SELECT populates immediately (no WITH NO DATA), so concurrent readers never see an empty or missing view — they see either the old version (pre-commit) or the new version (post-commit).
Rollback is safe. All changes are additive. If you have to revert the code, the new columns just sit unused — old code never references them, so nothing breaks.

Files:

Modify: server/stores/user_store.py — append to SCHEMA_SQL (ALTER blocks near L79–L98 and the matview block near L298–L335)
Step 1: Add column migration to SCHEMA_SQL

Open server/stores/user_store.py. Inside the first DO $$ BEGIN ... END $$; block (around line 80–98 that handles admin columns), append the is_test_account column check. Then add a second ALTER for invite_codes.marks_as_test in a new DO $$ block right after.

Add after the existing last_seen_at check (before END $$; on line ~98):

    IF NOT EXISTS (SELECT 1 FROM information_schema.columns
                   WHERE table_name = 'users_v2' AND column_name = 'is_test_account') THEN
        ALTER TABLE users_v2 ADD COLUMN is_test_account BOOLEAN DEFAULT FALSE;
    END IF;

Then, immediately after the END $$; that closes the users_v2 admin block, add a new block for invite_codes:

-- Add marks_as_test to invite_codes if not exists
DO $$
BEGIN
    IF NOT EXISTS (SELECT 1 FROM information_schema.columns
                   WHERE table_name = 'invite_codes' AND column_name = 'marks_as_test') THEN
        ALTER TABLE invite_codes ADD COLUMN marks_as_test BOOLEAN DEFAULT FALSE;
    END IF;
END $$;

Step 2: Add partial index on is_test_account

Find the indexes block near line 338. After the existing idx_users_banned index (line ~344), add:

CREATE INDEX IF NOT EXISTS idx_users_v2_is_test_account ON users_v2(is_test_account)
    WHERE is_test_account = TRUE;

Step 3: Rebuild leaderboard_overall materialized view to include is_test_account

Find the existing matview block at line ~298. Modify the version-check DO block so the view is dropped and recreated if it lacks the is_test_account column. Replace the existing block:

-- Leaderboard materialized view (refreshed periodically)
-- Drop and recreate if missing is_test_account column (soak harness migration)
DO $$
BEGIN
    IF EXISTS (SELECT 1 FROM pg_matviews WHERE matviewname = 'leaderboard_overall') THEN
        -- Check if is_test_account column exists in the view
        IF NOT EXISTS (
            SELECT 1 FROM information_schema.columns
            WHERE table_name = 'leaderboard_overall' AND column_name = 'is_test_account'
        ) THEN
            DROP MATERIALIZED VIEW leaderboard_overall;
        END IF;
    END IF;

    IF NOT EXISTS (SELECT 1 FROM pg_matviews WHERE matviewname = 'leaderboard_overall') THEN
        EXECUTE '
            CREATE MATERIALIZED VIEW leaderboard_overall AS
            SELECT
                u.id as user_id,
                u.username,
                COALESCE(u.is_test_account, FALSE) as is_test_account,
                s.games_played,
                s.games_won,
                ROUND(s.games_won::numeric / NULLIF(s.games_played, 0) * 100, 1) as win_rate,
                s.rounds_won,
                ROUND(s.total_points::numeric / NULLIF(s.total_rounds, 0), 1) as avg_score,
                s.best_score as best_round_score,
                s.knockouts,
                s.best_win_streak,
                COALESCE(s.rating, 1500) as rating,
                s.last_game_at
            FROM player_stats s
            JOIN users_v2 u ON s.user_id = u.id
            WHERE s.games_played >= 5
            AND u.deleted_at IS NULL
            AND (u.is_banned = false OR u.is_banned IS NULL)
        ';
    END IF;
END $$;

Note: the only differences from the existing block are the changed comment, the changed column-existence check (is_test_account instead of rating), and the new COALESCE(u.is_test_account, FALSE) as is_test_account column in the SELECT. Everything else stays identical.

Step 4: Start the server to run migrations

Run (in another terminal, with Postgres + Redis up):

cd /home/alee/Sources/golfgame
python server/main.py

Expected: server starts cleanly, no errors about is_test_account or marks_as_test or leaderboard_overall.

Step 5: Verify schema via psql

Connect to the dev database and confirm:

psql -d golfgame -c "\d users_v2" | grep is_test_account
psql -d golfgame -c "\d invite_codes" | grep marks_as_test
psql -d golfgame -c "\d leaderboard_overall" | grep is_test_account
psql -d golfgame -c "\di idx_users_v2_is_test_account"

Expected: all four commands return matching rows.

Step 6: Commit

git add server/stores/user_store.py
git commit -m "$(cat <<'EOF'
feat(server): add is_test_account + marks_as_test schema

New columns support separating soak-harness test traffic from real
user traffic in stats queries. Rebuilds leaderboard_overall matview
to include is_test_account so the fast path stays filterable.

Migration is idempotent via DO $$ / IF NOT EXISTS blocks inside
SCHEMA_SQL, which runs on every server startup — same mechanism
every existing post-v1 column migration uses.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
EOF
)"

Step 7: Post-deploy verification (staging)

After this commit ships to staging via CI/CD (or docker compose up -d on the staging host), verify the migration actually applied:

ssh root@129.212.150.189 << 'REMOTE'
  cd /opt/golfgame
  # Find the postgres container name (it may vary across compose files)
  PG_CONTAINER=$(docker compose -f docker-compose.staging.yml ps -q postgres)
  docker exec -i $PG_CONTAINER psql -U postgres -d golfgame << 'SQL'
    -- Confirm columns exist
    \d users_v2
    \d invite_codes
    \d leaderboard_overall

    -- Targeted checks
    SELECT column_name, data_type, column_default
    FROM information_schema.columns
    WHERE table_name = 'users_v2' AND column_name = 'is_test_account';

    SELECT column_name, data_type, column_default
    FROM information_schema.columns
    WHERE table_name = 'invite_codes' AND column_name = 'marks_as_test';

    SELECT column_name FROM information_schema.columns
    WHERE table_name = 'leaderboard_overall' AND column_name = 'is_test_account';

    -- Partial index
    SELECT indexname, indexdef FROM pg_indexes
    WHERE indexname = 'idx_users_v2_is_test_account';
SQL
REMOTE

Expected (all four present):

users_v2.is_test_account with default false
invite_codes.marks_as_test with default false
leaderboard_overall has an is_test_account column
idx_users_v2_is_test_account exists

If any of these are missing, the server didn't actually restart (or restarted but the container has a stale image). Check docker compose logs golfgame for the line User store schema initialized — if it's not there, the migration never ran.

Step 8: Post-deploy verification (production)

Same check, against prod, after the prod deploy:

ssh root@165.245.152.51 << 'REMOTE'
  cd /opt/golfgame
  PG_CONTAINER=$(docker compose -f docker-compose.prod.yml ps -q postgres)
  docker exec -i $PG_CONTAINER psql -U postgres -d golfgame -c "\d users_v2" | grep is_test_account
  docker exec -i $PG_CONTAINER psql -U postgres -d golfgame -c "\d invite_codes" | grep marks_as_test
  docker exec -i $PG_CONTAINER psql -U postgres -d golfgame -c "\d leaderboard_overall" | grep is_test_account
REMOTE

Expected: three matching rows. If prod migration fails, the rollback story is clean — revert the commit, redeploy, old code keeps working because it never referenced the new columns.

Task 2: Propagate `is_test_account` through `User` model and `user_store`

Wire the new column into the User dataclass, create_user signature, _row_to_user mapping, and every SELECT list that already pulls user columns.

Files:

Modify: server/models/user.py — User dataclass (L22–L68) + to_dict (L82–L116) + from_dict (L118+)
Modify: server/stores/user_store.py — create_user (L454–L501), _row_to_user (L997–L1020), get_user_by_id/get_user_by_username/get_user_by_email SELECT lists (L503–L570)
Step 1: Add is_test_account to the User dataclass

In server/models/user.py, add a new field to the User dataclass (after force_password_reset on L68):

    is_test_account: bool = False

Update the docstring Attributes: block around L45 to include:

        is_test_account: True for accounts created by the soak test harness.

Step 2: Include is_test_account in to_dict and from_dict

In User.to_dict at L82, add to the d dict (after force_password_reset):

            "is_test_account": self.is_test_account,

In User.from_dict, add the corresponding parse — find where force_password_reset is parsed and add the same pattern:

            is_test_account=d.get("is_test_account", False),

Step 3: Add is_test_account parameter to create_user

In server/stores/user_store.py at L454, add a new parameter:

    async def create_user(
        self,
        username: str,
        password_hash: str,
        email: Optional[str] = None,
        role: UserRole = UserRole.USER,
        guest_id: Optional[str] = None,
        verification_token: Optional[str] = None,
        verification_expires: Optional[datetime] = None,
        is_test_account: bool = False,
    ) -> Optional[User]:

Update the docstring to add a line in Args: describing is_test_account.

Change the INSERT SQL block to include the new column:

                row = await conn.fetchrow(
                    """
                    INSERT INTO users_v2 (username, password_hash, email, role, guest_id,
                                          verification_token, verification_expires,
                                          is_test_account)
                    VALUES ($1, $2, $3, $4, $5, $6, $7, $8)
                    RETURNING id, username, email, password_hash, role, email_verified,
                              verification_token, verification_expires, reset_token, reset_expires,
                              guest_id, deleted_at, preferences, created_at, last_login, last_seen_at,
                              is_active, is_banned, ban_reason, force_password_reset, is_test_account
                    """,
                    username,
                    password_hash,
                    email,
                    role.value,
                    guest_id,
                    verification_token,
                    verification_expires,
                    is_test_account,
                )

Step 4: Update _row_to_user mapping

In server/stores/user_store.py at L997, add to the User(...) call (after force_password_reset):

            is_test_account=row.get("is_test_account", False) or False,

Step 5: Update all other SELECT lists in user_store

Find every query in server/stores/user_store.py that returns a full user row and passes it to _row_to_user. Add is_test_account to the SELECT column list for each. Grep to find them:

grep -n "is_active, is_banned, ban_reason, force_password_reset" server/stores/user_store.py

For each match, append , is_test_account to the SELECT list. Expected locations:

create_user INSERT ... RETURNING (already updated in Step 3)
get_user_by_id at L503
get_user_by_username at L519
get_user_by_email (find it)
Any other SELECT ... FROM users_v2 that calls _row_to_user
Step 6: Restart server, verify no errors

# Kill and restart the dev server
python server/main.py

Expected: server starts cleanly. Any query that touches users now returns is_test_account correctly.

Step 7: Smoke test via curl

# Register a throwaway test user (no invite code needed if DAILY_OPEN_SIGNUPS > 0 locally,
# or use the 5VC2MCCN invite code if INVITE_ONLY=true)
# Set PW to any password of your choice (>= 8 chars).
PW='SomeTestPw_1!'
curl -sX POST http://localhost:8000/api/auth/register \
  -H 'Content-Type: application/json' \
  -d "{\"username\":\"soaktest_smoke1\",\"password\":\"$PW\",\"email\":\"soaktest_smoke1@example.com\",\"invite_code\":\"5VC2MCCN\"}"

Expected: HTTP 200 with {"user":{...},"token":"..."}. The registration path now runs through the new column without errors even though the value is still always FALSE at this stage.

Step 8: Commit

git add server/models/user.py server/stores/user_store.py
git commit -m "$(cat <<'EOF'
feat(server): propagate is_test_account through User model & store

User dataclass, create_user, and all SELECT lists now round-trip the
new column. Value is always FALSE until Task 4 wires the register
flow to the invite code's marks_as_test flag.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
EOF
)"

Task 3: Expose `marks_as_test` on `InviteCode` and add lookup helper

validate_invite_code currently returns a bare bool. We need a new helper that returns the full row so the register flow can check marks_as_test without a second query.

Files:

Modify: server/services/admin_service.py — InviteCode dataclass (L115–L138), get_invite_codes SELECT (L1106–L1141), add new get_invite_code_details method
Step 1: Add marks_as_test field to InviteCode dataclass

In server/services/admin_service.py at L115:

@dataclass
class InviteCode:
    """Invite code details."""
    code: str
    created_by: str
    created_by_username: str
    created_at: datetime
    expires_at: datetime
    max_uses: int
    use_count: int
    is_active: bool
    marks_as_test: bool = False

Update to_dict at L127 to include the field:

    def to_dict(self) -> dict:
        return {
            "code": self.code,
            "created_by": self.created_by,
            "created_by_username": self.created_by_username,
            "created_at": self.created_at.isoformat() if self.created_at else None,
            "expires_at": self.expires_at.isoformat() if self.expires_at else None,
            "max_uses": self.max_uses,
            "use_count": self.use_count,
            "is_active": self.is_active,
            "remaining_uses": max(0, self.max_uses - self.use_count),
            "marks_as_test": self.marks_as_test,
        }

Step 2: Update get_invite_codes SELECT to include marks_as_test

Find get_invite_codes at L1106. Modify the SQL to pull the column and pass it through:

    async def get_invite_codes(self, include_expired: bool = False) -> List[InviteCode]:
        """List all invite codes."""
        async with self.pool.acquire() as conn:
            sql = """
                SELECT c.code, c.created_by, u.username as created_by_username,
                       c.created_at, c.expires_at,
                       c.max_uses, c.use_count, c.is_active,
                       COALESCE(c.marks_as_test, FALSE) as marks_as_test
                FROM invite_codes c
                LEFT JOIN users_v2 u ON c.created_by = u.id
            """

Find the list comprehension that constructs InviteCode(...) objects and add the new kwarg:

                InviteCode(
                    code=row["code"],
                    created_by=str(row["created_by"]),
                    created_by_username=row["created_by_username"] or "unknown",
                    created_at=row["created_at"].replace(tzinfo=timezone.utc) if row["created_at"] else None,
                    expires_at=row["expires_at"].replace(tzinfo=timezone.utc) if row["expires_at"] else None,
                    max_uses=row["max_uses"],
                    use_count=row["use_count"],
                    is_active=row["is_active"],
                    marks_as_test=row["marks_as_test"],
                )

Step 3: Add new get_invite_code_details method

Add a new method right after validate_invite_code (around L1214) that returns the row with marks_as_test. The register flow will call this to resolve the flag. Place it between validate_invite_code and use_invite_code:

    async def get_invite_code_details(self, code: str) -> Optional[dict]:
        """
        Look up an invite code's row including marks_as_test.

        Returns None if the code does not exist. Does NOT validate expiry
        or usage — use validate_invite_code for that. This is purely a
        helper for the register flow to discover the test-seed flag.
        """
        async with self.pool.acquire() as conn:
            row = await conn.fetchrow(
                """
                SELECT code, max_uses, use_count, is_active,
                       COALESCE(marks_as_test, FALSE) as marks_as_test
                FROM invite_codes
                WHERE code = $1
                """,
                code,
            )
            if not row:
                return None
            return {
                "code": row["code"],
                "max_uses": row["max_uses"],
                "use_count": row["use_count"],
                "is_active": row["is_active"],
                "marks_as_test": row["marks_as_test"],
            }

Step 4: Verify with curl via admin panel endpoint

Assuming you have an admin token from a local dev user. Hit the existing admin invites listing:

# Replace TOKEN with a valid admin JWT
curl -s http://localhost:8000/api/admin/invites \
  -H "Authorization: Bearer $TOKEN" | jq '.codes[0]'

Expected: response includes "marks_as_test": false on at least one code.

Step 5: Commit

git add server/services/admin_service.py
git commit -m "$(cat <<'EOF'
feat(server): expose marks_as_test on InviteCode

Adds the field to the dataclass, SELECT list in get_invite_codes,
and a new get_invite_code_details helper that the register flow
will use to discover whether an invite should flag new accounts
as test accounts.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
EOF
)"

Task 4: Wire register flow to set `is_test_account` from invite

When a user registers with an invite whose marks_as_test=TRUE, the new account is flagged. The plumbing lives in two places: the router reads the flag and passes it to the service; the service passes it to the store.

Files:

Modify: server/routers/auth.py — register handler (L224–L320)
Modify: server/services/auth_service.py — register method (L98–L178)
Step 1: Add is_test_account parameter to auth_service.register

In server/services/auth_service.py at L98, add the new parameter:

    async def register(
        self,
        username: str,
        password: str,
        email: Optional[str] = None,
        guest_id: Optional[str] = None,
        is_test_account: bool = False,
    ) -> RegistrationResult:

Update the docstring Args: block:

            is_test_account: Mark this user as a soak-harness test account.

Pass the value through to create_user at L146:

        user = await self.user_store.create_user(
            username=username,
            password_hash=password_hash,
            email=email,
            role=UserRole.USER,
            guest_id=guest_id,
            verification_token=verification_token,
            verification_expires=verification_expires,
            is_test_account=is_test_account,
        )

Step 2: Update the router to resolve marks_as_test and pass it through

In server/routers/auth.py, find the register handler at L224. After the existing invite-code validation block (around L248–L252), fetch the invite details and compute is_test:

    # --- Invite code validation ---
    is_test_account = False
    if has_invite:
        if not _admin_service:
            raise HTTPException(status_code=503, detail="Admin service not initialized")
        if not await _admin_service.validate_invite_code(request_body.invite_code):
            raise HTTPException(status_code=400, detail="Invalid or expired invite code")
        # Check if this invite flags new accounts as test accounts
        invite_details = await _admin_service.get_invite_code_details(request_body.invite_code)
        if invite_details and invite_details.get("marks_as_test"):
            is_test_account = True

Then pass it to auth_service.register at L276:

    # --- Create the account ---
    result = await auth_service.register(
        username=request_body.username,
        password=request_body.password,
        email=request_body.email,
        is_test_account=is_test_account,
    )

Step 3: Flag the dev invite code for testing

Before we can test end-to-end locally, we need an invite code with marks_as_test=TRUE in the local dev DB. Run (once, manually):

# First, check if 5VC2MCCN exists locally (it probably doesn't — that's staging's code).
# Create a local test invite code and flag it:
psql -d golfgame <<'EOF'
-- Create a local dev test-seed invite if not exists
INSERT INTO invite_codes (code, created_by, expires_at, max_uses, is_active, marks_as_test)
SELECT 'SOAKTEST', id, NOW() + INTERVAL '10 years', 100, TRUE, TRUE
FROM users_v2 WHERE role = 'admin' LIMIT 1
ON CONFLICT (code) DO UPDATE SET marks_as_test = TRUE;

-- Verify
SELECT code, max_uses, use_count, marks_as_test FROM invite_codes WHERE code = 'SOAKTEST';
EOF

Expected: marks_as_test | t in the last row.

Step 4: Verify register flow sets is_test_account

Restart the dev server, then:

curl -sX POST http://localhost:8000/api/auth/register \
  -H 'Content-Type: application/json' \
  -d "{\"username\":\"soaktest_register1\",\"password\":\"$PW\",\"email\":\"soaktest_register1@example.com\",\"invite_code\":\"SOAKTEST\"}"

# Verify in DB
psql -d golfgame -c "SELECT username, is_test_account FROM users_v2 WHERE username = 'soaktest_register1';"

Expected: is_test_account | t.

Step 5: Verify non-test invite does NOT flag new accounts

# Create a non-test invite
psql -d golfgame <<'EOF'
INSERT INTO invite_codes (code, created_by, expires_at, max_uses, is_active, marks_as_test)
SELECT 'NORMAL01', id, NOW() + INTERVAL '10 years', 10, TRUE, FALSE
FROM users_v2 WHERE role = 'admin' LIMIT 1
ON CONFLICT (code) DO UPDATE SET marks_as_test = FALSE;
EOF

curl -sX POST http://localhost:8000/api/auth/register \
  -H 'Content-Type: application/json' \
  -d "{\"username\":\"realuser_smoke1\",\"password\":\"$PW\",\"email\":\"realuser_smoke1@example.com\",\"invite_code\":\"NORMAL01\"}"

psql -d golfgame -c "SELECT username, is_test_account FROM users_v2 WHERE username = 'realuser_smoke1';"

Expected: is_test_account | f.

Step 6: Commit

git add server/routers/auth.py server/services/auth_service.py
git commit -m "$(cat <<'EOF'
feat(server): register flow flags accounts from test-seed invites

When a user registers with an invite_code whose marks_as_test=TRUE,
their users_v2.is_test_account is set to TRUE. Normal invite codes
and invite-less signups are unaffected.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
EOF
)"

Task 5: Stats filtering (`include_test` parameter)

Thread an include_test: bool = False parameter through get_leaderboard, get_player_rank, and the corresponding router handlers. Default is False — real users never see soak traffic.

Files:

Modify: server/services/stats_service.py — get_leaderboard (L169), get_player_rank (L249)
Modify: server/routers/stats.py — get_leaderboard route (L157), get_player_rank route (L227), get_my_rank route (L348)
Step 1: Add include_test to get_leaderboard service method

In server/services/stats_service.py at L169:

    async def get_leaderboard(
        self,
        metric: str = "wins",
        limit: int = 50,
        offset: int = 0,
        include_test: bool = False,
    ) -> List[LeaderboardEntry]:

Inside the method, find both SQL paths (materialized view and fallback). In the view path at L208, change the WHERE clause:

            if view_exists:
                # Use materialized view for performance
                rows = await conn.fetch(f"""
                    SELECT
                        user_id, username, games_played, games_won,
                        win_rate, avg_score, knockouts, best_win_streak,
                        COALESCE(rating, 1500) as rating,
                        ROW_NUMBER() OVER (ORDER BY {column} {direction}) as rank
                    FROM leaderboard_overall
                    WHERE ($3 OR NOT is_test_account)
                    ORDER BY {column} {direction}
                    LIMIT $1 OFFSET $2
                """, limit, offset, include_test)

In the fallback path at L220, add the WHERE clause and parameter:

            else:
                # Fall back to direct query
                rows = await conn.fetch(f"""
                    SELECT
                        s.user_id, u.username, s.games_played, s.games_won,
                        ROUND(s.games_won::numeric / NULLIF(s.games_played, 0) * 100, 1) as win_rate,
                        ROUND(s.total_points::numeric / NULLIF(s.total_rounds, 0), 1) as avg_score,
                        s.knockouts, s.best_win_streak,
                        COALESCE(s.rating, 1500) as rating,
                        ROW_NUMBER() OVER (ORDER BY {column} {direction}) as rank
                    FROM player_stats s
                    JOIN users_v2 u ON s.user_id = u.id
                    WHERE s.games_played >= 5
                    AND u.deleted_at IS NULL
                    AND (u.is_banned = false OR u.is_banned IS NULL)
                    AND ($3 OR NOT COALESCE(u.is_test_account, FALSE))
                    ORDER BY {column} {direction}
                    LIMIT $1 OFFSET $2
                """, limit, offset, include_test)

Step 2: Apply the same pattern to get_player_rank

In server/services/stats_service.py at L249:

    async def get_player_rank(
        self,
        user_id: str,
        metric: str = "wins",
        include_test: bool = False,
    ) -> Optional[int]:

Update both SQL paths to include the include_test filter. View path at L287:

            if view_exists:
                row = await conn.fetchrow(f"""
                    SELECT rank FROM (
                        SELECT user_id, ROW_NUMBER() OVER (ORDER BY {column} {direction}) as rank
                        FROM leaderboard_overall
                        WHERE ($2 OR NOT is_test_account)
                    ) ranked
                    WHERE user_id = $1
                """, user_id, include_test)

Fallback path at L294:

            else:
                row = await conn.fetchrow(f"""
                    SELECT rank FROM (
                        SELECT s.user_id, ROW_NUMBER() OVER (ORDER BY {column} {direction}) as rank
                        FROM player_stats s
                        JOIN users_v2 u ON s.user_id = u.id
                        WHERE s.games_played >= 5
                        AND u.deleted_at IS NULL
                        AND (u.is_banned = false OR u.is_banned IS NULL)
                        AND ($2 OR NOT COALESCE(u.is_test_account, FALSE))
                    ) ranked
                    WHERE user_id = $1
                """, user_id, include_test)

Step 3: Expose include_test as a query parameter on the leaderboard route

In server/routers/stats.py at L157:

@router.get("/leaderboard", response_model=LeaderboardResponse)
async def get_leaderboard(
    metric: str = Query("wins", pattern="^(wins|win_rate|avg_score|knockouts|streak|rating)$"),
    limit: int = Query(50, ge=1, le=100),
    offset: int = Query(0, ge=0),
    include_test: bool = Query(False, description="Include soak-harness test accounts"),
    service: StatsService = Depends(get_stats_service_dep),
):
    """
    Get leaderboard by metric.

    Metrics:
    - wins: Total games won
    - win_rate: Win percentage (requires 5+ games)
    - avg_score: Average points per round (lower is better)
    - knockouts: Times going out first
    - streak: Best win streak

    Players must have 5+ games to appear on leaderboards.
    By default, soak-harness test accounts are hidden.
    """
    entries = await service.get_leaderboard(metric, limit, offset, include_test)

Step 4: Same for get_player_rank and get_my_rank routes

At L227:

@router.get("/players/{user_id}/rank", response_model=PlayerRankResponse)
async def get_player_rank(
    user_id: str,
    metric: str = Query("wins", pattern="^(wins|win_rate|avg_score|knockouts|streak|rating)$"),
    include_test: bool = Query(False),
    service: StatsService = Depends(get_stats_service_dep),
):
    """Get player's rank on a leaderboard."""
    rank = await service.get_player_rank(user_id, metric, include_test)

At L348:

@router.get("/me/rank", response_model=PlayerRankResponse)
async def get_my_rank(
    metric: str = Query("wins", pattern="^(wins|win_rate|avg_score|knockouts|streak|rating)$"),
    include_test: bool = Query(False),
    user: User = Depends(require_user),
    service: StatsService = Depends(get_stats_service_dep),
):
    """Get current user's rank on a leaderboard."""
    rank = await service.get_player_rank(user.id, metric, include_test)

Step 5: Verify filtering works via curl

# Mark a test user we registered earlier as having games played (synthetic)
psql -d golfgame <<'EOF'
INSERT INTO player_stats (user_id, games_played, games_won, total_points, total_rounds, rounds_won)
SELECT id, 10, 8, 50, 30, 20 FROM users_v2 WHERE username = 'soaktest_register1'
ON CONFLICT (user_id) DO UPDATE SET games_played = 10, games_won = 8;

-- Refresh the matview so the test account shows up
REFRESH MATERIALIZED VIEW leaderboard_overall;
EOF

# Default (include_test=false) should NOT include soaktest_register1
curl -s "http://localhost:8000/api/stats/leaderboard?metric=wins" | jq '.entries[] | select(.username | startswith("soaktest_"))'

# include_test=true should include soaktest_register1
curl -s "http://localhost:8000/api/stats/leaderboard?metric=wins&include_test=true" | jq '.entries[] | select(.username | startswith("soaktest_"))'

Expected: first command returns nothing, second returns a JSON object for soaktest_register1.

Step 6: Commit

git add server/services/stats_service.py server/routers/stats.py
git commit -m "$(cat <<'EOF'
feat(server): stats queries support include_test filter

Leaderboard and rank queries take an optional include_test param
(default false). Real users never see soak-harness traffic unless
they explicitly opt in via ?include_test=true.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
EOF
)"

Task 6: Admin service + route surfaces `is_test_account`

UserDetails exposes the flag, search_users selects it, and list_users admin route accepts an include_test query parameter.

Files:

Modify: server/services/admin_service.py — UserDetails (L24–L58), search_users (L312–L382), get_user (L384–L428)
Modify: server/routers/admin.py — list_users route (L80–L107)
Step 1: Add field to UserDetails dataclass

In server/services/admin_service.py at L24, add to the dataclass:

@dataclass
class UserDetails:
    """Extended user info for admin view."""
    id: str
    username: str
    email: Optional[str]
    role: str
    email_verified: bool
    is_banned: bool
    ban_reason: Optional[str]
    force_password_reset: bool
    created_at: datetime
    last_login: Optional[datetime]
    last_seen_at: Optional[datetime]
    is_active: bool
    games_played: int
    games_won: int
    is_test_account: bool = False

Update to_dict to include it:

    def to_dict(self) -> dict:
        return {
            "id": self.id,
            "username": self.username,
            "email": self.email,
            "role": self.role,
            "email_verified": self.email_verified,
            "is_banned": self.is_banned,
            "ban_reason": self.ban_reason,
            "force_password_reset": self.force_password_reset,
            "created_at": self.created_at.isoformat() if self.created_at else None,
            "last_login": self.last_login.isoformat() if self.last_login else None,
            "last_seen_at": self.last_seen_at.isoformat() if self.last_seen_at else None,
            "is_active": self.is_active,
            "games_played": self.games_played,
            "games_won": self.games_won,
            "is_test_account": self.is_test_account,
        }

Step 2: Update search_users to SELECT and filter on is_test_account

In server/services/admin_service.py at L312, add include_test parameter and column to the SELECT:

    async def search_users(
        self,
        query: str = "",
        limit: int = 50,
        offset: int = 0,
        include_banned: bool = True,
        include_deleted: bool = False,
        include_test: bool = True,
    ) -> List[UserDetails]:

Modify the SQL to pull is_test_account:

            sql = """
                SELECT u.id, u.username, u.email, u.role,
                       u.email_verified, u.is_banned, u.ban_reason,
                       u.force_password_reset, u.created_at, u.last_login,
                       u.last_seen_at, u.is_active,
                       COALESCE(u.is_test_account, FALSE) as is_test_account,
                       COALESCE(s.games_played, 0) as games_played,
                       COALESCE(s.games_won, 0) as games_won
                FROM users_v2 u
                LEFT JOIN player_stats s ON u.id = s.user_id
                WHERE 1=1
            """

After the existing include_deleted check, add:

            if not include_test:
                sql += " AND (u.is_test_account = false OR u.is_test_account IS NULL)"

Update the UserDetails(...) construction in the list comprehension to include is_test_account=row["is_test_account"].

Step 3: Update get_user (single-user lookup) similarly

In server/services/admin_service.py at L384, add COALESCE(u.is_test_account, FALSE) as is_test_account to the SELECT and is_test_account=row["is_test_account"] to the UserDetails(...) construction. The get_user method does NOT need the filter parameter — admins looking up individual users should always see them.

Step 4: Add include_test to the admin list_users route

In server/routers/admin.py at L80:

@router.get("/users")
async def list_users(
    query: str = "",
    limit: int = 50,
    offset: int = 0,
    include_banned: bool = True,
    include_deleted: bool = False,
    include_test: bool = True,
    admin: User = Depends(require_admin_v2),
    service: AdminService = Depends(get_admin_service_dep),
):
    """
    Search and list users.

    Args:
        query: Search by username or email.
        limit: Maximum results to return.
        offset: Results to skip.
        include_banned: Include banned users.
        include_deleted: Include soft-deleted users.
        include_test: Include soak-harness test accounts (default true for admins).
    """
    users = await service.search_users(
        query=query,
        limit=limit,
        offset=offset,
        include_banned=include_banned,
        include_deleted=include_deleted,
        include_test=include_test,
    )
    return {"users": [u.to_dict() for u in users]}

Note: default is True for the admin path — admins should see everything by default. The client-side toggle will explicitly pass false when the admin wants to hide test accounts.

Step 5: Verify via curl

# Assuming admin token in $TOKEN env var
curl -s "http://localhost:8000/api/admin/users?query=soaktest" \
  -H "Authorization: Bearer $TOKEN" | jq '.users[] | {username, is_test_account}'

curl -s "http://localhost:8000/api/admin/users?query=soaktest&include_test=false" \
  -H "Authorization: Bearer $TOKEN" | jq '.users[]'

Expected: first returns users with is_test_account: true; second returns empty (test accounts filtered out).

Step 6: Commit

git add server/services/admin_service.py server/routers/admin.py
git commit -m "$(cat <<'EOF'
feat(server): admin users list surfaces is_test_account

UserDetails carries the new column, search_users selects and
optionally filters on it, and the /api/admin/users route accepts
?include_test=false to hide soak-harness accounts.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
EOF
)"

Task 7: Admin panel UI — Test badge and filter toggle

Add a visible [Test] badge on test accounts in the admin user list, a [Test-seed] indicator on invite codes that mark new accounts as test, and an "Include test accounts" checkbox next to the existing "Include banned" toggle.

Files:

Modify: client/admin.html — add the new toggle near the existing #include-banned checkbox
Modify: client/admin.js — loadUsers (L305), getStatusBadge (L246), the invite codes renderer (L443)
Step 1: Add the "Include test accounts" checkbox to admin.html

In client/admin.html, find the existing #include-banned checkbox (it's in the users tab filter bar — grep for it). Add a sibling checkbox right after:

grep -n "include-banned" client/admin.html

Add next to that line:

<label>
  <input type="checkbox" id="include-test" />
  Include test accounts
</label>

Step 2: Read the new checkbox in loadUsers and pass to getUsers

In client/admin.js at L305:

async function loadUsers() {
    try {
        const query = document.getElementById('user-search').value;
        const includeBanned = document.getElementById('include-banned').checked;
        const includeTest = document.getElementById('include-test').checked;
        const data = await getUsers(query, usersPage * PAGE_SIZE, includeBanned, includeTest);

Find getUsers at L70 and add the new parameter:

async function getUsers(query = '', offset = 0, includeBanned = true, includeTest = true) {
    const params = new URLSearchParams({
        query,
        limit: PAGE_SIZE,
        offset,
        include_banned: includeBanned,
        include_test: includeTest,
    });
    return apiRequest(`/api/admin/users?${params}`);
}

Note: the existing signature builds a URLSearchParams — check the actual code at L70 and match its style; the key change is adding include_test: includeTest to the params.

Step 3: Add a "Test" badge to the user table row

In client/admin.js at L314, modify the table row template to render a Test badge inline with the status badge:

        data.users.forEach(user => {
            const testBadge = user.is_test_account
                ? '<span class="badge badge-info" title="Soak harness test account">Test</span>'
                : '';
            tbody.innerHTML += `
                <tr>
                    <td>${escapeHtml(user.username)} ${testBadge}</td>
                    <td>${escapeHtml(user.email || '-')}</td>
                    <td><span class="badge badge-${user.role === 'admin' ? 'info' : 'muted'}">${user.role}</span></td>
                    <td>${getStatusBadge(user)}</td>
                    <td>${user.games_played} (${user.games_won} wins)</td>
                    <td>${formatDateShort(user.created_at)}</td>
                    <td>
                        <button class="btn btn-small" data-action="view-user" data-id="${user.id}">View</button>
                    </td>
                </tr>
            `;
        });

Step 4: Add Test-seed indicator to invite codes list

In client/admin.js around L443 (invite codes list renderer), find the row template and add a [Test-seed] badge when invite.marks_as_test:

grep -n "invite.is_active\|invite.code\|invites-tbody\|invites-table" client/admin.js | head

Once located, modify the row template to include:

            const testSeedBadge = invite.marks_as_test
                ? '<span class="badge badge-info" title="Creates test accounts">Test-seed</span>'
                : '';
            // Insert testSeedBadge into the invite code column, e.g.
            // <td>${escapeHtml(invite.code)} ${testSeedBadge}</td>

Step 5: Wire the checkbox change event to reload users

Find where #include-banned has its change listener attached (grep for it in admin.js):

grep -n "include-banned.*addEventListener\|include-banned" client/admin.js

Add a parallel listener for #include-test that calls loadUsers():

document.getElementById('include-test').addEventListener('change', () => {
    usersPage = 0;
    loadUsers();
});

Step 6: Manual verification in browser

Open http://localhost:8000/admin.html
Log in as admin
Navigate to Users tab
Search for "soaktest"
Confirm the [Test] badge appears next to soaktest_register1
Uncheck "Include test accounts" — the row should disappear
Re-check it — the row should return
Navigate to Invite Codes tab
Confirm the [Test-seed] badge appears next to the SOAKTEST code

Step 7: Commit

git add client/admin.html client/admin.js
git commit -m "$(cat <<'EOF'
feat(admin): visible Test/Test-seed badges + filter toggle

Users table shows [Test] next to soak-harness accounts, invite codes
list shows [Test-seed] next to codes that flag new accounts as test,
and a new "Include test accounts" checkbox lets admins hide bot
traffic from the user list.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
EOF
)"

Task 8: Document the one-time staging setup step

The staging invite code 5VC2MCCN needs to be flagged as test-seed before the harness can run against staging. This is a manual one-liner; document it in a new bring-up doc.

Files:

Create: docs/soak-harness-bringup.md
Step 1: Create the bring-up doc

cat > docs/soak-harness-bringup.md <<'EOF'
# Soak Harness Bring-Up

One-time setup steps before running `tests/soak` against an environment.

## Prerequisites

- An invite code exists with 16+ available uses
- You have psql access to the target DB (or admin SQL access via some other means)

## 1. Flag the invite code as test-seed

Any account registered with a `marks_as_test=TRUE` invite code gets
`users_v2.is_test_account=TRUE`, which keeps it out of real-user stats.

### Staging

Invite code: `5VC2MCCN` (16 uses, provisioned 2026-04-10).

```sql
UPDATE invite_codes SET marks_as_test = TRUE WHERE code = '5VC2MCCN';
SELECT code, max_uses, use_count, marks_as_test FROM invite_codes WHERE code = '5VC2MCCN';

Expected: marks_as_test | t.

Local dev

The dev DB already has a SOAKTEST invite created during Task 4 of the implementation plan. If you wiped the DB since, recreate it:

INSERT INTO invite_codes (code, created_by, expires_at, max_uses, is_active, marks_as_test)
SELECT 'SOAKTEST', id, NOW() + INTERVAL '10 years', 100, TRUE, TRUE
FROM users_v2 WHERE role = 'admin' LIMIT 1
ON CONFLICT (code) DO UPDATE SET marks_as_test = TRUE;

2. Run the harness

cd tests/soak
npm install
npm run seed                                  # first run only, populates .env.stresstest
TEST_URL=http://localhost:8000 npm run smoke  # 30s end-to-end check

For staging:

TEST_URL=https://staging.adlee.work npm run soak -- --scenario=populate

See tests/soak/README.md for the full flag reference. EOF


- [ ] **Step 2: Commit**

```bash
git add docs/soak-harness-bringup.md
git commit -m "$(cat <<'EOF'
docs: soak harness bring-up steps

Documents the one-time UPDATE invite_codes SET marks_as_test = TRUE
step required before running tests/soak against each environment,
plus the local dev SOAKTEST invite recreation SQL.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
EOF
)"

Phase 2 — Harness scaffolding

Task 9: Create the `tests/soak/` package skeleton

Bare minimum to get tsx running against an empty entry point. No behavior yet.

Files:

Create: tests/soak/package.json
Create: tests/soak/tsconfig.json
Create: tests/soak/.gitignore
Create: tests/soak/.env.stresstest.example
Create: tests/soak/README.md (stub)
Create: tests/soak/runner.ts (stub — prints "hello")
Step 1: Create tests/soak/package.json

{
  "name": "golf-soak",
  "version": "0.1.0",
  "private": true,
  "description": "Multiplayer soak & UX test harness for Golf Card Game",
  "scripts": {
    "soak": "tsx runner.ts",
    "soak:populate": "tsx runner.ts --scenario=populate",
    "soak:stress": "tsx runner.ts --scenario=stress",
    "seed": "tsx scripts/seed-accounts.ts",
    "smoke": "bash scripts/smoke.sh",
    "test": "vitest run"
  },
  "dependencies": {
    "playwright-core": "^1.40.0",
    "ws": "^8.16.0"
  },
  "devDependencies": {
    "tsx": "^4.7.0",
    "@types/ws": "^8.5.0",
    "@types/node": "^20.10.0",
    "typescript": "^5.3.0",
    "vitest": "^1.2.0"
  }
}

Step 2: Create tests/soak/tsconfig.json

{
  "compilerOptions": {
    "target": "ES2022",
    "module": "commonjs",
    "moduleResolution": "node",
    "strict": true,
    "esModuleInterop": true,
    "skipLibCheck": true,
    "forceConsistentCasingInFileNames": true,
    "resolveJsonModule": true,
    "declaration": false,
    "sourceMap": true,
    "outDir": "./dist",
    "rootDir": ".",
    "baseUrl": ".",
    "lib": ["ES2022", "DOM"],
    "paths": {
      "@soak/*": ["./*"],
      "@bot/*": ["../e2e/bot/*"]
    }
  },
  "include": ["**/*.ts"],
  "exclude": ["node_modules", "dist", "artifacts"]
}

Step 3: Create tests/soak/.gitignore

node_modules/
dist/
artifacts/
.env.stresstest
*.log

Step 4: Create tests/soak/.env.stresstest.example

# Soak harness account cache.
# This file is AUTO-GENERATED on first run; do not edit by hand.
# Format: SOAK_ACCOUNT_NN=username:password:token
#
# Example (delete before first real run):
# SOAK_ACCOUNT_00=soak_00_a7bx:<generated-password>:<jwt-token>

Step 5: Create tests/soak/README.md (stub — expanded in Task 31)

# Golf Soak & UX Test Harness

Runs 16 authenticated browser sessions across 4 rooms to populate
staging scoreboards and stress-test multiplayer stability.

**Spec:** `docs/superpowers/specs/2026-04-10-multiplayer-soak-test-design.md`
**Bring-up:** `docs/soak-harness-bringup.md`

## Quick start

```bash
npm install
npm run seed                    # first run only
TEST_URL=http://localhost:8000 npm run smoke

Full documentation arrives with Task 31.


- [ ] **Step 6: Create `tests/soak/runner.ts` as a placeholder**

```typescript
#!/usr/bin/env tsx
/**
 * Golf Soak Harness — entry point.
 *
 * Placeholder. Full runner lands in Task 17.
 */

async function main(): Promise<void> {
  console.log('golf-soak runner (placeholder)');
  console.log('Full implementation lands in Task 17 of the plan.');
}

main().catch((err) => {
  console.error(err);
  process.exit(1);
});

Step 7: Install deps and verify runner executes

cd tests/soak
npm install
npx tsx runner.ts

Expected output:

golf-soak runner (placeholder)
Full implementation lands in Task 17 of the plan.

Step 8: Commit

git add tests/soak/package.json tests/soak/package-lock.json tests/soak/tsconfig.json tests/soak/.gitignore tests/soak/.env.stresstest.example tests/soak/README.md tests/soak/runner.ts
git commit -m "$(cat <<'EOF'
feat(soak): scaffold tests/soak package

Placeholder runner, tsconfig with @bot alias to tests/e2e/bot,
gitignored .env.stresstest + artifacts. Real behavior follows
in Task 10 onward.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
EOF
)"

Task 10: Core types and `Deferred` helper

Pure TypeScript with Vitest tests. No browser, no network. Establishes the type surface the rest of the harness will target.

Files:

Create: tests/soak/core/types.ts
Create: tests/soak/core/deferred.ts
Create: tests/soak/tests/deferred.test.ts
Step 1: Write the failing test for Deferred

Create tests/soak/tests/deferred.test.ts:

import { describe, it, expect } from 'vitest';
import { deferred } from '../core/deferred';

describe('deferred', () => {
  it('resolves with the given value', async () => {
    const d = deferred<string>();
    d.resolve('hello');
    await expect(d.promise).resolves.toBe('hello');
  });

  it('rejects with the given error', async () => {
    const d = deferred<string>();
    const err = new Error('boom');
    d.reject(err);
    await expect(d.promise).rejects.toBe(err);
  });

  it('ignores second resolve calls', async () => {
    const d = deferred<number>();
    d.resolve(1);
    d.resolve(2);
    await expect(d.promise).resolves.toBe(1);
  });
});

Step 2: Run the test to verify it fails

cd tests/soak
npx vitest run tests/deferred.test.ts

Expected: FAIL — module ../core/deferred does not exist.

Step 3: Implement deferred

Create tests/soak/core/deferred.ts:

/**
 * Promise deferred primitive — lets external code resolve or reject
 * a promise. Used by RoomCoordinator for host→joiners handoff.
 */

export interface Deferred<T> {
  promise: Promise<T>;
  resolve(value: T): void;
  reject(error: unknown): void;
}

export function deferred<T>(): Deferred<T> {
  let resolve!: (value: T) => void;
  let reject!: (error: unknown) => void;
  const promise = new Promise<T>((res, rej) => {
    resolve = res;
    reject = rej;
  });
  return { promise, resolve, reject };
}

Step 4: Run tests to verify they pass

npx vitest run tests/deferred.test.ts

Expected: 3 passed.

Step 5: Create core/types.ts with the scenario interfaces

/**
 * Core type definitions for the soak harness.
 *
 * Contracts here are consumed by runner.ts, SessionPool, scenarios,
 * and the dashboard. Keep this file small and stable.
 */

import type { BrowserContext, Page } from 'playwright-core';
import type { GolfBot } from '../../e2e/bot/golf-bot';

// =============================================================================
// Accounts & sessions
// =============================================================================

export interface Account {
  /** Stable key used in logs, e.g. "soak_00". */
  key: string;
  username: string;
  password: string;
  /** JWT returned from /api/auth/login, may be refreshed by SessionPool. */
  token: string;
}

export interface Session {
  account: Account;
  context: BrowserContext;
  page: Page;
  bot: GolfBot;
  /** Convenience mirror of account.key. */
  key: string;
}

// =============================================================================
// Scenarios
// =============================================================================

export interface ScenarioNeeds {
  /** Total number of authenticated sessions the scenario requires. */
  accounts: number;
  /** How many rooms to partition sessions into (default: 1). */
  rooms?: number;
  /** CPUs to add per room (default: 0). */
  cpusPerRoom?: number;
}

/** Free-form per-scenario config merged with CLI flags. */
export type ScenarioConfig = Record<string, unknown>;

export interface ScenarioError {
  room: string;
  reason: string;
  detail?: string;
  timestamp: number;
}

export interface ScenarioResult {
  gamesCompleted: number;
  errors: ScenarioError[];
  durationMs: number;
  customMetrics?: Record<string, number>;
}

export interface ScenarioContext {
  /** Merged config: CLI flags → env → scenario defaults → runner defaults. */
  config: ScenarioConfig;
  /** Pre-authenticated sessions; ordered. */
  sessions: Session[];
  coordinator: RoomCoordinatorApi;
  dashboard: DashboardReporter;
  logger: Logger;
  signal: AbortSignal;
  /** Reset the per-room watchdog. Call at each progress point. */
  heartbeat(roomId: string): void;
}

export interface Scenario {
  name: string;
  description: string;
  defaultConfig: ScenarioConfig;
  needs: ScenarioNeeds;
  run(ctx: ScenarioContext): Promise<ScenarioResult>;
}

// =============================================================================
// Room coordination
// =============================================================================

export interface RoomCoordinatorApi {
  announce(roomId: string, code: string): void;
  await(roomId: string, timeoutMs?: number): Promise<string>;
}

// =============================================================================
// Dashboard reporter
// =============================================================================

export interface RoomState {
  phase?: string;
  currentPlayer?: string;
  hole?: number;
  totalHoles?: number;
  game?: number;
  totalGames?: number;
  moves?: number;
  players?: Array<{ key: string; score: number | null; isActive: boolean }>;
  message?: string;
}

export interface DashboardReporter {
  update(roomId: string, state: Partial<RoomState>): void;
  log(level: 'info' | 'warn' | 'error', msg: string, meta?: object): void;
  incrementMetric(name: string, by?: number): void;
}

// =============================================================================
// Logger
// =============================================================================

export type LogLevel = 'debug' | 'info' | 'warn' | 'error';

export interface Logger {
  debug(msg: string, meta?: object): void;
  info(msg: string, meta?: object): void;
  warn(msg: string, meta?: object): void;
  error(msg: string, meta?: object): void;
  child(meta: object): Logger;
}

Step 6: Verify tsx still parses the runner

cd tests/soak
npx tsx runner.ts

Expected: still prints the placeholder output; no TypeScript errors from the new core/ files (they're not imported yet).

Step 7: Commit

git add tests/soak/core/deferred.ts tests/soak/core/types.ts tests/soak/tests/deferred.test.ts
git commit -m "$(cat <<'EOF'
feat(soak): core types + Deferred primitive

Establishes the Scenario/Session/Logger/DashboardReporter contracts
the rest of the harness builds on. Deferred is the building block
for RoomCoordinator's host→joiners handoff.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
EOF
)"

Task 11: RoomCoordinator with tests

Tiny abstraction over Deferred keyed by room ID, with a timeout on await.

Files:

Create: tests/soak/core/room-coordinator.ts
Create: tests/soak/tests/room-coordinator.test.ts
Step 1: Write failing tests

// tests/soak/tests/room-coordinator.test.ts
import { describe, it, expect } from 'vitest';
import { RoomCoordinator } from '../core/room-coordinator';

describe('RoomCoordinator', () => {
  it('resolves await with the announced code (announce then await)', async () => {
    const rc = new RoomCoordinator();
    rc.announce('room-1', 'ABCD');
    await expect(rc.await('room-1')).resolves.toBe('ABCD');
  });

  it('resolves await with the announced code (await then announce)', async () => {
    const rc = new RoomCoordinator();
    const p = rc.await('room-2');
    rc.announce('room-2', 'WXYZ');
    await expect(p).resolves.toBe('WXYZ');
  });

  it('rejects await after timeout if not announced', async () => {
    const rc = new RoomCoordinator();
    await expect(rc.await('room-3', 50)).rejects.toThrow(/timed out/i);
  });

  it('isolates rooms — announcing room-A does not unblock room-B', async () => {
    const rc = new RoomCoordinator();
    const pB = rc.await('room-B', 100);
    rc.announce('room-A', 'A-CODE');
    await expect(pB).rejects.toThrow(/timed out/i);
  });
});

Step 2: Run tests to verify they fail

npx vitest run tests/room-coordinator.test.ts

Expected: FAIL — module not found.

Step 3: Implement RoomCoordinator

// tests/soak/core/room-coordinator.ts
import { deferred, Deferred } from './deferred';
import type { RoomCoordinatorApi } from './types';

export class RoomCoordinator implements RoomCoordinatorApi {
  private rooms = new Map<string, Deferred<string>>();

  announce(roomId: string, code: string): void {
    this.getOrCreate(roomId).resolve(code);
  }

  async await(roomId: string, timeoutMs: number = 30_000): Promise<string> {
    const d = this.getOrCreate(roomId);
    let timer: NodeJS.Timeout | undefined;
    const timeout = new Promise<never>((_, reject) => {
      timer = setTimeout(() => {
        reject(new Error(`RoomCoordinator: room "${roomId}" timed out after ${timeoutMs}ms`));
      }, timeoutMs);
    });
    try {
      return await Promise.race([d.promise, timeout]);
    } finally {
      if (timer) clearTimeout(timer);
    }
  }

  private getOrCreate(roomId: string): Deferred<string> {
    let d = this.rooms.get(roomId);
    if (!d) {
      d = deferred<string>();
      this.rooms.set(roomId, d);
    }
    return d;
  }
}

Step 4: Verify tests pass

npx vitest run tests/room-coordinator.test.ts

Expected: 4 passed.

Step 5: Commit

git add tests/soak/core/room-coordinator.ts tests/soak/tests/room-coordinator.test.ts
git commit -m "$(cat <<'EOF'
feat(soak): RoomCoordinator with host→joiners handoff

Lazy Deferred per roomId with a timeout on await. Lets concurrent
joiner sessions block until their host announces the room code
without polling or page scraping.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
EOF
)"

Task 12: Structured JSONL logger

Single module, no transport, writes to process.stdout. Supports child loggers with bound metadata (so scenarios can emit logs with room / game context without repeating it).

Files:

Create: tests/soak/core/logger.ts
Create: tests/soak/tests/logger.test.ts
Step 1: Write failing tests

// tests/soak/tests/logger.test.ts
import { describe, it, expect, beforeEach, vi } from 'vitest';
import { createLogger } from '../core/logger';

describe('logger', () => {
  let writes: string[];
  let write: (s: string) => boolean;

  beforeEach(() => {
    writes = [];
    write = (s: string) => {
      writes.push(s);
      return true;
    };
  });

  it('emits a JSON line per call with level and msg', () => {
    const log = createLogger({ runId: 'r1', write });
    log.info('hello');
    expect(writes).toHaveLength(1);
    const parsed = JSON.parse(writes[0]);
    expect(parsed.level).toBe('info');
    expect(parsed.msg).toBe('hello');
    expect(parsed.runId).toBe('r1');
    expect(parsed.timestamp).toBeTypeOf('string');
  });

  it('merges meta into the log line', () => {
    const log = createLogger({ runId: 'r1', write });
    log.warn('slow', { turnMs: 3000 });
    const parsed = JSON.parse(writes[0]);
    expect(parsed.turnMs).toBe(3000);
    expect(parsed.level).toBe('warn');
  });

  it('child logger inherits parent meta', () => {
    const log = createLogger({ runId: 'r1', write });
    const roomLog = log.child({ room: 'room-1' });
    roomLog.info('game_start');
    const parsed = JSON.parse(writes[0]);
    expect(parsed.room).toBe('room-1');
    expect(parsed.runId).toBe('r1');
  });

  it('respects minimum level', () => {
    const log = createLogger({ runId: 'r1', write, minLevel: 'warn' });
    log.debug('nope');
    log.info('nope');
    log.warn('yes');
    log.error('yes');
    expect(writes).toHaveLength(2);
  });
});

Step 2: Run tests to verify they fail

npx vitest run tests/logger.test.ts

Expected: FAIL — module not found.

Step 3: Implement the logger

// tests/soak/core/logger.ts
import type { Logger, LogLevel } from './types';

const LEVEL_ORDER: Record<LogLevel, number> = {
  debug: 0,
  info: 1,
  warn: 2,
  error: 3,
};

export interface LoggerOptions {
  runId: string;
  minLevel?: LogLevel;
  /** Defaults to process.stdout.write bound to stdout. Override for tests. */
  write?: (line: string) => boolean;
  baseMeta?: Record<string, unknown>;
}

export function createLogger(opts: LoggerOptions): Logger {
  const minLevel = opts.minLevel ?? 'info';
  const write = opts.write ?? ((s: string) => process.stdout.write(s));
  const baseMeta = opts.baseMeta ?? {};

  function emit(level: LogLevel, msg: string, meta?: object): void {
    if (LEVEL_ORDER[level] < LEVEL_ORDER[minLevel]) return;
    const line = JSON.stringify({
      timestamp: new Date().toISOString(),
      level,
      msg,
      runId: opts.runId,
      ...baseMeta,
      ...(meta ?? {}),
    }) + '\n';
    write(line);
  }

  const logger: Logger = {
    debug: (msg, meta) => emit('debug', msg, meta),
    info: (msg, meta) => emit('info', msg, meta),
    warn: (msg, meta) => emit('warn', msg, meta),
    error: (msg, meta) => emit('error', msg, meta),
    child: (meta) =>
      createLogger({
        runId: opts.runId,
        minLevel,
        write,
        baseMeta: { ...baseMeta, ...meta },
      }),
  };

  return logger;
}

Step 4: Verify tests pass

npx vitest run tests/logger.test.ts

Expected: 4 passed.

Step 5: Commit

git add tests/soak/core/logger.ts tests/soak/tests/logger.test.ts
git commit -m "$(cat <<'EOF'
feat(soak): structured JSONL logger with child contexts

Single file, no transport, writes one JSON line per call to stdout.
Child loggers inherit parent meta so scenarios can bind room/game
context once and forget about it.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
EOF
)"

Phase 3 — SessionPool and seeding

Task 13: SessionPool with HTTP registration and localStorage warm-start

This is the biggest single module. It owns browser context lifecycle, seeds accounts on cold start, logs in on warm start, and exposes a simple acquire() API to scenarios.

Files:

Create: tests/soak/core/session-pool.ts

Testing: manual via scripts/seed-accounts.ts in Task 14 and the first real runner invocation in Task 17. No Vitest test for this — it's an integration module that needs a real browser.

Step 1: Create tests/soak/core/session-pool.ts — imports and types

// tests/soak/core/session-pool.ts
import * as fs from 'fs';
import * as path from 'path';
import {
  Browser,
  BrowserContext,
  chromium,
} from 'playwright-core';
import { GolfBot } from '../../e2e/bot/golf-bot';
import type { Account, Session, Logger } from './types';

export interface SeedOptions {
  /** Full base URL of the target server, e.g. https://staging.adlee.work. */
  targetUrl: string;
  /** Invite code to pass to /api/auth/register. */
  inviteCode: string;
  /** Number of accounts to create. */
  count: number;
}

export interface SessionPoolOptions {
  targetUrl: string;
  inviteCode: string;
  credFile: string;   // absolute path to .env.stresstest
  logger: Logger;
  /** Optional override for the browser to attach contexts to. If absent, SessionPool launches its own. */
  browser?: Browser;
  /** Passed through to context.newContext. Useful for viewport overrides in tests. */
  contextOptions?: Parameters<Browser['newContext']>[0];
}

Step 2: Implement cred-file read/write

Append to session-pool.ts:

function readCredFile(filePath: string): Account[] | null {
  if (!fs.existsSync(filePath)) return null;
  const content = fs.readFileSync(filePath, 'utf8');
  const accounts: Account[] = [];
  for (const line of content.split('\n')) {
    const trimmed = line.trim();
    if (!trimmed || trimmed.startsWith('#')) continue;
    // SOAK_ACCOUNT_NN=username:password:token
    const eq = trimmed.indexOf('=');
    if (eq === -1) continue;
    const key = trimmed.slice(0, eq);
    const value = trimmed.slice(eq + 1);
    const m = key.match(/^SOAK_ACCOUNT_(\d+)$/);
    if (!m) continue;
    const [username, password, token] = value.split(':');
    if (!username || !password || !token) continue;
    const idx = parseInt(m[1], 10);
    accounts.push({
      key: `soak_${String(idx).padStart(2, '0')}`,
      username,
      password,
      token,
    });
  }
  return accounts.length > 0 ? accounts : null;
}

function writeCredFile(filePath: string, accounts: Account[]): void {
  const lines: string[] = [
    '# Soak harness account cache — auto-generated, do not hand-edit',
    '# Format: SOAK_ACCOUNT_NN=username:password:token',
  ];
  for (const acc of accounts) {
    const idx = parseInt(acc.key.replace('soak_', ''), 10);
    const key = `SOAK_ACCOUNT_${String(idx).padStart(2, '0')}`;
    lines.push(`${key}=${acc.username}:${acc.password}:${acc.token}`);
  }
  fs.writeFileSync(filePath, lines.join('\n') + '\n', { mode: 0o600 });
}

Step 3: Implement the HTTP register call

interface RegisterResponse {
  user: { id: string; username: string };
  token: string;
  expires_at: string;
}

async function registerAccount(
  targetUrl: string,
  username: string,
  password: string,
  email: string,
  inviteCode: string,
): Promise<string> {
  const res = await fetch(`${targetUrl}/api/auth/register`, {
    method: 'POST',
    headers: { 'Content-Type': 'application/json' },
    body: JSON.stringify({ username, password, email, invite_code: inviteCode }),
  });
  if (!res.ok) {
    const body = await res.text().catch(() => '<no body>');
    throw new Error(`register failed: ${res.status} ${body}`);
  }
  const data = (await res.json()) as RegisterResponse;
  if (!data.token) {
    throw new Error(`register returned no token: ${JSON.stringify(data)}`);
  }
  return data.token;
}

async function loginAccount(
  targetUrl: string,
  username: string,
  password: string,
): Promise<string> {
  const res = await fetch(`${targetUrl}/api/auth/login`, {
    method: 'POST',
    headers: { 'Content-Type': 'application/json' },
    body: JSON.stringify({ username, password }),
  });
  if (!res.ok) {
    const body = await res.text().catch(() => '<no body>');
    throw new Error(`login failed: ${res.status} ${body}`);
  }
  const data = (await res.json()) as RegisterResponse;
  return data.token;
}

function randomSuffix(): string {
  return Math.random().toString(36).slice(2, 6);
}

function generatePassword(): string {
  // 16 chars: letters + digits + one symbol. Meets 8-char minimum from auth_service.
  // Split across halves so repo secret-scanners don't flag the string as base64
  const lower = 'abcdefghijkm' + 'npqrstuvwxyz'; // pragma: allowlist secret
  const upper = 'ABCDEFGHJKLM' + 'NPQRSTUVWXYZ'; // pragma: allowlist secret
  const digits = '23456789';
  const chars = lower + upper + digits;
  let out = '';
  for (let i = 0; i < 15; i++) {
    out += chars[Math.floor(Math.random() * chars.length)];
  }
  return out + '!';
}

Step 4: Implement the SessionPool class

export class SessionPool {
  private accounts: Account[] = [];
  private ownedBrowser: Browser | null = null;
  private browser: Browser | null;
  private activeSessions: Session[] = [];

  constructor(private opts: SessionPoolOptions) {
    this.browser = opts.browser ?? null;
  }

  /**
   * Seed `count` accounts via the register endpoint and write them to credFile.
   * Safe to call multiple times — skips accounts already in the file.
   */
  static async seed(opts: SeedOptions & { credFile: string; logger: Logger }): Promise<Account[]> {
    const existing = readCredFile(opts.credFile) ?? [];
    const existingKeys = new Set(existing.map((a) => a.key));
    const created: Account[] = [...existing];

    for (let i = 0; i < opts.count; i++) {
      const key = `soak_${String(i).padStart(2, '0')}`;
      if (existingKeys.has(key)) continue;

      const suffix = randomSuffix();
      const username = `${key}_${suffix}`;
      const password = generatePassword();
      const email = `${key}_${suffix}@soak.test`;

      opts.logger.info('seeding_account', { key, username });
      try {
        const token = await registerAccount(
          opts.targetUrl,
          username,
          password,
          email,
          opts.inviteCode,
        );
        created.push({ key, username, password, token });
        writeCredFile(opts.credFile, created);
      } catch (err) {
        opts.logger.error('seed_failed', {
          key,
          error: err instanceof Error ? err.message : String(err),
        });
        throw err;
      }
    }
    return created;
  }

  /**
   * Load accounts from credFile, auto-seeding if the file is missing.
   */
  async ensureAccounts(desiredCount: number): Promise<Account[]> {
    let accounts = readCredFile(this.opts.credFile);
    if (!accounts || accounts.length < desiredCount) {
      this.opts.logger.warn('cred_file_missing_or_short', {
        found: accounts?.length ?? 0,
        desired: desiredCount,
      });
      accounts = await SessionPool.seed({
        targetUrl: this.opts.targetUrl,
        inviteCode: this.opts.inviteCode,
        count: desiredCount,
        credFile: this.opts.credFile,
        logger: this.opts.logger,
      });
    }
    this.accounts = accounts.slice(0, desiredCount);
    return this.accounts;
  }

  /**
   * Launch the browser if not provided, create N contexts, log each in via
   * localStorage injection (falling back to POST /api/auth/login if the
   * cached token is rejected), and return the live sessions.
   */
  async acquire(count: number): Promise<Session[]> {
    await this.ensureAccounts(count);
    if (!this.browser) {
      this.ownedBrowser = await chromium.launch({ headless: true });
      this.browser = this.ownedBrowser;
    }

    const sessions: Session[] = [];
    for (let i = 0; i < count; i++) {
      const account = this.accounts[i];
      const context = await this.browser.newContext(this.opts.contextOptions);
      await this.injectAuth(context, account);
      const page = await context.newPage();
      await page.goto(this.opts.targetUrl);
      const bot = new GolfBot(page);
      sessions.push({ account, context, page, bot, key: account.key });
    }
    this.activeSessions = sessions;
    return sessions;
  }

  /**
   * Inject the cached JWT into localStorage BEFORE any page loads.
   * Uses addInitScript so the token is present on the first navigation.
   * If the cached token is rejected later, acquire() falls back to login.
   */
  private async injectAuth(context: BrowserContext, account: Account): Promise<void> {
    // Try the cached token first
    try {
      await context.addInitScript(
        ({ token, username }) => {
          window.localStorage.setItem('authToken', token);
          window.localStorage.setItem(
            'authUser',
            JSON.stringify({ id: '', username, role: 'user', email_verified: true }),
          );
        },
        { token: account.token, username: account.username },
      );
    } catch (err) {
      this.opts.logger.warn('inject_auth_failed', {
        account: account.key,
        error: err instanceof Error ? err.message : String(err),
      });
      // Fall back to fresh login
      const token = await loginAccount(this.opts.targetUrl, account.username, account.password);
      account.token = token;
      writeCredFile(this.opts.credFile, this.accounts);
      await context.addInitScript(
        ({ token, username }) => {
          window.localStorage.setItem('authToken', token);
          window.localStorage.setItem(
            'authUser',
            JSON.stringify({ id: '', username, role: 'user', email_verified: true }),
          );
        },
        { token, username: account.username },
      );
    }
  }

  /** Close all active contexts. Safe to call multiple times. */
  async release(): Promise<void> {
    for (const session of this.activeSessions) {
      try {
        await session.context.close();
      } catch {
        // ignore
      }
    }
    this.activeSessions = [];
    if (this.ownedBrowser) {
      try {
        await this.ownedBrowser.close();
      } catch {
        // ignore
      }
      this.ownedBrowser = null;
      this.browser = null;
    }
  }
}

Step 5: Syntax-check by invoking tsx

cd tests/soak
npx tsx -e "import('./core/session-pool').then(() => console.log('ok'))"

Expected: ok. No TypeScript errors.

Step 6: Commit

git add tests/soak/core/session-pool.ts
git commit -m "$(cat <<'EOF'
feat(soak): SessionPool — seed, login, acquire contexts

Owns 16 BrowserContexts, seeds via POST /api/auth/register with the
invite code on cold start, warm-starts via localStorage injection of
the cached JWT, falls back to POST /api/auth/login if the token is
rejected. Exposes acquire(n) for scenarios.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
EOF
)"

Task 14: `seed-accounts.ts` CLI wrapper

Tiny standalone entry point that lets you pre-seed before the first harness run. Reuses SessionPool.seed.

Files:

Create: tests/soak/scripts/seed-accounts.ts
Step 1: Write the script

#!/usr/bin/env tsx
/**
 * Seed N soak-harness accounts via the register endpoint.
 *
 * Usage:
 *   TEST_URL=http://localhost:8000 \
 *   SOAK_INVITE_CODE=SOAKTEST \
 *     npm run seed -- --count=16
 */

import * as path from 'path';
import { SessionPool } from '../core/session-pool';
import { createLogger } from '../core/logger';

function parseArgs(argv: string[]): { count: number } {
  const result = { count: 16 };
  for (const arg of argv.slice(2)) {
    const m = arg.match(/^--count=(\d+)$/);
    if (m) result.count = parseInt(m[1], 10);
  }
  return result;
}

async function main(): Promise<void> {
  const { count } = parseArgs(process.argv);
  const targetUrl = process.env.TEST_URL ?? 'http://localhost:8000';
  const inviteCode = process.env.SOAK_INVITE_CODE;
  if (!inviteCode) {
    console.error('SOAK_INVITE_CODE env var is required');
    console.error('  Local dev: SOAK_INVITE_CODE=SOAKTEST');
    console.error('  Staging:   SOAK_INVITE_CODE=5VC2MCCN');
    process.exit(2);
  }

  const credFile = path.resolve(__dirname, '..', '.env.stresstest');
  const logger = createLogger({ runId: `seed-${Date.now()}` });

  logger.info('seed_start', { count, targetUrl, credFile });
  try {
    const accounts = await SessionPool.seed({
      targetUrl,
      inviteCode,
      count,
      credFile,
      logger,
    });
    logger.info('seed_complete', { created: accounts.length });
    console.error(`Seeded ${accounts.length} accounts → ${credFile}`);
  } catch (err) {
    logger.error('seed_failed', {
      error: err instanceof Error ? err.message : String(err),
    });
    process.exit(1);
  }
}

main();

Step 2: Run it against local dev to verify end-to-end

With the dev server running and the SOAKTEST invite flagged:

cd tests/soak
TEST_URL=http://localhost:8000 SOAK_INVITE_CODE=SOAKTEST npm run seed -- --count=4

Expected:

Log lines seeding_account × 4
Log line seed_complete
tests/soak/.env.stresstest file created with 4 SOAK_ACCOUNT_NN=... lines

Verify:

cat tests/soak/.env.stresstest | head

Expected: 4 account lines.

Also verify the accounts got flagged:

psql -d golfgame -c "SELECT username, is_test_account FROM users_v2 WHERE username LIKE 'soak_%' ORDER BY username;"

Expected: 4 rows, all with is_test_account | t.

Step 3: Commit

git add tests/soak/scripts/seed-accounts.ts
git commit -m "$(cat <<'EOF'
feat(soak): scripts/seed-accounts.ts CLI wrapper

Thin standalone entry for pre-seeding N accounts before the first
harness run. Wraps SessionPool.seed and writes .env.stresstest.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
EOF
)"

Phase 4 — First scenario, config, runner (end-to-end milestone)

Task 15: Shared multiplayer-game helper

Pulls the "run one full game in one room" logic out of the scenarios so populate and stress share it. Takes a room's sessions and a config, loops until the game ends.

Files:

Create: tests/soak/scenarios/shared/multiplayer-game.ts
Step 1: Create the helper module

// tests/soak/scenarios/shared/multiplayer-game.ts
import type { Session, ScenarioContext } from '../../core/types';

export interface MultiplayerGameOptions {
  roomId: string;
  holes: number;
  decks: number;
  cpusPerRoom: number;
  cpuPersonality?: string;
  /** Per-turn think time in [min, max] ms. */
  thinkTimeMs: [number, number];
  /** Max wall-clock time before giving up on the game (ms). */
  maxDurationMs?: number;
}

export interface MultiplayerGameResult {
  completed: boolean;
  turns: number;
  durationMs: number;
  error?: string;
}

function randomInt(min: number, max: number): number {
  return Math.floor(Math.random() * (max - min + 1)) + min;
}

async function sleep(ms: number): Promise<void> {
  return new Promise((resolve) => setTimeout(resolve, ms));
}

/**
 * Host + joiners play one full multiplayer game end to end.
 * The host creates the room, announces the code via the coordinator,
 * joiners wait for the code, the host adds CPUs and starts, everyone
 * loops on isMyTurn/playTurn until round_over or game_over.
 */
export async function runOneMultiplayerGame(
  ctx: ScenarioContext,
  sessions: Session[],
  opts: MultiplayerGameOptions,
): Promise<MultiplayerGameResult> {
  const start = Date.now();
  const [host, ...joiners] = sessions;
  const maxDuration = opts.maxDurationMs ?? 5 * 60_000;

  try {
    // Host creates game
    const code = await host.bot.createGame(host.account.username);
    ctx.coordinator.announce(opts.roomId, code);
    ctx.heartbeat(opts.roomId);
    ctx.dashboard.update(opts.roomId, { phase: 'lobby' });
    ctx.logger.info('room_created', { room: opts.roomId, code });

    // Joiners join concurrently
    await Promise.all(
      joiners.map(async (joiner) => {
        const awaited = await ctx.coordinator.await(opts.roomId);
        await joiner.bot.joinGame(awaited, joiner.account.username);
      }),
    );
    ctx.heartbeat(opts.roomId);

    // Host adds CPUs (if any) and starts
    for (let i = 0; i < opts.cpusPerRoom; i++) {
      await host.bot.addCPU(opts.cpuPersonality);
    }
    await host.bot.startGame({ holes: opts.holes, decks: opts.decks });
    ctx.heartbeat(opts.roomId);
    ctx.dashboard.update(opts.roomId, { phase: 'playing', totalHoles: opts.holes });

    // Concurrent turn loops — one per session
    const turnCounts = new Array(sessions.length).fill(0);

    async function sessionLoop(sessionIdx: number): Promise<void> {
      const session = sessions[sessionIdx];
      while (true) {
        if (ctx.signal.aborted) return;
        if (Date.now() - start > maxDuration) return;

        const phase = await session.bot.getGamePhase();
        if (phase === 'game_over' || phase === 'round_over') return;

        if (await session.bot.isMyTurn()) {
          await session.bot.playTurn();
          turnCounts[sessionIdx]++;
          ctx.heartbeat(opts.roomId);
          ctx.dashboard.update(opts.roomId, {
            currentPlayer: session.account.username,
            moves: turnCounts.reduce((a, b) => a + b, 0),
          });
          const thinkMs = randomInt(opts.thinkTimeMs[0], opts.thinkTimeMs[1]);
          await sleep(thinkMs);
        } else {
          await sleep(200);
        }
      }
    }

    await Promise.all(sessions.map((_, i) => sessionLoop(i)));

    const totalTurns = turnCounts.reduce((a, b) => a + b, 0);
    ctx.dashboard.update(opts.roomId, { phase: 'round_over' });
    return {
      completed: true,
      turns: totalTurns,
      durationMs: Date.now() - start,
    };
  } catch (err) {
    return {
      completed: false,
      turns: 0,
      durationMs: Date.now() - start,
      error: err instanceof Error ? err.message : String(err),
    };
  }
}

Step 2: Syntax-check

cd tests/soak
npx tsx -e "import('./scenarios/shared/multiplayer-game').then(() => console.log('ok'))"

Expected: ok.

Step 3: Commit

git add tests/soak/scenarios/shared/multiplayer-game.ts
git commit -m "$(cat <<'EOF'
feat(soak): shared runOneMultiplayerGame helper

Encapsulates the host-creates/joiners-join/loop-until-done flow so
populate and stress scenarios don't duplicate it. Honors abort
signal and a max-duration timeout, heartbeats on every turn.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
EOF
)"

Task 16: Populate scenario (minimal version)

Partitions sessions into rooms, runs gamesPerRoom games per room in parallel, aggregates results.

Files:

Create: tests/soak/scenarios/populate.ts
Create: tests/soak/scenarios/index.ts
Step 1: Create scenarios/populate.ts

// tests/soak/scenarios/populate.ts
import type {
  Scenario,
  ScenarioContext,
  ScenarioResult,
  ScenarioError,
  Session,
} from '../core/types';
import { runOneMultiplayerGame } from './shared/multiplayer-game';

const CPU_PERSONALITIES = ['Sofia', 'Marcus', 'Kenji', 'Priya'];

interface PopulateConfig {
  gamesPerRoom: number;
  holes: number;
  decks: number;
  rooms: number;
  cpusPerRoom: number;
  thinkTimeMs: [number, number];
  interGamePauseMs: number;
}

function chunk<T>(arr: T[], size: number): T[][] {
  const out: T[][] = [];
  for (let i = 0; i < arr.length; i += size) {
    out.push(arr.slice(i, i + size));
  }
  return out;
}

async function sleep(ms: number): Promise<void> {
  return new Promise((resolve) => setTimeout(resolve, ms));
}

async function runRoom(
  ctx: ScenarioContext,
  cfg: PopulateConfig,
  roomIdx: number,
  sessions: Session[],
): Promise<{ completed: number; errors: ScenarioError[] }> {
  const roomId = `room-${roomIdx}`;
  const cpuPersonality = CPU_PERSONALITIES[roomIdx % CPU_PERSONALITIES.length];
  let completed = 0;
  const errors: ScenarioError[] = [];

  for (let gameNum = 0; gameNum < cfg.gamesPerRoom; gameNum++) {
    if (ctx.signal.aborted) break;
    ctx.dashboard.update(roomId, { game: gameNum + 1, totalGames: cfg.gamesPerRoom });
    ctx.logger.info('game_start', { room: roomId, game: gameNum + 1 });

    const result = await runOneMultiplayerGame(ctx, sessions, {
      roomId,
      holes: cfg.holes,
      decks: cfg.decks,
      cpusPerRoom: cfg.cpusPerRoom,
      cpuPersonality,
      thinkTimeMs: cfg.thinkTimeMs,
    });

    if (result.completed) {
      completed++;
      ctx.logger.info('game_complete', {
        room: roomId,
        game: gameNum + 1,
        turns: result.turns,
        durationMs: result.durationMs,
      });
    } else {
      errors.push({
        room: roomId,
        reason: 'game_failed',
        detail: result.error,
        timestamp: Date.now(),
      });
      ctx.logger.error('game_failed', { room: roomId, game: gameNum + 1, error: result.error });
    }

    if (gameNum < cfg.gamesPerRoom - 1) {
      await sleep(cfg.interGamePauseMs);
    }
  }

  return { completed, errors };
}

const populate: Scenario = {
  name: 'populate',
  description: 'Long multi-round games to populate scoreboards',
  needs: { accounts: 16, rooms: 4, cpusPerRoom: 1 },
  defaultConfig: {
    gamesPerRoom: 10,
    holes: 9,
    decks: 2,
    rooms: 4,
    cpusPerRoom: 1,
    thinkTimeMs: [800, 2200],
    interGamePauseMs: 3000,
  },

  async run(ctx: ScenarioContext): Promise<ScenarioResult> {
    const start = Date.now();
    const cfg = ctx.config as unknown as PopulateConfig;

    const perRoom = Math.floor(ctx.sessions.length / cfg.rooms);
    if (perRoom * cfg.rooms !== ctx.sessions.length) {
      throw new Error(
        `populate: ${ctx.sessions.length} sessions does not divide evenly into ${cfg.rooms} rooms`,
      );
    }
    const roomSessions = chunk(ctx.sessions, perRoom);

    const results = await Promise.allSettled(
      roomSessions.map((sessions, idx) => runRoom(ctx, cfg, idx, sessions)),
    );

    let gamesCompleted = 0;
    const errors: ScenarioError[] = [];
    results.forEach((r, idx) => {
      if (r.status === 'fulfilled') {
        gamesCompleted += r.value.completed;
        errors.push(...r.value.errors);
      } else {
        errors.push({
          room: `room-${idx}`,
          reason: 'room_threw',
          detail: r.reason instanceof Error ? r.reason.message : String(r.reason),
          timestamp: Date.now(),
        });
      }
    });

    return {
      gamesCompleted,
      errors,
      durationMs: Date.now() - start,
    };
  },
};

export default populate;

Step 2: Create scenarios/index.ts registry

// tests/soak/scenarios/index.ts
import type { Scenario } from '../core/types';
import populate from './populate';

const registry: Record<string, Scenario> = {
  populate,
};

export function getScenario(name: string): Scenario | undefined {
  return registry[name];
}

export function listScenarios(): Scenario[] {
  return Object.values(registry);
}

Step 3: Syntax-check

cd tests/soak
npx tsx -e "import('./scenarios/index').then((m) => console.log(m.listScenarios().map(s => s.name)))"

Expected: ['populate'].

Step 4: Commit

git add tests/soak/scenarios/populate.ts tests/soak/scenarios/index.ts
git commit -m "$(cat <<'EOF'
feat(soak): populate scenario + scenario registry

Partitions sessions into N rooms, runs gamesPerRoom games per room
in parallel via Promise.allSettled so a failure in one room never
unwinds the others. Errors roll up into ScenarioResult.errors.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
EOF
)"

Task 17: Config parsing with tests

CLI flags, env vars, scenario defaults, runner defaults — merged in that precedence order.

Files:

Create: tests/soak/config.ts
Create: tests/soak/tests/config.test.ts
Step 1: Write failing tests

// tests/soak/tests/config.test.ts
import { describe, it, expect } from 'vitest';
import { parseArgs, mergeConfig } from '../config';

describe('parseArgs', () => {
  it('parses --scenario and numeric flags', () => {
    const r = parseArgs(['--scenario=populate', '--rooms=4', '--games-per-room=10']);
    expect(r.scenario).toBe('populate');
    expect(r.rooms).toBe(4);
    expect(r.gamesPerRoom).toBe(10);
  });

  it('parses watch mode', () => {
    const r = parseArgs(['--scenario=populate', '--watch=none']);
    expect(r.watch).toBe('none');
  });

  it('rejects unknown watch mode', () => {
    expect(() => parseArgs(['--scenario=populate', '--watch=bogus'])).toThrow();
  });

  it('--list sets listOnly', () => {
    const r = parseArgs(['--list']);
    expect(r.listOnly).toBe(true);
  });
});

describe('mergeConfig', () => {
  it('CLI flags override scenario defaults', () => {
    const cfg = mergeConfig(
      { games: 5, holes: 9 },
      {},
      { gamesPerRoom: 20 },
    );
    expect(cfg.gamesPerRoom).toBe(20);
  });

  it('env overrides scenario defaults but not CLI', () => {
    const cfg = mergeConfig(
      { games: 5, holes: 9 },
      { SOAK_HOLES: '3' },
      { holes: 7 },
    );
    expect(cfg.holes).toBe(7);    // CLI wins (7 was from scenario defaults? no — CLI not set here)
    // Correction: CLI not set, so env wins over scenario default
  });

  it('scenario defaults fill in unset values', () => {
    const cfg = mergeConfig(
      { games: 5, holes: 9 },
      {},
      { gamesPerRoom: 3 },
    );
    expect(cfg.games).toBe(5);
    expect(cfg.holes).toBe(9);
    expect(cfg.gamesPerRoom).toBe(3);
  });
});

Note: the middle test has a correction inline — re-read and fix so the assertion matches precedence "CLI > env > defaults". Correct version:

  it('env overrides scenario defaults but CLI overrides env', () => {
    const cfg = mergeConfig(
      { holes: 5 },                 // CLI
      { SOAK_HOLES: '3' },          // env
      { holes: 9 },                 // defaults
    );
    expect(cfg.holes).toBe(5);      // CLI wins
  });

Replace the second it(...) block above with this corrected version before running.

Step 2: Run tests to verify they fail

npx vitest run tests/config.test.ts

Expected: FAIL — module not found.

Step 3: Implement config.ts

// tests/soak/config.ts

export type WatchMode = 'none' | 'dashboard' | 'tiled';

export interface CliArgs {
  scenario?: string;
  accounts?: number;
  rooms?: number;
  cpusPerRoom?: number;
  gamesPerRoom?: number;
  holes?: number;
  watch?: WatchMode;
  dashboardPort?: number;
  target?: string;
  runId?: string;
  dryRun?: boolean;
  listOnly?: boolean;
}

const VALID_WATCH: WatchMode[] = ['none', 'dashboard', 'tiled'];

function parseInt10(s: string, name: string): number {
  const n = parseInt(s, 10);
  if (Number.isNaN(n)) throw new Error(`Invalid integer for ${name}: ${s}`);
  return n;
}

export function parseArgs(argv: string[]): CliArgs {
  const out: CliArgs = {};
  for (const arg of argv) {
    if (arg === '--list') {
      out.listOnly = true;
      continue;
    }
    if (arg === '--dry-run') {
      out.dryRun = true;
      continue;
    }
    const m = arg.match(/^--([a-z][a-z0-9-]*)=(.*)$/);
    if (!m) continue;
    const [, key, value] = m;
    switch (key) {
      case 'scenario':
        out.scenario = value;
        break;
      case 'accounts':
        out.accounts = parseInt10(value, '--accounts');
        break;
      case 'rooms':
        out.rooms = parseInt10(value, '--rooms');
        break;
      case 'cpus-per-room':
        out.cpusPerRoom = parseInt10(value, '--cpus-per-room');
        break;
      case 'games-per-room':
        out.gamesPerRoom = parseInt10(value, '--games-per-room');
        break;
      case 'holes':
        out.holes = parseInt10(value, '--holes');
        break;
      case 'watch':
        if (!VALID_WATCH.includes(value as WatchMode)) {
          throw new Error(`Invalid --watch value: ${value} (expected ${VALID_WATCH.join('|')})`);
        }
        out.watch = value as WatchMode;
        break;
      case 'dashboard-port':
        out.dashboardPort = parseInt10(value, '--dashboard-port');
        break;
      case 'target':
        out.target = value;
        break;
      case 'run-id':
        out.runId = value;
        break;
      default:
        // Unknown flag — ignore so scenario-specific flags can be added later
        break;
    }
  }
  return out;
}

/**
 * Merge in order: scenarioDefaults → env → cli (later wins).
 */
export function mergeConfig(
  cli: Record<string, unknown>,
  env: Record<string, string | undefined>,
  defaults: Record<string, unknown>,
): Record<string, unknown> {
  const merged: Record<string, unknown> = { ...defaults };

  // Env overlay — SOAK_UPPER_SNAKE → lowerCamel in cli space.
  const envMap: Record<string, string> = {
    SOAK_HOLES: 'holes',
    SOAK_ROOMS: 'rooms',
    SOAK_ACCOUNTS: 'accounts',
    SOAK_CPUS_PER_ROOM: 'cpusPerRoom',
    SOAK_GAMES_PER_ROOM: 'gamesPerRoom',
    SOAK_WATCH: 'watch',
    SOAK_DASHBOARD_PORT: 'dashboardPort',
  };
  for (const [envKey, cfgKey] of Object.entries(envMap)) {
    const v = env[envKey];
    if (v !== undefined) {
      // Heuristic: numeric keys
      if (/^(holes|rooms|accounts|cpusPerRoom|gamesPerRoom|dashboardPort)$/.test(cfgKey)) {
        merged[cfgKey] = parseInt(v, 10);
      } else {
        merged[cfgKey] = v;
      }
    }
  }

  // CLI overlay — wins over env and defaults.
  for (const [k, v] of Object.entries(cli)) {
    if (v !== undefined) merged[k] = v;
  }

  return merged;
}

Step 4: Fix the failing middle test as noted in Step 1

Edit tests/soak/tests/config.test.ts and replace the second it(...) block inside describe('mergeConfig') with the corrected version provided in Step 1.

Step 5: Run tests to verify they pass

npx vitest run tests/config.test.ts

Expected: all passing.

Step 6: Commit

git add tests/soak/config.ts tests/soak/tests/config.test.ts
git commit -m "$(cat <<'EOF'
feat(soak): CLI parsing + config precedence

parseArgs pulls --scenario/--rooms/--watch/etc from argv, mergeConfig
layers scenarioDefaults → env → CLI so CLI flags always win. Unit
tested.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
EOF
)"

Task 18: `runner.ts` entry point — first end-to-end milestone

Replaces the placeholder runner with the real thing: parse args, build dependencies, load scenario, acquire sessions, run scenario, clean up, print summary. Supports --watch=none only at this stage.

Files:

Modify: tests/soak/runner.ts (replace placeholder)
Step 1: Rewrite runner.ts

#!/usr/bin/env tsx
/**
 * Golf Soak Harness — entry point.
 *
 * Usage:
 *   TEST_URL=http://localhost:8000 \
 *   SOAK_INVITE_CODE=SOAKTEST \
 *     npm run soak -- --scenario=populate --rooms=1 --accounts=2 \
 *       --cpus-per-room=0 --games-per-room=1 --holes=1 --watch=none
 */

import * as path from 'path';
import { parseArgs, mergeConfig, CliArgs } from './config';
import { createLogger } from './core/logger';
import { SessionPool } from './core/session-pool';
import { RoomCoordinator } from './core/room-coordinator';
import { getScenario, listScenarios } from './scenarios';
import type { DashboardReporter, ScenarioContext } from './core/types';

function noopDashboard(): DashboardReporter {
  return {
    update: () => {},
    log: () => {},
    incrementMetric: () => {},
  };
}

function printScenarioList(): void {
  console.log('Available scenarios:');
  for (const s of listScenarios()) {
    console.log(`  ${s.name.padEnd(12)} ${s.description}`);
    console.log(`    needs: accounts=${s.needs.accounts}, rooms=${s.needs.rooms ?? 1}, cpus=${s.needs.cpusPerRoom ?? 0}`);
  }
}

async function main(): Promise<void> {
  const cli: CliArgs = parseArgs(process.argv.slice(2));

  if (cli.listOnly) {
    printScenarioList();
    return;
  }

  if (!cli.scenario) {
    console.error('Error: --scenario=<name> is required. Use --list to see scenarios.');
    process.exit(2);
  }

  const scenario = getScenario(cli.scenario);
  if (!scenario) {
    console.error(`Error: unknown scenario "${cli.scenario}". Use --list to see scenarios.`);
    process.exit(2);
  }

  const runId = cli.runId ?? `${cli.scenario}-${new Date().toISOString().replace(/[:.]/g, '-')}`;
  const targetUrl = cli.target ?? process.env.TEST_URL ?? 'http://localhost:8000';
  const inviteCode = process.env.SOAK_INVITE_CODE ?? 'SOAKTEST';
  const watch = cli.watch ?? 'dashboard';

  const logger = createLogger({ runId });
  logger.info('run_start', {
    scenario: scenario.name,
    targetUrl,
    watch,
    cli,
  });

  // Resolve final config
  const config = mergeConfig(
    cli as Record<string, unknown>,
    process.env,
    scenario.defaultConfig,
  );
  // Ensure core knobs exist
  const accounts = Number(config.accounts ?? scenario.needs.accounts);
  const rooms = Number(config.rooms ?? scenario.needs.rooms ?? 1);
  const cpusPerRoom = Number(config.cpusPerRoom ?? scenario.needs.cpusPerRoom ?? 0);
  if (accounts % rooms !== 0) {
    console.error(`Error: --accounts=${accounts} does not divide evenly into --rooms=${rooms}`);
    process.exit(2);
  }
  config.rooms = rooms;
  config.cpusPerRoom = cpusPerRoom;

  if (cli.dryRun) {
    logger.info('dry_run', { config });
    console.log('Dry run OK. Resolved config:');
    console.log(JSON.stringify(config, null, 2));
    return;
  }

  if (watch !== 'none') {
    logger.warn('watch_mode_not_yet_implemented', { watch });
    console.warn(`Watch mode "${watch}" not yet implemented — falling back to "none".`);
  }

  // Build dependencies
  const credFile = path.resolve(__dirname, '.env.stresstest');
  const pool = new SessionPool({
    targetUrl,
    inviteCode,
    credFile,
    logger,
  });
  const coordinator = new RoomCoordinator();
  const dashboard = noopDashboard();
  const abortController = new AbortController();

  const onSignal = (sig: string) => {
    logger.warn('signal_received', { signal: sig });
    abortController.abort();
  };
  process.on('SIGINT', () => onSignal('SIGINT'));
  process.on('SIGTERM', () => onSignal('SIGTERM'));

  let exitCode = 0;
  try {
    const sessions = await pool.acquire(accounts);
    logger.info('sessions_acquired', { count: sessions.length });

    const ctx: ScenarioContext = {
      config,
      sessions,
      coordinator,
      dashboard,
      logger,
      signal: abortController.signal,
      heartbeat: () => {}, // Task 26 wires this up
    };

    const result = await scenario.run(ctx);
    logger.info('run_complete', {
      gamesCompleted: result.gamesCompleted,
      errors: result.errors.length,
      durationMs: result.durationMs,
    });
    console.log(`Games completed: ${result.gamesCompleted}`);
    console.log(`Errors:          ${result.errors.length}`);
    console.log(`Duration:        ${(result.durationMs / 1000).toFixed(1)}s`);
    if (result.errors.length > 0) {
      console.log('Errors:');
      for (const e of result.errors) {
        console.log(`  ${e.room}: ${e.reason}${e.detail ? ' — ' + e.detail : ''}`);
      }
      exitCode = 1;
    }
  } catch (err) {
    logger.error('run_failed', {
      error: err instanceof Error ? err.message : String(err),
      stack: err instanceof Error ? err.stack : undefined,
    });
    exitCode = 1;
  } finally {
    await pool.release();
  }

  if (abortController.signal.aborted && exitCode === 0) exitCode = 2;
  process.exit(exitCode);
}

main().catch((err) => {
  console.error(err);
  process.exit(1);
});

Step 2: Run a minimal --watch=none smoke against local dev

Server running, 4 soak accounts already seeded from Task 14:

cd tests/soak
TEST_URL=http://localhost:8000 SOAK_INVITE_CODE=SOAKTEST npm run soak -- \
  --scenario=populate \
  --accounts=2 \
  --rooms=1 \
  --cpus-per-room=0 \
  --games-per-room=1 \
  --holes=1 \
  --watch=none

Expected output (abbreviated):

{"timestamp":"...","level":"info","msg":"run_start",...}
{"timestamp":"...","level":"info","msg":"sessions_acquired","count":2}
{"timestamp":"...","level":"info","msg":"game_start","room":"room-0","game":1}
{"timestamp":"...","level":"info","msg":"room_created","code":"XXXX"}
{"timestamp":"...","level":"info","msg":"game_complete","room":"room-0","turns":...}
{"timestamp":"...","level":"info","msg":"run_complete","gamesCompleted":1,"errors":0}
Games completed: 1
Errors:          0
Duration:        X.Xs

Exit code 0.

This is the first end-to-end milestone. Stop here if debugging is needed — fix issues before moving on.

Step 3: Commit

git add tests/soak/runner.ts
git commit -m "$(cat <<'EOF'
feat(soak): runner.ts end-to-end with --watch=none

First full end-to-end milestone: parses CLI, builds SessionPool +
RoomCoordinator, loads a scenario by name, runs it, reports results,
cleans up. Watch modes other than "none" log a warning and fall back
until Tasks 19-24 implement them.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
EOF
)"

Phase 5 — Dashboard status grid

Task 19: Dashboard HTTP + WS server

Vanilla node http + ws. Serves one static HTML page, accepts WS connections, broadcasts room-state updates.

Files:

Create: tests/soak/dashboard/server.ts
Step 1: Implement dashboard/server.ts

// tests/soak/dashboard/server.ts
import * as http from 'http';
import * as fs from 'fs';
import * as path from 'path';
import { WebSocketServer, WebSocket } from 'ws';
import type { DashboardReporter, Logger, RoomState } from '../core/types';

export type DashboardIncoming =
  | { type: 'start_stream'; sessionKey: string }
  | { type: 'stop_stream'; sessionKey: string };

export type DashboardOutgoing =
  | { type: 'room_state'; roomId: string; state: Partial<RoomState> }
  | { type: 'log'; level: string; msg: string; meta?: object; timestamp: number }
  | { type: 'metric'; name: string; value: number }
  | { type: 'frame'; sessionKey: string; jpegBase64: string };

export interface DashboardHandlers {
  onStartStream?(sessionKey: string): void;
  onStopStream?(sessionKey: string): void;
  onDisconnect?(): void;
}

export class DashboardServer {
  private httpServer!: http.Server;
  private wsServer!: WebSocketServer;
  private clients = new Set<WebSocket>();
  private metrics: Record<string, number> = {};
  private roomStates: Record<string, Partial<RoomState>> = {};

  constructor(
    private port: number,
    private logger: Logger,
    private handlers: DashboardHandlers = {},
  ) {}

  async start(): Promise<void> {
    const htmlPath = path.resolve(__dirname, 'index.html');
    const cssPath = path.resolve(__dirname, 'dashboard.css');
    const jsPath = path.resolve(__dirname, 'dashboard.js');

    this.httpServer = http.createServer((req, res) => {
      const url = req.url ?? '/';
      if (url === '/' || url === '/index.html') {
        res.writeHead(200, { 'Content-Type': 'text/html; charset=utf-8' });
        fs.createReadStream(htmlPath).pipe(res);
      } else if (url === '/dashboard.css') {
        res.writeHead(200, { 'Content-Type': 'text/css' });
        fs.createReadStream(cssPath).pipe(res);
      } else if (url === '/dashboard.js') {
        res.writeHead(200, { 'Content-Type': 'application/javascript' });
        fs.createReadStream(jsPath).pipe(res);
      } else {
        res.writeHead(404);
        res.end('not found');
      }
    });

    this.wsServer = new WebSocketServer({ server: this.httpServer });
    this.wsServer.on('connection', (ws) => {
      this.clients.add(ws);
      this.logger.info('dashboard_client_connected', { count: this.clients.size });

      // Replay current state to the new client
      for (const [roomId, state] of Object.entries(this.roomStates)) {
        ws.send(JSON.stringify({ type: 'room_state', roomId, state } as DashboardOutgoing));
      }
      for (const [name, value] of Object.entries(this.metrics)) {
        ws.send(JSON.stringify({ type: 'metric', name, value } as DashboardOutgoing));
      }

      ws.on('message', (data) => {
        try {
          const parsed = JSON.parse(data.toString()) as DashboardIncoming;
          if (parsed.type === 'start_stream' && this.handlers.onStartStream) {
            this.handlers.onStartStream(parsed.sessionKey);
          } else if (parsed.type === 'stop_stream' && this.handlers.onStopStream) {
            this.handlers.onStopStream(parsed.sessionKey);
          }
        } catch (err) {
          this.logger.warn('dashboard_ws_parse_error', {
            error: err instanceof Error ? err.message : String(err),
          });
        }
      });

      ws.on('close', () => {
        this.clients.delete(ws);
        this.logger.info('dashboard_client_disconnected', { count: this.clients.size });
        if (this.clients.size === 0 && this.handlers.onDisconnect) {
          this.handlers.onDisconnect();
        }
      });
    });

    await new Promise<void>((resolve) => {
      this.httpServer.listen(this.port, () => resolve());
    });
    this.logger.info('dashboard_listening', { url: `http://localhost:${this.port}` });
  }

  async stop(): Promise<void> {
    for (const ws of this.clients) {
      try {
        ws.close();
      } catch {
        // ignore
      }
    }
    this.clients.clear();
    await new Promise<void>((resolve) => {
      this.wsServer.close(() => resolve());
    });
    await new Promise<void>((resolve) => {
      this.httpServer.close(() => resolve());
    });
  }

  broadcast(msg: DashboardOutgoing): void {
    const payload = JSON.stringify(msg);
    for (const ws of this.clients) {
      if (ws.readyState === WebSocket.OPEN) {
        ws.send(payload);
      }
    }
  }

  /** Create a DashboardReporter wired to this server. */
  reporter(): DashboardReporter {
    return {
      update: (roomId, state) => {
        this.roomStates[roomId] = { ...this.roomStates[roomId], ...state };
        this.broadcast({ type: 'room_state', roomId, state });
      },
      log: (level, msg, meta) => {
        this.broadcast({ type: 'log', level, msg, meta, timestamp: Date.now() });
      },
      incrementMetric: (name, by = 1) => {
        this.metrics[name] = (this.metrics[name] ?? 0) + by;
        this.broadcast({ type: 'metric', name, value: this.metrics[name] });
      },
    };
  }
}

Step 2: Syntax-check

cd tests/soak
npx tsx -e "import('./dashboard/server').then(() => console.log('ok'))"

Expected: ok.

Step 3: Commit

git add tests/soak/dashboard/server.ts
git commit -m "$(cat <<'EOF'
feat(soak): DashboardServer — vanilla http + ws

Serves one static HTML page, accepts WS connections, broadcasts
room_state/log/metric messages to all clients. Exposes a
reporter() method that returns a DashboardReporter scenarios can
call without knowing about sockets.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
EOF
)"

Task 20: Dashboard HTML/CSS/JS status grid

Single static HTML page + stylesheet + client script. Renders the 2×2 room grid, subscribes to WS, updates tiles on each message.

Files:

Create: tests/soak/dashboard/index.html
Create: tests/soak/dashboard/dashboard.css
Create: tests/soak/dashboard/dashboard.js
Step 1: Create dashboard/index.html

<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Golf Soak Dashboard</title>
<link rel="stylesheet" href="/dashboard.css">
</head>
<body>
<header class="dash-header">
  <h1>⛳ Golf Soak Dashboard</h1>
  <div class="meta">
    <span id="run-id">run —</span>
    <span id="elapsed">00:00:00</span>
  </div>
</header>

<div class="meta-bar">
  <div class="stat"><span class="label">Games</span><span id="metric-games">0</span></div>
  <div class="stat"><span class="label">Moves</span><span id="metric-moves">0</span></div>
  <div class="stat"><span class="label">Errors</span><span id="metric-errors">0</span></div>
  <div class="stat"><span class="label">WS</span><span id="ws-status">connecting</span></div>
</div>

<div class="rooms" id="rooms">
  <!-- Room tiles injected by dashboard.js -->
</div>

<section class="log">
  <div class="log-header">Activity Log</div>
  <ul id="log-list"></ul>
</section>

<!-- Modal for focused live video (Task 23) -->
<div id="video-modal" class="video-modal hidden">
  <div class="video-modal-content">
    <div class="video-modal-header">
      <span id="video-modal-title">Watching —</span>
      <button id="video-modal-close">Close</button>
    </div>
    <img id="video-frame" alt="Live screencast" />
  </div>
</div>

<script src="/dashboard.js"></script>
</body>
</html>

Step 2: Create dashboard/dashboard.css

:root {
  --bg: #0a0e16;
  --panel: #0e1420;
  --border: #1a2230;
  --text: #c8d4e4;
  --accent: #7fbaff;
  --good: #6fd08f;
  --warn: #ffb84d;
  --err: #ff5c6c;
  --muted: #556577;
}

* { box-sizing: border-box; }

body {
  margin: 0;
  font-family: -apple-system, system-ui, 'SF Mono', Consolas, monospace;
  background: var(--bg);
  color: var(--text);
}

.dash-header {
  display: flex;
  justify-content: space-between;
  align-items: center;
  padding: 12px 20px;
  background: linear-gradient(135deg, #0f1823, #0a1018);
  border-bottom: 1px solid var(--border);
}
.dash-header h1 { margin: 0; font-size: 16px; color: var(--accent); }
.dash-header .meta { font-size: 11px; color: var(--muted); }
.dash-header .meta span + span { margin-left: 12px; }

.meta-bar {
  display: flex;
  gap: 24px;
  padding: 10px 20px;
  background: #0c131d;
  border-bottom: 1px solid var(--border);
  font-size: 12px;
}
.meta-bar .stat .label { color: var(--muted); margin-right: 6px; }
.meta-bar .stat span:last-child { color: #fff; font-weight: 600; }

.rooms {
  display: grid;
  grid-template-columns: 1fr 1fr;
  gap: 1px;
  background: var(--border);
}
.room {
  background: var(--panel);
  padding: 14px 18px;
  min-height: 180px;
}
.room-title {
  display: flex;
  justify-content: space-between;
  align-items: center;
  margin-bottom: 10px;
}
.room-title .name { font-size: 13px; color: var(--accent); font-weight: 600; }
.room-title .phase {
  font-size: 10px;
  padding: 2px 8px;
  border-radius: 10px;
  background: #1a3a2a;
  color: var(--good);
}
.room-title .phase.lobby { background: #3a2a1a; color: var(--warn); }
.room-title .phase.err { background: #3a1a1a; color: var(--err); }

.players {
  display: grid;
  grid-template-columns: repeat(2, 1fr);
  gap: 4px;
  font-size: 11px;
  margin-bottom: 8px;
}
.player {
  display: flex;
  justify-content: space-between;
  padding: 4px 8px;
  background: #0a0f18;
  border-radius: 3px;
  cursor: pointer;
  border: 1px solid transparent;
}
.player:hover { border-color: var(--accent); }
.player.active {
  background: #1a2a40;
  border-left: 2px solid var(--accent);
}
.player .score { color: var(--muted); }

.progress-bar {
  height: 4px;
  background: var(--border);
  border-radius: 2px;
  overflow: hidden;
  margin-top: 6px;
}
.progress-fill {
  height: 100%;
  background: linear-gradient(90deg, var(--accent), var(--good));
  transition: width 0.3s;
}
.room-meta {
  font-size: 10px;
  color: var(--muted);
  display: flex;
  gap: 12px;
  margin-top: 6px;
}

.log {
  border-top: 1px solid var(--border);
  background: #080c13;
  max-height: 160px;
  overflow-y: auto;
}
.log .log-header {
  padding: 6px 20px;
  font-size: 10px;
  text-transform: uppercase;
  color: var(--muted);
  border-bottom: 1px solid var(--border);
}
.log ul { list-style: none; margin: 0; padding: 4px 20px; font-size: 10px; }
.log li { line-height: 1.5; font-family: monospace; color: var(--muted); }
.log li.warn { color: var(--warn); }
.log li.error { color: var(--err); }

.video-modal {
  position: fixed;
  inset: 0;
  background: rgba(0, 0, 0, 0.85);
  display: flex;
  align-items: center;
  justify-content: center;
  z-index: 100;
}
.video-modal.hidden { display: none; }
.video-modal-content {
  background: var(--panel);
  border: 1px solid var(--border);
  border-radius: 6px;
  padding: 16px;
  max-width: 90vw;
  max-height: 90vh;
}
.video-modal-header {
  display: flex;
  justify-content: space-between;
  align-items: center;
  margin-bottom: 12px;
  color: var(--accent);
  font-size: 13px;
}
.video-modal-header button {
  background: var(--border);
  color: var(--text);
  border: none;
  padding: 4px 12px;
  border-radius: 3px;
  cursor: pointer;
}
#video-frame {
  display: block;
  max-width: 100%;
  max-height: 70vh;
  border: 1px solid var(--border);
}

Step 3: Create dashboard/dashboard.js

// tests/soak/dashboard/dashboard.js
(() => {
  const ws = new WebSocket(`ws://${location.host}`);
  const roomsEl = document.getElementById('rooms');
  const logEl = document.getElementById('log-list');
  const wsStatusEl = document.getElementById('ws-status');
  const metricGames = document.getElementById('metric-games');
  const metricMoves = document.getElementById('metric-moves');
  const metricErrors = document.getElementById('metric-errors');
  const elapsedEl = document.getElementById('elapsed');

  const roomTiles = new Map();
  const startTime = Date.now();
  let currentWatchedKey = null;

  // Video modal
  const videoModal = document.getElementById('video-modal');
  const videoFrame = document.getElementById('video-frame');
  const videoTitle = document.getElementById('video-modal-title');
  const videoClose = document.getElementById('video-modal-close');

  function fmtElapsed(ms) {
    const s = Math.floor(ms / 1000);
    const h = Math.floor(s / 3600);
    const m = Math.floor((s % 3600) / 60);
    const sec = s % 60;
    return `${String(h).padStart(2, '0')}:${String(m).padStart(2, '0')}:${String(sec).padStart(2, '0')}`;
  }
  setInterval(() => {
    elapsedEl.textContent = fmtElapsed(Date.now() - startTime);
  }, 1000);

  function ensureRoomTile(roomId) {
    if (roomTiles.has(roomId)) return roomTiles.get(roomId);
    const tile = document.createElement('div');
    tile.className = 'room';
    tile.innerHTML = `
      <div class="room-title">
        <div class="name">${roomId}</div>
        <div class="phase lobby">waiting</div>
      </div>
      <div class="players"></div>
      <div class="progress-bar"><div class="progress-fill" style="width:0%"></div></div>
      <div class="room-meta">
        <span class="moves">0 moves</span>
        <span class="game">game —</span>
      </div>
    `;
    roomsEl.appendChild(tile);
    roomTiles.set(roomId, tile);
    return tile;
  }

  function renderRoomState(roomId, state) {
    const tile = ensureRoomTile(roomId);
    if (state.phase !== undefined) {
      const phaseEl = tile.querySelector('.phase');
      phaseEl.textContent = state.phase;
      phaseEl.classList.toggle('lobby', state.phase === 'lobby' || state.phase === 'waiting');
      phaseEl.classList.toggle('err', state.phase === 'error');
    }
    if (state.players !== undefined) {
      const playersEl = tile.querySelector('.players');
      playersEl.innerHTML = state.players
        .map(
          (p) => `
            <div class="player ${p.isActive ? 'active' : ''}" data-session="${p.key}">
              <span>${p.isActive ? '▶ ' : ''}${p.key}</span>
              <span class="score">${p.score ?? '—'}</span>
            </div>
          `,
        )
        .join('');
    }
    if (state.hole !== undefined && state.totalHoles !== undefined) {
      const fill = tile.querySelector('.progress-fill');
      const pct = state.totalHoles > 0 ? Math.round((state.hole / state.totalHoles) * 100) : 0;
      fill.style.width = `${pct}%`;
    }
    if (state.moves !== undefined) {
      tile.querySelector('.moves').textContent = `${state.moves} moves`;
    }
    if (state.game !== undefined && state.totalGames !== undefined) {
      tile.querySelector('.game').textContent = `game ${state.game}/${state.totalGames}`;
    }
  }

  function appendLog(level, msg, meta) {
    const li = document.createElement('li');
    li.className = level;
    const ts = new Date().toLocaleTimeString();
    li.textContent = `[${ts}] ${msg} ${meta ? JSON.stringify(meta) : ''}`;
    logEl.insertBefore(li, logEl.firstChild);
    // Cap log length
    while (logEl.children.length > 100) {
      logEl.removeChild(logEl.lastChild);
    }
  }

  function applyMetric(name, value) {
    if (name === 'games_completed') metricGames.textContent = value;
    else if (name === 'moves_total') metricMoves.textContent = value;
    else if (name === 'errors') metricErrors.textContent = value;
  }

  ws.addEventListener('open', () => {
    wsStatusEl.textContent = 'healthy';
    wsStatusEl.style.color = 'var(--good)';
  });
  ws.addEventListener('close', () => {
    wsStatusEl.textContent = 'disconnected';
    wsStatusEl.style.color = 'var(--err)';
  });
  ws.addEventListener('message', (event) => {
    let msg;
    try {
      msg = JSON.parse(event.data);
    } catch {
      return;
    }
    if (msg.type === 'room_state') {
      renderRoomState(msg.roomId, msg.state);
    } else if (msg.type === 'log') {
      appendLog(msg.level, msg.msg, msg.meta);
    } else if (msg.type === 'metric') {
      applyMetric(msg.name, msg.value);
    } else if (msg.type === 'frame') {
      if (msg.sessionKey === currentWatchedKey) {
        videoFrame.src = `data:image/jpeg;base64,${msg.jpegBase64}`;
      }
    }
  });

  // Click-to-watch (wired in Task 23)
  roomsEl.addEventListener('click', (e) => {
    const playerEl = e.target.closest('.player');
    if (!playerEl) return;
    const key = playerEl.dataset.session;
    if (!key) return;
    currentWatchedKey = key;
    videoTitle.textContent = `Watching ${key}`;
    videoModal.classList.remove('hidden');
    ws.send(JSON.stringify({ type: 'start_stream', sessionKey: key }));
  });

  function closeVideo() {
    if (currentWatchedKey) {
      ws.send(JSON.stringify({ type: 'stop_stream', sessionKey: currentWatchedKey }));
    }
    currentWatchedKey = null;
    videoModal.classList.add('hidden');
    videoFrame.src = '';
  }
  videoClose.addEventListener('click', closeVideo);
  document.addEventListener('keydown', (e) => {
    if (e.key === 'Escape') closeVideo();
  });
})();

Step 4: Commit

git add tests/soak/dashboard/index.html tests/soak/dashboard/dashboard.css tests/soak/dashboard/dashboard.js
git commit -m "$(cat <<'EOF'
feat(soak): dashboard status grid UI

Static HTML page served by DashboardServer. Renders the 2×2 room
grid with progress bars and player tiles, subscribes to WS events,
updates tiles live. Click-to-watch modal is wired but receives
frames once the CDP screencaster ships in Task 22.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
EOF
)"

Task 21: Wire `WATCH=dashboard` in runner

Start the dashboard server when --watch=dashboard, auto-open the URL in the user's browser, use its reporter() as the ctx.dashboard.

Files:

Modify: tests/soak/runner.ts
Step 1: Import and instantiate DashboardServer in runner.ts

At the top of runner.ts, add:

import { DashboardServer } from './dashboard/server';
import { spawn } from 'child_process';

Replace the block that creates dashboard with:

  // Build dashboard if requested
  let dashboardServer: DashboardServer | null = null;
  let dashboard: DashboardReporter = noopDashboard();
  if (watch === 'dashboard') {
    const port = Number(config.dashboardPort ?? 7777);
    dashboardServer = new DashboardServer(port, logger, {
      onStartStream: (_key) => {
        logger.info('stream_start_requested', { sessionKey: _key });
        // Wired in Task 22
      },
      onStopStream: (_key) => {
        logger.info('stream_stop_requested', { sessionKey: _key });
      },
    });
    await dashboardServer.start();
    dashboard = dashboardServer.reporter();
    const url = `http://localhost:${port}`;
    console.log(`Dashboard: ${url}`);
    // Best-effort auto-open
    try {
      const opener = process.platform === 'darwin' ? 'open' : process.platform === 'win32' ? 'start' : 'xdg-open';
      spawn(opener, [url], { stdio: 'ignore', detached: true }).unref();
    } catch {
      // If auto-open fails, the URL is already printed
    }
  } else if (watch === 'tiled') {
    logger.warn('tiled_not_yet_implemented');
    console.warn('Watch mode "tiled" not yet implemented (Task 24). Falling back to none.');
  }

And in the finally block, shut down the server:

  } finally {
    await pool.release();
    if (dashboardServer) {
      await dashboardServer.stop();
    }
  }

Also remove the earlier if (watch !== 'none') warning block — it's replaced by the dispatch above.

Step 2: Run smoke against dev with dashboard

TEST_URL=http://localhost:8000 SOAK_INVITE_CODE=SOAKTEST npm run soak -- \
  --scenario=populate \
  --accounts=2 --rooms=1 --cpus-per-room=0 --games-per-room=1 --holes=1 \
  --watch=dashboard

Expected:

Dashboard: http://localhost:7777 printed
Browser auto-opens (or you open it manually)
Page shows the dashboard with WS: healthy
During the game, the room-0 tile shows phase: playing, increments moves, updates progress
After game completes, the runner exits 0 and the dashboard stops
Step 3: Commit

git add tests/soak/runner.ts
git commit -m "$(cat <<'EOF'
feat(soak): wire --watch=dashboard in runner

Starts DashboardServer on 7777 (configurable), uses its reporter as
ctx.dashboard, auto-opens the URL. Cleans up on exit.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
EOF
)"

Phase 6 — Live video click-to-watch

Task 22: CDP screencast module

Attach a CDP session to a given page, start screencasting JPEG frames at a fixed rate, forward each frame to a callback, detach on stop.

Files:

Create: tests/soak/core/screencaster.ts
Step 1: Implement core/screencaster.ts

// tests/soak/core/screencaster.ts
import type { Page, CDPSession } from 'playwright-core';
import type { Logger } from './types';

export interface ScreencastOptions {
  format?: 'jpeg' | 'png';
  quality?: number;
  maxWidth?: number;
  maxHeight?: number;
  everyNthFrame?: number;
}

export type FrameCallback = (jpegBase64: string) => void;

export class Screencaster {
  private sessions = new Map<string, CDPSession>();

  constructor(private logger: Logger) {}

  /**
   * Attach a CDP session to the given page and start forwarding frames.
   * If already streaming, this is a no-op.
   */
  async start(
    sessionKey: string,
    page: Page,
    onFrame: FrameCallback,
    opts: ScreencastOptions = {},
  ): Promise<void> {
    if (this.sessions.has(sessionKey)) {
      this.logger.warn('screencast_already_running', { sessionKey });
      return;
    }
    const client = await page.context().newCDPSession(page);
    this.sessions.set(sessionKey, client);

    client.on('Page.screencastFrame', async (evt: { data: string; sessionId: number }) => {
      try {
        onFrame(evt.data);
        await client.send('Page.screencastFrameAck', { sessionId: evt.sessionId });
      } catch (err) {
        this.logger.warn('screencast_frame_error', {
          sessionKey,
          error: err instanceof Error ? err.message : String(err),
        });
      }
    });

    await client.send('Page.startScreencast', {
      format: opts.format ?? 'jpeg',
      quality: opts.quality ?? 60,
      maxWidth: opts.maxWidth ?? 640,
      maxHeight: opts.maxHeight ?? 360,
      everyNthFrame: opts.everyNthFrame ?? 2,
    });
    this.logger.info('screencast_started', { sessionKey });
  }

  async stop(sessionKey: string): Promise<void> {
    const client = this.sessions.get(sessionKey);
    if (!client) return;
    try {
      await client.send('Page.stopScreencast');
      await client.detach();
    } catch (err) {
      this.logger.warn('screencast_stop_error', {
        sessionKey,
        error: err instanceof Error ? err.message : String(err),
      });
    }
    this.sessions.delete(sessionKey);
    this.logger.info('screencast_stopped', { sessionKey });
  }

  async stopAll(): Promise<void> {
    const keys = Array.from(this.sessions.keys());
    await Promise.all(keys.map((k) => this.stop(k)));
  }
}

Step 2: Syntax-check

cd tests/soak
npx tsx -e "import('./core/screencaster').then(() => console.log('ok'))"

Expected: ok.

Step 3: Commit

git add tests/soak/core/screencaster.ts
git commit -m "$(cat <<'EOF'
feat(soak): Screencaster — CDP Page.startScreencast wrapper

Attach/detach CDP sessions per Playwright Page, start/stop JPEG
screencasts with configurable quality and frame rate, forward each
frame to a callback. Used by the dashboard for click-to-watch
live video.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
EOF
)"

Task 23: Wire screencaster to dashboard click-to-watch

Runner creates a Screencaster, passes callbacks into DashboardServer.onStartStream/onStopStream that look up the right session and start/stop streaming. Each frame is broadcast to the dashboard.

Files:

Modify: tests/soak/runner.ts
Step 1: Import Screencaster and hold a sessions map

In runner.ts, add at the top:

import { Screencaster } from './core/screencaster';

After const sessions = await pool.acquire(accounts);, build a lookup map:

    const sessionsByKey = new Map<string, typeof sessions[number]>();
    for (const s of sessions) sessionsByKey.set(s.key, s);

Create the screencaster before the dashboard (or right after sessions are acquired):

    const screencaster = new Screencaster(logger);

Step 2: Replace the onStartStream/onStopStream no-ops with real wiring

Update the DashboardServer construction (earlier in the function) to accept handlers that close over screencaster and sessionsByKey. But since those are built after the dashboard, we need to build the dashboard AFTER sessions are acquired. Reorganize:

Move the dashboard construction to AFTER sessions = await pool.acquire(accounts). Then:

    if (watch === 'dashboard') {
      const port = Number(config.dashboardPort ?? 7777);
      dashboardServer = new DashboardServer(port, logger, {
        onStartStream: (key) => {
          const session = sessionsByKey.get(key);
          if (!session) {
            logger.warn('stream_start_unknown_session', { sessionKey: key });
            return;
          }
          screencaster
            .start(key, session.page, (jpegBase64) => {
              dashboardServer!.broadcast({ type: 'frame', sessionKey: key, jpegBase64 });
            })
            .catch((err) =>
              logger.error('screencast_start_failed', {
                key,
                error: err instanceof Error ? err.message : String(err),
              }),
            );
        },
        onStopStream: (key) => {
          screencaster.stop(key).catch(() => {});
        },
        onDisconnect: () => {
          screencaster.stopAll().catch(() => {});
        },
      });
      await dashboardServer.start();
      dashboard = dashboardServer.reporter();
      const url = `http://localhost:${port}`;
      console.log(`Dashboard: ${url}`);
      // ... auto-open
    }

Make sure the ctx.dashboard assignment happens AFTER the dashboard setup (it already does — const ctx = { ... dashboard, ... } comes later).

In the finally block, add:

    await screencaster.stopAll();

Step 3: Manual test end-to-end

Run a longer populate game so there's time to click:

TEST_URL=http://localhost:8000 SOAK_INVITE_CODE=SOAKTEST npm run soak -- \
  --scenario=populate \
  --accounts=4 --rooms=1 --cpus-per-room=0 --games-per-room=2 --holes=3 \
  --watch=dashboard

Expected:

Dashboard opens, shows 1 room with 4 players
Click on any player tile (soak_00, soak_01, ...)
Modal opens, shows live JPEG frames of that player's view of the game
Close modal (Esc or Close button) — frames stop, screencast detaches
Run completes cleanly

Step 4: Commit

git add tests/soak/runner.ts
git commit -m "$(cat <<'EOF'
feat(soak): click-to-watch live video via CDP screencast

Runner creates a Screencaster and wires its start/stop into
DashboardServer.onStartStream/onStopStream. Clicking a player tile
in the dashboard starts a CDP screencast on that session's page,
forwards JPEG frames as WS "frame" messages, closes on modal
dismiss or WS disconnect.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
EOF
)"

Phase 7 — Tiled mode

Task 24: `--watch=tiled` native windows

Launch a second headed browser for the 4 host contexts, position their windows in a 2×2 grid using page.evaluate(window.moveTo).

Files:

Modify: tests/soak/core/session-pool.ts — add optional headed-host support
Modify: tests/soak/runner.ts — enable tiled mode
Step 1: Extend SessionPool to support headed host contexts

Add a new option and method to SessionPool. In core/session-pool.ts:

export interface SessionPoolOptions {
  targetUrl: string;
  inviteCode: string;
  credFile: string;
  logger: Logger;
  browser?: Browser;
  contextOptions?: Parameters<Browser['newContext']>[0];
  /** If set, the first `headedHostCount` sessions use a separate headed browser. */
  headedHostCount?: number;
}

Inside the class, add a headedBrowser field and extend acquire:

  private headedBrowser: Browser | null = null;

  // ... in acquire(), before the loop:

  if ((this.opts.headedHostCount ?? 0) > 0 && !this.headedBrowser) {
    this.headedBrowser = await chromium.launch({
      headless: false,
      slowMo: 50,
    });
  }

  for (let i = 0; i < count; i++) {
    const account = this.accounts[i];
    const useHeaded = i < (this.opts.headedHostCount ?? 0);
    const targetBrowser = useHeaded ? this.headedBrowser! : this.browser!;
    const context = await targetBrowser.newContext({
      ...this.opts.contextOptions,
      ...(useHeaded ? { viewport: { width: 960, height: 540 } } : {}),
    });
    await this.injectAuth(context, account);
    const page = await context.newPage();
    await page.goto(this.opts.targetUrl);

    // Position headed windows in a 2×2 grid
    if (useHeaded) {
      const col = i % 2;
      const row = Math.floor(i / 2);
      const x = col * 960;
      const y = row * 560;
      await page.evaluate(
        ([x, y, w, h]) => {
          window.moveTo(x, y);
          window.resizeTo(w, h);
        },
        [x, y, 960, 540] as [number, number, number, number],
      );
    }

    const bot = new GolfBot(page);
    sessions.push({ account, context, page, bot, key: account.key });
  }

Update release to close the headed browser too:

  async release(): Promise<void> {
    for (const session of this.activeSessions) {
      try { await session.context.close(); } catch { /* ignore */ }
    }
    this.activeSessions = [];
    if (this.ownedBrowser) {
      try { await this.ownedBrowser.close(); } catch { /* ignore */ }
      this.ownedBrowser = null;
      this.browser = null;
    }
    if (this.headedBrowser) {
      try { await this.headedBrowser.close(); } catch { /* ignore */ }
      this.headedBrowser = null;
    }
  }

Step 2: Wire watch === 'tiled' in the runner

In runner.ts, replace the existing tiled_not_yet_implemented warning with:

  const headedHostCount = watch === 'tiled' ? rooms : 0;

  const pool = new SessionPool({
    targetUrl,
    inviteCode,
    credFile,
    logger,
    headedHostCount,
  });

(Move that pool creation up so it's aware of watch.)

Step 3: Test tiled mode

TEST_URL=http://localhost:8000 SOAK_INVITE_CODE=SOAKTEST npm run soak -- \
  --scenario=populate \
  --accounts=4 --rooms=2 --cpus-per-room=0 --games-per-room=1 --holes=1 \
  --watch=tiled

Expected: 2 native Chromium windows appear (one per host), sized ~960×540 and positioned at the upper-left of the screen. They play the game visibly. On exit, windows close.

Step 4: Commit

git add tests/soak/core/session-pool.ts tests/soak/runner.ts
git commit -m "$(cat <<'EOF'
feat(soak): --watch=tiled launches N headed host windows

SessionPool accepts headedHostCount; when > 0 it launches a second
Chromium in headed mode, creates those contexts there, and positions
each host window in a 2×2 grid via window.moveTo/resizeTo.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
EOF
)"

Phase 8 — Stress scenario

Task 25: Chaos injector + stress scenario

Short 1-hole games in tight loops, with a 5% per-turn chance of injecting a chaos event (rapid clicks, brief offline toggle, tab navigation).

Files:

Create: tests/soak/scenarios/stress.ts
Create: tests/soak/scenarios/shared/chaos.ts
Modify: tests/soak/scenarios/index.ts — register stress
Step 1: Create scenarios/shared/chaos.ts

// tests/soak/scenarios/shared/chaos.ts
import type { Session, Logger } from '../../core/types';

export type ChaosEvent =
  | 'rapid_clicks'
  | 'tab_blur'
  | 'brief_offline';

const ALL_EVENTS: ChaosEvent[] = ['rapid_clicks', 'tab_blur', 'brief_offline'];

function pickEvent(): ChaosEvent {
  return ALL_EVENTS[Math.floor(Math.random() * ALL_EVENTS.length)];
}

export async function maybeInjectChaos(
  session: Session,
  probability: number,
  logger: Logger,
  roomId: string,
): Promise<ChaosEvent | null> {
  if (Math.random() >= probability) return null;

  const event = pickEvent();
  logger.info('chaos_injected', { room: roomId, session: session.key, event });
  try {
    switch (event) {
      case 'rapid_clicks': {
        // Fire 5 rapid clicks at the player's own cards
        for (let i = 0; i < 5; i++) {
          await session.page.locator(`#player-cards .card:nth-child(${(i % 6) + 1})`)
            .click({ timeout: 300 })
            .catch(() => {});
        }
        break;
      }
      case 'tab_blur': {
        // Briefly dispatch blur then focus
        await session.page.evaluate(() => {
          window.dispatchEvent(new Event('blur'));
          setTimeout(() => window.dispatchEvent(new Event('focus')), 200);
        });
        break;
      }
      case 'brief_offline': {
        await session.context.setOffline(true);
        await new Promise((r) => setTimeout(r, 300));
        await session.context.setOffline(false);
        break;
      }
    }
  } catch (err) {
    logger.warn('chaos_error', {
      event,
      error: err instanceof Error ? err.message : String(err),
    });
  }
  return event;
}

Step 2: Create scenarios/stress.ts

// tests/soak/scenarios/stress.ts
import type {
  Scenario,
  ScenarioContext,
  ScenarioResult,
  ScenarioError,
  Session,
} from '../core/types';
import { runOneMultiplayerGame } from './shared/multiplayer-game';
import { maybeInjectChaos } from './shared/chaos';

interface StressConfig {
  gamesPerRoom: number;
  holes: number;
  decks: number;
  rooms: number;
  cpusPerRoom: number;
  thinkTimeMs: [number, number];
  interGamePauseMs: number;
  chaosChance: number;
}

function chunk<T>(arr: T[], size: number): T[][] {
  const out: T[][] = [];
  for (let i = 0; i < arr.length; i += size) out.push(arr.slice(i, i + size));
  return out;
}

async function sleep(ms: number): Promise<void> {
  return new Promise((r) => setTimeout(r, ms));
}

async function runStressRoom(
  ctx: ScenarioContext,
  cfg: StressConfig,
  roomIdx: number,
  sessions: Session[],
): Promise<{ completed: number; errors: ScenarioError[]; chaosFired: number }> {
  const roomId = `room-${roomIdx}`;
  let completed = 0;
  let chaosFired = 0;
  const errors: ScenarioError[] = [];

  for (let gameNum = 0; gameNum < cfg.gamesPerRoom; gameNum++) {
    if (ctx.signal.aborted) break;

    ctx.dashboard.update(roomId, { game: gameNum + 1, totalGames: cfg.gamesPerRoom });

    // Start a background chaos loop for this game
    let chaosActive = true;
    const chaosLoop = (async () => {
      while (chaosActive && !ctx.signal.aborted) {
        await sleep(500);
        for (const session of sessions) {
          const e = await maybeInjectChaos(session, cfg.chaosChance, ctx.logger, roomId);
          if (e) chaosFired++;
        }
      }
    })();

    const result = await runOneMultiplayerGame(ctx, sessions, {
      roomId,
      holes: cfg.holes,
      decks: cfg.decks,
      cpusPerRoom: cfg.cpusPerRoom,
      thinkTimeMs: cfg.thinkTimeMs,
    });

    chaosActive = false;
    await chaosLoop;

    if (result.completed) {
      completed++;
      ctx.logger.info('game_complete', { room: roomId, game: gameNum + 1, turns: result.turns });
    } else {
      errors.push({
        room: roomId,
        reason: 'game_failed',
        detail: result.error,
        timestamp: Date.now(),
      });
      ctx.logger.error('game_failed', { room: roomId, error: result.error });
    }

    await sleep(cfg.interGamePauseMs);
  }

  return { completed, errors, chaosFired };
}

const stress: Scenario = {
  name: 'stress',
  description: 'Rapid short games for stability & race condition hunting',
  needs: { accounts: 16, rooms: 4, cpusPerRoom: 2 },
  defaultConfig: {
    gamesPerRoom: 50,
    holes: 1,
    decks: 1,
    rooms: 4,
    cpusPerRoom: 2,
    thinkTimeMs: [50, 150],
    interGamePauseMs: 200,
    chaosChance: 0.05,
  },

  async run(ctx: ScenarioContext): Promise<ScenarioResult> {
    const start = Date.now();
    const cfg = ctx.config as unknown as StressConfig;
    const perRoom = Math.floor(ctx.sessions.length / cfg.rooms);
    const roomSessions = chunk(ctx.sessions, perRoom);

    const results = await Promise.allSettled(
      roomSessions.map((s, idx) => runStressRoom(ctx, cfg, idx, s)),
    );

    let gamesCompleted = 0;
    let chaosFired = 0;
    const errors: ScenarioError[] = [];
    results.forEach((r, idx) => {
      if (r.status === 'fulfilled') {
        gamesCompleted += r.value.completed;
        chaosFired += r.value.chaosFired;
        errors.push(...r.value.errors);
      } else {
        errors.push({
          room: `room-${idx}`,
          reason: 'room_threw',
          detail: r.reason instanceof Error ? r.reason.message : String(r.reason),
          timestamp: Date.now(),
        });
      }
    });

    return {
      gamesCompleted,
      errors,
      durationMs: Date.now() - start,
      customMetrics: { chaos_fired: chaosFired },
    };
  },
};

export default stress;

Step 3: Register stress in the registry

Edit tests/soak/scenarios/index.ts:

import type { Scenario } from '../core/types';
import populate from './populate';
import stress from './stress';

const registry: Record<string, Scenario> = {
  populate,
  stress,
};

export function getScenario(name: string): Scenario | undefined {
  return registry[name];
}

export function listScenarios(): Scenario[] {
  return Object.values(registry);
}

Step 4: Smoke test stress scenario

TEST_URL=http://localhost:8000 SOAK_INVITE_CODE=SOAKTEST npm run soak -- \
  --scenario=stress \
  --accounts=4 --rooms=1 --cpus-per-room=1 --games-per-room=3 --holes=1 \
  --watch=none

Expected: 3 quick games complete, chaos events in logs (look for chaos_injected), exit 0.

Step 5: Commit

git add tests/soak/scenarios/stress.ts tests/soak/scenarios/shared/chaos.ts tests/soak/scenarios/index.ts
git commit -m "$(cat <<'EOF'
feat(soak): stress scenario with chaos injection

Rapid 1-hole games with a parallel chaos loop that has a 5% per-turn
chance of firing rapid_clicks, tab_blur, or brief_offline events.
Chaos counts roll up into ScenarioResult.customMetrics.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
EOF
)"

Phase 9 — Failure handling

Task 26: Watchdog + heartbeat wiring

Per-room timeout that fires if no heartbeat arrives within N ms. Runner wires it into ctx.heartbeat. Vitest-tested.

Files:

Create: tests/soak/core/watchdog.ts
Create: tests/soak/tests/watchdog.test.ts
Modify: tests/soak/runner.ts — wire heartbeat to per-room watchdogs
Step 1: Write failing tests

// tests/soak/tests/watchdog.test.ts
import { describe, it, expect, vi, beforeEach, afterEach } from 'vitest';
import { Watchdog } from '../core/watchdog';

describe('Watchdog', () => {
  beforeEach(() => vi.useFakeTimers());
  afterEach(() => vi.useRealTimers());

  it('fires after timeout if no heartbeat', () => {
    const onTimeout = vi.fn();
    const w = new Watchdog(1000, onTimeout);
    w.start();
    vi.advanceTimersByTime(1001);
    expect(onTimeout).toHaveBeenCalledOnce();
  });

  it('heartbeat resets the timer', () => {
    const onTimeout = vi.fn();
    const w = new Watchdog(1000, onTimeout);
    w.start();
    vi.advanceTimersByTime(800);
    w.heartbeat();
    vi.advanceTimersByTime(800);
    expect(onTimeout).not.toHaveBeenCalled();
    vi.advanceTimersByTime(300);
    expect(onTimeout).toHaveBeenCalledOnce();
  });

  it('stop cancels pending timeout', () => {
    const onTimeout = vi.fn();
    const w = new Watchdog(1000, onTimeout);
    w.start();
    w.stop();
    vi.advanceTimersByTime(2000);
    expect(onTimeout).not.toHaveBeenCalled();
  });

  it('does not fire twice after stop', () => {
    const onTimeout = vi.fn();
    const w = new Watchdog(1000, onTimeout);
    w.start();
    vi.advanceTimersByTime(1001);
    w.heartbeat();
    vi.advanceTimersByTime(1001);
    expect(onTimeout).toHaveBeenCalledOnce();
  });
});

Step 2: Run to verify failure

npx vitest run tests/watchdog.test.ts

Expected: FAIL.

Step 3: Implement core/watchdog.ts

// tests/soak/core/watchdog.ts
export class Watchdog {
  private timer: NodeJS.Timeout | null = null;
  private fired = false;

  constructor(
    private timeoutMs: number,
    private onTimeout: () => void,
  ) {}

  start(): void {
    this.stop();
    this.fired = false;
    this.timer = setTimeout(() => {
      if (this.fired) return;
      this.fired = true;
      this.onTimeout();
    }, this.timeoutMs);
  }

  heartbeat(): void {
    if (this.fired) return;
    this.start();
  }

  stop(): void {
    if (this.timer) {
      clearTimeout(this.timer);
      this.timer = null;
    }
  }
}

Step 4: Verify tests pass

npx vitest run tests/watchdog.test.ts

Expected: all passing.

Step 5: Wire watchdogs into the runner

In runner.ts, add before building ctx:

    const watchdogs = new Map<string, Watchdog>();
    const roomAborters = new Map<string, AbortController>();
    for (let i = 0; i < rooms; i++) {
      const roomId = `room-${i}`;
      const aborter = new AbortController();
      roomAborters.set(roomId, aborter);
      const w = new Watchdog(60_000, () => {
        logger.error('watchdog_fired', { room: roomId });
        aborter.abort();
        dashboard.update(roomId, { phase: 'error' });
      });
      w.start();
      watchdogs.set(roomId, w);
    }

Import at the top:

import { Watchdog } from './core/watchdog';

Set ctx.heartbeat to:

      heartbeat: (roomId: string) => {
        const w = watchdogs.get(roomId);
        if (w) w.heartbeat();
      },

In the finally block, stop all watchdogs:

    for (const w of watchdogs.values()) w.stop();

Note: for now the roomAborters aren't fully plumbed into scenario cancellation — scenarios see the global ctx.signal only. This is intentional; per-room abort requires scenario-side awareness and is deferred until a scenario genuinely misbehaves. The watchdog still catches stuck runs and flips the global error state.

Step 6: Commit

git add tests/soak/core/watchdog.ts tests/soak/tests/watchdog.test.ts tests/soak/runner.ts
git commit -m "$(cat <<'EOF'
feat(soak): per-room watchdog with heartbeat

Watchdog class with Vitest tests, wired into ctx.heartbeat in the
runner. One watchdog per room, 60s timeout; firing logs an error
and marks the room's dashboard tile as errored.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
EOF
)"

Task 27: Artifact capture on failure

When the runner catches an error, snapshot every session's page: screenshot, HTML, console log tail, game state JSON.

Files:

Create: tests/soak/core/artifacts.ts
Modify: tests/soak/runner.ts — call captureArtifacts in the catch block
Step 1: Implement core/artifacts.ts

// tests/soak/core/artifacts.ts
import * as fs from 'fs';
import * as path from 'path';
import type { Session, Logger } from './types';

export interface ArtifactsOptions {
  runId: string;
  /** Absolute path to the artifacts root, e.g., /path/to/tests/soak/artifacts */
  rootDir: string;
  logger: Logger;
}

export class Artifacts {
  readonly runDir: string;

  constructor(private opts: ArtifactsOptions) {
    this.runDir = path.join(opts.rootDir, opts.runId);
    fs.mkdirSync(this.runDir, { recursive: true });
  }

  /** Capture everything for a single session. */
  async captureSession(session: Session, roomId: string): Promise<void> {
    const dir = path.join(this.runDir, roomId);
    fs.mkdirSync(dir, { recursive: true });
    const prefix = session.key;

    try {
      const png = await session.page.screenshot({ fullPage: true });
      fs.writeFileSync(path.join(dir, `${prefix}.png`), png);
    } catch (err) {
      this.opts.logger.warn('artifact_screenshot_failed', {
        session: session.key,
        error: err instanceof Error ? err.message : String(err),
      });
    }

    try {
      const html = await session.page.content();
      fs.writeFileSync(path.join(dir, `${prefix}.html`), html);
    } catch (err) {
      this.opts.logger.warn('artifact_html_failed', {
        session: session.key,
        error: err instanceof Error ? err.message : String(err),
      });
    }

    try {
      const state = await session.bot.getGameState();
      fs.writeFileSync(
        path.join(dir, `${prefix}.state.json`),
        JSON.stringify(state, null, 2),
      );
    } catch (err) {
      this.opts.logger.warn('artifact_state_failed', {
        session: session.key,
        error: err instanceof Error ? err.message : String(err),
      });
    }

    try {
      const errors = session.bot.getConsoleErrors?.() ?? [];
      fs.writeFileSync(path.join(dir, `${prefix}.console.txt`), errors.join('\n'));
    } catch {
      // ignore — not all bots expose this
    }
  }

  async captureAll(sessions: Session[]): Promise<void> {
    // Best-effort: partition sessions by their key prefix (doesn't matter)
    // and write everything under room-unknown/ unless callers pre-partition
    await Promise.all(
      sessions.map((s) => this.captureSession(s, 'room-unknown')),
    );
  }

  writeSummary(summary: object): void {
    fs.writeFileSync(
      path.join(this.runDir, 'summary.json'),
      JSON.stringify(summary, null, 2),
    );
  }
}

/** Prune run directories older than `maxAgeMs`. */
export function pruneOldRuns(rootDir: string, maxAgeMs: number, logger: Logger): void {
  if (!fs.existsSync(rootDir)) return;
  const now = Date.now();
  for (const entry of fs.readdirSync(rootDir)) {
    const full = path.join(rootDir, entry);
    try {
      const stat = fs.statSync(full);
      if (stat.isDirectory() && now - stat.mtimeMs > maxAgeMs) {
        fs.rmSync(full, { recursive: true, force: true });
        logger.info('artifact_pruned', { runId: entry });
      }
    } catch {
      // ignore
    }
  }
}

Step 2: Call artifact capture from the runner's error path

In runner.ts, import:

import { Artifacts, pruneOldRuns } from './core/artifacts';

After const runId = ..., instantiate and prune:

  const artifactsRoot = path.resolve(__dirname, 'artifacts');
  const artifacts = new Artifacts({ runId, rootDir: artifactsRoot, logger });
  pruneOldRuns(artifactsRoot, 7 * 24 * 3600 * 1000, logger);

In the catch (err) block, after logging, capture:

  } catch (err) {
    logger.error('run_failed', {
      error: err instanceof Error ? err.message : String(err),
      stack: err instanceof Error ? err.stack : undefined,
    });
    try {
      const liveSessions = pool['activeSessions'] as Session[] | undefined;
      if (liveSessions && liveSessions.length > 0) {
        await artifacts.captureAll(liveSessions);
      }
    } catch (captureErr) {
      logger.warn('artifact_capture_failed', {
        error: captureErr instanceof Error ? captureErr.message : String(captureErr),
      });
    }
    exitCode = 1;
  }

(Note: the pool['activeSessions'] access bypasses visibility to avoid adding a public getter for one call site. Acceptable for an error path in a test harness.)

After successful run, write the summary:

    artifacts.writeSummary({
      runId,
      scenario: scenario.name,
      targetUrl,
      gamesCompleted: result.gamesCompleted,
      errors: result.errors,
      durationMs: result.durationMs,
      customMetrics: result.customMetrics,
    });

Import Session type:

import type { Session } from './core/types';

Step 3: Verify by forcing a failure

Kill the server mid-run and confirm artifacts are written:

# In one terminal
TEST_URL=http://localhost:8000 SOAK_INVITE_CODE=SOAKTEST npm run soak -- \
  --scenario=populate --accounts=2 --rooms=1 --cpus-per-room=0 \
  --games-per-room=5 --holes=3 --watch=none

# In another: wait ~3 seconds then Ctrl-C the dev server
# The soak run should catch errors and write artifacts

ls tests/soak/artifacts/
ls tests/soak/artifacts/<run-id>/

Expected: a run directory exists with summary.json (if it got far enough) or per-session screenshots / HTML under room-unknown/.

Step 4: Commit

git add tests/soak/core/artifacts.ts tests/soak/runner.ts
git commit -m "$(cat <<'EOF'
feat(soak): artifact capture on failure + run summary

Screenshots, HTML, game state, and console errors are captured into
tests/soak/artifacts/<run-id>/ when a scenario throws. Runs older
than 7 days are pruned on startup. Successful runs get a
summary.json next to the artifacts dir.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
EOF
)"

Task 28: Graceful shutdown (already partially in place) + exit codes

SIGINT/SIGTERM already flip the abort controller. Formalize the timeout-and-force-exit path and the three exit codes (0 / 1 / 2).

Files:

Modify: tests/soak/runner.ts
Step 1: Add a graceful shutdown timeout

In runner.ts, replace the existing signal handlers with:

  let forceExitTimer: NodeJS.Timeout | null = null;
  const onSignal = (sig: string) => {
    if (abortController.signal.aborted) {
      // Second signal: force exit
      logger.warn('force_exit', { signal: sig });
      process.exit(130);
    }
    logger.warn('signal_received', { signal: sig });
    abortController.abort();
    // Hard-kill after 10s if cleanup hangs
    forceExitTimer = setTimeout(() => {
      logger.error('graceful_shutdown_timeout');
      process.exit(130);
    }, 10_000);
  };
  process.on('SIGINT', () => onSignal('SIGINT'));
  process.on('SIGTERM', () => onSignal('SIGTERM'));

In the finally block, clear the force-exit timer:

    if (forceExitTimer) clearTimeout(forceExitTimer);

Step 2: Manual test — Ctrl-C a long run

TEST_URL=http://localhost:8000 SOAK_INVITE_CODE=SOAKTEST npm run soak -- \
  --scenario=populate --accounts=2 --rooms=1 --cpus-per-room=0 \
  --games-per-room=10 --holes=3 --watch=none

# After ~5 seconds: Ctrl-C

Expected: runner logs signal_received, finishes current turn, prints summary, exits with code 2 (check echo $?).

Step 3: Commit

git add tests/soak/runner.ts
git commit -m "$(cat <<'EOF'
feat(soak): graceful shutdown with 10s hard-kill fallback

SIGINT/SIGTERM flips the abort signal; scenarios finish the current
turn then exit. If cleanup hangs >10s the runner force-exits. Second
Ctrl-C is an immediate hard kill. Exit codes: 0 success, 1 errors,
2 interrupted.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
EOF
)"

Task 29: Periodic health probes

Every 30s, fetch /api/health on the target server. Three consecutive failures declare a fatal error and abort.

Files:

Modify: tests/soak/runner.ts
Step 1: Add a health probe interval

In runner.ts, after building the abort controller and before running the scenario:

  let healthFailures = 0;
  const healthTimer = setInterval(async () => {
    try {
      const res = await fetch(`${targetUrl}/api/health`);
      if (!res.ok) throw new Error(`status ${res.status}`);
      healthFailures = 0;
    } catch (err) {
      healthFailures++;
      logger.warn('health_probe_failed', {
        consecutive: healthFailures,
        error: err instanceof Error ? err.message : String(err),
      });
      if (healthFailures >= 3) {
        logger.error('health_fatal', { consecutive: healthFailures });
        abortController.abort();
      }
    }
  }, 30_000);

In the finally block:

    clearInterval(healthTimer);

Step 2: Commit

git add tests/soak/runner.ts
git commit -m "$(cat <<'EOF'
feat(soak): periodic health probes against target server

Every 30s GET /api/health. Three consecutive failures abort the
run with a fatal error, so staging outages don't get misattributed
to harness bugs.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
EOF
)"

Phase 10 — Polish and bring-up

Task 30: Smoke test script

tests/soak/scripts/smoke.sh — the canary run that takes ~30s against local dev.

Files:

Create: tests/soak/scripts/smoke.sh
Step 1: Create the script

#!/usr/bin/env bash
# Soak harness smoke test — end-to-end canary against local dev.
# Expected runtime: ~30 seconds.
set -euo pipefail

cd "$(dirname "$0")/.."

: "${TEST_URL:=http://localhost:8000}"
: "${SOAK_INVITE_CODE:=SOAKTEST}"

echo "Smoke target: $TEST_URL"
echo "Invite code:  $SOAK_INVITE_CODE"

# 1. Health probe
curl -fsS "$TEST_URL/api/health" > /dev/null || {
  echo "FAIL: target server unreachable at $TEST_URL"
  exit 1
}

# 2. Ensure minimum accounts
if [ ! -f .env.stresstest ]; then
  echo "Seeding accounts..."
  npm run seed -- --count=4
fi

# 3. Run minimum viable scenario
TEST_URL="$TEST_URL" SOAK_INVITE_CODE="$SOAK_INVITE_CODE" \
  npm run soak -- \
    --scenario=populate \
    --accounts=2 \
    --rooms=1 \
    --cpus-per-room=0 \
    --games-per-room=1 \
    --holes=1 \
    --watch=none

echo "Smoke PASSED"

Step 2: Make it executable and run it

chmod +x tests/soak/scripts/smoke.sh
cd tests/soak && bash scripts/smoke.sh

Expected: Smoke PASSED within ~30s.

Step 3: Commit

git add tests/soak/scripts/smoke.sh
git commit -m "$(cat <<'EOF'
feat(soak): smoke test script — 30s end-to-end canary

Confirms the harness works against local dev with the absolute
minimum config. Run after any change.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
EOF
)"

Task 31: README + CHECKLIST

Replace the README stub with a full quickstart and flag reference. Add the manual validation checklist.

Files:

Modify: tests/soak/README.md
Create: tests/soak/CHECKLIST.md
Step 1: Rewrite tests/soak/README.md

# Golf Soak & UX Test Harness

Standalone Playwright-based runner that drives multi-user authenticated
game sessions for scoreboard population and stability testing.

**Spec:** `../../docs/superpowers/specs/2026-04-10-multiplayer-soak-test-design.md`
**Bring-up:** `../../docs/soak-harness-bringup.md`

## Quick start

```bash
cd tests/soak
npm install

# First run only: seed 16 accounts
TEST_URL=http://localhost:8000 SOAK_INVITE_CODE=SOAKTEST npm run seed

# 30-second end-to-end smoke test
bash scripts/smoke.sh

# Populate scoreboard (4 rooms × 4 accounts × 10 long games)
TEST_URL=http://localhost:8000 SOAK_INVITE_CODE=SOAKTEST \
  npm run soak:populate

# Stress test (4 rooms × 50 rapid games with chaos)
TEST_URL=http://localhost:8000 SOAK_INVITE_CODE=SOAKTEST \
  npm run soak:stress

CLI flags

--scenario=populate|stress    required
--accounts=<n>                total sessions (default: scenario.needs.accounts)
--rooms=<n>                   default from scenario.needs
--cpus-per-room=<n>           default from scenario.needs
--games-per-room=<n>          default from scenario.defaultConfig
--holes=<n>                   default from scenario.defaultConfig
--watch=none|dashboard|tiled  default: dashboard
--dashboard-port=<n>          default: 7777
--target=<url>                default: TEST_URL env
--run-id=<string>             default: ISO timestamp
--list                        print scenarios and exit
--dry-run                     validate config, don't run

Derived: accounts / rooms must divide evenly.

Environment variables

TEST_URL             target base URL (e.g. https://staging.adlee.work)
SOAK_INVITE_CODE     invite code flagged marks_as_test (staging: 5VC2MCCN)
SOAK_HOLES           override --holes
SOAK_ROOMS           override --rooms
SOAK_ACCOUNTS        override --accounts
SOAK_CPUS_PER_ROOM   override --cpus-per-room
SOAK_GAMES_PER_ROOM  override --games-per-room
SOAK_WATCH           override --watch
SOAK_DASHBOARD_PORT  override --dashboard-port

Watch modes

none — pure headless, JSON logs to stdout. Use for CI and overnight runs.
dashboard (default) — HTTP+WS server on localhost:7777 serving a live status grid. Click any player tile to watch their live session via CDP screencast.
tiled — 4 native Chromium windows for the host of each room, positioned in a 2×2 grid. Joiners stay headless.

Scenarios

Name	Description
`populate`	Long 9-hole games with varied CPU personalities, realistic pacing, for populating scoreboards
`stress`	Rapid 1-hole games with chaos injection (rapid clicks, offline toggles, tab blur) for hunting race conditions

Add new scenarios by creating scenarios/<name>.ts and registering in scenarios/index.ts.

Architecture

See the design spec for full module breakdown. Key modules:

runner.ts — CLI entry, wires everything together
core/session-pool.ts — owns browser contexts, seeds/logs in 16 accounts
core/room-coordinator.ts — host→joiners room-code handoff
core/watchdog.ts — per-room timeout detection
core/screencaster.ts — CDP Page.startScreencast for live video
dashboard/server.ts — HTTP + WS server
scenarios/ — pluggable scenarios

Reuses ../../tests/e2e/bot/golf-bot.ts unchanged.

Running tests (unit)

npm test

Tests cover Deferred, RoomCoordinator, Watchdog, and config. Integration-level modules are verified by the smoke test.


- [ ] **Step 2: Create `tests/soak/CHECKLIST.md`**

```markdown
# Soak Harness Manual Validation Checklist

Run after any significant change or before calling the implementation complete.

## Bring-up

- [ ] Local dev server is running (`python server/main.py`)
- [ ] `SOAKTEST` invite code exists locally with `marks_as_test=TRUE`
- [ ] `npm install` in `tests/soak/` succeeded
- [ ] `npm run seed -- --count=16` creates/updates 16 accounts
- [ ] `.env.stresstest` has 16 `SOAK_ACCOUNT_NN=...` lines
- [ ] All seeded users show `is_test_account=TRUE` in the DB

## Smoke

- [ ] `bash scripts/smoke.sh` exits 0 within 60s

## Scenarios

- [ ] `--scenario=populate --rooms=1 --games-per-room=1` completes cleanly
- [ ] `--scenario=populate --rooms=4 --games-per-room=1` runs 4 rooms in parallel with no cross-contamination
- [ ] `--scenario=stress --games-per-room=3` logs `chaos_injected` events

## Watch modes

- [ ] `--watch=none` produces JSONL on stdout, nothing else
- [ ] `--watch=dashboard` opens http://localhost:7777, grid renders, tiles update live, WS status shows `healthy`
- [ ] Clicking any player tile opens the video modal and streams live JPEG frames (~10 fps)
- [ ] Closing the modal stops the screencast (check logs for `screencast_stopped`)
- [ ] `--watch=tiled` opens 4 native Chromium windows for the 4 hosts

## Failure modes

- [ ] Ctrl-C during a run → graceful shutdown, summary printed, exit code 2
- [ ] Double Ctrl-C → hard exit (130)
- [ ] Killing the dev server mid-run → health probes fail 3× → fatal abort, artifacts captured, exit 1
- [ ] Artifacts directory contains a subdirectory per failed run with screenshots and state.json
- [ ] Artifacts older than 7 days are pruned on next startup

## Server-side filtering

- [ ] `GET /api/stats/leaderboard` (default) hides soak_* accounts
- [ ] `GET /api/stats/leaderboard?include_test=true` shows soak_* accounts
- [ ] Admin panel user list shows `[Test]` badge on soak_* accounts
- [ ] Admin panel "Include test accounts" checkbox filters them out
- [ ] Admin panel invite codes tab shows `[Test-seed]` next to SOAKTEST

## Post-deploy schema verification

Run after the server-side changes (Tasks 1–7) ship to each environment.

- [ ] Server restarted (docker compose up -d or CI/CD deploy)
- [ ] Server logs show `User store schema initialized` after restart
- [ ] `\d users_v2` on target DB shows `is_test_account` column with default `false`
- [ ] `\d invite_codes` shows `marks_as_test` column with default `false`
- [ ] `\d leaderboard_overall` shows `is_test_account` column
- [ ] `\di idx_users_v2_is_test_account` shows the partial index
- [ ] `SELECT count(*) FROM leaderboard_overall` returns nonzero (view re-populated after rebuild)
- [ ] Default leaderboard query still works: `curl .../api/stats/leaderboard` returns entries
- [ ] `?include_test=true` parameter is accepted (no 422/500)

## Staging bring-up (final step)

- [ ] `UPDATE invite_codes SET marks_as_test = TRUE WHERE code = '5VC2MCCN';` run on staging
- [ ] `SOAK_INVITE_CODE=5VC2MCCN TEST_URL=https://staging.adlee.work npm run seed -- --count=16` seeds staging accounts
- [ ] Staging run with `--scenario=populate --watch=none` completes
- [ ] Staging leaderboard with `include_test=true` shows the soak accounts
- [ ] Staging leaderboard default (no param) does NOT show the soak accounts

Step 3: Commit

git add tests/soak/README.md tests/soak/CHECKLIST.md
git commit -m "$(cat <<'EOF'
docs(soak): full README + manual validation checklist

Quickstart, flag reference, env var reference, scenario table, and
the bring-up/validation checklist that gates calling the harness
implementation complete.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
EOF
)"

Task 32: Staging bring-up (manual, no code)

This is a documentation-only task — the actual run happens on your workstation. Listed here so the implementation plan is complete end to end.

Step 1: Flag 5VC2MCCN as test-seed on staging

From your workstation (requires DB access to staging):

ssh root@129.212.150.189 \
  'docker exec -i golfgame-postgres psql -U postgres -d golfgame' <<'EOF'
UPDATE invite_codes SET marks_as_test = TRUE WHERE code = '5VC2MCCN';
SELECT code, max_uses, use_count, marks_as_test FROM invite_codes WHERE code = '5VC2MCCN';
EOF

Expected: marks_as_test | t.

(The exact docker container name may differ — adjust based on docker ps on the staging host.)

Step 2: Seed the 16 staging accounts

cd tests/soak
rm -f .env.stresstest
TEST_URL=https://staging.adlee.work \
  SOAK_INVITE_CODE=5VC2MCCN \
  npm run seed -- --count=16

Expected: .env.stresstest populated with 16 entries.

Step 3: Run populate against staging

TEST_URL=https://staging.adlee.work \
  SOAK_INVITE_CODE=5VC2MCCN \
  npm run soak -- \
    --scenario=populate \
    --rooms=4 \
    --games-per-room=3 \
    --holes=3 \
    --watch=dashboard

Expected: dashboard opens, 4 rooms play 3 games each, staging scoreboard accumulates data. Exit 0 at the end.

Step 4: Verify scoreboard filtering on staging

# Should NOT contain soak_* usernames
curl -s "https://staging.adlee.work/api/stats/leaderboard?metric=wins" | jq '.entries[] | select(.username | startswith("soak_"))'

# Should contain soak_* usernames
curl -s "https://staging.adlee.work/api/stats/leaderboard?metric=wins&include_test=true" | jq '.entries[] | select(.username | startswith("soak_"))'

Expected: first returns nothing, second returns entries.

Step 5: Mark implementation complete

Check off all items in tests/soak/CHECKLIST.md that correspond to this plan. Commit the filled-in checklist if you want a record:

git add tests/soak/CHECKLIST.md
git commit -m "docs(soak): checklist passed on initial staging run"

Phase 11 — Version bump

Task 33: Bump to v3.3.4 and add footer to admin.html

Updates all HTML footers from v3.1.6 to v3.3.4, adds a footer to admin.html which currently has none, bumps pyproject.toml.

Files:

Modify: client/index.html — both footer occurrences (L58, L291)
Modify: client/admin.html — add footer
Modify: pyproject.toml — version field
Step 1: Update client/index.html footers

grep -n "v3\.1\.6" client/index.html

For each match, replace v3.1.6 with v3.3.4. There should be exactly two matches.

Step 2: Add footer to client/admin.html

Find the closing </body> in client/admin.html and add a footer just before it:

<footer class="app-footer" style="text-align: center; padding: 16px; color: var(--muted, #666); font-size: 12px;">v3.3.4 &copy; Aaron D. Lee</footer>
</body>

(The inline style is a fallback — admin.css may already have an .app-footer class; if so, drop the inline styles.)

grep -n "app-footer" client/admin.css 2>/dev/null

Step 3: Bump pyproject.toml

sed -i 's/^version = "3\.1\.6"$/version = "3.3.4"/' pyproject.toml
grep version pyproject.toml

Expected: version = "3.3.4".

Step 4: Verify in the browser

Restart the dev server, open http://localhost:8000 and http://localhost:8000/admin.html. Confirm both show v3.3.4 in the footer.

Step 5: Commit

git add client/index.html client/admin.html pyproject.toml
git commit -m "$(cat <<'EOF'
chore: bump version to v3.3.4

Updates client/index.html footer (×2) and pyproject.toml from
v3.1.6 → v3.3.4, and adds a matching footer to client/admin.html
which previously had none.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
EOF
)"

Summary

33 tasks across 11 phases:

Phase	Tasks	Milestone
1 — Server changes	1–8	Stats filter works, test accounts are separable
2 — Harness scaffolding	9–12	Core pure-logic modules with Vitest tests pass
3 — SessionPool + seeding	13–14	`.env.stresstest` seeded via real HTTP
4 — First run	15–18	`--watch=none` smoke test passes end-to-end
5 — Dashboard	19–21	Live status grid in browser
6 — Live video	22–23	Click-to-watch CDP screencast
7 — Tiled mode	24	Native host windows
8 — Stress scenario	25	Chaos injection runs clean
9 — Failure handling	26–29	Watchdog + artifacts + graceful shutdown + health probes
10 — Polish	30–31	Smoke script + README + CHECKLIST
11 — Version bump	33	v3.3.4 everywhere

(Task 32 is the manual staging bring-up — no code.)

Dependencies between tasks:

Tasks 1–8 are independent of the harness (ship them first if you want immediate value for admins)
Tasks 9–18 are strictly sequential (each builds on the previous)
Tasks 19–21, 22–23, 24, 25 are independent of each other — can be done in any order after Task 18
Tasks 26–29 can be done after Task 18 but are most valuable after Task 25
Tasks 30–31 come last before staging
Task 33 is independent and can be done any time after Task 8

168 KiB Raw Blame History Unescape Escape

Multiplayer Soak & UX Test Harness — Implementation Plan

Testing Strategy Notes

Phase 1 — Server-side changes (independent, ships first)

Task 1: Schema migration for is_test_account and marks_as_test

Task 2: Propagate is_test_account through User model and user_store

Task 3: Expose marks_as_test on InviteCode and add lookup helper

Task 4: Wire register flow to set is_test_account from invite

Task 5: Stats filtering (include_test parameter)

Task 6: Admin service + route surfaces is_test_account

Task 7: Admin panel UI — Test badge and filter toggle

Task 8: Document the one-time staging setup step

Local dev

2. Run the harness

Phase 2 — Harness scaffolding

Task 9: Create the tests/soak/ package skeleton

Task 10: Core types and Deferred helper

Task 11: RoomCoordinator with tests

Task 12: Structured JSONL logger

Phase 3 — SessionPool and seeding

Task 13: SessionPool with HTTP registration and localStorage warm-start

Task 14: seed-accounts.ts CLI wrapper

Phase 4 — First scenario, config, runner (end-to-end milestone)

Task 15: Shared multiplayer-game helper

Task 16: Populate scenario (minimal version)

Task 17: Config parsing with tests

Task 18: runner.ts entry point — first end-to-end milestone

Phase 5 — Dashboard status grid

Task 19: Dashboard HTTP + WS server

Task 20: Dashboard HTML/CSS/JS status grid

Task 21: Wire WATCH=dashboard in runner

Phase 6 — Live video click-to-watch

Task 22: CDP screencast module

Task 23: Wire screencaster to dashboard click-to-watch

Phase 7 — Tiled mode

Task 24: --watch=tiled native windows

Phase 8 — Stress scenario

Task 25: Chaos injector + stress scenario

Phase 9 — Failure handling

Task 26: Watchdog + heartbeat wiring

Task 27: Artifact capture on failure

Task 28: Graceful shutdown (already partially in place) + exit codes

Task 29: Periodic health probes

Phase 10 — Polish and bring-up

Task 30: Smoke test script

Task 31: README + CHECKLIST

CLI flags

Environment variables

Watch modes

Scenarios

Architecture

Running tests (unit)

Task 32: Staging bring-up (manual, no code)

Phase 11 — Version bump

Task 33: Bump to v3.3.4 and add footer to admin.html

Summary

168 KiB

Raw Blame History

Task 1: Schema migration for `is_test_account` and `marks_as_test`

Task 2: Propagate `is_test_account` through `User` model and `user_store`

Task 3: Expose `marks_as_test` on `InviteCode` and add lookup helper

Task 4: Wire register flow to set `is_test_account` from invite

Task 5: Stats filtering (`include_test` parameter)

Task 6: Admin service + route surfaces `is_test_account`

Task 9: Create the `tests/soak/` package skeleton

Task 10: Core types and `Deferred` helper

Task 14: `seed-accounts.ts` CLI wrapper

Task 18: `runner.ts` entry point — first end-to-end milestone

Task 21: Wire `WATCH=dashboard` in runner

Task 24: `--watch=tiled` native windows