golfgame/docs/superpowers/plans/2026-04-10-multiplayer-soak-test.md

# Multiplayer Soak & UX Test Harness — Implementation Plan

> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.

**Goal:** Build a standalone Playwright-based soak runner in `tests/soak/` that drives 16 authenticated browser sessions across 4 concurrent rooms playing many multiplayer games, with pluggable scenarios, a click-to-watch dashboard via CDP screencast, and strict per-room failure isolation.

**Architecture:** Single-process node runner reusing the existing `GolfBot` class from `tests/e2e/bot/`. One shared browser (16 contexts) by default; `WATCH=tiled` uses a second headed browser for the 4 host contexts. Scenarios are plain TS modules exported from `tests/soak/scenarios/`. Dashboard is a tiny HTTP+WS server serving one static page that pushes live status and on-demand CDP screencast frames.

**Tech Stack:** TypeScript + tsx (no build step), Playwright Core, ws (WebSocket server), Vitest for unit tests, FastAPI + asyncpg (existing server), PostgreSQL (existing).

**Spec:** `docs/superpowers/specs/2026-04-10-multiplayer-soak-test-design.md`

---

## Testing Strategy Notes

- **Server-side Python changes:** The existing test suite mocks stores with `AsyncMock` and has no real-Postgres fixtures. Rather than inventing a new fixture pattern for this plan, server tasks use **curl-based verification against a running local dev server** as the explicit verification step after each commit. Run `python server/main.py` in another terminal (requires Postgres + Redis running — see `docs/INSTALL.md`).
- **TypeScript harness logic:** Unit-tested with Vitest for pure modules (Deferred, RoomCoordinator, Watchdog, Config). Integration-level modules (SessionPool, Dashboard, Screencaster, Scenarios) are verified by running the harness itself via the smoke test.
- **End-to-end validation:** `tests/soak/scripts/smoke.sh` is the canary — after every non-trivial change, run it against local dev and expect exit 0 within ~30s.

---

## Phase 1 — Server-side changes (independent, ships first)

### Task 1: Schema migration for `is_test_account` and `marks_as_test`

Add two columns, one partial index, and rebuild the `leaderboard_overall` materialized view to include `is_test_account` (so the filter works through the view fast path). Fits the existing inline-migration pattern in `user_store.py`.

**Files:**
- Modify: `server/stores/user_store.py` — append to `SCHEMA_SQL` (ALTER blocks near L79–L98 and the matview block near L298–L335)

- [ ] **Step 1: Add column migration to `SCHEMA_SQL`**

Open `server/stores/user_store.py`. Inside the first `DO $$ BEGIN ... END $$;` block (around line 80–98 that handles admin columns), append the `is_test_account` column check. Then add a second ALTER for `invite_codes.marks_as_test` in a new `DO $$` block right after.

Add after the existing `last_seen_at` check (before `END $$;` on line ~98):

```sql
    IF NOT EXISTS (SELECT 1 FROM information_schema.columns
                   WHERE table_name = 'users_v2' AND column_name = 'is_test_account') THEN
        ALTER TABLE users_v2 ADD COLUMN is_test_account BOOLEAN DEFAULT FALSE;
    END IF;
```

Then, immediately after the `END $$;` that closes the users_v2 admin block, add a new block for invite_codes:

```sql
-- Add marks_as_test to invite_codes if not exists
DO $$
BEGIN
    IF NOT EXISTS (SELECT 1 FROM information_schema.columns
                   WHERE table_name = 'invite_codes' AND column_name = 'marks_as_test') THEN
        ALTER TABLE invite_codes ADD COLUMN marks_as_test BOOLEAN DEFAULT FALSE;
    END IF;
END $$;
```

- [ ] **Step 2: Add partial index on `is_test_account`**

Find the indexes block near line 338. After the existing `idx_users_banned` index (line ~344), add:

```sql
CREATE INDEX IF NOT EXISTS idx_users_v2_is_test_account ON users_v2(is_test_account)
    WHERE is_test_account = TRUE;
```

- [ ] **Step 3: Rebuild `leaderboard_overall` materialized view to include `is_test_account`**

Find the existing matview block at line ~298. Modify the version-check DO block so the view is dropped and recreated if it lacks the `is_test_account` column. Replace the existing block:

```sql
-- Leaderboard materialized view (refreshed periodically)
-- Drop and recreate if missing is_test_account column (soak harness migration)
DO $$
BEGIN
    IF EXISTS (SELECT 1 FROM pg_matviews WHERE matviewname = 'leaderboard_overall') THEN
        -- Check if is_test_account column exists in the view
        IF NOT EXISTS (
            SELECT 1 FROM information_schema.columns
            WHERE table_name = 'leaderboard_overall' AND column_name = 'is_test_account'
        ) THEN
            DROP MATERIALIZED VIEW leaderboard_overall;
        END IF;
    END IF;

    IF NOT EXISTS (SELECT 1 FROM pg_matviews WHERE matviewname = 'leaderboard_overall') THEN
        EXECUTE '
            CREATE MATERIALIZED VIEW leaderboard_overall AS
            SELECT
                u.id as user_id,
                u.username,
                COALESCE(u.is_test_account, FALSE) as is_test_account,
                s.games_played,
                s.games_won,
                ROUND(s.games_won::numeric / NULLIF(s.games_played, 0) * 100, 1) as win_rate,
                s.rounds_won,
                ROUND(s.total_points::numeric / NULLIF(s.total_rounds, 0), 1) as avg_score,
                s.best_score as best_round_score,
                s.knockouts,
                s.best_win_streak,
                COALESCE(s.rating, 1500) as rating,
                s.last_game_at
            FROM player_stats s
            JOIN users_v2 u ON s.user_id = u.id
            WHERE s.games_played >= 5
            AND u.deleted_at IS NULL
            AND (u.is_banned = false OR u.is_banned IS NULL)
        ';
    END IF;
END $$;
```

Note: the only differences from the existing block are the changed comment, the changed column-existence check (`is_test_account` instead of `rating`), and the new `COALESCE(u.is_test_account, FALSE) as is_test_account` column in the SELECT. Everything else stays identical.

- [ ] **Step 4: Start the server to run migrations**

Run (in another terminal, with Postgres + Redis up):

```bash
cd /home/alee/Sources/golfgame
python server/main.py
```

Expected: server starts cleanly, no errors about `is_test_account` or `marks_as_test` or `leaderboard_overall`.

- [ ] **Step 5: Verify schema via psql**

Connect to the dev database and confirm:

```bash
psql -d golfgame -c "\d users_v2" | grep is_test_account
psql -d golfgame -c "\d invite_codes" | grep marks_as_test
psql -d golfgame -c "\d leaderboard_overall" | grep is_test_account
psql -d golfgame -c "\di idx_users_v2_is_test_account"
```

Expected: all four commands return matching rows.

- [ ] **Step 6: Commit**

```bash
git add server/stores/user_store.py
git commit -m "$(cat <<'EOF'
feat(server): add is_test_account + marks_as_test schema

New columns support separating soak-harness test traffic from real
user traffic in stats queries. Rebuilds leaderboard_overall matview
to include is_test_account so the fast path stays filterable.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
EOF
)"
```

---

### Task 2: Propagate `is_test_account` through `User` model and `user_store`

Wire the new column into the `User` dataclass, `create_user` signature, `_row_to_user` mapping, and every SELECT list that already pulls user columns.

**Files:**
- Modify: `server/models/user.py` — `User` dataclass (L22–L68) + `to_dict` (L82–L116) + `from_dict` (L118+)
- Modify: `server/stores/user_store.py` — `create_user` (L454–L501), `_row_to_user` (L997–L1020), `get_user_by_id`/`get_user_by_username`/`get_user_by_email` SELECT lists (L503–L570)

- [ ] **Step 1: Add `is_test_account` to the `User` dataclass**

In `server/models/user.py`, add a new field to the `User` dataclass (after `force_password_reset` on L68):

```python
    is_test_account: bool = False
```

Update the docstring `Attributes:` block around L45 to include:

```
        is_test_account: True for accounts created by the soak test harness.
```

- [ ] **Step 2: Include `is_test_account` in `to_dict` and `from_dict`**

In `User.to_dict` at L82, add to the `d` dict (after `force_password_reset`):

```python
            "is_test_account": self.is_test_account,
```

In `User.from_dict`, add the corresponding parse — find where `force_password_reset` is parsed and add the same pattern:

```python
            is_test_account=d.get("is_test_account", False),
```

- [ ] **Step 3: Add `is_test_account` parameter to `create_user`**

In `server/stores/user_store.py` at L454, add a new parameter:

```python
    async def create_user(
        self,
        username: str,
        password_hash: str,
        email: Optional[str] = None,
        role: UserRole = UserRole.USER,
        guest_id: Optional[str] = None,
        verification_token: Optional[str] = None,
        verification_expires: Optional[datetime] = None,
        is_test_account: bool = False,
    ) -> Optional[User]:
```

Update the docstring to add a line in `Args:` describing `is_test_account`.

Change the INSERT SQL block to include the new column:

```python
                row = await conn.fetchrow(
                    """
                    INSERT INTO users_v2 (username, password_hash, email, role, guest_id,
                                          verification_token, verification_expires,
                                          is_test_account)
                    VALUES ($1, $2, $3, $4, $5, $6, $7, $8)
                    RETURNING id, username, email, password_hash, role, email_verified,
                              verification_token, verification_expires, reset_token, reset_expires,
                              guest_id, deleted_at, preferences, created_at, last_login, last_seen_at,
                              is_active, is_banned, ban_reason, force_password_reset, is_test_account
                    """,
                    username,
                    password_hash,
                    email,
                    role.value,
                    guest_id,
                    verification_token,
                    verification_expires,
                    is_test_account,
                )
```

- [ ] **Step 4: Update `_row_to_user` mapping**

In `server/stores/user_store.py` at L997, add to the `User(...)` call (after `force_password_reset`):

```python
            is_test_account=row.get("is_test_account", False) or False,
```

- [ ] **Step 5: Update all other SELECT lists in user_store**

Find every query in `server/stores/user_store.py` that returns a full user row and passes it to `_row_to_user`. Add `is_test_account` to the SELECT column list for each. Grep to find them:

```bash
grep -n "is_active, is_banned, ban_reason, force_password_reset" server/stores/user_store.py
```

For each match, append `, is_test_account` to the SELECT list. Expected locations:
- `create_user` INSERT ... RETURNING (already updated in Step 3)
- `get_user_by_id` at L503
- `get_user_by_username` at L519
- `get_user_by_email` (find it)
- Any other `SELECT` ... FROM users_v2 that calls `_row_to_user`

- [ ] **Step 6: Restart server, verify no errors**

```bash
# Kill and restart the dev server
python server/main.py
```

Expected: server starts cleanly. Any query that touches users now returns `is_test_account` correctly.

- [ ] **Step 7: Smoke test via curl**

```bash
# Register a throwaway test user (no invite code needed if DAILY_OPEN_SIGNUPS > 0 locally,
# or use the 5VC2MCCN invite code if INVITE_ONLY=true)
# Set PW to any password of your choice (>= 8 chars).
PW='SomeTestPw_1!'
curl -sX POST http://localhost:8000/api/auth/register \
  -H 'Content-Type: application/json' \
  -d "{\"username\":\"soaktest_smoke1\",\"password\":\"$PW\",\"email\":\"soaktest_smoke1@example.com\",\"invite_code\":\"5VC2MCCN\"}"
```

Expected: HTTP 200 with `{"user":{...},"token":"..."}`. The registration path now runs through the new column without errors even though the value is still always FALSE at this stage.

- [ ] **Step 8: Commit**

```bash
git add server/models/user.py server/stores/user_store.py
git commit -m "$(cat <<'EOF'
feat(server): propagate is_test_account through User model & store

User dataclass, create_user, and all SELECT lists now round-trip the
new column. Value is always FALSE until Task 4 wires the register
flow to the invite code's marks_as_test flag.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
EOF
)"
```

---

### Task 3: Expose `marks_as_test` on `InviteCode` and add lookup helper

`validate_invite_code` currently returns a bare bool. We need a new helper that returns the full row so the register flow can check `marks_as_test` without a second query.

**Files:**
- Modify: `server/services/admin_service.py` — `InviteCode` dataclass (L115–L138), `get_invite_codes` SELECT (L1106–L1141), add new `get_invite_code_details` method

- [ ] **Step 1: Add `marks_as_test` field to `InviteCode` dataclass**

In `server/services/admin_service.py` at L115:

```python
@dataclass
class InviteCode:
    """Invite code details."""
    code: str
    created_by: str
    created_by_username: str
    created_at: datetime
    expires_at: datetime
    max_uses: int
    use_count: int
    is_active: bool
    marks_as_test: bool = False
```

Update `to_dict` at L127 to include the field:

```python
    def to_dict(self) -> dict:
        return {
            "code": self.code,
            "created_by": self.created_by,
            "created_by_username": self.created_by_username,
            "created_at": self.created_at.isoformat() if self.created_at else None,
            "expires_at": self.expires_at.isoformat() if self.expires_at else None,
            "max_uses": self.max_uses,
            "use_count": self.use_count,
            "is_active": self.is_active,
            "remaining_uses": max(0, self.max_uses - self.use_count),
            "marks_as_test": self.marks_as_test,
        }
```

- [ ] **Step 2: Update `get_invite_codes` SELECT to include `marks_as_test`**

Find `get_invite_codes` at L1106. Modify the SQL to pull the column and pass it through:

```python
    async def get_invite_codes(self, include_expired: bool = False) -> List[InviteCode]:
        """List all invite codes."""
        async with self.pool.acquire() as conn:
            sql = """
                SELECT c.code, c.created_by, u.username as created_by_username,
                       c.created_at, c.expires_at,
                       c.max_uses, c.use_count, c.is_active,
                       COALESCE(c.marks_as_test, FALSE) as marks_as_test
                FROM invite_codes c
                LEFT JOIN users_v2 u ON c.created_by = u.id
            """
```

Find the list comprehension that constructs `InviteCode(...)` objects and add the new kwarg:

```python
                InviteCode(
                    code=row["code"],
                    created_by=str(row["created_by"]),
                    created_by_username=row["created_by_username"] or "unknown",
                    created_at=row["created_at"].replace(tzinfo=timezone.utc) if row["created_at"] else None,
                    expires_at=row["expires_at"].replace(tzinfo=timezone.utc) if row["expires_at"] else None,
                    max_uses=row["max_uses"],
                    use_count=row["use_count"],
                    is_active=row["is_active"],
                    marks_as_test=row["marks_as_test"],
                )
```

- [ ] **Step 3: Add new `get_invite_code_details` method**

Add a new method right after `validate_invite_code` (around L1214) that returns the row with `marks_as_test`. The register flow will call this to resolve the flag. Place it between `validate_invite_code` and `use_invite_code`:

```python
    async def get_invite_code_details(self, code: str) -> Optional[dict]:
        """
        Look up an invite code's row including marks_as_test.

        Returns None if the code does not exist. Does NOT validate expiry
        or usage — use validate_invite_code for that. This is purely a
        helper for the register flow to discover the test-seed flag.
        """
        async with self.pool.acquire() as conn:
            row = await conn.fetchrow(
                """
                SELECT code, max_uses, use_count, is_active,
                       COALESCE(marks_as_test, FALSE) as marks_as_test
                FROM invite_codes
                WHERE code = $1
                """,
                code,
            )
            if not row:
                return None
            return {
                "code": row["code"],
                "max_uses": row["max_uses"],
                "use_count": row["use_count"],
                "is_active": row["is_active"],
                "marks_as_test": row["marks_as_test"],
            }
```

- [ ] **Step 4: Verify with curl via admin panel endpoint**

Assuming you have an admin token from a local dev user. Hit the existing admin invites listing:

```bash
# Replace TOKEN with a valid admin JWT
curl -s http://localhost:8000/api/admin/invites \
  -H "Authorization: Bearer $TOKEN" | jq '.codes[0]'
```

Expected: response includes `"marks_as_test": false` on at least one code.

- [ ] **Step 5: Commit**

```bash
git add server/services/admin_service.py
git commit -m "$(cat <<'EOF'
feat(server): expose marks_as_test on InviteCode

Adds the field to the dataclass, SELECT list in get_invite_codes,
and a new get_invite_code_details helper that the register flow
will use to discover whether an invite should flag new accounts
as test accounts.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
EOF
)"
```

---

### Task 4: Wire register flow to set `is_test_account` from invite

When a user registers with an invite whose `marks_as_test=TRUE`, the new account is flagged. The plumbing lives in two places: the router reads the flag and passes it to the service; the service passes it to the store.

**Files:**
- Modify: `server/routers/auth.py` — `register` handler (L224–L320)
- Modify: `server/services/auth_service.py` — `register` method (L98–L178)

- [ ] **Step 1: Add `is_test_account` parameter to `auth_service.register`**

In `server/services/auth_service.py` at L98, add the new parameter:

```python
    async def register(
        self,
        username: str,
        password: str,
        email: Optional[str] = None,
        guest_id: Optional[str] = None,
        is_test_account: bool = False,
    ) -> RegistrationResult:
```

Update the docstring `Args:` block:

```
            is_test_account: Mark this user as a soak-harness test account.
```

Pass the value through to `create_user` at L146:

```python
        user = await self.user_store.create_user(
            username=username,
            password_hash=password_hash,
            email=email,
            role=UserRole.USER,
            guest_id=guest_id,
            verification_token=verification_token,
            verification_expires=verification_expires,
            is_test_account=is_test_account,
        )
```

- [ ] **Step 2: Update the router to resolve `marks_as_test` and pass it through**

In `server/routers/auth.py`, find the `register` handler at L224. After the existing invite-code validation block (around L248–L252), fetch the invite details and compute `is_test`:

```python
    # --- Invite code validation ---
    is_test_account = False
    if has_invite:
        if not _admin_service:
            raise HTTPException(status_code=503, detail="Admin service not initialized")
        if not await _admin_service.validate_invite_code(request_body.invite_code):
            raise HTTPException(status_code=400, detail="Invalid or expired invite code")
        # Check if this invite flags new accounts as test accounts
        invite_details = await _admin_service.get_invite_code_details(request_body.invite_code)
        if invite_details and invite_details.get("marks_as_test"):
            is_test_account = True
```

Then pass it to `auth_service.register` at L276:

```python
    # --- Create the account ---
    result = await auth_service.register(
        username=request_body.username,
        password=request_body.password,
        email=request_body.email,
        is_test_account=is_test_account,
    )
```

- [ ] **Step 3: Flag the dev invite code for testing**

Before we can test end-to-end locally, we need an invite code with `marks_as_test=TRUE` in the local dev DB. Run (once, manually):

```bash
# First, check if 5VC2MCCN exists locally (it probably doesn't — that's staging's code).
# Create a local test invite code and flag it:
psql -d golfgame <<'EOF'
-- Create a local dev test-seed invite if not exists
INSERT INTO invite_codes (code, created_by, expires_at, max_uses, is_active, marks_as_test)
SELECT 'SOAKTEST', id, NOW() + INTERVAL '10 years', 100, TRUE, TRUE
FROM users_v2 WHERE role = 'admin' LIMIT 1
ON CONFLICT (code) DO UPDATE SET marks_as_test = TRUE;

-- Verify
SELECT code, max_uses, use_count, marks_as_test FROM invite_codes WHERE code = 'SOAKTEST';
EOF
```

Expected: `marks_as_test | t` in the last row.

- [ ] **Step 4: Verify register flow sets `is_test_account`**

Restart the dev server, then:

```bash
curl -sX POST http://localhost:8000/api/auth/register \
  -H 'Content-Type: application/json' \
  -d "{\"username\":\"soaktest_register1\",\"password\":\"$PW\",\"email\":\"soaktest_register1@example.com\",\"invite_code\":\"SOAKTEST\"}"

# Verify in DB
psql -d golfgame -c "SELECT username, is_test_account FROM users_v2 WHERE username = 'soaktest_register1';"
```

Expected: `is_test_account | t`.

- [ ] **Step 5: Verify non-test invite does NOT flag new accounts**

```bash
# Create a non-test invite
psql -d golfgame <<'EOF'
INSERT INTO invite_codes (code, created_by, expires_at, max_uses, is_active, marks_as_test)
SELECT 'NORMAL01', id, NOW() + INTERVAL '10 years', 10, TRUE, FALSE
FROM users_v2 WHERE role = 'admin' LIMIT 1
ON CONFLICT (code) DO UPDATE SET marks_as_test = FALSE;
EOF

curl -sX POST http://localhost:8000/api/auth/register \
  -H 'Content-Type: application/json' \
  -d "{\"username\":\"realuser_smoke1\",\"password\":\"$PW\",\"email\":\"realuser_smoke1@example.com\",\"invite_code\":\"NORMAL01\"}"

psql -d golfgame -c "SELECT username, is_test_account FROM users_v2 WHERE username = 'realuser_smoke1';"
```

Expected: `is_test_account | f`.

- [ ] **Step 6: Commit**

```bash
git add server/routers/auth.py server/services/auth_service.py
git commit -m "$(cat <<'EOF'
feat(server): register flow flags accounts from test-seed invites

When a user registers with an invite_code whose marks_as_test=TRUE,
their users_v2.is_test_account is set to TRUE. Normal invite codes
and invite-less signups are unaffected.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
EOF
)"
```

---

### Task 5: Stats filtering (`include_test` parameter)

Thread an `include_test: bool = False` parameter through `get_leaderboard`, `get_player_rank`, and the corresponding router handlers. Default is `False` — real users never see soak traffic.

**Files:**
- Modify: `server/services/stats_service.py` — `get_leaderboard` (L169), `get_player_rank` (L249)
- Modify: `server/routers/stats.py` — `get_leaderboard` route (L157), `get_player_rank` route (L227), `get_my_rank` route (L348)

- [ ] **Step 1: Add `include_test` to `get_leaderboard` service method**

In `server/services/stats_service.py` at L169:

```python
    async def get_leaderboard(
        self,
        metric: str = "wins",
        limit: int = 50,
        offset: int = 0,
        include_test: bool = False,
    ) -> List[LeaderboardEntry]:
```

Inside the method, find both SQL paths (materialized view and fallback). In the view path at L208, change the WHERE clause:

```python
            if view_exists:
                # Use materialized view for performance
                rows = await conn.fetch(f"""
                    SELECT
                        user_id, username, games_played, games_won,
                        win_rate, avg_score, knockouts, best_win_streak,
                        COALESCE(rating, 1500) as rating,
                        ROW_NUMBER() OVER (ORDER BY {column} {direction}) as rank
                    FROM leaderboard_overall
                    WHERE ($3 OR NOT is_test_account)
                    ORDER BY {column} {direction}
                    LIMIT $1 OFFSET $2
                """, limit, offset, include_test)
```

In the fallback path at L220, add the WHERE clause and parameter:

```python
            else:
                # Fall back to direct query
                rows = await conn.fetch(f"""
                    SELECT
                        s.user_id, u.username, s.games_played, s.games_won,
                        ROUND(s.games_won::numeric / NULLIF(s.games_played, 0) * 100, 1) as win_rate,
                        ROUND(s.total_points::numeric / NULLIF(s.total_rounds, 0), 1) as avg_score,
                        s.knockouts, s.best_win_streak,
                        COALESCE(s.rating, 1500) as rating,
                        ROW_NUMBER() OVER (ORDER BY {column} {direction}) as rank
                    FROM player_stats s
                    JOIN users_v2 u ON s.user_id = u.id
                    WHERE s.games_played >= 5
                    AND u.deleted_at IS NULL
                    AND (u.is_banned = false OR u.is_banned IS NULL)
                    AND ($3 OR NOT COALESCE(u.is_test_account, FALSE))
                    ORDER BY {column} {direction}
                    LIMIT $1 OFFSET $2
                """, limit, offset, include_test)
```

- [ ] **Step 2: Apply the same pattern to `get_player_rank`**

In `server/services/stats_service.py` at L249:

```python
    async def get_player_rank(
        self,
        user_id: str,
        metric: str = "wins",
        include_test: bool = False,
    ) -> Optional[int]:
```

Update both SQL paths to include the `include_test` filter. View path at L287:

```python
            if view_exists:
                row = await conn.fetchrow(f"""
                    SELECT rank FROM (
                        SELECT user_id, ROW_NUMBER() OVER (ORDER BY {column} {direction}) as rank
                        FROM leaderboard_overall
                        WHERE ($2 OR NOT is_test_account)
                    ) ranked
                    WHERE user_id = $1
                """, user_id, include_test)
```

Fallback path at L294:

```python
            else:
                row = await conn.fetchrow(f"""
                    SELECT rank FROM (
                        SELECT s.user_id, ROW_NUMBER() OVER (ORDER BY {column} {direction}) as rank
                        FROM player_stats s
                        JOIN users_v2 u ON s.user_id = u.id
                        WHERE s.games_played >= 5
                        AND u.deleted_at IS NULL
                        AND (u.is_banned = false OR u.is_banned IS NULL)
                        AND ($2 OR NOT COALESCE(u.is_test_account, FALSE))
                    ) ranked
                    WHERE user_id = $1
                """, user_id, include_test)
```

- [ ] **Step 3: Expose `include_test` as a query parameter on the leaderboard route**

In `server/routers/stats.py` at L157:

```python
@router.get("/leaderboard", response_model=LeaderboardResponse)
async def get_leaderboard(
    metric: str = Query("wins", pattern="^(wins|win_rate|avg_score|knockouts|streak|rating)$"),
    limit: int = Query(50, ge=1, le=100),
    offset: int = Query(0, ge=0),
    include_test: bool = Query(False, description="Include soak-harness test accounts"),
    service: StatsService = Depends(get_stats_service_dep),
):
    """
    Get leaderboard by metric.

    Metrics:
    - wins: Total games won
    - win_rate: Win percentage (requires 5+ games)
    - avg_score: Average points per round (lower is better)
    - knockouts: Times going out first
    - streak: Best win streak

    Players must have 5+ games to appear on leaderboards.
    By default, soak-harness test accounts are hidden.
    """
    entries = await service.get_leaderboard(metric, limit, offset, include_test)
```

- [ ] **Step 4: Same for `get_player_rank` and `get_my_rank` routes**

At L227:

```python
@router.get("/players/{user_id}/rank", response_model=PlayerRankResponse)
async def get_player_rank(
    user_id: str,
    metric: str = Query("wins", pattern="^(wins|win_rate|avg_score|knockouts|streak|rating)$"),
    include_test: bool = Query(False),
    service: StatsService = Depends(get_stats_service_dep),
):
    """Get player's rank on a leaderboard."""
    rank = await service.get_player_rank(user_id, metric, include_test)
```

At L348:

```python
@router.get("/me/rank", response_model=PlayerRankResponse)
async def get_my_rank(
    metric: str = Query("wins", pattern="^(wins|win_rate|avg_score|knockouts|streak|rating)$"),
    include_test: bool = Query(False),
    user: User = Depends(require_user),
    service: StatsService = Depends(get_stats_service_dep),
):
    """Get current user's rank on a leaderboard."""
    rank = await service.get_player_rank(user.id, metric, include_test)
```

- [ ] **Step 5: Verify filtering works via curl**

```bash
# Mark a test user we registered earlier as having games played (synthetic)
psql -d golfgame <<'EOF'
INSERT INTO player_stats (user_id, games_played, games_won, total_points, total_rounds, rounds_won)
SELECT id, 10, 8, 50, 30, 20 FROM users_v2 WHERE username = 'soaktest_register1'
ON CONFLICT (user_id) DO UPDATE SET games_played = 10, games_won = 8;

-- Refresh the matview so the test account shows up
REFRESH MATERIALIZED VIEW leaderboard_overall;
EOF

# Default (include_test=false) should NOT include soaktest_register1
curl -s "http://localhost:8000/api/stats/leaderboard?metric=wins" | jq '.entries[] | select(.username | startswith("soaktest_"))'

# include_test=true should include soaktest_register1
curl -s "http://localhost:8000/api/stats/leaderboard?metric=wins&include_test=true" | jq '.entries[] | select(.username | startswith("soaktest_"))'
```

Expected: first command returns nothing, second returns a JSON object for `soaktest_register1`.

- [ ] **Step 6: Commit**

```bash
git add server/services/stats_service.py server/routers/stats.py
git commit -m "$(cat <<'EOF'
feat(server): stats queries support include_test filter

Leaderboard and rank queries take an optional include_test param
(default false). Real users never see soak-harness traffic unless
they explicitly opt in via ?include_test=true.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
EOF
)"
```

---

### Task 6: Admin service + route surfaces `is_test_account`

`UserDetails` exposes the flag, `search_users` selects it, and `list_users` admin route accepts an `include_test` query parameter.

**Files:**
- Modify: `server/services/admin_service.py` — `UserDetails` (L24–L58), `search_users` (L312–L382), `get_user` (L384–L428)
- Modify: `server/routers/admin.py` — `list_users` route (L80–L107)

- [ ] **Step 1: Add field to `UserDetails` dataclass**

In `server/services/admin_service.py` at L24, add to the dataclass:

```python
@dataclass
class UserDetails:
    """Extended user info for admin view."""
    id: str
    username: str
    email: Optional[str]
    role: str
    email_verified: bool
    is_banned: bool
    ban_reason: Optional[str]
    force_password_reset: bool
    created_at: datetime
    last_login: Optional[datetime]
    last_seen_at: Optional[datetime]
    is_active: bool
    games_played: int
    games_won: int
    is_test_account: bool = False
```

Update `to_dict` to include it:

```python
    def to_dict(self) -> dict:
        return {
            "id": self.id,
            "username": self.username,
            "email": self.email,
            "role": self.role,
            "email_verified": self.email_verified,
            "is_banned": self.is_banned,
            "ban_reason": self.ban_reason,
            "force_password_reset": self.force_password_reset,
            "created_at": self.created_at.isoformat() if self.created_at else None,
            "last_login": self.last_login.isoformat() if self.last_login else None,
            "last_seen_at": self.last_seen_at.isoformat() if self.last_seen_at else None,
            "is_active": self.is_active,
            "games_played": self.games_played,
            "games_won": self.games_won,
            "is_test_account": self.is_test_account,
        }
```

- [ ] **Step 2: Update `search_users` to SELECT and filter on `is_test_account`**

In `server/services/admin_service.py` at L312, add `include_test` parameter and column to the SELECT:

```python
    async def search_users(
        self,
        query: str = "",
        limit: int = 50,
        offset: int = 0,
        include_banned: bool = True,
        include_deleted: bool = False,
        include_test: bool = True,
    ) -> List[UserDetails]:
```

Modify the SQL to pull `is_test_account`:

```python
            sql = """
                SELECT u.id, u.username, u.email, u.role,
                       u.email_verified, u.is_banned, u.ban_reason,
                       u.force_password_reset, u.created_at, u.last_login,
                       u.last_seen_at, u.is_active,
                       COALESCE(u.is_test_account, FALSE) as is_test_account,
                       COALESCE(s.games_played, 0) as games_played,
                       COALESCE(s.games_won, 0) as games_won
                FROM users_v2 u
                LEFT JOIN player_stats s ON u.id = s.user_id
                WHERE 1=1
            """
```

After the existing `include_deleted` check, add:

```python
            if not include_test:
                sql += " AND (u.is_test_account = false OR u.is_test_account IS NULL)"
```

Update the `UserDetails(...)` construction in the list comprehension to include `is_test_account=row["is_test_account"]`.

- [ ] **Step 3: Update `get_user` (single-user lookup) similarly**

In `server/services/admin_service.py` at L384, add `COALESCE(u.is_test_account, FALSE) as is_test_account` to the SELECT and `is_test_account=row["is_test_account"]` to the `UserDetails(...)` construction. The `get_user` method does NOT need the filter parameter — admins looking up individual users should always see them.

- [ ] **Step 4: Add `include_test` to the admin `list_users` route**

In `server/routers/admin.py` at L80:

```python
@router.get("/users")
async def list_users(
    query: str = "",
    limit: int = 50,
    offset: int = 0,
    include_banned: bool = True,
    include_deleted: bool = False,
    include_test: bool = True,
    admin: User = Depends(require_admin_v2),
    service: AdminService = Depends(get_admin_service_dep),
):
    """
    Search and list users.

    Args:
        query: Search by username or email.
        limit: Maximum results to return.
        offset: Results to skip.
        include_banned: Include banned users.
        include_deleted: Include soft-deleted users.
        include_test: Include soak-harness test accounts (default true for admins).
    """
    users = await service.search_users(
        query=query,
        limit=limit,
        offset=offset,
        include_banned=include_banned,
        include_deleted=include_deleted,
        include_test=include_test,
    )
    return {"users": [u.to_dict() for u in users]}
```

Note: default is `True` for the admin path — admins should see everything by default. The client-side toggle will explicitly pass `false` when the admin wants to hide test accounts.

- [ ] **Step 5: Verify via curl**

```bash
# Assuming admin token in $TOKEN env var
curl -s "http://localhost:8000/api/admin/users?query=soaktest" \
  -H "Authorization: Bearer $TOKEN" | jq '.users[] | {username, is_test_account}'

curl -s "http://localhost:8000/api/admin/users?query=soaktest&include_test=false" \
  -H "Authorization: Bearer $TOKEN" | jq '.users[]'
```

Expected: first returns users with `is_test_account: true`; second returns empty (test accounts filtered out).

- [ ] **Step 6: Commit**

```bash
git add server/services/admin_service.py server/routers/admin.py
git commit -m "$(cat <<'EOF'
feat(server): admin users list surfaces is_test_account

UserDetails carries the new column, search_users selects and
optionally filters on it, and the /api/admin/users route accepts
?include_test=false to hide soak-harness accounts.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
EOF
)"
```

---

### Task 7: Admin panel UI — Test badge and filter toggle

Add a visible `[Test]` badge on test accounts in the admin user list, a `[Test-seed]` indicator on invite codes that mark new accounts as test, and an "Include test accounts" checkbox next to the existing "Include banned" toggle.

**Files:**
- Modify: `client/admin.html` — add the new toggle near the existing `#include-banned` checkbox
- Modify: `client/admin.js` — `loadUsers` (L305), `getStatusBadge` (L246), the invite codes renderer (L443)

- [ ] **Step 1: Add the "Include test accounts" checkbox to admin.html**

In `client/admin.html`, find the existing `#include-banned` checkbox (it's in the users tab filter bar — grep for it). Add a sibling checkbox right after:

```bash
grep -n "include-banned" client/admin.html
```

Add next to that line:

```html
<label>
  <input type="checkbox" id="include-test" />
  Include test accounts
</label>
```

- [ ] **Step 2: Read the new checkbox in `loadUsers` and pass to getUsers**

In `client/admin.js` at L305:

```javascript
async function loadUsers() {
    try {
        const query = document.getElementById('user-search').value;
        const includeBanned = document.getElementById('include-banned').checked;
        const includeTest = document.getElementById('include-test').checked;
        const data = await getUsers(query, usersPage * PAGE_SIZE, includeBanned, includeTest);
```

Find `getUsers` at L70 and add the new parameter:

```javascript
async function getUsers(query = '', offset = 0, includeBanned = true, includeTest = true) {
    const params = new URLSearchParams({
        query,
        limit: PAGE_SIZE,
        offset,
        include_banned: includeBanned,
        include_test: includeTest,
    });
    return apiRequest(`/api/admin/users?${params}`);
}
```

Note: the existing signature builds a URLSearchParams — check the actual code at L70 and match its style; the key change is adding `include_test: includeTest` to the params.

- [ ] **Step 3: Add a "Test" badge to the user table row**

In `client/admin.js` at L314, modify the table row template to render a Test badge inline with the status badge:

```javascript
        data.users.forEach(user => {
            const testBadge = user.is_test_account
                ? '<span class="badge badge-info" title="Soak harness test account">Test</span>'
                : '';
            tbody.innerHTML += `
                <tr>
                    <td>${escapeHtml(user.username)} ${testBadge}</td>
                    <td>${escapeHtml(user.email || '-')}</td>
                    <td><span class="badge badge-${user.role === 'admin' ? 'info' : 'muted'}">${user.role}</span></td>
                    <td>${getStatusBadge(user)}</td>
                    <td>${user.games_played} (${user.games_won} wins)</td>
                    <td>${formatDateShort(user.created_at)}</td>
                    <td>
                        <button class="btn btn-small" data-action="view-user" data-id="${user.id}">View</button>
                    </td>
                </tr>
            `;
        });
```

- [ ] **Step 4: Add Test-seed indicator to invite codes list**

In `client/admin.js` around L443 (invite codes list renderer), find the row template and add a `[Test-seed]` badge when `invite.marks_as_test`:

```bash
grep -n "invite.is_active\|invite.code\|invites-tbody\|invites-table" client/admin.js | head
```

Once located, modify the row template to include:

```javascript
            const testSeedBadge = invite.marks_as_test
                ? '<span class="badge badge-info" title="Creates test accounts">Test-seed</span>'
                : '';
            // Insert testSeedBadge into the invite code column, e.g.
            // <td>${escapeHtml(invite.code)} ${testSeedBadge}</td>
```

- [ ] **Step 5: Wire the checkbox change event to reload users**

Find where `#include-banned` has its `change` listener attached (grep for it in admin.js):

```bash
grep -n "include-banned.*addEventListener\|include-banned" client/admin.js
```

Add a parallel listener for `#include-test` that calls `loadUsers()`:

```javascript
document.getElementById('include-test').addEventListener('change', () => {
    usersPage = 0;
    loadUsers();
});
```

- [ ] **Step 6: Manual verification in browser**

1. Open http://localhost:8000/admin.html
2. Log in as admin
3. Navigate to Users tab
4. Search for "soaktest"
5. Confirm the `[Test]` badge appears next to `soaktest_register1`
6. Uncheck "Include test accounts" — the row should disappear
7. Re-check it — the row should return
8. Navigate to Invite Codes tab
9. Confirm the `[Test-seed]` badge appears next to the `SOAKTEST` code

- [ ] **Step 7: Commit**

```bash
git add client/admin.html client/admin.js
git commit -m "$(cat <<'EOF'
feat(admin): visible Test/Test-seed badges + filter toggle

Users table shows [Test] next to soak-harness accounts, invite codes
list shows [Test-seed] next to codes that flag new accounts as test,
and a new "Include test accounts" checkbox lets admins hide bot
traffic from the user list.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
EOF
)"
```

---

### Task 8: Document the one-time staging setup step

The staging invite code `5VC2MCCN` needs to be flagged as test-seed before the harness can run against staging. This is a manual one-liner; document it in a new bring-up doc.

**Files:**
- Create: `docs/soak-harness-bringup.md`

- [ ] **Step 1: Create the bring-up doc**

```bash
cat > docs/soak-harness-bringup.md <<'EOF'
# Soak Harness Bring-Up

One-time setup steps before running `tests/soak` against an environment.

## Prerequisites

- An invite code exists with 16+ available uses
- You have psql access to the target DB (or admin SQL access via some other means)

## 1. Flag the invite code as test-seed

Any account registered with a `marks_as_test=TRUE` invite code gets
`users_v2.is_test_account=TRUE`, which keeps it out of real-user stats.

### Staging

Invite code: `5VC2MCCN` (16 uses, provisioned 2026-04-10).

```sql
UPDATE invite_codes SET marks_as_test = TRUE WHERE code = '5VC2MCCN';
SELECT code, max_uses, use_count, marks_as_test FROM invite_codes WHERE code = '5VC2MCCN';
```

Expected: `marks_as_test | t`.

### Local dev

The dev DB already has a `SOAKTEST` invite created during Task 4 of
the implementation plan. If you wiped the DB since, recreate it:

```sql
INSERT INTO invite_codes (code, created_by, expires_at, max_uses, is_active, marks_as_test)
SELECT 'SOAKTEST', id, NOW() + INTERVAL '10 years', 100, TRUE, TRUE
FROM users_v2 WHERE role = 'admin' LIMIT 1
ON CONFLICT (code) DO UPDATE SET marks_as_test = TRUE;
```

## 2. Run the harness

```bash
cd tests/soak
npm install
npm run seed                                  # first run only, populates .env.stresstest
TEST_URL=http://localhost:8000 npm run smoke  # 30s end-to-end check
```

For staging:

```bash
TEST_URL=https://staging.adlee.work npm run soak -- --scenario=populate
```

See `tests/soak/README.md` for the full flag reference.
EOF
```

- [ ] **Step 2: Commit**

```bash
git add docs/soak-harness-bringup.md
git commit -m "$(cat <<'EOF'
docs: soak harness bring-up steps

Documents the one-time UPDATE invite_codes SET marks_as_test = TRUE
step required before running tests/soak against each environment,
plus the local dev SOAKTEST invite recreation SQL.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
EOF
)"
```

---

## Phase 2 — Harness scaffolding

### Task 9: Create the `tests/soak/` package skeleton

Bare minimum to get `tsx` running against an empty entry point. No behavior yet.

**Files:**
- Create: `tests/soak/package.json`
- Create: `tests/soak/tsconfig.json`
- Create: `tests/soak/.gitignore`
- Create: `tests/soak/.env.stresstest.example`
- Create: `tests/soak/README.md` (stub)
- Create: `tests/soak/runner.ts` (stub — prints "hello")

- [ ] **Step 1: Create `tests/soak/package.json`**

```json
{
  "name": "golf-soak",
  "version": "0.1.0",
  "private": true,
  "description": "Multiplayer soak & UX test harness for Golf Card Game",
  "scripts": {
    "soak": "tsx runner.ts",
    "soak:populate": "tsx runner.ts --scenario=populate",
    "soak:stress": "tsx runner.ts --scenario=stress",
    "seed": "tsx scripts/seed-accounts.ts",
    "smoke": "bash scripts/smoke.sh",
    "test": "vitest run"
  },
  "dependencies": {
    "playwright-core": "^1.40.0",
    "ws": "^8.16.0"
  },
  "devDependencies": {
    "tsx": "^4.7.0",
    "@types/ws": "^8.5.0",
    "@types/node": "^20.10.0",
    "typescript": "^5.3.0",
    "vitest": "^1.2.0"
  }
}
```

- [ ] **Step 2: Create `tests/soak/tsconfig.json`**

```json
{
  "compilerOptions": {
    "target": "ES2022",
    "module": "commonjs",
    "moduleResolution": "node",
    "strict": true,
    "esModuleInterop": true,
    "skipLibCheck": true,
    "forceConsistentCasingInFileNames": true,
    "resolveJsonModule": true,
    "declaration": false,
    "sourceMap": true,
    "outDir": "./dist",
    "rootDir": ".",
    "baseUrl": ".",
    "lib": ["ES2022", "DOM"],
    "paths": {
      "@soak/*": ["./*"],
      "@bot/*": ["../e2e/bot/*"]
    }
  },
  "include": ["**/*.ts"],
  "exclude": ["node_modules", "dist", "artifacts"]
}
```

- [ ] **Step 3: Create `tests/soak/.gitignore`**

```
node_modules/
dist/
artifacts/
.env.stresstest
*.log
```

- [ ] **Step 4: Create `tests/soak/.env.stresstest.example`**

```
# Soak harness account cache.
# This file is AUTO-GENERATED on first run; do not edit by hand.
# Format: SOAK_ACCOUNT_NN=username:password:token
#
# Example (delete before first real run):
# SOAK_ACCOUNT_00=soak_00_a7bx:<generated-password>:<jwt-token>
```

- [ ] **Step 5: Create `tests/soak/README.md` (stub — expanded in Task 31)**

```markdown
# Golf Soak & UX Test Harness

Runs 16 authenticated browser sessions across 4 rooms to populate
staging scoreboards and stress-test multiplayer stability.

**Spec:** `docs/superpowers/specs/2026-04-10-multiplayer-soak-test-design.md`
**Bring-up:** `docs/soak-harness-bringup.md`

## Quick start

```bash
npm install
npm run seed                    # first run only
TEST_URL=http://localhost:8000 npm run smoke
```

Full documentation arrives with Task 31.
```

- [ ] **Step 6: Create `tests/soak/runner.ts` as a placeholder**

```typescript
#!/usr/bin/env tsx
/**
 * Golf Soak Harness — entry point.
 *
 * Placeholder. Full runner lands in Task 17.
 */

async function main(): Promise<void> {
  console.log('golf-soak runner (placeholder)');
  console.log('Full implementation lands in Task 17 of the plan.');
}

main().catch((err) => {
  console.error(err);
  process.exit(1);
});
```

- [ ] **Step 7: Install deps and verify runner executes**

```bash
cd tests/soak
npm install
npx tsx runner.ts
```

Expected output:

```
golf-soak runner (placeholder)
Full implementation lands in Task 17 of the plan.
```

- [ ] **Step 8: Commit**

```bash
git add tests/soak/package.json tests/soak/package-lock.json tests/soak/tsconfig.json tests/soak/.gitignore tests/soak/.env.stresstest.example tests/soak/README.md tests/soak/runner.ts
git commit -m "$(cat <<'EOF'
feat(soak): scaffold tests/soak package

Placeholder runner, tsconfig with @bot alias to tests/e2e/bot,
gitignored .env.stresstest + artifacts. Real behavior follows
in Task 10 onward.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
EOF
)"
```

---

### Task 10: Core types and `Deferred` helper

Pure TypeScript with Vitest tests. No browser, no network. Establishes the type surface the rest of the harness will target.

**Files:**
- Create: `tests/soak/core/types.ts`
- Create: `tests/soak/core/deferred.ts`
- Create: `tests/soak/tests/deferred.test.ts`

- [ ] **Step 1: Write the failing test for `Deferred`**

Create `tests/soak/tests/deferred.test.ts`:

```typescript
import { describe, it, expect } from 'vitest';
import { deferred } from '../core/deferred';

describe('deferred', () => {
  it('resolves with the given value', async () => {
    const d = deferred<string>();
    d.resolve('hello');
    await expect(d.promise).resolves.toBe('hello');
  });

  it('rejects with the given error', async () => {
    const d = deferred<string>();
    const err = new Error('boom');
    d.reject(err);
    await expect(d.promise).rejects.toBe(err);
  });

  it('ignores second resolve calls', async () => {
    const d = deferred<number>();
    d.resolve(1);
    d.resolve(2);
    await expect(d.promise).resolves.toBe(1);
  });
});
```

- [ ] **Step 2: Run the test to verify it fails**

```bash
cd tests/soak
npx vitest run tests/deferred.test.ts
```

Expected: FAIL — module `../core/deferred` does not exist.

- [ ] **Step 3: Implement `deferred`**

Create `tests/soak/core/deferred.ts`:

```typescript
/**
 * Promise deferred primitive — lets external code resolve or reject
 * a promise. Used by RoomCoordinator for host→joiners handoff.
 */

export interface Deferred<T> {
  promise: Promise<T>;
  resolve(value: T): void;
  reject(error: unknown): void;
}

export function deferred<T>(): Deferred<T> {
  let resolve!: (value: T) => void;
  let reject!: (error: unknown) => void;
  const promise = new Promise<T>((res, rej) => {
    resolve = res;
    reject = rej;
  });
  return { promise, resolve, reject };
}
```

- [ ] **Step 4: Run tests to verify they pass**

```bash
npx vitest run tests/deferred.test.ts
```

Expected: 3 passed.

- [ ] **Step 5: Create `core/types.ts` with the scenario interfaces**

```typescript
/**
 * Core type definitions for the soak harness.
 *
 * Contracts here are consumed by runner.ts, SessionPool, scenarios,
 * and the dashboard. Keep this file small and stable.
 */

import type { BrowserContext, Page } from 'playwright-core';
import type { GolfBot } from '../../e2e/bot/golf-bot';

// =============================================================================
// Accounts & sessions
// =============================================================================

export interface Account {
  /** Stable key used in logs, e.g. "soak_00". */
  key: string;
  username: string;
  password: string;
  /** JWT returned from /api/auth/login, may be refreshed by SessionPool. */
  token: string;
}

export interface Session {
  account: Account;
  context: BrowserContext;
  page: Page;
  bot: GolfBot;
  /** Convenience mirror of account.key. */
  key: string;
}

// =============================================================================
// Scenarios
// =============================================================================

export interface ScenarioNeeds {
  /** Total number of authenticated sessions the scenario requires. */
  accounts: number;
  /** How many rooms to partition sessions into (default: 1). */
  rooms?: number;
  /** CPUs to add per room (default: 0). */
  cpusPerRoom?: number;
}

/** Free-form per-scenario config merged with CLI flags. */
export type ScenarioConfig = Record<string, unknown>;

export interface ScenarioError {
  room: string;
  reason: string;
  detail?: string;
  timestamp: number;
}

export interface ScenarioResult {
  gamesCompleted: number;
  errors: ScenarioError[];
  durationMs: number;
  customMetrics?: Record<string, number>;
}

export interface ScenarioContext {
  /** Merged config: CLI flags → env → scenario defaults → runner defaults. */
  config: ScenarioConfig;
  /** Pre-authenticated sessions; ordered. */
  sessions: Session[];
  coordinator: RoomCoordinatorApi;
  dashboard: DashboardReporter;
  logger: Logger;
  signal: AbortSignal;
  /** Reset the per-room watchdog. Call at each progress point. */
  heartbeat(roomId: string): void;
}

export interface Scenario {
  name: string;
  description: string;
  defaultConfig: ScenarioConfig;
  needs: ScenarioNeeds;
  run(ctx: ScenarioContext): Promise<ScenarioResult>;
}

// =============================================================================
// Room coordination
// =============================================================================

export interface RoomCoordinatorApi {
  announce(roomId: string, code: string): void;
  await(roomId: string, timeoutMs?: number): Promise<string>;
}

// =============================================================================
// Dashboard reporter
// =============================================================================

export interface RoomState {
  phase?: string;
  currentPlayer?: string;
  hole?: number;
  totalHoles?: number;
  game?: number;
  totalGames?: number;
  moves?: number;
  players?: Array<{ key: string; score: number | null; isActive: boolean }>;
  message?: string;
}

export interface DashboardReporter {
  update(roomId: string, state: Partial<RoomState>): void;
  log(level: 'info' | 'warn' | 'error', msg: string, meta?: object): void;
  incrementMetric(name: string, by?: number): void;
}

// =============================================================================
// Logger
// =============================================================================

export type LogLevel = 'debug' | 'info' | 'warn' | 'error';

export interface Logger {
  debug(msg: string, meta?: object): void;
  info(msg: string, meta?: object): void;
  warn(msg: string, meta?: object): void;
  error(msg: string, meta?: object): void;
  child(meta: object): Logger;
}
```

- [ ] **Step 6: Verify tsx still parses the runner**

```bash
cd tests/soak
npx tsx runner.ts
```

Expected: still prints the placeholder output; no TypeScript errors from the new `core/` files (they're not imported yet).

- [ ] **Step 7: Commit**

```bash
git add tests/soak/core/deferred.ts tests/soak/core/types.ts tests/soak/tests/deferred.test.ts
git commit -m "$(cat <<'EOF'
feat(soak): core types + Deferred primitive

Establishes the Scenario/Session/Logger/DashboardReporter contracts
the rest of the harness builds on. Deferred is the building block
for RoomCoordinator's host→joiners handoff.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
EOF
)"
```

---

### Task 11: RoomCoordinator with tests

Tiny abstraction over `Deferred` keyed by room ID, with a timeout on `await`.

**Files:**
- Create: `tests/soak/core/room-coordinator.ts`
- Create: `tests/soak/tests/room-coordinator.test.ts`

- [ ] **Step 1: Write failing tests**

```typescript
// tests/soak/tests/room-coordinator.test.ts
import { describe, it, expect } from 'vitest';
import { RoomCoordinator } from '../core/room-coordinator';

describe('RoomCoordinator', () => {
  it('resolves await with the announced code (announce then await)', async () => {
    const rc = new RoomCoordinator();
    rc.announce('room-1', 'ABCD');
    await expect(rc.await('room-1')).resolves.toBe('ABCD');
  });

  it('resolves await with the announced code (await then announce)', async () => {
    const rc = new RoomCoordinator();
    const p = rc.await('room-2');
    rc.announce('room-2', 'WXYZ');
    await expect(p).resolves.toBe('WXYZ');
  });

  it('rejects await after timeout if not announced', async () => {
    const rc = new RoomCoordinator();
    await expect(rc.await('room-3', 50)).rejects.toThrow(/timed out/i);
  });

  it('isolates rooms — announcing room-A does not unblock room-B', async () => {
    const rc = new RoomCoordinator();
    const pB = rc.await('room-B', 100);
    rc.announce('room-A', 'A-CODE');
    await expect(pB).rejects.toThrow(/timed out/i);
  });
});
```

- [ ] **Step 2: Run tests to verify they fail**

```bash
npx vitest run tests/room-coordinator.test.ts
```

Expected: FAIL — module not found.

- [ ] **Step 3: Implement `RoomCoordinator`**

```typescript
// tests/soak/core/room-coordinator.ts
import { deferred, Deferred } from './deferred';
import type { RoomCoordinatorApi } from './types';

export class RoomCoordinator implements RoomCoordinatorApi {
  private rooms = new Map<string, Deferred<string>>();

  announce(roomId: string, code: string): void {
    this.getOrCreate(roomId).resolve(code);
  }

  async await(roomId: string, timeoutMs: number = 30_000): Promise<string> {
    const d = this.getOrCreate(roomId);
    let timer: NodeJS.Timeout | undefined;
    const timeout = new Promise<never>((_, reject) => {
      timer = setTimeout(() => {
        reject(new Error(`RoomCoordinator: room "${roomId}" timed out after ${timeoutMs}ms`));
      }, timeoutMs);
    });
    try {
      return await Promise.race([d.promise, timeout]);
    } finally {
      if (timer) clearTimeout(timer);
    }
  }

  private getOrCreate(roomId: string): Deferred<string> {
    let d = this.rooms.get(roomId);
    if (!d) {
      d = deferred<string>();
      this.rooms.set(roomId, d);
    }
    return d;
  }
}
```

- [ ] **Step 4: Verify tests pass**

```bash
npx vitest run tests/room-coordinator.test.ts
```

Expected: 4 passed.

- [ ] **Step 5: Commit**

```bash
git add tests/soak/core/room-coordinator.ts tests/soak/tests/room-coordinator.test.ts
git commit -m "$(cat <<'EOF'
feat(soak): RoomCoordinator with host→joiners handoff

Lazy Deferred per roomId with a timeout on await. Lets concurrent
joiner sessions block until their host announces the room code
without polling or page scraping.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
EOF
)"
```

---

### Task 12: Structured JSONL logger

Single module, no transport, writes to `process.stdout`. Supports child loggers with bound metadata (so scenarios can emit logs with `room` / `game` context without repeating it).

**Files:**
- Create: `tests/soak/core/logger.ts`
- Create: `tests/soak/tests/logger.test.ts`

- [ ] **Step 1: Write failing tests**

```typescript
// tests/soak/tests/logger.test.ts
import { describe, it, expect, beforeEach, vi } from 'vitest';
import { createLogger } from '../core/logger';

describe('logger', () => {
  let writes: string[];
  let write: (s: string) => boolean;

  beforeEach(() => {
    writes = [];
    write = (s: string) => {
      writes.push(s);
      return true;
    };
  });

  it('emits a JSON line per call with level and msg', () => {
    const log = createLogger({ runId: 'r1', write });
    log.info('hello');
    expect(writes).toHaveLength(1);
    const parsed = JSON.parse(writes[0]);
    expect(parsed.level).toBe('info');
    expect(parsed.msg).toBe('hello');
    expect(parsed.runId).toBe('r1');
    expect(parsed.timestamp).toBeTypeOf('string');
  });

  it('merges meta into the log line', () => {
    const log = createLogger({ runId: 'r1', write });
    log.warn('slow', { turnMs: 3000 });
    const parsed = JSON.parse(writes[0]);
    expect(parsed.turnMs).toBe(3000);
    expect(parsed.level).toBe('warn');
  });

  it('child logger inherits parent meta', () => {
    const log = createLogger({ runId: 'r1', write });
    const roomLog = log.child({ room: 'room-1' });
    roomLog.info('game_start');
    const parsed = JSON.parse(writes[0]);
    expect(parsed.room).toBe('room-1');
    expect(parsed.runId).toBe('r1');
  });

  it('respects minimum level', () => {
    const log = createLogger({ runId: 'r1', write, minLevel: 'warn' });
    log.debug('nope');
    log.info('nope');
    log.warn('yes');
    log.error('yes');
    expect(writes).toHaveLength(2);
  });
});
```

- [ ] **Step 2: Run tests to verify they fail**

```bash
npx vitest run tests/logger.test.ts
```

Expected: FAIL — module not found.

- [ ] **Step 3: Implement the logger**

```typescript
// tests/soak/core/logger.ts
import type { Logger, LogLevel } from './types';

const LEVEL_ORDER: Record<LogLevel, number> = {
  debug: 0,
  info: 1,
  warn: 2,
  error: 3,
};

export interface LoggerOptions {
  runId: string;
  minLevel?: LogLevel;
  /** Defaults to process.stdout.write bound to stdout. Override for tests. */
  write?: (line: string) => boolean;
  baseMeta?: Record<string, unknown>;
}

export function createLogger(opts: LoggerOptions): Logger {
  const minLevel = opts.minLevel ?? 'info';
  const write = opts.write ?? ((s: string) => process.stdout.write(s));
  const baseMeta = opts.baseMeta ?? {};

  function emit(level: LogLevel, msg: string, meta?: object): void {
    if (LEVEL_ORDER[level] < LEVEL_ORDER[minLevel]) return;
    const line = JSON.stringify({
      timestamp: new Date().toISOString(),
      level,
      msg,
      runId: opts.runId,
      ...baseMeta,
      ...(meta ?? {}),
    }) + '\n';
    write(line);
  }

  const logger: Logger = {
    debug: (msg, meta) => emit('debug', msg, meta),
    info: (msg, meta) => emit('info', msg, meta),
    warn: (msg, meta) => emit('warn', msg, meta),
    error: (msg, meta) => emit('error', msg, meta),
    child: (meta) =>
      createLogger({
        runId: opts.runId,
        minLevel,
        write,
        baseMeta: { ...baseMeta, ...meta },
      }),
  };

  return logger;
}
```

- [ ] **Step 4: Verify tests pass**

```bash
npx vitest run tests/logger.test.ts
```

Expected: 4 passed.

- [ ] **Step 5: Commit**

```bash
git add tests/soak/core/logger.ts tests/soak/tests/logger.test.ts
git commit -m "$(cat <<'EOF'
feat(soak): structured JSONL logger with child contexts

Single file, no transport, writes one JSON line per call to stdout.
Child loggers inherit parent meta so scenarios can bind room/game
context once and forget about it.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
EOF
)"
```

---

## Phase 3 — SessionPool and seeding

### Task 13: SessionPool with HTTP registration and localStorage warm-start

This is the biggest single module. It owns browser context lifecycle, seeds accounts on cold start, logs in on warm start, and exposes a simple `acquire()` API to scenarios.

**Files:**
- Create: `tests/soak/core/session-pool.ts`

Testing: manual via `scripts/seed-accounts.ts` in Task 14 and the first real runner invocation in Task 17. No Vitest test for this — it's an integration module that needs a real browser.

- [ ] **Step 1: Create `tests/soak/core/session-pool.ts` — imports and types**

```typescript
// tests/soak/core/session-pool.ts
import * as fs from 'fs';
import * as path from 'path';
import {
  Browser,
  BrowserContext,
  chromium,
} from 'playwright-core';
import { GolfBot } from '../../e2e/bot/golf-bot';
import type { Account, Session, Logger } from './types';

export interface SeedOptions {
  /** Full base URL of the target server, e.g. https://staging.adlee.work. */
  targetUrl: string;
  /** Invite code to pass to /api/auth/register. */
  inviteCode: string;
  /** Number of accounts to create. */
  count: number;
}

export interface SessionPoolOptions {
  targetUrl: string;
  inviteCode: string;
  credFile: string;   // absolute path to .env.stresstest
  logger: Logger;
  /** Optional override for the browser to attach contexts to. If absent, SessionPool launches its own. */
  browser?: Browser;
  /** Passed through to context.newContext. Useful for viewport overrides in tests. */
  contextOptions?: Parameters<Browser['newContext']>[0];
}
```

- [ ] **Step 2: Implement cred-file read/write**

Append to `session-pool.ts`:

```typescript
function readCredFile(filePath: string): Account[] | null {
  if (!fs.existsSync(filePath)) return null;
  const content = fs.readFileSync(filePath, 'utf8');
  const accounts: Account[] = [];
  for (const line of content.split('\n')) {
    const trimmed = line.trim();
    if (!trimmed || trimmed.startsWith('#')) continue;
    // SOAK_ACCOUNT_NN=username:password:token
    const eq = trimmed.indexOf('=');
    if (eq === -1) continue;
    const key = trimmed.slice(0, eq);
    const value = trimmed.slice(eq + 1);
    const m = key.match(/^SOAK_ACCOUNT_(\d+)$/);
    if (!m) continue;
    const [username, password, token] = value.split(':');
    if (!username || !password || !token) continue;
    const idx = parseInt(m[1], 10);
    accounts.push({
      key: `soak_${String(idx).padStart(2, '0')}`,
      username,
      password,
      token,
    });
  }
  return accounts.length > 0 ? accounts : null;
}

function writeCredFile(filePath: string, accounts: Account[]): void {
  const lines: string[] = [
    '# Soak harness account cache — auto-generated, do not hand-edit',
    '# Format: SOAK_ACCOUNT_NN=username:password:token',
  ];
  for (const acc of accounts) {
    const idx = parseInt(acc.key.replace('soak_', ''), 10);
    const key = `SOAK_ACCOUNT_${String(idx).padStart(2, '0')}`;
    lines.push(`${key}=${acc.username}:${acc.password}:${acc.token}`);
  }
  fs.writeFileSync(filePath, lines.join('\n') + '\n', { mode: 0o600 });
}
```

- [ ] **Step 3: Implement the HTTP register call**

```typescript
interface RegisterResponse {
  user: { id: string; username: string };
  token: string;
  expires_at: string;
}

async function registerAccount(
  targetUrl: string,
  username: string,
  password: string,
  email: string,
  inviteCode: string,
): Promise<string> {
  const res = await fetch(`${targetUrl}/api/auth/register`, {
    method: 'POST',
    headers: { 'Content-Type': 'application/json' },
    body: JSON.stringify({ username, password, email, invite_code: inviteCode }),
  });
  if (!res.ok) {
    const body = await res.text().catch(() => '<no body>');
    throw new Error(`register failed: ${res.status} ${body}`);
  }
  const data = (await res.json()) as RegisterResponse;
  if (!data.token) {
    throw new Error(`register returned no token: ${JSON.stringify(data)}`);
  }
  return data.token;
}

async function loginAccount(
  targetUrl: string,
  username: string,
  password: string,
): Promise<string> {
  const res = await fetch(`${targetUrl}/api/auth/login`, {
    method: 'POST',
    headers: { 'Content-Type': 'application/json' },
    body: JSON.stringify({ username, password }),
  });
  if (!res.ok) {
    const body = await res.text().catch(() => '<no body>');
    throw new Error(`login failed: ${res.status} ${body}`);
  }
  const data = (await res.json()) as RegisterResponse;
  return data.token;
}

function randomSuffix(): string {
  return Math.random().toString(36).slice(2, 6);
}

function generatePassword(): string {
  // 16 chars: letters + digits + one symbol. Meets 8-char minimum from auth_service.
  // Split across halves so repo secret-scanners don't flag the string as base64
  const lower = 'abcdefghijkm' + 'npqrstuvwxyz'; // pragma: allowlist secret
  const upper = 'ABCDEFGHJKLM' + 'NPQRSTUVWXYZ'; // pragma: allowlist secret
  const digits = '23456789';
  const chars = lower + upper + digits;
  let out = '';
  for (let i = 0; i < 15; i++) {
    out += chars[Math.floor(Math.random() * chars.length)];
  }
  return out + '!';
}
```

- [ ] **Step 4: Implement the `SessionPool` class**

```typescript
export class SessionPool {
  private accounts: Account[] = [];
  private ownedBrowser: Browser | null = null;
  private browser: Browser | null;
  private activeSessions: Session[] = [];

  constructor(private opts: SessionPoolOptions) {
    this.browser = opts.browser ?? null;
  }

  /**
   * Seed `count` accounts via the register endpoint and write them to credFile.
   * Safe to call multiple times — skips accounts already in the file.
   */
  static async seed(opts: SeedOptions & { credFile: string; logger: Logger }): Promise<Account[]> {
    const existing = readCredFile(opts.credFile) ?? [];
    const existingKeys = new Set(existing.map((a) => a.key));
    const created: Account[] = [...existing];

    for (let i = 0; i < opts.count; i++) {
      const key = `soak_${String(i).padStart(2, '0')}`;
      if (existingKeys.has(key)) continue;

      const suffix = randomSuffix();
      const username = `${key}_${suffix}`;
      const password = generatePassword();
      const email = `${key}_${suffix}@soak.test`;

      opts.logger.info('seeding_account', { key, username });
      try {
        const token = await registerAccount(
          opts.targetUrl,
          username,
          password,
          email,
          opts.inviteCode,
        );
        created.push({ key, username, password, token });
        writeCredFile(opts.credFile, created);
      } catch (err) {
        opts.logger.error('seed_failed', {
          key,
          error: err instanceof Error ? err.message : String(err),
        });
        throw err;
      }
    }
    return created;
  }

  /**
   * Load accounts from credFile, auto-seeding if the file is missing.
   */
  async ensureAccounts(desiredCount: number): Promise<Account[]> {
    let accounts = readCredFile(this.opts.credFile);
    if (!accounts || accounts.length < desiredCount) {
      this.opts.logger.warn('cred_file_missing_or_short', {
        found: accounts?.length ?? 0,
        desired: desiredCount,
      });
      accounts = await SessionPool.seed({
        targetUrl: this.opts.targetUrl,
        inviteCode: this.opts.inviteCode,
        count: desiredCount,
        credFile: this.opts.credFile,
        logger: this.opts.logger,
      });
    }
    this.accounts = accounts.slice(0, desiredCount);
    return this.accounts;
  }

  /**
   * Launch the browser if not provided, create N contexts, log each in via
   * localStorage injection (falling back to POST /api/auth/login if the
   * cached token is rejected), and return the live sessions.
   */
  async acquire(count: number): Promise<Session[]> {
    await this.ensureAccounts(count);
    if (!this.browser) {
      this.ownedBrowser = await chromium.launch({ headless: true });
      this.browser = this.ownedBrowser;
    }

    const sessions: Session[] = [];
    for (let i = 0; i < count; i++) {
      const account = this.accounts[i];
      const context = await this.browser.newContext(this.opts.contextOptions);
      await this.injectAuth(context, account);
      const page = await context.newPage();
      await page.goto(this.opts.targetUrl);
      const bot = new GolfBot(page);
      sessions.push({ account, context, page, bot, key: account.key });
    }
    this.activeSessions = sessions;
    return sessions;
  }

  /**
   * Inject the cached JWT into localStorage BEFORE any page loads.
   * Uses addInitScript so the token is present on the first navigation.
   * If the cached token is rejected later, acquire() falls back to login.
   */
  private async injectAuth(context: BrowserContext, account: Account): Promise<void> {
    // Try the cached token first
    try {
      await context.addInitScript(
        ({ token, username }) => {
          window.localStorage.setItem('authToken', token);
          window.localStorage.setItem(
            'authUser',
            JSON.stringify({ id: '', username, role: 'user', email_verified: true }),
          );
        },
        { token: account.token, username: account.username },
      );
    } catch (err) {
      this.opts.logger.warn('inject_auth_failed', {
        account: account.key,
        error: err instanceof Error ? err.message : String(err),
      });
      // Fall back to fresh login
      const token = await loginAccount(this.opts.targetUrl, account.username, account.password);
      account.token = token;
      writeCredFile(this.opts.credFile, this.accounts);
      await context.addInitScript(
        ({ token, username }) => {
          window.localStorage.setItem('authToken', token);
          window.localStorage.setItem(
            'authUser',
            JSON.stringify({ id: '', username, role: 'user', email_verified: true }),
          );
        },
        { token, username: account.username },
      );
    }
  }

  /** Close all active contexts. Safe to call multiple times. */
  async release(): Promise<void> {
    for (const session of this.activeSessions) {
      try {
        await session.context.close();
      } catch {
        // ignore
      }
    }
    this.activeSessions = [];
    if (this.ownedBrowser) {
      try {
        await this.ownedBrowser.close();
      } catch {
        // ignore
      }
      this.ownedBrowser = null;
      this.browser = null;
    }
  }
}
```

- [ ] **Step 5: Syntax-check by invoking tsx**

```bash
cd tests/soak
npx tsx -e "import('./core/session-pool').then(() => console.log('ok'))"
```

Expected: `ok`. No TypeScript errors.

- [ ] **Step 6: Commit**

```bash
git add tests/soak/core/session-pool.ts
git commit -m "$(cat <<'EOF'
feat(soak): SessionPool — seed, login, acquire contexts

Owns 16 BrowserContexts, seeds via POST /api/auth/register with the
invite code on cold start, warm-starts via localStorage injection of
the cached JWT, falls back to POST /api/auth/login if the token is
rejected. Exposes acquire(n) for scenarios.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
EOF
)"
```

---

### Task 14: `seed-accounts.ts` CLI wrapper

Tiny standalone entry point that lets you pre-seed before the first harness run. Reuses `SessionPool.seed`.

**Files:**
- Create: `tests/soak/scripts/seed-accounts.ts`

- [ ] **Step 1: Write the script**

```typescript
#!/usr/bin/env tsx
/**
 * Seed N soak-harness accounts via the register endpoint.
 *
 * Usage:
 *   TEST_URL=http://localhost:8000 \
 *   SOAK_INVITE_CODE=SOAKTEST \
 *     npm run seed -- --count=16
 */

import * as path from 'path';
import { SessionPool } from '../core/session-pool';
import { createLogger } from '../core/logger';

function parseArgs(argv: string[]): { count: number } {
  const result = { count: 16 };
  for (const arg of argv.slice(2)) {
    const m = arg.match(/^--count=(\d+)$/);
    if (m) result.count = parseInt(m[1], 10);
  }
  return result;
}

async function main(): Promise<void> {
  const { count } = parseArgs(process.argv);
  const targetUrl = process.env.TEST_URL ?? 'http://localhost:8000';
  const inviteCode = process.env.SOAK_INVITE_CODE;
  if (!inviteCode) {
    console.error('SOAK_INVITE_CODE env var is required');
    console.error('  Local dev: SOAK_INVITE_CODE=SOAKTEST');
    console.error('  Staging:   SOAK_INVITE_CODE=5VC2MCCN');
    process.exit(2);
  }

  const credFile = path.resolve(__dirname, '..', '.env.stresstest');
  const logger = createLogger({ runId: `seed-${Date.now()}` });

  logger.info('seed_start', { count, targetUrl, credFile });
  try {
    const accounts = await SessionPool.seed({
      targetUrl,
      inviteCode,
      count,
      credFile,
      logger,
    });
    logger.info('seed_complete', { created: accounts.length });
    console.error(`Seeded ${accounts.length} accounts → ${credFile}`);
  } catch (err) {
    logger.error('seed_failed', {
      error: err instanceof Error ? err.message : String(err),
    });
    process.exit(1);
  }
}

main();
```

- [ ] **Step 2: Run it against local dev to verify end-to-end**

With the dev server running and the `SOAKTEST` invite flagged:

```bash
cd tests/soak
TEST_URL=http://localhost:8000 SOAK_INVITE_CODE=SOAKTEST npm run seed -- --count=4
```

Expected:
- Log lines `seeding_account` × 4
- Log line `seed_complete`
- `tests/soak/.env.stresstest` file created with 4 `SOAK_ACCOUNT_NN=...` lines

Verify:

```bash
cat tests/soak/.env.stresstest | head
```

Expected: 4 account lines.

Also verify the accounts got flagged:

```bash
psql -d golfgame -c "SELECT username, is_test_account FROM users_v2 WHERE username LIKE 'soak_%' ORDER BY username;"
```

Expected: 4 rows, all with `is_test_account | t`.

- [ ] **Step 3: Commit**

```bash
git add tests/soak/scripts/seed-accounts.ts
git commit -m "$(cat <<'EOF'
feat(soak): scripts/seed-accounts.ts CLI wrapper

Thin standalone entry for pre-seeding N accounts before the first
harness run. Wraps SessionPool.seed and writes .env.stresstest.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
EOF
)"
```

---

## Phase 4 — First scenario, config, runner (end-to-end milestone)

### Task 15: Shared multiplayer-game helper

Pulls the "run one full game in one room" logic out of the scenarios so `populate` and `stress` share it. Takes a room's sessions and a config, loops until the game ends.

**Files:**
- Create: `tests/soak/scenarios/shared/multiplayer-game.ts`

- [ ] **Step 1: Create the helper module**

```typescript
// tests/soak/scenarios/shared/multiplayer-game.ts
import type { Session, ScenarioContext } from '../../core/types';

export interface MultiplayerGameOptions {
  roomId: string;
  holes: number;
  decks: number;
  cpusPerRoom: number;
  cpuPersonality?: string;
  /** Per-turn think time in [min, max] ms. */
  thinkTimeMs: [number, number];
  /** Max wall-clock time before giving up on the game (ms). */
  maxDurationMs?: number;
}

export interface MultiplayerGameResult {
  completed: boolean;
  turns: number;
  durationMs: number;
  error?: string;
}

function randomInt(min: number, max: number): number {
  return Math.floor(Math.random() * (max - min + 1)) + min;
}

async function sleep(ms: number): Promise<void> {
  return new Promise((resolve) => setTimeout(resolve, ms));
}

/**
 * Host + joiners play one full multiplayer game end to end.
 * The host creates the room, announces the code via the coordinator,
 * joiners wait for the code, the host adds CPUs and starts, everyone
 * loops on isMyTurn/playTurn until round_over or game_over.
 */
export async function runOneMultiplayerGame(
  ctx: ScenarioContext,
  sessions: Session[],
  opts: MultiplayerGameOptions,
): Promise<MultiplayerGameResult> {
  const start = Date.now();
  const [host, ...joiners] = sessions;
  const maxDuration = opts.maxDurationMs ?? 5 * 60_000;

  try {
    // Host creates game
    const code = await host.bot.createGame(host.account.username);
    ctx.coordinator.announce(opts.roomId, code);
    ctx.heartbeat(opts.roomId);
    ctx.dashboard.update(opts.roomId, { phase: 'lobby' });
    ctx.logger.info('room_created', { room: opts.roomId, code });

    // Joiners join concurrently
    await Promise.all(
      joiners.map(async (joiner) => {
        const awaited = await ctx.coordinator.await(opts.roomId);
        await joiner.bot.joinGame(awaited, joiner.account.username);
      }),
    );
    ctx.heartbeat(opts.roomId);

    // Host adds CPUs (if any) and starts
    for (let i = 0; i < opts.cpusPerRoom; i++) {
      await host.bot.addCPU(opts.cpuPersonality);
    }
    await host.bot.startGame({ holes: opts.holes, decks: opts.decks });
    ctx.heartbeat(opts.roomId);
    ctx.dashboard.update(opts.roomId, { phase: 'playing', totalHoles: opts.holes });

    // Concurrent turn loops — one per session
    const turnCounts = new Array(sessions.length).fill(0);

    async function sessionLoop(sessionIdx: number): Promise<void> {
      const session = sessions[sessionIdx];
      while (true) {
        if (ctx.signal.aborted) return;
        if (Date.now() - start > maxDuration) return;

        const phase = await session.bot.getGamePhase();
        if (phase === 'game_over' || phase === 'round_over') return;

        if (await session.bot.isMyTurn()) {
          await session.bot.playTurn();
          turnCounts[sessionIdx]++;
          ctx.heartbeat(opts.roomId);
          ctx.dashboard.update(opts.roomId, {
            currentPlayer: session.account.username,
            moves: turnCounts.reduce((a, b) => a + b, 0),
          });
          const thinkMs = randomInt(opts.thinkTimeMs[0], opts.thinkTimeMs[1]);
          await sleep(thinkMs);
        } else {
          await sleep(200);
        }
      }
    }

    await Promise.all(sessions.map((_, i) => sessionLoop(i)));

    const totalTurns = turnCounts.reduce((a, b) => a + b, 0);
    ctx.dashboard.update(opts.roomId, { phase: 'round_over' });
    return {
      completed: true,
      turns: totalTurns,
      durationMs: Date.now() - start,
    };
  } catch (err) {
    return {
      completed: false,
      turns: 0,
      durationMs: Date.now() - start,
      error: err instanceof Error ? err.message : String(err),
    };
  }
}
```

- [ ] **Step 2: Syntax-check**

```bash
cd tests/soak
npx tsx -e "import('./scenarios/shared/multiplayer-game').then(() => console.log('ok'))"
```

Expected: `ok`.

- [ ] **Step 3: Commit**

```bash
git add tests/soak/scenarios/shared/multiplayer-game.ts
git commit -m "$(cat <<'EOF'
feat(soak): shared runOneMultiplayerGame helper

Encapsulates the host-creates/joiners-join/loop-until-done flow so
populate and stress scenarios don't duplicate it. Honors abort
signal and a max-duration timeout, heartbeats on every turn.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
EOF
)"
```

---

### Task 16: Populate scenario (minimal version)

Partitions sessions into rooms, runs `gamesPerRoom` games per room in parallel, aggregates results.

**Files:**
- Create: `tests/soak/scenarios/populate.ts`
- Create: `tests/soak/scenarios/index.ts`

- [ ] **Step 1: Create `scenarios/populate.ts`**

```typescript
// tests/soak/scenarios/populate.ts
import type {
  Scenario,
  ScenarioContext,
  ScenarioResult,
  ScenarioError,
  Session,
} from '../core/types';
import { runOneMultiplayerGame } from './shared/multiplayer-game';

const CPU_PERSONALITIES = ['Sofia', 'Marcus', 'Kenji', 'Priya'];

interface PopulateConfig {
  gamesPerRoom: number;
  holes: number;
  decks: number;
  rooms: number;
  cpusPerRoom: number;
  thinkTimeMs: [number, number];
  interGamePauseMs: number;
}

function chunk<T>(arr: T[], size: number): T[][] {
  const out: T[][] = [];
  for (let i = 0; i < arr.length; i += size) {
    out.push(arr.slice(i, i + size));
  }
  return out;
}

async function sleep(ms: number): Promise<void> {
  return new Promise((resolve) => setTimeout(resolve, ms));
}

async function runRoom(
  ctx: ScenarioContext,
  cfg: PopulateConfig,
  roomIdx: number,
  sessions: Session[],
): Promise<{ completed: number; errors: ScenarioError[] }> {
  const roomId = `room-${roomIdx}`;
  const cpuPersonality = CPU_PERSONALITIES[roomIdx % CPU_PERSONALITIES.length];
  let completed = 0;
  const errors: ScenarioError[] = [];

  for (let gameNum = 0; gameNum < cfg.gamesPerRoom; gameNum++) {
    if (ctx.signal.aborted) break;
    ctx.dashboard.update(roomId, { game: gameNum + 1, totalGames: cfg.gamesPerRoom });
    ctx.logger.info('game_start', { room: roomId, game: gameNum + 1 });

    const result = await runOneMultiplayerGame(ctx, sessions, {
      roomId,
      holes: cfg.holes,
      decks: cfg.decks,
      cpusPerRoom: cfg.cpusPerRoom,
      cpuPersonality,
      thinkTimeMs: cfg.thinkTimeMs,
    });

    if (result.completed) {
      completed++;
      ctx.logger.info('game_complete', {
        room: roomId,
        game: gameNum + 1,
        turns: result.turns,
        durationMs: result.durationMs,
      });
    } else {
      errors.push({
        room: roomId,
        reason: 'game_failed',
        detail: result.error,
        timestamp: Date.now(),
      });
      ctx.logger.error('game_failed', { room: roomId, game: gameNum + 1, error: result.error });
    }

    if (gameNum < cfg.gamesPerRoom - 1) {
      await sleep(cfg.interGamePauseMs);
    }
  }

  return { completed, errors };
}

const populate: Scenario = {
  name: 'populate',
  description: 'Long multi-round games to populate scoreboards',
  needs: { accounts: 16, rooms: 4, cpusPerRoom: 1 },
  defaultConfig: {
    gamesPerRoom: 10,
    holes: 9,
    decks: 2,
    rooms: 4,
    cpusPerRoom: 1,
    thinkTimeMs: [800, 2200],
    interGamePauseMs: 3000,
  },

  async run(ctx: ScenarioContext): Promise<ScenarioResult> {
    const start = Date.now();
    const cfg = ctx.config as unknown as PopulateConfig;

    const perRoom = Math.floor(ctx.sessions.length / cfg.rooms);
    if (perRoom * cfg.rooms !== ctx.sessions.length) {
      throw new Error(
        `populate: ${ctx.sessions.length} sessions does not divide evenly into ${cfg.rooms} rooms`,
      );
    }
    const roomSessions = chunk(ctx.sessions, perRoom);

    const results = await Promise.allSettled(
      roomSessions.map((sessions, idx) => runRoom(ctx, cfg, idx, sessions)),
    );

    let gamesCompleted = 0;
    const errors: ScenarioError[] = [];
    results.forEach((r, idx) => {
      if (r.status === 'fulfilled') {
        gamesCompleted += r.value.completed;
        errors.push(...r.value.errors);
      } else {
        errors.push({
          room: `room-${idx}`,
          reason: 'room_threw',
          detail: r.reason instanceof Error ? r.reason.message : String(r.reason),
          timestamp: Date.now(),
        });
      }
    });

    return {
      gamesCompleted,
      errors,
      durationMs: Date.now() - start,
    };
  },
};

export default populate;
```

- [ ] **Step 2: Create `scenarios/index.ts` registry**

```typescript
// tests/soak/scenarios/index.ts
import type { Scenario } from '../core/types';
import populate from './populate';

const registry: Record<string, Scenario> = {
  populate,
};

export function getScenario(name: string): Scenario | undefined {
  return registry[name];
}

export function listScenarios(): Scenario[] {
  return Object.values(registry);
}
```

- [ ] **Step 3: Syntax-check**

```bash
cd tests/soak
npx tsx -e "import('./scenarios/index').then((m) => console.log(m.listScenarios().map(s => s.name)))"
```

Expected: `['populate']`.

- [ ] **Step 4: Commit**

```bash
git add tests/soak/scenarios/populate.ts tests/soak/scenarios/index.ts
git commit -m "$(cat <<'EOF'
feat(soak): populate scenario + scenario registry

Partitions sessions into N rooms, runs gamesPerRoom games per room
in parallel via Promise.allSettled so a failure in one room never
unwinds the others. Errors roll up into ScenarioResult.errors.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
EOF
)"
```

---

### Task 17: Config parsing with tests

CLI flags, env vars, scenario defaults, runner defaults — merged in that precedence order.

**Files:**
- Create: `tests/soak/config.ts`
- Create: `tests/soak/tests/config.test.ts`

- [ ] **Step 1: Write failing tests**

```typescript
// tests/soak/tests/config.test.ts
import { describe, it, expect } from 'vitest';
import { parseArgs, mergeConfig } from '../config';

describe('parseArgs', () => {
  it('parses --scenario and numeric flags', () => {
    const r = parseArgs(['--scenario=populate', '--rooms=4', '--games-per-room=10']);
    expect(r.scenario).toBe('populate');
    expect(r.rooms).toBe(4);
    expect(r.gamesPerRoom).toBe(10);
  });

  it('parses watch mode', () => {
    const r = parseArgs(['--scenario=populate', '--watch=none']);
    expect(r.watch).toBe('none');
  });

  it('rejects unknown watch mode', () => {
    expect(() => parseArgs(['--scenario=populate', '--watch=bogus'])).toThrow();
  });

  it('--list sets listOnly', () => {
    const r = parseArgs(['--list']);
    expect(r.listOnly).toBe(true);
  });
});

describe('mergeConfig', () => {
  it('CLI flags override scenario defaults', () => {
    const cfg = mergeConfig(
      { games: 5, holes: 9 },
      {},
      { gamesPerRoom: 20 },
    );
    expect(cfg.gamesPerRoom).toBe(20);
  });

  it('env overrides scenario defaults but not CLI', () => {
    const cfg = mergeConfig(
      { games: 5, holes: 9 },
      { SOAK_HOLES: '3' },
      { holes: 7 },
    );
    expect(cfg.holes).toBe(7);    // CLI wins (7 was from scenario defaults? no — CLI not set here)
    // Correction: CLI not set, so env wins over scenario default
  });

  it('scenario defaults fill in unset values', () => {
    const cfg = mergeConfig(
      { games: 5, holes: 9 },
      {},
      { gamesPerRoom: 3 },
    );
    expect(cfg.games).toBe(5);
    expect(cfg.holes).toBe(9);
    expect(cfg.gamesPerRoom).toBe(3);
  });
});
```

Note: the middle test has a correction inline — re-read and fix so the assertion matches precedence "CLI > env > defaults". Correct version:

```typescript
  it('env overrides scenario defaults but CLI overrides env', () => {
    const cfg = mergeConfig(
      { holes: 5 },                 // CLI
      { SOAK_HOLES: '3' },          // env
      { holes: 9 },                 // defaults
    );
    expect(cfg.holes).toBe(5);      // CLI wins
  });
```

Replace the second `it(...)` block above with this corrected version before running.

- [ ] **Step 2: Run tests to verify they fail**

```bash
npx vitest run tests/config.test.ts
```

Expected: FAIL — module not found.

- [ ] **Step 3: Implement `config.ts`**

```typescript
// tests/soak/config.ts

export type WatchMode = 'none' | 'dashboard' | 'tiled';

export interface CliArgs {
  scenario?: string;
  accounts?: number;
  rooms?: number;
  cpusPerRoom?: number;
  gamesPerRoom?: number;
  holes?: number;
  watch?: WatchMode;
  dashboardPort?: number;
  target?: string;
  runId?: string;
  dryRun?: boolean;
  listOnly?: boolean;
}

const VALID_WATCH: WatchMode[] = ['none', 'dashboard', 'tiled'];

function parseInt10(s: string, name: string): number {
  const n = parseInt(s, 10);
  if (Number.isNaN(n)) throw new Error(`Invalid integer for ${name}: ${s}`);
  return n;
}

export function parseArgs(argv: string[]): CliArgs {
  const out: CliArgs = {};
  for (const arg of argv) {
    if (arg === '--list') {
      out.listOnly = true;
      continue;
    }
    if (arg === '--dry-run') {
      out.dryRun = true;
      continue;
    }
    const m = arg.match(/^--([a-z][a-z0-9-]*)=(.*)$/);
    if (!m) continue;
    const [, key, value] = m;
    switch (key) {
      case 'scenario':
        out.scenario = value;
        break;
      case 'accounts':
        out.accounts = parseInt10(value, '--accounts');
        break;
      case 'rooms':
        out.rooms = parseInt10(value, '--rooms');
        break;
      case 'cpus-per-room':
        out.cpusPerRoom = parseInt10(value, '--cpus-per-room');
        break;
      case 'games-per-room':
        out.gamesPerRoom = parseInt10(value, '--games-per-room');
        break;
      case 'holes':
        out.holes = parseInt10(value, '--holes');
        break;
      case 'watch':
        if (!VALID_WATCH.includes(value as WatchMode)) {
          throw new Error(`Invalid --watch value: ${value} (expected ${VALID_WATCH.join('|')})`);
        }
        out.watch = value as WatchMode;
        break;
      case 'dashboard-port':
        out.dashboardPort = parseInt10(value, '--dashboard-port');
        break;
      case 'target':
        out.target = value;
        break;
      case 'run-id':
        out.runId = value;
        break;
      default:
        // Unknown flag — ignore so scenario-specific flags can be added later
        break;
    }
  }
  return out;
}

/**
 * Merge in order: scenarioDefaults → env → cli (later wins).
 */
export function mergeConfig(
  cli: Record<string, unknown>,
  env: Record<string, string | undefined>,
  defaults: Record<string, unknown>,
): Record<string, unknown> {
  const merged: Record<string, unknown> = { ...defaults };

  // Env overlay — SOAK_UPPER_SNAKE → lowerCamel in cli space.
  const envMap: Record<string, string> = {
    SOAK_HOLES: 'holes',
    SOAK_ROOMS: 'rooms',
    SOAK_ACCOUNTS: 'accounts',
    SOAK_CPUS_PER_ROOM: 'cpusPerRoom',
    SOAK_GAMES_PER_ROOM: 'gamesPerRoom',
    SOAK_WATCH: 'watch',
    SOAK_DASHBOARD_PORT: 'dashboardPort',
  };
  for (const [envKey, cfgKey] of Object.entries(envMap)) {
    const v = env[envKey];
    if (v !== undefined) {
      // Heuristic: numeric keys
      if (/^(holes|rooms|accounts|cpusPerRoom|gamesPerRoom|dashboardPort)$/.test(cfgKey)) {
        merged[cfgKey] = parseInt(v, 10);
      } else {
        merged[cfgKey] = v;
      }
    }
  }

  // CLI overlay — wins over env and defaults.
  for (const [k, v] of Object.entries(cli)) {
    if (v !== undefined) merged[k] = v;
  }

  return merged;
}
```

- [ ] **Step 4: Fix the failing middle test as noted in Step 1**

Edit `tests/soak/tests/config.test.ts` and replace the second `it(...)` block inside `describe('mergeConfig')` with the corrected version provided in Step 1.

- [ ] **Step 5: Run tests to verify they pass**

```bash
npx vitest run tests/config.test.ts
```

Expected: all passing.

- [ ] **Step 6: Commit**

```bash
git add tests/soak/config.ts tests/soak/tests/config.test.ts
git commit -m "$(cat <<'EOF'
feat(soak): CLI parsing + config precedence

parseArgs pulls --scenario/--rooms/--watch/etc from argv, mergeConfig
layers scenarioDefaults → env → CLI so CLI flags always win. Unit
tested.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
EOF
)"
```

---

### Task 18: `runner.ts` entry point — first end-to-end milestone

Replaces the placeholder runner with the real thing: parse args, build dependencies, load scenario, acquire sessions, run scenario, clean up, print summary. Supports `--watch=none` only at this stage.

**Files:**
- Modify: `tests/soak/runner.ts` (replace placeholder)

- [ ] **Step 1: Rewrite `runner.ts`**

```typescript
#!/usr/bin/env tsx
/**
 * Golf Soak Harness — entry point.
 *
 * Usage:
 *   TEST_URL=http://localhost:8000 \
 *   SOAK_INVITE_CODE=SOAKTEST \
 *     npm run soak -- --scenario=populate --rooms=1 --accounts=2 \
 *       --cpus-per-room=0 --games-per-room=1 --holes=1 --watch=none
 */

import * as path from 'path';
import { parseArgs, mergeConfig, CliArgs } from './config';
import { createLogger } from './core/logger';
import { SessionPool } from './core/session-pool';
import { RoomCoordinator } from './core/room-coordinator';
import { getScenario, listScenarios } from './scenarios';
import type { DashboardReporter, ScenarioContext } from './core/types';

function noopDashboard(): DashboardReporter {
  return {
    update: () => {},
    log: () => {},
    incrementMetric: () => {},
  };
}

function printScenarioList(): void {
  console.log('Available scenarios:');
  for (const s of listScenarios()) {
    console.log(`  ${s.name.padEnd(12)} ${s.description}`);
    console.log(`    needs: accounts=${s.needs.accounts}, rooms=${s.needs.rooms ?? 1}, cpus=${s.needs.cpusPerRoom ?? 0}`);
  }
}

async function main(): Promise<void> {
  const cli: CliArgs = parseArgs(process.argv.slice(2));

  if (cli.listOnly) {
    printScenarioList();
    return;
  }

  if (!cli.scenario) {
    console.error('Error: --scenario=<name> is required. Use --list to see scenarios.');
    process.exit(2);
  }

  const scenario = getScenario(cli.scenario);
  if (!scenario) {
    console.error(`Error: unknown scenario "${cli.scenario}". Use --list to see scenarios.`);
    process.exit(2);
  }

  const runId = cli.runId ?? `${cli.scenario}-${new Date().toISOString().replace(/[:.]/g, '-')}`;
  const targetUrl = cli.target ?? process.env.TEST_URL ?? 'http://localhost:8000';
  const inviteCode = process.env.SOAK_INVITE_CODE ?? 'SOAKTEST';
  const watch = cli.watch ?? 'dashboard';

  const logger = createLogger({ runId });
  logger.info('run_start', {
    scenario: scenario.name,
    targetUrl,
    watch,
    cli,
  });

  // Resolve final config
  const config = mergeConfig(
    cli as Record<string, unknown>,
    process.env,
    scenario.defaultConfig,
  );
  // Ensure core knobs exist
  const accounts = Number(config.accounts ?? scenario.needs.accounts);
  const rooms = Number(config.rooms ?? scenario.needs.rooms ?? 1);
  const cpusPerRoom = Number(config.cpusPerRoom ?? scenario.needs.cpusPerRoom ?? 0);
  if (accounts % rooms !== 0) {
    console.error(`Error: --accounts=${accounts} does not divide evenly into --rooms=${rooms}`);
    process.exit(2);
  }
  config.rooms = rooms;
  config.cpusPerRoom = cpusPerRoom;

  if (cli.dryRun) {
    logger.info('dry_run', { config });
    console.log('Dry run OK. Resolved config:');
    console.log(JSON.stringify(config, null, 2));
    return;
  }

  if (watch !== 'none') {
    logger.warn('watch_mode_not_yet_implemented', { watch });
    console.warn(`Watch mode "${watch}" not yet implemented — falling back to "none".`);
  }

  // Build dependencies
  const credFile = path.resolve(__dirname, '.env.stresstest');
  const pool = new SessionPool({
    targetUrl,
    inviteCode,
    credFile,
    logger,
  });
  const coordinator = new RoomCoordinator();
  const dashboard = noopDashboard();
  const abortController = new AbortController();

  const onSignal = (sig: string) => {
    logger.warn('signal_received', { signal: sig });
    abortController.abort();
  };
  process.on('SIGINT', () => onSignal('SIGINT'));
  process.on('SIGTERM', () => onSignal('SIGTERM'));

  let exitCode = 0;
  try {
    const sessions = await pool.acquire(accounts);
    logger.info('sessions_acquired', { count: sessions.length });

    const ctx: ScenarioContext = {
      config,
      sessions,
      coordinator,
      dashboard,
      logger,
      signal: abortController.signal,
      heartbeat: () => {}, // Task 26 wires this up
    };

    const result = await scenario.run(ctx);
    logger.info('run_complete', {
      gamesCompleted: result.gamesCompleted,
      errors: result.errors.length,
      durationMs: result.durationMs,
    });
    console.log(`Games completed: ${result.gamesCompleted}`);
    console.log(`Errors:          ${result.errors.length}`);
    console.log(`Duration:        ${(result.durationMs / 1000).toFixed(1)}s`);
    if (result.errors.length > 0) {
      console.log('Errors:');
      for (const e of result.errors) {
        console.log(`  ${e.room}: ${e.reason}${e.detail ? ' — ' + e.detail : ''}`);
      }
      exitCode = 1;
    }
  } catch (err) {
    logger.error('run_failed', {
      error: err instanceof Error ? err.message : String(err),
      stack: err instanceof Error ? err.stack : undefined,
    });
    exitCode = 1;
  } finally {
    await pool.release();
  }

  if (abortController.signal.aborted && exitCode === 0) exitCode = 2;
  process.exit(exitCode);
}

main().catch((err) => {
  console.error(err);
  process.exit(1);
});
```

- [ ] **Step 2: Run a minimal `--watch=none` smoke against local dev**

Server running, 4 soak accounts already seeded from Task 14:

```bash
cd tests/soak
TEST_URL=http://localhost:8000 SOAK_INVITE_CODE=SOAKTEST npm run soak -- \
  --scenario=populate \
  --accounts=2 \
  --rooms=1 \
  --cpus-per-room=0 \
  --games-per-room=1 \
  --holes=1 \
  --watch=none
```

Expected output (abbreviated):

```
{"timestamp":"...","level":"info","msg":"run_start",...}
{"timestamp":"...","level":"info","msg":"sessions_acquired","count":2}
{"timestamp":"...","level":"info","msg":"game_start","room":"room-0","game":1}
{"timestamp":"...","level":"info","msg":"room_created","code":"XXXX"}
{"timestamp":"...","level":"info","msg":"game_complete","room":"room-0","turns":...}
{"timestamp":"...","level":"info","msg":"run_complete","gamesCompleted":1,"errors":0}
Games completed: 1
Errors:          0
Duration:        X.Xs
```

Exit code 0.

This is the first **end-to-end milestone**. Stop here if debugging is needed — fix issues before moving on.

- [ ] **Step 3: Commit**

```bash
git add tests/soak/runner.ts
git commit -m "$(cat <<'EOF'
feat(soak): runner.ts end-to-end with --watch=none

First full end-to-end milestone: parses CLI, builds SessionPool +
RoomCoordinator, loads a scenario by name, runs it, reports results,
cleans up. Watch modes other than "none" log a warning and fall back
until Tasks 19-24 implement them.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
EOF
)"
```

---

## Phase 5 — Dashboard status grid

### Task 19: Dashboard HTTP + WS server

Vanilla node `http` + `ws`. Serves one static HTML page, accepts WS connections, broadcasts room-state updates.

**Files:**
- Create: `tests/soak/dashboard/server.ts`

- [ ] **Step 1: Implement `dashboard/server.ts`**

```typescript
// tests/soak/dashboard/server.ts
import * as http from 'http';
import * as fs from 'fs';
import * as path from 'path';
import { WebSocketServer, WebSocket } from 'ws';
import type { DashboardReporter, Logger, RoomState } from '../core/types';

export type DashboardIncoming =
  | { type: 'start_stream'; sessionKey: string }
  | { type: 'stop_stream'; sessionKey: string };

export type DashboardOutgoing =
  | { type: 'room_state'; roomId: string; state: Partial<RoomState> }
  | { type: 'log'; level: string; msg: string; meta?: object; timestamp: number }
  | { type: 'metric'; name: string; value: number }
  | { type: 'frame'; sessionKey: string; jpegBase64: string };

export interface DashboardHandlers {
  onStartStream?(sessionKey: string): void;
  onStopStream?(sessionKey: string): void;
  onDisconnect?(): void;
}

export class DashboardServer {
  private httpServer!: http.Server;
  private wsServer!: WebSocketServer;
  private clients = new Set<WebSocket>();
  private metrics: Record<string, number> = {};
  private roomStates: Record<string, Partial<RoomState>> = {};

  constructor(
    private port: number,
    private logger: Logger,
    private handlers: DashboardHandlers = {},
  ) {}

  async start(): Promise<void> {
    const htmlPath = path.resolve(__dirname, 'index.html');
    const cssPath = path.resolve(__dirname, 'dashboard.css');
    const jsPath = path.resolve(__dirname, 'dashboard.js');

    this.httpServer = http.createServer((req, res) => {
      const url = req.url ?? '/';
      if (url === '/' || url === '/index.html') {
        res.writeHead(200, { 'Content-Type': 'text/html; charset=utf-8' });
        fs.createReadStream(htmlPath).pipe(res);
      } else if (url === '/dashboard.css') {
        res.writeHead(200, { 'Content-Type': 'text/css' });
        fs.createReadStream(cssPath).pipe(res);
      } else if (url === '/dashboard.js') {
        res.writeHead(200, { 'Content-Type': 'application/javascript' });
        fs.createReadStream(jsPath).pipe(res);
      } else {
        res.writeHead(404);
        res.end('not found');
      }
    });

    this.wsServer = new WebSocketServer({ server: this.httpServer });
    this.wsServer.on('connection', (ws) => {
      this.clients.add(ws);
      this.logger.info('dashboard_client_connected', { count: this.clients.size });

      // Replay current state to the new client
      for (const [roomId, state] of Object.entries(this.roomStates)) {
        ws.send(JSON.stringify({ type: 'room_state', roomId, state } as DashboardOutgoing));
      }
      for (const [name, value] of Object.entries(this.metrics)) {
        ws.send(JSON.stringify({ type: 'metric', name, value } as DashboardOutgoing));
      }

      ws.on('message', (data) => {
        try {
          const parsed = JSON.parse(data.toString()) as DashboardIncoming;
          if (parsed.type === 'start_stream' && this.handlers.onStartStream) {
            this.handlers.onStartStream(parsed.sessionKey);
          } else if (parsed.type === 'stop_stream' && this.handlers.onStopStream) {
            this.handlers.onStopStream(parsed.sessionKey);
          }
        } catch (err) {
          this.logger.warn('dashboard_ws_parse_error', {
            error: err instanceof Error ? err.message : String(err),
          });
        }
      });

      ws.on('close', () => {
        this.clients.delete(ws);
        this.logger.info('dashboard_client_disconnected', { count: this.clients.size });
        if (this.clients.size === 0 && this.handlers.onDisconnect) {
          this.handlers.onDisconnect();
        }
      });
    });

    await new Promise<void>((resolve) => {
      this.httpServer.listen(this.port, () => resolve());
    });
    this.logger.info('dashboard_listening', { url: `http://localhost:${this.port}` });
  }

  async stop(): Promise<void> {
    for (const ws of this.clients) {
      try {
        ws.close();
      } catch {
        // ignore
      }
    }
    this.clients.clear();
    await new Promise<void>((resolve) => {
      this.wsServer.close(() => resolve());
    });
    await new Promise<void>((resolve) => {
      this.httpServer.close(() => resolve());
    });
  }

  broadcast(msg: DashboardOutgoing): void {
    const payload = JSON.stringify(msg);
    for (const ws of this.clients) {
      if (ws.readyState === WebSocket.OPEN) {
        ws.send(payload);
      }
    }
  }

  /** Create a DashboardReporter wired to this server. */
  reporter(): DashboardReporter {
    return {
      update: (roomId, state) => {
        this.roomStates[roomId] = { ...this.roomStates[roomId], ...state };
        this.broadcast({ type: 'room_state', roomId, state });
      },
      log: (level, msg, meta) => {
        this.broadcast({ type: 'log', level, msg, meta, timestamp: Date.now() });
      },
      incrementMetric: (name, by = 1) => {
        this.metrics[name] = (this.metrics[name] ?? 0) + by;
        this.broadcast({ type: 'metric', name, value: this.metrics[name] });
      },
    };
  }
}
```

- [ ] **Step 2: Syntax-check**

```bash
cd tests/soak
npx tsx -e "import('./dashboard/server').then(() => console.log('ok'))"
```

Expected: `ok`.

- [ ] **Step 3: Commit**

```bash
git add tests/soak/dashboard/server.ts
git commit -m "$(cat <<'EOF'
feat(soak): DashboardServer — vanilla http + ws

Serves one static HTML page, accepts WS connections, broadcasts
room_state/log/metric messages to all clients. Exposes a
reporter() method that returns a DashboardReporter scenarios can
call without knowing about sockets.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
EOF
)"
```

---

### Task 20: Dashboard HTML/CSS/JS status grid

Single static HTML page + stylesheet + client script. Renders the 2×2 room grid, subscribes to WS, updates tiles on each message.

**Files:**
- Create: `tests/soak/dashboard/index.html`
- Create: `tests/soak/dashboard/dashboard.css`
- Create: `tests/soak/dashboard/dashboard.js`

- [ ] **Step 1: Create `dashboard/index.html`**

```html
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Golf Soak Dashboard</title>
<link rel="stylesheet" href="/dashboard.css">
</head>
<body>
<header class="dash-header">
  <h1>⛳ Golf Soak Dashboard</h1>
  <div class="meta">
    <span id="run-id">run —</span>
    <span id="elapsed">00:00:00</span>
  </div>
</header>

<div class="meta-bar">
  <div class="stat"><span class="label">Games</span><span id="metric-games">0</span></div>
  <div class="stat"><span class="label">Moves</span><span id="metric-moves">0</span></div>
  <div class="stat"><span class="label">Errors</span><span id="metric-errors">0</span></div>
  <div class="stat"><span class="label">WS</span><span id="ws-status">connecting</span></div>
</div>

<div class="rooms" id="rooms">
  <!-- Room tiles injected by dashboard.js -->
</div>

<section class="log">
  <div class="log-header">Activity Log</div>
  <ul id="log-list"></ul>
</section>

<!-- Modal for focused live video (Task 23) -->
<div id="video-modal" class="video-modal hidden">
  <div class="video-modal-content">
    <div class="video-modal-header">
      <span id="video-modal-title">Watching —</span>
      <button id="video-modal-close">Close</button>
    </div>
    <img id="video-frame" alt="Live screencast" />
  </div>
</div>

<script src="/dashboard.js"></script>
</body>
</html>
```

- [ ] **Step 2: Create `dashboard/dashboard.css`**

```css
:root {
  --bg: #0a0e16;
  --panel: #0e1420;
  --border: #1a2230;
  --text: #c8d4e4;
  --accent: #7fbaff;
  --good: #6fd08f;
  --warn: #ffb84d;
  --err: #ff5c6c;
  --muted: #556577;
}

* { box-sizing: border-box; }

body {
  margin: 0;
  font-family: -apple-system, system-ui, 'SF Mono', Consolas, monospace;
  background: var(--bg);
  color: var(--text);
}

.dash-header {
  display: flex;
  justify-content: space-between;
  align-items: center;
  padding: 12px 20px;
  background: linear-gradient(135deg, #0f1823, #0a1018);
  border-bottom: 1px solid var(--border);
}
.dash-header h1 { margin: 0; font-size: 16px; color: var(--accent); }
.dash-header .meta { font-size: 11px; color: var(--muted); }
.dash-header .meta span + span { margin-left: 12px; }

.meta-bar {
  display: flex;
  gap: 24px;
  padding: 10px 20px;
  background: #0c131d;
  border-bottom: 1px solid var(--border);
  font-size: 12px;
}
.meta-bar .stat .label { color: var(--muted); margin-right: 6px; }
.meta-bar .stat span:last-child { color: #fff; font-weight: 600; }

.rooms {
  display: grid;
  grid-template-columns: 1fr 1fr;
  gap: 1px;
  background: var(--border);
}
.room {
  background: var(--panel);
  padding: 14px 18px;
  min-height: 180px;
}
.room-title {
  display: flex;
  justify-content: space-between;
  align-items: center;
  margin-bottom: 10px;
}
.room-title .name { font-size: 13px; color: var(--accent); font-weight: 600; }
.room-title .phase {
  font-size: 10px;
  padding: 2px 8px;
  border-radius: 10px;
  background: #1a3a2a;
  color: var(--good);
}
.room-title .phase.lobby { background: #3a2a1a; color: var(--warn); }
.room-title .phase.err { background: #3a1a1a; color: var(--err); }

.players {
  display: grid;
  grid-template-columns: repeat(2, 1fr);
  gap: 4px;
  font-size: 11px;
  margin-bottom: 8px;
}
.player {
  display: flex;
  justify-content: space-between;
  padding: 4px 8px;
  background: #0a0f18;
  border-radius: 3px;
  cursor: pointer;
  border: 1px solid transparent;
}
.player:hover { border-color: var(--accent); }
.player.active {
  background: #1a2a40;
  border-left: 2px solid var(--accent);
}
.player .score { color: var(--muted); }

.progress-bar {
  height: 4px;
  background: var(--border);
  border-radius: 2px;
  overflow: hidden;
  margin-top: 6px;
}
.progress-fill {
  height: 100%;
  background: linear-gradient(90deg, var(--accent), var(--good));
  transition: width 0.3s;
}
.room-meta {
  font-size: 10px;
  color: var(--muted);
  display: flex;
  gap: 12px;
  margin-top: 6px;
}

.log {
  border-top: 1px solid var(--border);
  background: #080c13;
  max-height: 160px;
  overflow-y: auto;
}
.log .log-header {
  padding: 6px 20px;
  font-size: 10px;
  text-transform: uppercase;
  color: var(--muted);
  border-bottom: 1px solid var(--border);
}
.log ul { list-style: none; margin: 0; padding: 4px 20px; font-size: 10px; }
.log li { line-height: 1.5; font-family: monospace; color: var(--muted); }
.log li.warn { color: var(--warn); }
.log li.error { color: var(--err); }

.video-modal {
  position: fixed;
  inset: 0;
  background: rgba(0, 0, 0, 0.85);
  display: flex;
  align-items: center;
  justify-content: center;
  z-index: 100;
}
.video-modal.hidden { display: none; }
.video-modal-content {
  background: var(--panel);
  border: 1px solid var(--border);
  border-radius: 6px;
  padding: 16px;
  max-width: 90vw;
  max-height: 90vh;
}
.video-modal-header {
  display: flex;
  justify-content: space-between;
  align-items: center;
  margin-bottom: 12px;
  color: var(--accent);
  font-size: 13px;
}
.video-modal-header button {
  background: var(--border);
  color: var(--text);
  border: none;
  padding: 4px 12px;
  border-radius: 3px;
  cursor: pointer;
}
#video-frame {
  display: block;
  max-width: 100%;
  max-height: 70vh;
  border: 1px solid var(--border);
}
```

- [ ] **Step 3: Create `dashboard/dashboard.js`**

```javascript
// tests/soak/dashboard/dashboard.js
(() => {
  const ws = new WebSocket(`ws://${location.host}`);
  const roomsEl = document.getElementById('rooms');
  const logEl = document.getElementById('log-list');
  const wsStatusEl = document.getElementById('ws-status');
  const metricGames = document.getElementById('metric-games');
  const metricMoves = document.getElementById('metric-moves');
  const metricErrors = document.getElementById('metric-errors');
  const elapsedEl = document.getElementById('elapsed');

  const roomTiles = new Map();
  const startTime = Date.now();
  let currentWatchedKey = null;

  // Video modal
  const videoModal = document.getElementById('video-modal');
  const videoFrame = document.getElementById('video-frame');
  const videoTitle = document.getElementById('video-modal-title');
  const videoClose = document.getElementById('video-modal-close');

  function fmtElapsed(ms) {
    const s = Math.floor(ms / 1000);
    const h = Math.floor(s / 3600);
    const m = Math.floor((s % 3600) / 60);
    const sec = s % 60;
    return `${String(h).padStart(2, '0')}:${String(m).padStart(2, '0')}:${String(sec).padStart(2, '0')}`;
  }
  setInterval(() => {
    elapsedEl.textContent = fmtElapsed(Date.now() - startTime);
  }, 1000);

  function ensureRoomTile(roomId) {
    if (roomTiles.has(roomId)) return roomTiles.get(roomId);
    const tile = document.createElement('div');
    tile.className = 'room';
    tile.innerHTML = `
      <div class="room-title">
        <div class="name">${roomId}</div>
        <div class="phase lobby">waiting</div>
      </div>
      <div class="players"></div>
      <div class="progress-bar"><div class="progress-fill" style="width:0%"></div></div>
      <div class="room-meta">
        <span class="moves">0 moves</span>
        <span class="game">game —</span>
      </div>
    `;
    roomsEl.appendChild(tile);
    roomTiles.set(roomId, tile);
    return tile;
  }

  function renderRoomState(roomId, state) {
    const tile = ensureRoomTile(roomId);
    if (state.phase !== undefined) {
      const phaseEl = tile.querySelector('.phase');
      phaseEl.textContent = state.phase;
      phaseEl.classList.toggle('lobby', state.phase === 'lobby' || state.phase === 'waiting');
      phaseEl.classList.toggle('err', state.phase === 'error');
    }
    if (state.players !== undefined) {
      const playersEl = tile.querySelector('.players');
      playersEl.innerHTML = state.players
        .map(
          (p) => `
            <div class="player ${p.isActive ? 'active' : ''}" data-session="${p.key}">
              <span>${p.isActive ? '▶ ' : ''}${p.key}</span>
              <span class="score">${p.score ?? '—'}</span>
            </div>
          `,
        )
        .join('');
    }
    if (state.hole !== undefined && state.totalHoles !== undefined) {
      const fill = tile.querySelector('.progress-fill');
      const pct = state.totalHoles > 0 ? Math.round((state.hole / state.totalHoles) * 100) : 0;
      fill.style.width = `${pct}%`;
    }
    if (state.moves !== undefined) {
      tile.querySelector('.moves').textContent = `${state.moves} moves`;
    }
    if (state.game !== undefined && state.totalGames !== undefined) {
      tile.querySelector('.game').textContent = `game ${state.game}/${state.totalGames}`;
    }
  }

  function appendLog(level, msg, meta) {
    const li = document.createElement('li');
    li.className = level;
    const ts = new Date().toLocaleTimeString();
    li.textContent = `[${ts}] ${msg} ${meta ? JSON.stringify(meta) : ''}`;
    logEl.insertBefore(li, logEl.firstChild);
    // Cap log length
    while (logEl.children.length > 100) {
      logEl.removeChild(logEl.lastChild);
    }
  }

  function applyMetric(name, value) {
    if (name === 'games_completed') metricGames.textContent = value;
    else if (name === 'moves_total') metricMoves.textContent = value;
    else if (name === 'errors') metricErrors.textContent = value;
  }

  ws.addEventListener('open', () => {
    wsStatusEl.textContent = 'healthy';
    wsStatusEl.style.color = 'var(--good)';
  });
  ws.addEventListener('close', () => {
    wsStatusEl.textContent = 'disconnected';
    wsStatusEl.style.color = 'var(--err)';
  });
  ws.addEventListener('message', (event) => {
    let msg;
    try {
      msg = JSON.parse(event.data);
    } catch {
      return;
    }
    if (msg.type === 'room_state') {
      renderRoomState(msg.roomId, msg.state);
    } else if (msg.type === 'log') {
      appendLog(msg.level, msg.msg, msg.meta);
    } else if (msg.type === 'metric') {
      applyMetric(msg.name, msg.value);
    } else if (msg.type === 'frame') {
      if (msg.sessionKey === currentWatchedKey) {
        videoFrame.src = `data:image/jpeg;base64,${msg.jpegBase64}`;
      }
    }
  });

  // Click-to-watch (wired in Task 23)
  roomsEl.addEventListener('click', (e) => {
    const playerEl = e.target.closest('.player');
    if (!playerEl) return;
    const key = playerEl.dataset.session;
    if (!key) return;
    currentWatchedKey = key;
    videoTitle.textContent = `Watching ${key}`;
    videoModal.classList.remove('hidden');
    ws.send(JSON.stringify({ type: 'start_stream', sessionKey: key }));
  });

  function closeVideo() {
    if (currentWatchedKey) {
      ws.send(JSON.stringify({ type: 'stop_stream', sessionKey: currentWatchedKey }));
    }
    currentWatchedKey = null;
    videoModal.classList.add('hidden');
    videoFrame.src = '';
  }
  videoClose.addEventListener('click', closeVideo);
  document.addEventListener('keydown', (e) => {
    if (e.key === 'Escape') closeVideo();
  });
})();
```

- [ ] **Step 4: Commit**

```bash
git add tests/soak/dashboard/index.html tests/soak/dashboard/dashboard.css tests/soak/dashboard/dashboard.js
git commit -m "$(cat <<'EOF'
feat(soak): dashboard status grid UI

Static HTML page served by DashboardServer. Renders the 2×2 room
grid with progress bars and player tiles, subscribes to WS events,
updates tiles live. Click-to-watch modal is wired but receives
frames once the CDP screencaster ships in Task 22.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
EOF
)"
```

---

### Task 21: Wire `WATCH=dashboard` in runner

Start the dashboard server when `--watch=dashboard`, auto-open the URL in the user's browser, use its `reporter()` as the `ctx.dashboard`.

**Files:**
- Modify: `tests/soak/runner.ts`

- [ ] **Step 1: Import and instantiate DashboardServer in `runner.ts`**

At the top of `runner.ts`, add:

```typescript
import { DashboardServer } from './dashboard/server';
import { spawn } from 'child_process';
```

Replace the block that creates `dashboard` with:

```typescript
  // Build dashboard if requested
  let dashboardServer: DashboardServer | null = null;
  let dashboard: DashboardReporter = noopDashboard();
  if (watch === 'dashboard') {
    const port = Number(config.dashboardPort ?? 7777);
    dashboardServer = new DashboardServer(port, logger, {
      onStartStream: (_key) => {
        logger.info('stream_start_requested', { sessionKey: _key });
        // Wired in Task 22
      },
      onStopStream: (_key) => {
        logger.info('stream_stop_requested', { sessionKey: _key });
      },
    });
    await dashboardServer.start();
    dashboard = dashboardServer.reporter();
    const url = `http://localhost:${port}`;
    console.log(`Dashboard: ${url}`);
    // Best-effort auto-open
    try {
      const opener = process.platform === 'darwin' ? 'open' : process.platform === 'win32' ? 'start' : 'xdg-open';
      spawn(opener, [url], { stdio: 'ignore', detached: true }).unref();
    } catch {
      // If auto-open fails, the URL is already printed
    }
  } else if (watch === 'tiled') {
    logger.warn('tiled_not_yet_implemented');
    console.warn('Watch mode "tiled" not yet implemented (Task 24). Falling back to none.');
  }
```

And in the `finally` block, shut down the server:

```typescript
  } finally {
    await pool.release();
    if (dashboardServer) {
      await dashboardServer.stop();
    }
  }
```

Also remove the earlier `if (watch !== 'none')` warning block — it's replaced by the dispatch above.

- [ ] **Step 2: Run smoke against dev with dashboard**

```bash
TEST_URL=http://localhost:8000 SOAK_INVITE_CODE=SOAKTEST npm run soak -- \
  --scenario=populate \
  --accounts=2 --rooms=1 --cpus-per-room=0 --games-per-room=1 --holes=1 \
  --watch=dashboard
```

Expected:
- `Dashboard: http://localhost:7777` printed
- Browser auto-opens (or you open it manually)
- Page shows the dashboard with `WS: healthy`
- During the game, the `room-0` tile shows `phase: playing`, increments `moves`, updates progress
- After game completes, the runner exits 0 and the dashboard stops

- [ ] **Step 3: Commit**

```bash
git add tests/soak/runner.ts
git commit -m "$(cat <<'EOF'
feat(soak): wire --watch=dashboard in runner

Starts DashboardServer on 7777 (configurable), uses its reporter as
ctx.dashboard, auto-opens the URL. Cleans up on exit.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
EOF
)"
```

---

## Phase 6 — Live video click-to-watch

### Task 22: CDP screencast module

Attach a CDP session to a given page, start screencasting JPEG frames at a fixed rate, forward each frame to a callback, detach on stop.

**Files:**
- Create: `tests/soak/core/screencaster.ts`

- [ ] **Step 1: Implement `core/screencaster.ts`**

```typescript
// tests/soak/core/screencaster.ts
import type { Page, CDPSession } from 'playwright-core';
import type { Logger } from './types';

export interface ScreencastOptions {
  format?: 'jpeg' | 'png';
  quality?: number;
  maxWidth?: number;
  maxHeight?: number;
  everyNthFrame?: number;
}

export type FrameCallback = (jpegBase64: string) => void;

export class Screencaster {
  private sessions = new Map<string, CDPSession>();

  constructor(private logger: Logger) {}

  /**
   * Attach a CDP session to the given page and start forwarding frames.
   * If already streaming, this is a no-op.
   */
  async start(
    sessionKey: string,
    page: Page,
    onFrame: FrameCallback,
    opts: ScreencastOptions = {},
  ): Promise<void> {
    if (this.sessions.has(sessionKey)) {
      this.logger.warn('screencast_already_running', { sessionKey });
      return;
    }
    const client = await page.context().newCDPSession(page);
    this.sessions.set(sessionKey, client);

    client.on('Page.screencastFrame', async (evt: { data: string; sessionId: number }) => {
      try {
        onFrame(evt.data);
        await client.send('Page.screencastFrameAck', { sessionId: evt.sessionId });
      } catch (err) {
        this.logger.warn('screencast_frame_error', {
          sessionKey,
          error: err instanceof Error ? err.message : String(err),
        });
      }
    });

    await client.send('Page.startScreencast', {
      format: opts.format ?? 'jpeg',
      quality: opts.quality ?? 60,
      maxWidth: opts.maxWidth ?? 640,
      maxHeight: opts.maxHeight ?? 360,
      everyNthFrame: opts.everyNthFrame ?? 2,
    });
    this.logger.info('screencast_started', { sessionKey });
  }

  async stop(sessionKey: string): Promise<void> {
    const client = this.sessions.get(sessionKey);
    if (!client) return;
    try {
      await client.send('Page.stopScreencast');
      await client.detach();
    } catch (err) {
      this.logger.warn('screencast_stop_error', {
        sessionKey,
        error: err instanceof Error ? err.message : String(err),
      });
    }
    this.sessions.delete(sessionKey);
    this.logger.info('screencast_stopped', { sessionKey });
  }

  async stopAll(): Promise<void> {
    const keys = Array.from(this.sessions.keys());
    await Promise.all(keys.map((k) => this.stop(k)));
  }
}
```

- [ ] **Step 2: Syntax-check**

```bash
cd tests/soak
npx tsx -e "import('./core/screencaster').then(() => console.log('ok'))"
```

Expected: `ok`.

- [ ] **Step 3: Commit**

```bash
git add tests/soak/core/screencaster.ts
git commit -m "$(cat <<'EOF'
feat(soak): Screencaster — CDP Page.startScreencast wrapper

Attach/detach CDP sessions per Playwright Page, start/stop JPEG
screencasts with configurable quality and frame rate, forward each
frame to a callback. Used by the dashboard for click-to-watch
live video.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
EOF
)"
```

---

### Task 23: Wire screencaster to dashboard click-to-watch

Runner creates a `Screencaster`, passes callbacks into `DashboardServer.onStartStream/onStopStream` that look up the right session and start/stop streaming. Each frame is broadcast to the dashboard.

**Files:**
- Modify: `tests/soak/runner.ts`

- [ ] **Step 1: Import Screencaster and hold a sessions map**

In `runner.ts`, add at the top:

```typescript
import { Screencaster } from './core/screencaster';
```

After `const sessions = await pool.acquire(accounts);`, build a lookup map:

```typescript
    const sessionsByKey = new Map<string, typeof sessions[number]>();
    for (const s of sessions) sessionsByKey.set(s.key, s);
```

Create the screencaster before the dashboard (or right after sessions are acquired):

```typescript
    const screencaster = new Screencaster(logger);
```

- [ ] **Step 2: Replace the `onStartStream`/`onStopStream` no-ops with real wiring**

Update the `DashboardServer` construction (earlier in the function) to accept handlers that close over `screencaster` and `sessionsByKey`. But since those are built after the dashboard, we need to build the dashboard AFTER sessions are acquired. Reorganize:

Move the dashboard construction to AFTER `sessions = await pool.acquire(accounts)`. Then:

```typescript
    if (watch === 'dashboard') {
      const port = Number(config.dashboardPort ?? 7777);
      dashboardServer = new DashboardServer(port, logger, {
        onStartStream: (key) => {
          const session = sessionsByKey.get(key);
          if (!session) {
            logger.warn('stream_start_unknown_session', { sessionKey: key });
            return;
          }
          screencaster
            .start(key, session.page, (jpegBase64) => {
              dashboardServer!.broadcast({ type: 'frame', sessionKey: key, jpegBase64 });
            })
            .catch((err) =>
              logger.error('screencast_start_failed', {
                key,
                error: err instanceof Error ? err.message : String(err),
              }),
            );
        },
        onStopStream: (key) => {
          screencaster.stop(key).catch(() => {});
        },
        onDisconnect: () => {
          screencaster.stopAll().catch(() => {});
        },
      });
      await dashboardServer.start();
      dashboard = dashboardServer.reporter();
      const url = `http://localhost:${port}`;
      console.log(`Dashboard: ${url}`);
      // ... auto-open
    }
```

Make sure the `ctx.dashboard` assignment happens AFTER the dashboard setup (it already does — `const ctx = { ... dashboard, ... }` comes later).

In the `finally` block, add:

```typescript
    await screencaster.stopAll();
```

- [ ] **Step 3: Manual test end-to-end**

Run a longer populate game so there's time to click:

```bash
TEST_URL=http://localhost:8000 SOAK_INVITE_CODE=SOAKTEST npm run soak -- \
  --scenario=populate \
  --accounts=4 --rooms=1 --cpus-per-room=0 --games-per-room=2 --holes=3 \
  --watch=dashboard
```

Expected:
1. Dashboard opens, shows 1 room with 4 players
2. Click on any player tile (`soak_00`, `soak_01`, ...)
3. Modal opens, shows live JPEG frames of that player's view of the game
4. Close modal (Esc or Close button) — frames stop, screencast detaches
5. Run completes cleanly

- [ ] **Step 4: Commit**

```bash
git add tests/soak/runner.ts
git commit -m "$(cat <<'EOF'
feat(soak): click-to-watch live video via CDP screencast

Runner creates a Screencaster and wires its start/stop into
DashboardServer.onStartStream/onStopStream. Clicking a player tile
in the dashboard starts a CDP screencast on that session's page,
forwards JPEG frames as WS "frame" messages, closes on modal
dismiss or WS disconnect.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
EOF
)"
```

---

## Phase 7 — Tiled mode

### Task 24: `--watch=tiled` native windows

Launch a second headed browser for the 4 host contexts, position their windows in a 2×2 grid using `page.evaluate(window.moveTo)`.

**Files:**
- Modify: `tests/soak/core/session-pool.ts` — add optional headed-host support
- Modify: `tests/soak/runner.ts` — enable tiled mode

- [ ] **Step 1: Extend `SessionPool` to support headed host contexts**

Add a new option and method to `SessionPool`. In `core/session-pool.ts`:

```typescript
export interface SessionPoolOptions {
  targetUrl: string;
  inviteCode: string;
  credFile: string;
  logger: Logger;
  browser?: Browser;
  contextOptions?: Parameters<Browser['newContext']>[0];
  /** If set, the first `headedHostCount` sessions use a separate headed browser. */
  headedHostCount?: number;
}
```

Inside the class, add a `headedBrowser` field and extend `acquire`:

```typescript
  private headedBrowser: Browser | null = null;

  // ... in acquire(), before the loop:

  if ((this.opts.headedHostCount ?? 0) > 0 && !this.headedBrowser) {
    this.headedBrowser = await chromium.launch({
      headless: false,
      slowMo: 50,
    });
  }

  for (let i = 0; i < count; i++) {
    const account = this.accounts[i];
    const useHeaded = i < (this.opts.headedHostCount ?? 0);
    const targetBrowser = useHeaded ? this.headedBrowser! : this.browser!;
    const context = await targetBrowser.newContext({
      ...this.opts.contextOptions,
      ...(useHeaded ? { viewport: { width: 960, height: 540 } } : {}),
    });
    await this.injectAuth(context, account);
    const page = await context.newPage();
    await page.goto(this.opts.targetUrl);

    // Position headed windows in a 2×2 grid
    if (useHeaded) {
      const col = i % 2;
      const row = Math.floor(i / 2);
      const x = col * 960;
      const y = row * 560;
      await page.evaluate(
        ([x, y, w, h]) => {
          window.moveTo(x, y);
          window.resizeTo(w, h);
        },
        [x, y, 960, 540] as [number, number, number, number],
      );
    }

    const bot = new GolfBot(page);
    sessions.push({ account, context, page, bot, key: account.key });
  }
```

Update `release` to close the headed browser too:

```typescript
  async release(): Promise<void> {
    for (const session of this.activeSessions) {
      try { await session.context.close(); } catch { /* ignore */ }
    }
    this.activeSessions = [];
    if (this.ownedBrowser) {
      try { await this.ownedBrowser.close(); } catch { /* ignore */ }
      this.ownedBrowser = null;
      this.browser = null;
    }
    if (this.headedBrowser) {
      try { await this.headedBrowser.close(); } catch { /* ignore */ }
      this.headedBrowser = null;
    }
  }
```

- [ ] **Step 2: Wire `watch === 'tiled'` in the runner**

In `runner.ts`, replace the existing `tiled_not_yet_implemented` warning with:

```typescript
  const headedHostCount = watch === 'tiled' ? rooms : 0;

  const pool = new SessionPool({
    targetUrl,
    inviteCode,
    credFile,
    logger,
    headedHostCount,
  });
```

(Move that `pool` creation up so it's aware of `watch`.)

- [ ] **Step 3: Test tiled mode**

```bash
TEST_URL=http://localhost:8000 SOAK_INVITE_CODE=SOAKTEST npm run soak -- \
  --scenario=populate \
  --accounts=4 --rooms=2 --cpus-per-room=0 --games-per-room=1 --holes=1 \
  --watch=tiled
```

Expected: 2 native Chromium windows appear (one per host), sized ~960×540 and positioned at the upper-left of the screen. They play the game visibly. On exit, windows close.

- [ ] **Step 4: Commit**

```bash
git add tests/soak/core/session-pool.ts tests/soak/runner.ts
git commit -m "$(cat <<'EOF'
feat(soak): --watch=tiled launches N headed host windows

SessionPool accepts headedHostCount; when > 0 it launches a second
Chromium in headed mode, creates those contexts there, and positions
each host window in a 2×2 grid via window.moveTo/resizeTo.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
EOF
)"
```

---

## Phase 8 — Stress scenario

### Task 25: Chaos injector + stress scenario

Short 1-hole games in tight loops, with a 5% per-turn chance of injecting a chaos event (rapid clicks, brief offline toggle, tab navigation).

**Files:**
- Create: `tests/soak/scenarios/stress.ts`
- Create: `tests/soak/scenarios/shared/chaos.ts`
- Modify: `tests/soak/scenarios/index.ts` — register `stress`

- [ ] **Step 1: Create `scenarios/shared/chaos.ts`**

```typescript
// tests/soak/scenarios/shared/chaos.ts
import type { Session, Logger } from '../../core/types';

export type ChaosEvent =
  | 'rapid_clicks'
  | 'tab_blur'
  | 'brief_offline';

const ALL_EVENTS: ChaosEvent[] = ['rapid_clicks', 'tab_blur', 'brief_offline'];

function pickEvent(): ChaosEvent {
  return ALL_EVENTS[Math.floor(Math.random() * ALL_EVENTS.length)];
}

export async function maybeInjectChaos(
  session: Session,
  probability: number,
  logger: Logger,
  roomId: string,
): Promise<ChaosEvent | null> {
  if (Math.random() >= probability) return null;

  const event = pickEvent();
  logger.info('chaos_injected', { room: roomId, session: session.key, event });
  try {
    switch (event) {
      case 'rapid_clicks': {
        // Fire 5 rapid clicks at the player's own cards
        for (let i = 0; i < 5; i++) {
          await session.page.locator(`#player-cards .card:nth-child(${(i % 6) + 1})`)
            .click({ timeout: 300 })
            .catch(() => {});
        }
        break;
      }
      case 'tab_blur': {
        // Briefly dispatch blur then focus
        await session.page.evaluate(() => {
          window.dispatchEvent(new Event('blur'));
          setTimeout(() => window.dispatchEvent(new Event('focus')), 200);
        });
        break;
      }
      case 'brief_offline': {
        await session.context.setOffline(true);
        await new Promise((r) => setTimeout(r, 300));
        await session.context.setOffline(false);
        break;
      }
    }
  } catch (err) {
    logger.warn('chaos_error', {
      event,
      error: err instanceof Error ? err.message : String(err),
    });
  }
  return event;
}
```

- [ ] **Step 2: Create `scenarios/stress.ts`**

```typescript
// tests/soak/scenarios/stress.ts
import type {
  Scenario,
  ScenarioContext,
  ScenarioResult,
  ScenarioError,
  Session,
} from '../core/types';
import { runOneMultiplayerGame } from './shared/multiplayer-game';
import { maybeInjectChaos } from './shared/chaos';

interface StressConfig {
  gamesPerRoom: number;
  holes: number;
  decks: number;
  rooms: number;
  cpusPerRoom: number;
  thinkTimeMs: [number, number];
  interGamePauseMs: number;
  chaosChance: number;
}

function chunk<T>(arr: T[], size: number): T[][] {
  const out: T[][] = [];
  for (let i = 0; i < arr.length; i += size) out.push(arr.slice(i, i + size));
  return out;
}

async function sleep(ms: number): Promise<void> {
  return new Promise((r) => setTimeout(r, ms));
}

async function runStressRoom(
  ctx: ScenarioContext,
  cfg: StressConfig,
  roomIdx: number,
  sessions: Session[],
): Promise<{ completed: number; errors: ScenarioError[]; chaosFired: number }> {
  const roomId = `room-${roomIdx}`;
  let completed = 0;
  let chaosFired = 0;
  const errors: ScenarioError[] = [];

  for (let gameNum = 0; gameNum < cfg.gamesPerRoom; gameNum++) {
    if (ctx.signal.aborted) break;

    ctx.dashboard.update(roomId, { game: gameNum + 1, totalGames: cfg.gamesPerRoom });

    // Start a background chaos loop for this game
    let chaosActive = true;
    const chaosLoop = (async () => {
      while (chaosActive && !ctx.signal.aborted) {
        await sleep(500);
        for (const session of sessions) {
          const e = await maybeInjectChaos(session, cfg.chaosChance, ctx.logger, roomId);
          if (e) chaosFired++;
        }
      }
    })();

    const result = await runOneMultiplayerGame(ctx, sessions, {
      roomId,
      holes: cfg.holes,
      decks: cfg.decks,
      cpusPerRoom: cfg.cpusPerRoom,
      thinkTimeMs: cfg.thinkTimeMs,
    });

    chaosActive = false;
    await chaosLoop;

    if (result.completed) {
      completed++;
      ctx.logger.info('game_complete', { room: roomId, game: gameNum + 1, turns: result.turns });
    } else {
      errors.push({
        room: roomId,
        reason: 'game_failed',
        detail: result.error,
        timestamp: Date.now(),
      });
      ctx.logger.error('game_failed', { room: roomId, error: result.error });
    }

    await sleep(cfg.interGamePauseMs);
  }

  return { completed, errors, chaosFired };
}

const stress: Scenario = {
  name: 'stress',
  description: 'Rapid short games for stability & race condition hunting',
  needs: { accounts: 16, rooms: 4, cpusPerRoom: 2 },
  defaultConfig: {
    gamesPerRoom: 50,
    holes: 1,
    decks: 1,
    rooms: 4,
    cpusPerRoom: 2,
    thinkTimeMs: [50, 150],
    interGamePauseMs: 200,
    chaosChance: 0.05,
  },

  async run(ctx: ScenarioContext): Promise<ScenarioResult> {
    const start = Date.now();
    const cfg = ctx.config as unknown as StressConfig;
    const perRoom = Math.floor(ctx.sessions.length / cfg.rooms);
    const roomSessions = chunk(ctx.sessions, perRoom);

    const results = await Promise.allSettled(
      roomSessions.map((s, idx) => runStressRoom(ctx, cfg, idx, s)),
    );

    let gamesCompleted = 0;
    let chaosFired = 0;
    const errors: ScenarioError[] = [];
    results.forEach((r, idx) => {
      if (r.status === 'fulfilled') {
        gamesCompleted += r.value.completed;
        chaosFired += r.value.chaosFired;
        errors.push(...r.value.errors);
      } else {
        errors.push({
          room: `room-${idx}`,
          reason: 'room_threw',
          detail: r.reason instanceof Error ? r.reason.message : String(r.reason),
          timestamp: Date.now(),
        });
      }
    });

    return {
      gamesCompleted,
      errors,
      durationMs: Date.now() - start,
      customMetrics: { chaos_fired: chaosFired },
    };
  },
};

export default stress;
```

- [ ] **Step 3: Register stress in the registry**

Edit `tests/soak/scenarios/index.ts`:

```typescript
import type { Scenario } from '../core/types';
import populate from './populate';
import stress from './stress';

const registry: Record<string, Scenario> = {
  populate,
  stress,
};

export function getScenario(name: string): Scenario | undefined {
  return registry[name];
}

export function listScenarios(): Scenario[] {
  return Object.values(registry);
}
```

- [ ] **Step 4: Smoke test stress scenario**

```bash
TEST_URL=http://localhost:8000 SOAK_INVITE_CODE=SOAKTEST npm run soak -- \
  --scenario=stress \
  --accounts=4 --rooms=1 --cpus-per-room=1 --games-per-room=3 --holes=1 \
  --watch=none
```

Expected: 3 quick games complete, chaos events in logs (look for `chaos_injected`), exit 0.

- [ ] **Step 5: Commit**

```bash
git add tests/soak/scenarios/stress.ts tests/soak/scenarios/shared/chaos.ts tests/soak/scenarios/index.ts
git commit -m "$(cat <<'EOF'
feat(soak): stress scenario with chaos injection

Rapid 1-hole games with a parallel chaos loop that has a 5% per-turn
chance of firing rapid_clicks, tab_blur, or brief_offline events.
Chaos counts roll up into ScenarioResult.customMetrics.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
EOF
)"
```

---

## Phase 9 — Failure handling

### Task 26: Watchdog + heartbeat wiring

Per-room timeout that fires if no heartbeat arrives within N ms. Runner wires it into `ctx.heartbeat`. Vitest-tested.

**Files:**
- Create: `tests/soak/core/watchdog.ts`
- Create: `tests/soak/tests/watchdog.test.ts`
- Modify: `tests/soak/runner.ts` — wire `heartbeat` to per-room watchdogs

- [ ] **Step 1: Write failing tests**

```typescript
// tests/soak/tests/watchdog.test.ts
import { describe, it, expect, vi, beforeEach, afterEach } from 'vitest';
import { Watchdog } from '../core/watchdog';

describe('Watchdog', () => {
  beforeEach(() => vi.useFakeTimers());
  afterEach(() => vi.useRealTimers());

  it('fires after timeout if no heartbeat', () => {
    const onTimeout = vi.fn();
    const w = new Watchdog(1000, onTimeout);
    w.start();
    vi.advanceTimersByTime(1001);
    expect(onTimeout).toHaveBeenCalledOnce();
  });

  it('heartbeat resets the timer', () => {
    const onTimeout = vi.fn();
    const w = new Watchdog(1000, onTimeout);
    w.start();
    vi.advanceTimersByTime(800);
    w.heartbeat();
    vi.advanceTimersByTime(800);
    expect(onTimeout).not.toHaveBeenCalled();
    vi.advanceTimersByTime(300);
    expect(onTimeout).toHaveBeenCalledOnce();
  });

  it('stop cancels pending timeout', () => {
    const onTimeout = vi.fn();
    const w = new Watchdog(1000, onTimeout);
    w.start();
    w.stop();
    vi.advanceTimersByTime(2000);
    expect(onTimeout).not.toHaveBeenCalled();
  });

  it('does not fire twice after stop', () => {
    const onTimeout = vi.fn();
    const w = new Watchdog(1000, onTimeout);
    w.start();
    vi.advanceTimersByTime(1001);
    w.heartbeat();
    vi.advanceTimersByTime(1001);
    expect(onTimeout).toHaveBeenCalledOnce();
  });
});
```

- [ ] **Step 2: Run to verify failure**

```bash
npx vitest run tests/watchdog.test.ts
```

Expected: FAIL.

- [ ] **Step 3: Implement `core/watchdog.ts`**

```typescript
// tests/soak/core/watchdog.ts
export class Watchdog {
  private timer: NodeJS.Timeout | null = null;
  private fired = false;

  constructor(
    private timeoutMs: number,
    private onTimeout: () => void,
  ) {}

  start(): void {
    this.stop();
    this.fired = false;
    this.timer = setTimeout(() => {
      if (this.fired) return;
      this.fired = true;
      this.onTimeout();
    }, this.timeoutMs);
  }

  heartbeat(): void {
    if (this.fired) return;
    this.start();
  }

  stop(): void {
    if (this.timer) {
      clearTimeout(this.timer);
      this.timer = null;
    }
  }
}
```

- [ ] **Step 4: Verify tests pass**

```bash
npx vitest run tests/watchdog.test.ts
```

Expected: all passing.

- [ ] **Step 5: Wire watchdogs into the runner**

In `runner.ts`, add before building `ctx`:

```typescript
    const watchdogs = new Map<string, Watchdog>();
    const roomAborters = new Map<string, AbortController>();
    for (let i = 0; i < rooms; i++) {
      const roomId = `room-${i}`;
      const aborter = new AbortController();
      roomAborters.set(roomId, aborter);
      const w = new Watchdog(60_000, () => {
        logger.error('watchdog_fired', { room: roomId });
        aborter.abort();
        dashboard.update(roomId, { phase: 'error' });
      });
      w.start();
      watchdogs.set(roomId, w);
    }
```

Import at the top:

```typescript
import { Watchdog } from './core/watchdog';
```

Set `ctx.heartbeat` to:

```typescript
      heartbeat: (roomId: string) => {
        const w = watchdogs.get(roomId);
        if (w) w.heartbeat();
      },
```

In the `finally` block, stop all watchdogs:

```typescript
    for (const w of watchdogs.values()) w.stop();
```

Note: for now the `roomAborters` aren't fully plumbed into scenario cancellation — scenarios see the global `ctx.signal` only. This is intentional; per-room abort requires scenario-side awareness and is deferred until a scenario genuinely misbehaves. The watchdog still catches stuck runs and flips the global error state.

- [ ] **Step 6: Commit**

```bash
git add tests/soak/core/watchdog.ts tests/soak/tests/watchdog.test.ts tests/soak/runner.ts
git commit -m "$(cat <<'EOF'
feat(soak): per-room watchdog with heartbeat

Watchdog class with Vitest tests, wired into ctx.heartbeat in the
runner. One watchdog per room, 60s timeout; firing logs an error
and marks the room's dashboard tile as errored.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
EOF
)"
```

---

### Task 27: Artifact capture on failure

When the runner catches an error, snapshot every session's page: screenshot, HTML, console log tail, game state JSON.

**Files:**
- Create: `tests/soak/core/artifacts.ts`
- Modify: `tests/soak/runner.ts` — call `captureArtifacts` in the catch block

- [ ] **Step 1: Implement `core/artifacts.ts`**

```typescript
// tests/soak/core/artifacts.ts
import * as fs from 'fs';
import * as path from 'path';
import type { Session, Logger } from './types';

export interface ArtifactsOptions {
  runId: string;
  /** Absolute path to the artifacts root, e.g., /path/to/tests/soak/artifacts */
  rootDir: string;
  logger: Logger;
}

export class Artifacts {
  readonly runDir: string;

  constructor(private opts: ArtifactsOptions) {
    this.runDir = path.join(opts.rootDir, opts.runId);
    fs.mkdirSync(this.runDir, { recursive: true });
  }

  /** Capture everything for a single session. */
  async captureSession(session: Session, roomId: string): Promise<void> {
    const dir = path.join(this.runDir, roomId);
    fs.mkdirSync(dir, { recursive: true });
    const prefix = session.key;

    try {
      const png = await session.page.screenshot({ fullPage: true });
      fs.writeFileSync(path.join(dir, `${prefix}.png`), png);
    } catch (err) {
      this.opts.logger.warn('artifact_screenshot_failed', {
        session: session.key,
        error: err instanceof Error ? err.message : String(err),
      });
    }

    try {
      const html = await session.page.content();
      fs.writeFileSync(path.join(dir, `${prefix}.html`), html);
    } catch (err) {
      this.opts.logger.warn('artifact_html_failed', {
        session: session.key,
        error: err instanceof Error ? err.message : String(err),
      });
    }

    try {
      const state = await session.bot.getGameState();
      fs.writeFileSync(
        path.join(dir, `${prefix}.state.json`),
        JSON.stringify(state, null, 2),
      );
    } catch (err) {
      this.opts.logger.warn('artifact_state_failed', {
        session: session.key,
        error: err instanceof Error ? err.message : String(err),
      });
    }

    try {
      const errors = session.bot.getConsoleErrors?.() ?? [];
      fs.writeFileSync(path.join(dir, `${prefix}.console.txt`), errors.join('\n'));
    } catch {
      // ignore — not all bots expose this
    }
  }

  async captureAll(sessions: Session[]): Promise<void> {
    // Best-effort: partition sessions by their key prefix (doesn't matter)
    // and write everything under room-unknown/ unless callers pre-partition
    await Promise.all(
      sessions.map((s) => this.captureSession(s, 'room-unknown')),
    );
  }

  writeSummary(summary: object): void {
    fs.writeFileSync(
      path.join(this.runDir, 'summary.json'),
      JSON.stringify(summary, null, 2),
    );
  }
}

/** Prune run directories older than `maxAgeMs`. */
export function pruneOldRuns(rootDir: string, maxAgeMs: number, logger: Logger): void {
  if (!fs.existsSync(rootDir)) return;
  const now = Date.now();
  for (const entry of fs.readdirSync(rootDir)) {
    const full = path.join(rootDir, entry);
    try {
      const stat = fs.statSync(full);
      if (stat.isDirectory() && now - stat.mtimeMs > maxAgeMs) {
        fs.rmSync(full, { recursive: true, force: true });
        logger.info('artifact_pruned', { runId: entry });
      }
    } catch {
      // ignore
    }
  }
}
```

- [ ] **Step 2: Call artifact capture from the runner's error path**

In `runner.ts`, import:

```typescript
import { Artifacts, pruneOldRuns } from './core/artifacts';
```

After `const runId = ...`, instantiate and prune:

```typescript
  const artifactsRoot = path.resolve(__dirname, 'artifacts');
  const artifacts = new Artifacts({ runId, rootDir: artifactsRoot, logger });
  pruneOldRuns(artifactsRoot, 7 * 24 * 3600 * 1000, logger);
```

In the `catch (err)` block, after logging, capture:

```typescript
  } catch (err) {
    logger.error('run_failed', {
      error: err instanceof Error ? err.message : String(err),
      stack: err instanceof Error ? err.stack : undefined,
    });
    try {
      const liveSessions = pool['activeSessions'] as Session[] | undefined;
      if (liveSessions && liveSessions.length > 0) {
        await artifacts.captureAll(liveSessions);
      }
    } catch (captureErr) {
      logger.warn('artifact_capture_failed', {
        error: captureErr instanceof Error ? captureErr.message : String(captureErr),
      });
    }
    exitCode = 1;
  }
```

(Note: the `pool['activeSessions']` access bypasses visibility to avoid adding a public getter for one call site. Acceptable for an error path in a test harness.)

After successful run, write the summary:

```typescript
    artifacts.writeSummary({
      runId,
      scenario: scenario.name,
      targetUrl,
      gamesCompleted: result.gamesCompleted,
      errors: result.errors,
      durationMs: result.durationMs,
      customMetrics: result.customMetrics,
    });
```

Import `Session` type:

```typescript
import type { Session } from './core/types';
```

- [ ] **Step 3: Verify by forcing a failure**

Kill the server mid-run and confirm artifacts are written:

```bash
# In one terminal
TEST_URL=http://localhost:8000 SOAK_INVITE_CODE=SOAKTEST npm run soak -- \
  --scenario=populate --accounts=2 --rooms=1 --cpus-per-room=0 \
  --games-per-room=5 --holes=3 --watch=none

# In another: wait ~3 seconds then Ctrl-C the dev server
# The soak run should catch errors and write artifacts

ls tests/soak/artifacts/
ls tests/soak/artifacts/<run-id>/
```

Expected: a run directory exists with `summary.json` (if it got far enough) or per-session screenshots / HTML under `room-unknown/`.

- [ ] **Step 4: Commit**

```bash
git add tests/soak/core/artifacts.ts tests/soak/runner.ts
git commit -m "$(cat <<'EOF'
feat(soak): artifact capture on failure + run summary

Screenshots, HTML, game state, and console errors are captured into
tests/soak/artifacts/<run-id>/ when a scenario throws. Runs older
than 7 days are pruned on startup. Successful runs get a
summary.json next to the artifacts dir.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
EOF
)"
```

---

### Task 28: Graceful shutdown (already partially in place) + exit codes

SIGINT/SIGTERM already flip the abort controller. Formalize the timeout-and-force-exit path and the three exit codes (`0` / `1` / `2`).

**Files:**
- Modify: `tests/soak/runner.ts`

- [ ] **Step 1: Add a graceful shutdown timeout**

In `runner.ts`, replace the existing signal handlers with:

```typescript
  let forceExitTimer: NodeJS.Timeout | null = null;
  const onSignal = (sig: string) => {
    if (abortController.signal.aborted) {
      // Second signal: force exit
      logger.warn('force_exit', { signal: sig });
      process.exit(130);
    }
    logger.warn('signal_received', { signal: sig });
    abortController.abort();
    // Hard-kill after 10s if cleanup hangs
    forceExitTimer = setTimeout(() => {
      logger.error('graceful_shutdown_timeout');
      process.exit(130);
    }, 10_000);
  };
  process.on('SIGINT', () => onSignal('SIGINT'));
  process.on('SIGTERM', () => onSignal('SIGTERM'));
```

In the `finally` block, clear the force-exit timer:

```typescript
    if (forceExitTimer) clearTimeout(forceExitTimer);
```

- [ ] **Step 2: Manual test — Ctrl-C a long run**

```bash
TEST_URL=http://localhost:8000 SOAK_INVITE_CODE=SOAKTEST npm run soak -- \
  --scenario=populate --accounts=2 --rooms=1 --cpus-per-room=0 \
  --games-per-room=10 --holes=3 --watch=none

# After ~5 seconds: Ctrl-C
```

Expected: runner logs `signal_received`, finishes current turn, prints summary, exits with code 2 (check `echo $?`).

- [ ] **Step 3: Commit**

```bash
git add tests/soak/runner.ts
git commit -m "$(cat <<'EOF'
feat(soak): graceful shutdown with 10s hard-kill fallback

SIGINT/SIGTERM flips the abort signal; scenarios finish the current
turn then exit. If cleanup hangs >10s the runner force-exits. Second
Ctrl-C is an immediate hard kill. Exit codes: 0 success, 1 errors,
2 interrupted.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
EOF
)"
```

---

### Task 29: Periodic health probes

Every 30s, fetch `/api/health` on the target server. Three consecutive failures declare a fatal error and abort.

**Files:**
- Modify: `tests/soak/runner.ts`

- [ ] **Step 1: Add a health probe interval**

In `runner.ts`, after building the abort controller and before running the scenario:

```typescript
  let healthFailures = 0;
  const healthTimer = setInterval(async () => {
    try {
      const res = await fetch(`${targetUrl}/api/health`);
      if (!res.ok) throw new Error(`status ${res.status}`);
      healthFailures = 0;
    } catch (err) {
      healthFailures++;
      logger.warn('health_probe_failed', {
        consecutive: healthFailures,
        error: err instanceof Error ? err.message : String(err),
      });
      if (healthFailures >= 3) {
        logger.error('health_fatal', { consecutive: healthFailures });
        abortController.abort();
      }
    }
  }, 30_000);
```

In the `finally` block:

```typescript
    clearInterval(healthTimer);
```

- [ ] **Step 2: Commit**

```bash
git add tests/soak/runner.ts
git commit -m "$(cat <<'EOF'
feat(soak): periodic health probes against target server

Every 30s GET /api/health. Three consecutive failures abort the
run with a fatal error, so staging outages don't get misattributed
to harness bugs.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
EOF
)"
```

---

## Phase 10 — Polish and bring-up

### Task 30: Smoke test script

`tests/soak/scripts/smoke.sh` — the canary run that takes ~30s against local dev.

**Files:**
- Create: `tests/soak/scripts/smoke.sh`

- [ ] **Step 1: Create the script**

```bash
#!/usr/bin/env bash
# Soak harness smoke test — end-to-end canary against local dev.
# Expected runtime: ~30 seconds.
set -euo pipefail

cd "$(dirname "$0")/.."

: "${TEST_URL:=http://localhost:8000}"
: "${SOAK_INVITE_CODE:=SOAKTEST}"

echo "Smoke target: $TEST_URL"
echo "Invite code:  $SOAK_INVITE_CODE"

# 1. Health probe
curl -fsS "$TEST_URL/api/health" > /dev/null || {
  echo "FAIL: target server unreachable at $TEST_URL"
  exit 1
}

# 2. Ensure minimum accounts
if [ ! -f .env.stresstest ]; then
  echo "Seeding accounts..."
  npm run seed -- --count=4
fi

# 3. Run minimum viable scenario
TEST_URL="$TEST_URL" SOAK_INVITE_CODE="$SOAK_INVITE_CODE" \
  npm run soak -- \
    --scenario=populate \
    --accounts=2 \
    --rooms=1 \
    --cpus-per-room=0 \
    --games-per-room=1 \
    --holes=1 \
    --watch=none

echo "Smoke PASSED"
```

- [ ] **Step 2: Make it executable and run it**

```bash
chmod +x tests/soak/scripts/smoke.sh
cd tests/soak && bash scripts/smoke.sh
```

Expected: `Smoke PASSED` within ~30s.

- [ ] **Step 3: Commit**

```bash
git add tests/soak/scripts/smoke.sh
git commit -m "$(cat <<'EOF'
feat(soak): smoke test script — 30s end-to-end canary

Confirms the harness works against local dev with the absolute
minimum config. Run after any change.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
EOF
)"
```

---

### Task 31: README + CHECKLIST

Replace the README stub with a full quickstart and flag reference. Add the manual validation checklist.

**Files:**
- Modify: `tests/soak/README.md`
- Create: `tests/soak/CHECKLIST.md`

- [ ] **Step 1: Rewrite `tests/soak/README.md`**

```markdown
# Golf Soak & UX Test Harness

Standalone Playwright-based runner that drives multi-user authenticated
game sessions for scoreboard population and stability testing.

**Spec:** `../../docs/superpowers/specs/2026-04-10-multiplayer-soak-test-design.md`
**Bring-up:** `../../docs/soak-harness-bringup.md`

## Quick start

```bash
cd tests/soak
npm install

# First run only: seed 16 accounts
TEST_URL=http://localhost:8000 SOAK_INVITE_CODE=SOAKTEST npm run seed

# 30-second end-to-end smoke test
bash scripts/smoke.sh

# Populate scoreboard (4 rooms × 4 accounts × 10 long games)
TEST_URL=http://localhost:8000 SOAK_INVITE_CODE=SOAKTEST \
  npm run soak:populate

# Stress test (4 rooms × 50 rapid games with chaos)
TEST_URL=http://localhost:8000 SOAK_INVITE_CODE=SOAKTEST \
  npm run soak:stress
```

## CLI flags

```
--scenario=populate|stress    required
--accounts=<n>                total sessions (default: scenario.needs.accounts)
--rooms=<n>                   default from scenario.needs
--cpus-per-room=<n>           default from scenario.needs
--games-per-room=<n>          default from scenario.defaultConfig
--holes=<n>                   default from scenario.defaultConfig
--watch=none|dashboard|tiled  default: dashboard
--dashboard-port=<n>          default: 7777
--target=<url>                default: TEST_URL env
--run-id=<string>             default: ISO timestamp
--list                        print scenarios and exit
--dry-run                     validate config, don't run
```

Derived: `accounts / rooms` must divide evenly.

## Environment variables

```
TEST_URL             target base URL (e.g. https://staging.adlee.work)
SOAK_INVITE_CODE     invite code flagged marks_as_test (staging: 5VC2MCCN)
SOAK_HOLES           override --holes
SOAK_ROOMS           override --rooms
SOAK_ACCOUNTS        override --accounts
SOAK_CPUS_PER_ROOM   override --cpus-per-room
SOAK_GAMES_PER_ROOM  override --games-per-room
SOAK_WATCH           override --watch
SOAK_DASHBOARD_PORT  override --dashboard-port
```

## Watch modes

- **`none`** — pure headless, JSON logs to stdout. Use for CI and overnight runs.
- **`dashboard`** (default) — HTTP+WS server on localhost:7777 serving a live status grid. Click any player tile to watch their live session via CDP screencast.
- **`tiled`** — 4 native Chromium windows for the host of each room, positioned in a 2×2 grid. Joiners stay headless.

## Scenarios

| Name | Description |
|---|---|
| `populate` | Long 9-hole games with varied CPU personalities, realistic pacing, for populating scoreboards |
| `stress` | Rapid 1-hole games with chaos injection (rapid clicks, offline toggles, tab blur) for hunting race conditions |

Add new scenarios by creating `scenarios/<name>.ts` and registering in `scenarios/index.ts`.

## Architecture

See the design spec for full module breakdown. Key modules:

- `runner.ts` — CLI entry, wires everything together
- `core/session-pool.ts` — owns browser contexts, seeds/logs in 16 accounts
- `core/room-coordinator.ts` — host→joiners room-code handoff
- `core/watchdog.ts` — per-room timeout detection
- `core/screencaster.ts` — CDP Page.startScreencast for live video
- `dashboard/server.ts` — HTTP + WS server
- `scenarios/` — pluggable scenarios

Reuses `../../tests/e2e/bot/golf-bot.ts` unchanged.

## Running tests (unit)

```bash
npm test
```

Tests cover `Deferred`, `RoomCoordinator`, `Watchdog`, and `config`.
Integration-level modules are verified by the smoke test.
```

- [ ] **Step 2: Create `tests/soak/CHECKLIST.md`**

```markdown
# Soak Harness Manual Validation Checklist

Run after any significant change or before calling the implementation complete.

## Bring-up

- [ ] Local dev server is running (`python server/main.py`)
- [ ] `SOAKTEST` invite code exists locally with `marks_as_test=TRUE`
- [ ] `npm install` in `tests/soak/` succeeded
- [ ] `npm run seed -- --count=16` creates/updates 16 accounts
- [ ] `.env.stresstest` has 16 `SOAK_ACCOUNT_NN=...` lines
- [ ] All seeded users show `is_test_account=TRUE` in the DB

## Smoke

- [ ] `bash scripts/smoke.sh` exits 0 within 60s

## Scenarios

- [ ] `--scenario=populate --rooms=1 --games-per-room=1` completes cleanly
- [ ] `--scenario=populate --rooms=4 --games-per-room=1` runs 4 rooms in parallel with no cross-contamination
- [ ] `--scenario=stress --games-per-room=3` logs `chaos_injected` events

## Watch modes

- [ ] `--watch=none` produces JSONL on stdout, nothing else
- [ ] `--watch=dashboard` opens http://localhost:7777, grid renders, tiles update live, WS status shows `healthy`
- [ ] Clicking any player tile opens the video modal and streams live JPEG frames (~10 fps)
- [ ] Closing the modal stops the screencast (check logs for `screencast_stopped`)
- [ ] `--watch=tiled` opens 4 native Chromium windows for the 4 hosts

## Failure modes

- [ ] Ctrl-C during a run → graceful shutdown, summary printed, exit code 2
- [ ] Double Ctrl-C → hard exit (130)
- [ ] Killing the dev server mid-run → health probes fail 3× → fatal abort, artifacts captured, exit 1
- [ ] Artifacts directory contains a subdirectory per failed run with screenshots and state.json
- [ ] Artifacts older than 7 days are pruned on next startup

## Server-side filtering

- [ ] `GET /api/stats/leaderboard` (default) hides soak_* accounts
- [ ] `GET /api/stats/leaderboard?include_test=true` shows soak_* accounts
- [ ] Admin panel user list shows `[Test]` badge on soak_* accounts
- [ ] Admin panel "Include test accounts" checkbox filters them out
- [ ] Admin panel invite codes tab shows `[Test-seed]` next to SOAKTEST

## Staging bring-up (final step)

- [ ] `UPDATE invite_codes SET marks_as_test = TRUE WHERE code = '5VC2MCCN';` run on staging
- [ ] `SOAK_INVITE_CODE=5VC2MCCN TEST_URL=https://staging.adlee.work npm run seed -- --count=16` seeds staging accounts
- [ ] Staging run with `--scenario=populate --watch=none` completes
- [ ] Staging leaderboard with `include_test=true` shows the soak accounts
- [ ] Staging leaderboard default (no param) does NOT show the soak accounts
```

- [ ] **Step 3: Commit**

```bash
git add tests/soak/README.md tests/soak/CHECKLIST.md
git commit -m "$(cat <<'EOF'
docs(soak): full README + manual validation checklist

Quickstart, flag reference, env var reference, scenario table, and
the bring-up/validation checklist that gates calling the harness
implementation complete.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
EOF
)"
```

---

### Task 32: Staging bring-up (manual, no code)

This is a documentation-only task — the actual run happens on your workstation. Listed here so the implementation plan is complete end to end.

- [ ] **Step 1: Flag `5VC2MCCN` as test-seed on staging**

From your workstation (requires DB access to staging):

```bash
ssh root@129.212.150.189 \
  'docker exec -i golfgame-postgres psql -U postgres -d golfgame' <<'EOF'
UPDATE invite_codes SET marks_as_test = TRUE WHERE code = '5VC2MCCN';
SELECT code, max_uses, use_count, marks_as_test FROM invite_codes WHERE code = '5VC2MCCN';
EOF
```

Expected: `marks_as_test | t`.

(The exact docker container name may differ — adjust based on `docker ps` on the staging host.)

- [ ] **Step 2: Seed the 16 staging accounts**

```bash
cd tests/soak
rm -f .env.stresstest
TEST_URL=https://staging.adlee.work \
  SOAK_INVITE_CODE=5VC2MCCN \
  npm run seed -- --count=16
```

Expected: `.env.stresstest` populated with 16 entries.

- [ ] **Step 3: Run populate against staging**

```bash
TEST_URL=https://staging.adlee.work \
  SOAK_INVITE_CODE=5VC2MCCN \
  npm run soak -- \
    --scenario=populate \
    --rooms=4 \
    --games-per-room=3 \
    --holes=3 \
    --watch=dashboard
```

Expected: dashboard opens, 4 rooms play 3 games each, staging scoreboard accumulates data. Exit 0 at the end.

- [ ] **Step 4: Verify scoreboard filtering on staging**

```bash
# Should NOT contain soak_* usernames
curl -s "https://staging.adlee.work/api/stats/leaderboard?metric=wins" | jq '.entries[] | select(.username | startswith("soak_"))'

# Should contain soak_* usernames
curl -s "https://staging.adlee.work/api/stats/leaderboard?metric=wins&include_test=true" | jq '.entries[] | select(.username | startswith("soak_"))'
```

Expected: first returns nothing, second returns entries.

- [ ] **Step 5: Mark implementation complete**

Check off all items in `tests/soak/CHECKLIST.md` that correspond to this plan. Commit the filled-in checklist if you want a record:

```bash
git add tests/soak/CHECKLIST.md
git commit -m "docs(soak): checklist passed on initial staging run"
```

---

## Phase 11 — Version bump

### Task 33: Bump to v3.3.4 and add footer to admin.html

Updates all HTML footers from `v3.1.6` to `v3.3.4`, adds a footer to admin.html which currently has none, bumps `pyproject.toml`.

**Files:**
- Modify: `client/index.html` — both footer occurrences (L58, L291)
- Modify: `client/admin.html` — add footer
- Modify: `pyproject.toml` — version field

- [ ] **Step 1: Update `client/index.html` footers**

```bash
grep -n "v3\.1\.6" client/index.html
```

For each match, replace `v3.1.6` with `v3.3.4`. There should be exactly two matches.

- [ ] **Step 2: Add footer to `client/admin.html`**

Find the closing `</body>` in `client/admin.html` and add a footer just before it:

```html
<footer class="app-footer" style="text-align: center; padding: 16px; color: var(--muted, #666); font-size: 12px;">v3.3.4 &copy; Aaron D. Lee</footer>
</body>
```

(The inline style is a fallback — admin.css may already have an `.app-footer` class; if so, drop the inline styles.)

```bash
grep -n "app-footer" client/admin.css 2>/dev/null
```

If the class exists, use just `<footer class="app-footer">v3.3.4 &copy; Aaron D. Lee</footer>`.

- [ ] **Step 3: Bump `pyproject.toml`**

```bash
sed -i 's/^version = "3\.1\.6"$/version = "3.3.4"/' pyproject.toml
grep version pyproject.toml
```

Expected: `version = "3.3.4"`.

- [ ] **Step 4: Verify in the browser**

Restart the dev server, open http://localhost:8000 and http://localhost:8000/admin.html. Confirm both show `v3.3.4` in the footer.

- [ ] **Step 5: Commit**

```bash
git add client/index.html client/admin.html pyproject.toml
git commit -m "$(cat <<'EOF'
chore: bump version to v3.3.4

Updates client/index.html footer (×2) and pyproject.toml from
v3.1.6 → v3.3.4, and adds a matching footer to client/admin.html
which previously had none.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
EOF
)"
```

---

## Summary

33 tasks across 11 phases:

| Phase | Tasks | Milestone |
|---|---|---|
| 1 — Server changes | 1–8 | Stats filter works, test accounts are separable |
| 2 — Harness scaffolding | 9–12 | Core pure-logic modules with Vitest tests pass |
| 3 — SessionPool + seeding | 13–14 | `.env.stresstest` seeded via real HTTP |
| 4 — First run | 15–18 | **`--watch=none` smoke test passes end-to-end** |
| 5 — Dashboard | 19–21 | Live status grid in browser |
| 6 — Live video | 22–23 | Click-to-watch CDP screencast |
| 7 — Tiled mode | 24 | Native host windows |
| 8 — Stress scenario | 25 | Chaos injection runs clean |
| 9 — Failure handling | 26–29 | Watchdog + artifacts + graceful shutdown + health probes |
| 10 — Polish | 30–31 | Smoke script + README + CHECKLIST |
| 11 — Version bump | 33 | v3.3.4 everywhere |

(Task 32 is the manual staging bring-up — no code.)

Dependencies between tasks:

- Tasks 1–8 are independent of the harness (ship them first if you want immediate value for admins)
- Tasks 9–18 are strictly sequential (each builds on the previous)
- Tasks 19–21, 22–23, 24, 25 are independent of each other — can be done in any order after Task 18
- Tasks 26–29 can be done after Task 18 but are most valuable after Task 25
- Tasks 30–31 come last before staging
- Task 33 is independent and can be done any time after Task 8