Files
golfgame/docs/superpowers/plans/2026-04-10-multiplayer-soak-test.md
adlee-was-taken e8051b256b docs(plan): harden soak-harness schema migration for deploy
Makes the deployment path explicit in Task 1: traces the existing
lifespan → get_user_store → initialize_schema → conn.execute(SCHEMA_SQL)
flow, notes that the DO $$/IF NOT EXISTS pattern is the same one
every post-v1 column migration uses, and explains why rollback is
safe (additive changes only).

Adds two new verification steps to Task 1:
 - Step 7: post-deploy psql checks against staging
 - Step 8: same against production

Adds a "Post-deploy schema verification" block to CHECKLIST.md so
the schema state is verified after every server restart against
each target environment.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 23:40:28 -04:00

5583 lines
168 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Multiplayer Soak & UX Test Harness — Implementation Plan
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
**Goal:** Build a standalone Playwright-based soak runner in `tests/soak/` that drives 16 authenticated browser sessions across 4 concurrent rooms playing many multiplayer games, with pluggable scenarios, a click-to-watch dashboard via CDP screencast, and strict per-room failure isolation.
**Architecture:** Single-process node runner reusing the existing `GolfBot` class from `tests/e2e/bot/`. One shared browser (16 contexts) by default; `WATCH=tiled` uses a second headed browser for the 4 host contexts. Scenarios are plain TS modules exported from `tests/soak/scenarios/`. Dashboard is a tiny HTTP+WS server serving one static page that pushes live status and on-demand CDP screencast frames.
**Tech Stack:** TypeScript + tsx (no build step), Playwright Core, ws (WebSocket server), Vitest for unit tests, FastAPI + asyncpg (existing server), PostgreSQL (existing).
**Spec:** `docs/superpowers/specs/2026-04-10-multiplayer-soak-test-design.md`
---
## Testing Strategy Notes
- **Server-side Python changes:** The existing test suite mocks stores with `AsyncMock` and has no real-Postgres fixtures. Rather than inventing a new fixture pattern for this plan, server tasks use **curl-based verification against a running local dev server** as the explicit verification step after each commit. Run `python server/main.py` in another terminal (requires Postgres + Redis running — see `docs/INSTALL.md`).
- **TypeScript harness logic:** Unit-tested with Vitest for pure modules (Deferred, RoomCoordinator, Watchdog, Config). Integration-level modules (SessionPool, Dashboard, Screencaster, Scenarios) are verified by running the harness itself via the smoke test.
- **End-to-end validation:** `tests/soak/scripts/smoke.sh` is the canary — after every non-trivial change, run it against local dev and expect exit 0 within ~30s.
---
## Phase 1 — Server-side changes (independent, ships first)
### Task 1: Schema migration for `is_test_account` and `marks_as_test`
Add two columns, one partial index, and rebuild the `leaderboard_overall` materialized view to include `is_test_account` (so the filter works through the view fast path).
**Deploy path (this is load-bearing — read before editing):**
The existing codebase applies schema changes via inline `DO $$ BEGIN IF NOT EXISTS (...) THEN ALTER TABLE ... END IF; END $$;` blocks inside `SCHEMA_SQL` in `server/stores/user_store.py`. That string gets executed on **every server startup** by `UserStore.create() → initialize_schema() → conn.execute(SCHEMA_SQL)`, which is called from the FastAPI lifespan via `get_user_store(config.POSTGRES_URL)` in `server/main.py`. Same pattern added every other post-v1 column (`is_banned`, `force_password_reset`, `last_seen_at`, `rating`, and many others — see the existing DO blocks in `SCHEMA_SQL`).
What this means for deploy:
- **No separate migration tool needed.** CI/CD rebuilds the image, `docker compose up -d` restarts the container, lifespan fires, `SCHEMA_SQL` executes, the new `DO $$` blocks see the missing columns and `ALTER TABLE ADD COLUMN` them in place.
- **Idempotent by construction.** Re-running against an already-migrated DB is a no-op — the `IF NOT EXISTS` guard in each DO block skips the ALTER.
- **Fresh installs work.** `CREATE TABLE IF NOT EXISTS users_v2` uses the current column list; the ADD COLUMN DO blocks are no-ops because the column is already there from the CREATE.
- **Matview rebuild is atomic.** The `DO $$` block that DROPs+CREATEs `leaderboard_overall` runs inside a single transaction. `CREATE MATERIALIZED VIEW ... AS SELECT` populates immediately (no `WITH NO DATA`), so concurrent readers never see an empty or missing view — they see either the old version (pre-commit) or the new version (post-commit).
- **Rollback is safe.** All changes are additive. If you have to revert the code, the new columns just sit unused — old code never references them, so nothing breaks.
**Files:**
- Modify: `server/stores/user_store.py` — append to `SCHEMA_SQL` (ALTER blocks near L79L98 and the matview block near L298L335)
- [ ] **Step 1: Add column migration to `SCHEMA_SQL`**
Open `server/stores/user_store.py`. Inside the first `DO $$ BEGIN ... END $$;` block (around line 8098 that handles admin columns), append the `is_test_account` column check. Then add a second ALTER for `invite_codes.marks_as_test` in a new `DO $$` block right after.
Add after the existing `last_seen_at` check (before `END $$;` on line ~98):
```sql
IF NOT EXISTS (SELECT 1 FROM information_schema.columns
WHERE table_name = 'users_v2' AND column_name = 'is_test_account') THEN
ALTER TABLE users_v2 ADD COLUMN is_test_account BOOLEAN DEFAULT FALSE;
END IF;
```
Then, immediately after the `END $$;` that closes the users_v2 admin block, add a new block for invite_codes:
```sql
-- Add marks_as_test to invite_codes if not exists
DO $$
BEGIN
IF NOT EXISTS (SELECT 1 FROM information_schema.columns
WHERE table_name = 'invite_codes' AND column_name = 'marks_as_test') THEN
ALTER TABLE invite_codes ADD COLUMN marks_as_test BOOLEAN DEFAULT FALSE;
END IF;
END $$;
```
- [ ] **Step 2: Add partial index on `is_test_account`**
Find the indexes block near line 338. After the existing `idx_users_banned` index (line ~344), add:
```sql
CREATE INDEX IF NOT EXISTS idx_users_v2_is_test_account ON users_v2(is_test_account)
WHERE is_test_account = TRUE;
```
- [ ] **Step 3: Rebuild `leaderboard_overall` materialized view to include `is_test_account`**
Find the existing matview block at line ~298. Modify the version-check DO block so the view is dropped and recreated if it lacks the `is_test_account` column. Replace the existing block:
```sql
-- Leaderboard materialized view (refreshed periodically)
-- Drop and recreate if missing is_test_account column (soak harness migration)
DO $$
BEGIN
IF EXISTS (SELECT 1 FROM pg_matviews WHERE matviewname = 'leaderboard_overall') THEN
-- Check if is_test_account column exists in the view
IF NOT EXISTS (
SELECT 1 FROM information_schema.columns
WHERE table_name = 'leaderboard_overall' AND column_name = 'is_test_account'
) THEN
DROP MATERIALIZED VIEW leaderboard_overall;
END IF;
END IF;
IF NOT EXISTS (SELECT 1 FROM pg_matviews WHERE matviewname = 'leaderboard_overall') THEN
EXECUTE '
CREATE MATERIALIZED VIEW leaderboard_overall AS
SELECT
u.id as user_id,
u.username,
COALESCE(u.is_test_account, FALSE) as is_test_account,
s.games_played,
s.games_won,
ROUND(s.games_won::numeric / NULLIF(s.games_played, 0) * 100, 1) as win_rate,
s.rounds_won,
ROUND(s.total_points::numeric / NULLIF(s.total_rounds, 0), 1) as avg_score,
s.best_score as best_round_score,
s.knockouts,
s.best_win_streak,
COALESCE(s.rating, 1500) as rating,
s.last_game_at
FROM player_stats s
JOIN users_v2 u ON s.user_id = u.id
WHERE s.games_played >= 5
AND u.deleted_at IS NULL
AND (u.is_banned = false OR u.is_banned IS NULL)
';
END IF;
END $$;
```
Note: the only differences from the existing block are the changed comment, the changed column-existence check (`is_test_account` instead of `rating`), and the new `COALESCE(u.is_test_account, FALSE) as is_test_account` column in the SELECT. Everything else stays identical.
- [ ] **Step 4: Start the server to run migrations**
Run (in another terminal, with Postgres + Redis up):
```bash
cd /home/alee/Sources/golfgame
python server/main.py
```
Expected: server starts cleanly, no errors about `is_test_account` or `marks_as_test` or `leaderboard_overall`.
- [ ] **Step 5: Verify schema via psql**
Connect to the dev database and confirm:
```bash
psql -d golfgame -c "\d users_v2" | grep is_test_account
psql -d golfgame -c "\d invite_codes" | grep marks_as_test
psql -d golfgame -c "\d leaderboard_overall" | grep is_test_account
psql -d golfgame -c "\di idx_users_v2_is_test_account"
```
Expected: all four commands return matching rows.
- [ ] **Step 6: Commit**
```bash
git add server/stores/user_store.py
git commit -m "$(cat <<'EOF'
feat(server): add is_test_account + marks_as_test schema
New columns support separating soak-harness test traffic from real
user traffic in stats queries. Rebuilds leaderboard_overall matview
to include is_test_account so the fast path stays filterable.
Migration is idempotent via DO $$ / IF NOT EXISTS blocks inside
SCHEMA_SQL, which runs on every server startup — same mechanism
every existing post-v1 column migration uses.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
EOF
)"
```
- [ ] **Step 7: Post-deploy verification (staging)**
After this commit ships to staging via CI/CD (or `docker compose up -d` on the staging host), verify the migration actually applied:
```bash
ssh root@129.212.150.189 << 'REMOTE'
cd /opt/golfgame
# Find the postgres container name (it may vary across compose files)
PG_CONTAINER=$(docker compose -f docker-compose.staging.yml ps -q postgres)
docker exec -i $PG_CONTAINER psql -U postgres -d golfgame << 'SQL'
-- Confirm columns exist
\d users_v2
\d invite_codes
\d leaderboard_overall
-- Targeted checks
SELECT column_name, data_type, column_default
FROM information_schema.columns
WHERE table_name = 'users_v2' AND column_name = 'is_test_account';
SELECT column_name, data_type, column_default
FROM information_schema.columns
WHERE table_name = 'invite_codes' AND column_name = 'marks_as_test';
SELECT column_name FROM information_schema.columns
WHERE table_name = 'leaderboard_overall' AND column_name = 'is_test_account';
-- Partial index
SELECT indexname, indexdef FROM pg_indexes
WHERE indexname = 'idx_users_v2_is_test_account';
SQL
REMOTE
```
Expected (all four present):
- `users_v2.is_test_account` with default `false`
- `invite_codes.marks_as_test` with default `false`
- `leaderboard_overall` has an `is_test_account` column
- `idx_users_v2_is_test_account` exists
If any of these are missing, the server didn't actually restart (or restarted but the container has a stale image). Check `docker compose logs golfgame` for the line `User store schema initialized` — if it's not there, the migration never ran.
- [ ] **Step 8: Post-deploy verification (production)**
Same check, against prod, after the prod deploy:
```bash
ssh root@165.245.152.51 << 'REMOTE'
cd /opt/golfgame
PG_CONTAINER=$(docker compose -f docker-compose.prod.yml ps -q postgres)
docker exec -i $PG_CONTAINER psql -U postgres -d golfgame -c "\d users_v2" | grep is_test_account
docker exec -i $PG_CONTAINER psql -U postgres -d golfgame -c "\d invite_codes" | grep marks_as_test
docker exec -i $PG_CONTAINER psql -U postgres -d golfgame -c "\d leaderboard_overall" | grep is_test_account
REMOTE
```
Expected: three matching rows. If prod migration fails, the rollback story is clean — revert the commit, redeploy, old code keeps working because it never referenced the new columns.
---
### Task 2: Propagate `is_test_account` through `User` model and `user_store`
Wire the new column into the `User` dataclass, `create_user` signature, `_row_to_user` mapping, and every SELECT list that already pulls user columns.
**Files:**
- Modify: `server/models/user.py``User` dataclass (L22L68) + `to_dict` (L82L116) + `from_dict` (L118+)
- Modify: `server/stores/user_store.py``create_user` (L454L501), `_row_to_user` (L997L1020), `get_user_by_id`/`get_user_by_username`/`get_user_by_email` SELECT lists (L503L570)
- [ ] **Step 1: Add `is_test_account` to the `User` dataclass**
In `server/models/user.py`, add a new field to the `User` dataclass (after `force_password_reset` on L68):
```python
is_test_account: bool = False
```
Update the docstring `Attributes:` block around L45 to include:
```
is_test_account: True for accounts created by the soak test harness.
```
- [ ] **Step 2: Include `is_test_account` in `to_dict` and `from_dict`**
In `User.to_dict` at L82, add to the `d` dict (after `force_password_reset`):
```python
"is_test_account": self.is_test_account,
```
In `User.from_dict`, add the corresponding parse — find where `force_password_reset` is parsed and add the same pattern:
```python
is_test_account=d.get("is_test_account", False),
```
- [ ] **Step 3: Add `is_test_account` parameter to `create_user`**
In `server/stores/user_store.py` at L454, add a new parameter:
```python
async def create_user(
self,
username: str,
password_hash: str,
email: Optional[str] = None,
role: UserRole = UserRole.USER,
guest_id: Optional[str] = None,
verification_token: Optional[str] = None,
verification_expires: Optional[datetime] = None,
is_test_account: bool = False,
) -> Optional[User]:
```
Update the docstring to add a line in `Args:` describing `is_test_account`.
Change the INSERT SQL block to include the new column:
```python
row = await conn.fetchrow(
"""
INSERT INTO users_v2 (username, password_hash, email, role, guest_id,
verification_token, verification_expires,
is_test_account)
VALUES ($1, $2, $3, $4, $5, $6, $7, $8)
RETURNING id, username, email, password_hash, role, email_verified,
verification_token, verification_expires, reset_token, reset_expires,
guest_id, deleted_at, preferences, created_at, last_login, last_seen_at,
is_active, is_banned, ban_reason, force_password_reset, is_test_account
""",
username,
password_hash,
email,
role.value,
guest_id,
verification_token,
verification_expires,
is_test_account,
)
```
- [ ] **Step 4: Update `_row_to_user` mapping**
In `server/stores/user_store.py` at L997, add to the `User(...)` call (after `force_password_reset`):
```python
is_test_account=row.get("is_test_account", False) or False,
```
- [ ] **Step 5: Update all other SELECT lists in user_store**
Find every query in `server/stores/user_store.py` that returns a full user row and passes it to `_row_to_user`. Add `is_test_account` to the SELECT column list for each. Grep to find them:
```bash
grep -n "is_active, is_banned, ban_reason, force_password_reset" server/stores/user_store.py
```
For each match, append `, is_test_account` to the SELECT list. Expected locations:
- `create_user` INSERT ... RETURNING (already updated in Step 3)
- `get_user_by_id` at L503
- `get_user_by_username` at L519
- `get_user_by_email` (find it)
- Any other `SELECT` ... FROM users_v2 that calls `_row_to_user`
- [ ] **Step 6: Restart server, verify no errors**
```bash
# Kill and restart the dev server
python server/main.py
```
Expected: server starts cleanly. Any query that touches users now returns `is_test_account` correctly.
- [ ] **Step 7: Smoke test via curl**
```bash
# Register a throwaway test user (no invite code needed if DAILY_OPEN_SIGNUPS > 0 locally,
# or use the 5VC2MCCN invite code if INVITE_ONLY=true)
# Set PW to any password of your choice (>= 8 chars).
PW='SomeTestPw_1!'
curl -sX POST http://localhost:8000/api/auth/register \
-H 'Content-Type: application/json' \
-d "{\"username\":\"soaktest_smoke1\",\"password\":\"$PW\",\"email\":\"soaktest_smoke1@example.com\",\"invite_code\":\"5VC2MCCN\"}"
```
Expected: HTTP 200 with `{"user":{...},"token":"..."}`. The registration path now runs through the new column without errors even though the value is still always FALSE at this stage.
- [ ] **Step 8: Commit**
```bash
git add server/models/user.py server/stores/user_store.py
git commit -m "$(cat <<'EOF'
feat(server): propagate is_test_account through User model & store
User dataclass, create_user, and all SELECT lists now round-trip the
new column. Value is always FALSE until Task 4 wires the register
flow to the invite code's marks_as_test flag.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
EOF
)"
```
---
### Task 3: Expose `marks_as_test` on `InviteCode` and add lookup helper
`validate_invite_code` currently returns a bare bool. We need a new helper that returns the full row so the register flow can check `marks_as_test` without a second query.
**Files:**
- Modify: `server/services/admin_service.py``InviteCode` dataclass (L115L138), `get_invite_codes` SELECT (L1106L1141), add new `get_invite_code_details` method
- [ ] **Step 1: Add `marks_as_test` field to `InviteCode` dataclass**
In `server/services/admin_service.py` at L115:
```python
@dataclass
class InviteCode:
"""Invite code details."""
code: str
created_by: str
created_by_username: str
created_at: datetime
expires_at: datetime
max_uses: int
use_count: int
is_active: bool
marks_as_test: bool = False
```
Update `to_dict` at L127 to include the field:
```python
def to_dict(self) -> dict:
return {
"code": self.code,
"created_by": self.created_by,
"created_by_username": self.created_by_username,
"created_at": self.created_at.isoformat() if self.created_at else None,
"expires_at": self.expires_at.isoformat() if self.expires_at else None,
"max_uses": self.max_uses,
"use_count": self.use_count,
"is_active": self.is_active,
"remaining_uses": max(0, self.max_uses - self.use_count),
"marks_as_test": self.marks_as_test,
}
```
- [ ] **Step 2: Update `get_invite_codes` SELECT to include `marks_as_test`**
Find `get_invite_codes` at L1106. Modify the SQL to pull the column and pass it through:
```python
async def get_invite_codes(self, include_expired: bool = False) -> List[InviteCode]:
"""List all invite codes."""
async with self.pool.acquire() as conn:
sql = """
SELECT c.code, c.created_by, u.username as created_by_username,
c.created_at, c.expires_at,
c.max_uses, c.use_count, c.is_active,
COALESCE(c.marks_as_test, FALSE) as marks_as_test
FROM invite_codes c
LEFT JOIN users_v2 u ON c.created_by = u.id
"""
```
Find the list comprehension that constructs `InviteCode(...)` objects and add the new kwarg:
```python
InviteCode(
code=row["code"],
created_by=str(row["created_by"]),
created_by_username=row["created_by_username"] or "unknown",
created_at=row["created_at"].replace(tzinfo=timezone.utc) if row["created_at"] else None,
expires_at=row["expires_at"].replace(tzinfo=timezone.utc) if row["expires_at"] else None,
max_uses=row["max_uses"],
use_count=row["use_count"],
is_active=row["is_active"],
marks_as_test=row["marks_as_test"],
)
```
- [ ] **Step 3: Add new `get_invite_code_details` method**
Add a new method right after `validate_invite_code` (around L1214) that returns the row with `marks_as_test`. The register flow will call this to resolve the flag. Place it between `validate_invite_code` and `use_invite_code`:
```python
async def get_invite_code_details(self, code: str) -> Optional[dict]:
"""
Look up an invite code's row including marks_as_test.
Returns None if the code does not exist. Does NOT validate expiry
or usage — use validate_invite_code for that. This is purely a
helper for the register flow to discover the test-seed flag.
"""
async with self.pool.acquire() as conn:
row = await conn.fetchrow(
"""
SELECT code, max_uses, use_count, is_active,
COALESCE(marks_as_test, FALSE) as marks_as_test
FROM invite_codes
WHERE code = $1
""",
code,
)
if not row:
return None
return {
"code": row["code"],
"max_uses": row["max_uses"],
"use_count": row["use_count"],
"is_active": row["is_active"],
"marks_as_test": row["marks_as_test"],
}
```
- [ ] **Step 4: Verify with curl via admin panel endpoint**
Assuming you have an admin token from a local dev user. Hit the existing admin invites listing:
```bash
# Replace TOKEN with a valid admin JWT
curl -s http://localhost:8000/api/admin/invites \
-H "Authorization: Bearer $TOKEN" | jq '.codes[0]'
```
Expected: response includes `"marks_as_test": false` on at least one code.
- [ ] **Step 5: Commit**
```bash
git add server/services/admin_service.py
git commit -m "$(cat <<'EOF'
feat(server): expose marks_as_test on InviteCode
Adds the field to the dataclass, SELECT list in get_invite_codes,
and a new get_invite_code_details helper that the register flow
will use to discover whether an invite should flag new accounts
as test accounts.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
EOF
)"
```
---
### Task 4: Wire register flow to set `is_test_account` from invite
When a user registers with an invite whose `marks_as_test=TRUE`, the new account is flagged. The plumbing lives in two places: the router reads the flag and passes it to the service; the service passes it to the store.
**Files:**
- Modify: `server/routers/auth.py``register` handler (L224L320)
- Modify: `server/services/auth_service.py``register` method (L98L178)
- [ ] **Step 1: Add `is_test_account` parameter to `auth_service.register`**
In `server/services/auth_service.py` at L98, add the new parameter:
```python
async def register(
self,
username: str,
password: str,
email: Optional[str] = None,
guest_id: Optional[str] = None,
is_test_account: bool = False,
) -> RegistrationResult:
```
Update the docstring `Args:` block:
```
is_test_account: Mark this user as a soak-harness test account.
```
Pass the value through to `create_user` at L146:
```python
user = await self.user_store.create_user(
username=username,
password_hash=password_hash,
email=email,
role=UserRole.USER,
guest_id=guest_id,
verification_token=verification_token,
verification_expires=verification_expires,
is_test_account=is_test_account,
)
```
- [ ] **Step 2: Update the router to resolve `marks_as_test` and pass it through**
In `server/routers/auth.py`, find the `register` handler at L224. After the existing invite-code validation block (around L248L252), fetch the invite details and compute `is_test`:
```python
# --- Invite code validation ---
is_test_account = False
if has_invite:
if not _admin_service:
raise HTTPException(status_code=503, detail="Admin service not initialized")
if not await _admin_service.validate_invite_code(request_body.invite_code):
raise HTTPException(status_code=400, detail="Invalid or expired invite code")
# Check if this invite flags new accounts as test accounts
invite_details = await _admin_service.get_invite_code_details(request_body.invite_code)
if invite_details and invite_details.get("marks_as_test"):
is_test_account = True
```
Then pass it to `auth_service.register` at L276:
```python
# --- Create the account ---
result = await auth_service.register(
username=request_body.username,
password=request_body.password,
email=request_body.email,
is_test_account=is_test_account,
)
```
- [ ] **Step 3: Flag the dev invite code for testing**
Before we can test end-to-end locally, we need an invite code with `marks_as_test=TRUE` in the local dev DB. Run (once, manually):
```bash
# First, check if 5VC2MCCN exists locally (it probably doesn't — that's staging's code).
# Create a local test invite code and flag it:
psql -d golfgame <<'EOF'
-- Create a local dev test-seed invite if not exists
INSERT INTO invite_codes (code, created_by, expires_at, max_uses, is_active, marks_as_test)
SELECT 'SOAKTEST', id, NOW() + INTERVAL '10 years', 100, TRUE, TRUE
FROM users_v2 WHERE role = 'admin' LIMIT 1
ON CONFLICT (code) DO UPDATE SET marks_as_test = TRUE;
-- Verify
SELECT code, max_uses, use_count, marks_as_test FROM invite_codes WHERE code = 'SOAKTEST';
EOF
```
Expected: `marks_as_test | t` in the last row.
- [ ] **Step 4: Verify register flow sets `is_test_account`**
Restart the dev server, then:
```bash
curl -sX POST http://localhost:8000/api/auth/register \
-H 'Content-Type: application/json' \
-d "{\"username\":\"soaktest_register1\",\"password\":\"$PW\",\"email\":\"soaktest_register1@example.com\",\"invite_code\":\"SOAKTEST\"}"
# Verify in DB
psql -d golfgame -c "SELECT username, is_test_account FROM users_v2 WHERE username = 'soaktest_register1';"
```
Expected: `is_test_account | t`.
- [ ] **Step 5: Verify non-test invite does NOT flag new accounts**
```bash
# Create a non-test invite
psql -d golfgame <<'EOF'
INSERT INTO invite_codes (code, created_by, expires_at, max_uses, is_active, marks_as_test)
SELECT 'NORMAL01', id, NOW() + INTERVAL '10 years', 10, TRUE, FALSE
FROM users_v2 WHERE role = 'admin' LIMIT 1
ON CONFLICT (code) DO UPDATE SET marks_as_test = FALSE;
EOF
curl -sX POST http://localhost:8000/api/auth/register \
-H 'Content-Type: application/json' \
-d "{\"username\":\"realuser_smoke1\",\"password\":\"$PW\",\"email\":\"realuser_smoke1@example.com\",\"invite_code\":\"NORMAL01\"}"
psql -d golfgame -c "SELECT username, is_test_account FROM users_v2 WHERE username = 'realuser_smoke1';"
```
Expected: `is_test_account | f`.
- [ ] **Step 6: Commit**
```bash
git add server/routers/auth.py server/services/auth_service.py
git commit -m "$(cat <<'EOF'
feat(server): register flow flags accounts from test-seed invites
When a user registers with an invite_code whose marks_as_test=TRUE,
their users_v2.is_test_account is set to TRUE. Normal invite codes
and invite-less signups are unaffected.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
EOF
)"
```
---
### Task 5: Stats filtering (`include_test` parameter)
Thread an `include_test: bool = False` parameter through `get_leaderboard`, `get_player_rank`, and the corresponding router handlers. Default is `False` — real users never see soak traffic.
**Files:**
- Modify: `server/services/stats_service.py``get_leaderboard` (L169), `get_player_rank` (L249)
- Modify: `server/routers/stats.py``get_leaderboard` route (L157), `get_player_rank` route (L227), `get_my_rank` route (L348)
- [ ] **Step 1: Add `include_test` to `get_leaderboard` service method**
In `server/services/stats_service.py` at L169:
```python
async def get_leaderboard(
self,
metric: str = "wins",
limit: int = 50,
offset: int = 0,
include_test: bool = False,
) -> List[LeaderboardEntry]:
```
Inside the method, find both SQL paths (materialized view and fallback). In the view path at L208, change the WHERE clause:
```python
if view_exists:
# Use materialized view for performance
rows = await conn.fetch(f"""
SELECT
user_id, username, games_played, games_won,
win_rate, avg_score, knockouts, best_win_streak,
COALESCE(rating, 1500) as rating,
ROW_NUMBER() OVER (ORDER BY {column} {direction}) as rank
FROM leaderboard_overall
WHERE ($3 OR NOT is_test_account)
ORDER BY {column} {direction}
LIMIT $1 OFFSET $2
""", limit, offset, include_test)
```
In the fallback path at L220, add the WHERE clause and parameter:
```python
else:
# Fall back to direct query
rows = await conn.fetch(f"""
SELECT
s.user_id, u.username, s.games_played, s.games_won,
ROUND(s.games_won::numeric / NULLIF(s.games_played, 0) * 100, 1) as win_rate,
ROUND(s.total_points::numeric / NULLIF(s.total_rounds, 0), 1) as avg_score,
s.knockouts, s.best_win_streak,
COALESCE(s.rating, 1500) as rating,
ROW_NUMBER() OVER (ORDER BY {column} {direction}) as rank
FROM player_stats s
JOIN users_v2 u ON s.user_id = u.id
WHERE s.games_played >= 5
AND u.deleted_at IS NULL
AND (u.is_banned = false OR u.is_banned IS NULL)
AND ($3 OR NOT COALESCE(u.is_test_account, FALSE))
ORDER BY {column} {direction}
LIMIT $1 OFFSET $2
""", limit, offset, include_test)
```
- [ ] **Step 2: Apply the same pattern to `get_player_rank`**
In `server/services/stats_service.py` at L249:
```python
async def get_player_rank(
self,
user_id: str,
metric: str = "wins",
include_test: bool = False,
) -> Optional[int]:
```
Update both SQL paths to include the `include_test` filter. View path at L287:
```python
if view_exists:
row = await conn.fetchrow(f"""
SELECT rank FROM (
SELECT user_id, ROW_NUMBER() OVER (ORDER BY {column} {direction}) as rank
FROM leaderboard_overall
WHERE ($2 OR NOT is_test_account)
) ranked
WHERE user_id = $1
""", user_id, include_test)
```
Fallback path at L294:
```python
else:
row = await conn.fetchrow(f"""
SELECT rank FROM (
SELECT s.user_id, ROW_NUMBER() OVER (ORDER BY {column} {direction}) as rank
FROM player_stats s
JOIN users_v2 u ON s.user_id = u.id
WHERE s.games_played >= 5
AND u.deleted_at IS NULL
AND (u.is_banned = false OR u.is_banned IS NULL)
AND ($2 OR NOT COALESCE(u.is_test_account, FALSE))
) ranked
WHERE user_id = $1
""", user_id, include_test)
```
- [ ] **Step 3: Expose `include_test` as a query parameter on the leaderboard route**
In `server/routers/stats.py` at L157:
```python
@router.get("/leaderboard", response_model=LeaderboardResponse)
async def get_leaderboard(
metric: str = Query("wins", pattern="^(wins|win_rate|avg_score|knockouts|streak|rating)$"),
limit: int = Query(50, ge=1, le=100),
offset: int = Query(0, ge=0),
include_test: bool = Query(False, description="Include soak-harness test accounts"),
service: StatsService = Depends(get_stats_service_dep),
):
"""
Get leaderboard by metric.
Metrics:
- wins: Total games won
- win_rate: Win percentage (requires 5+ games)
- avg_score: Average points per round (lower is better)
- knockouts: Times going out first
- streak: Best win streak
Players must have 5+ games to appear on leaderboards.
By default, soak-harness test accounts are hidden.
"""
entries = await service.get_leaderboard(metric, limit, offset, include_test)
```
- [ ] **Step 4: Same for `get_player_rank` and `get_my_rank` routes**
At L227:
```python
@router.get("/players/{user_id}/rank", response_model=PlayerRankResponse)
async def get_player_rank(
user_id: str,
metric: str = Query("wins", pattern="^(wins|win_rate|avg_score|knockouts|streak|rating)$"),
include_test: bool = Query(False),
service: StatsService = Depends(get_stats_service_dep),
):
"""Get player's rank on a leaderboard."""
rank = await service.get_player_rank(user_id, metric, include_test)
```
At L348:
```python
@router.get("/me/rank", response_model=PlayerRankResponse)
async def get_my_rank(
metric: str = Query("wins", pattern="^(wins|win_rate|avg_score|knockouts|streak|rating)$"),
include_test: bool = Query(False),
user: User = Depends(require_user),
service: StatsService = Depends(get_stats_service_dep),
):
"""Get current user's rank on a leaderboard."""
rank = await service.get_player_rank(user.id, metric, include_test)
```
- [ ] **Step 5: Verify filtering works via curl**
```bash
# Mark a test user we registered earlier as having games played (synthetic)
psql -d golfgame <<'EOF'
INSERT INTO player_stats (user_id, games_played, games_won, total_points, total_rounds, rounds_won)
SELECT id, 10, 8, 50, 30, 20 FROM users_v2 WHERE username = 'soaktest_register1'
ON CONFLICT (user_id) DO UPDATE SET games_played = 10, games_won = 8;
-- Refresh the matview so the test account shows up
REFRESH MATERIALIZED VIEW leaderboard_overall;
EOF
# Default (include_test=false) should NOT include soaktest_register1
curl -s "http://localhost:8000/api/stats/leaderboard?metric=wins" | jq '.entries[] | select(.username | startswith("soaktest_"))'
# include_test=true should include soaktest_register1
curl -s "http://localhost:8000/api/stats/leaderboard?metric=wins&include_test=true" | jq '.entries[] | select(.username | startswith("soaktest_"))'
```
Expected: first command returns nothing, second returns a JSON object for `soaktest_register1`.
- [ ] **Step 6: Commit**
```bash
git add server/services/stats_service.py server/routers/stats.py
git commit -m "$(cat <<'EOF'
feat(server): stats queries support include_test filter
Leaderboard and rank queries take an optional include_test param
(default false). Real users never see soak-harness traffic unless
they explicitly opt in via ?include_test=true.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
EOF
)"
```
---
### Task 6: Admin service + route surfaces `is_test_account`
`UserDetails` exposes the flag, `search_users` selects it, and `list_users` admin route accepts an `include_test` query parameter.
**Files:**
- Modify: `server/services/admin_service.py``UserDetails` (L24L58), `search_users` (L312L382), `get_user` (L384L428)
- Modify: `server/routers/admin.py``list_users` route (L80L107)
- [ ] **Step 1: Add field to `UserDetails` dataclass**
In `server/services/admin_service.py` at L24, add to the dataclass:
```python
@dataclass
class UserDetails:
"""Extended user info for admin view."""
id: str
username: str
email: Optional[str]
role: str
email_verified: bool
is_banned: bool
ban_reason: Optional[str]
force_password_reset: bool
created_at: datetime
last_login: Optional[datetime]
last_seen_at: Optional[datetime]
is_active: bool
games_played: int
games_won: int
is_test_account: bool = False
```
Update `to_dict` to include it:
```python
def to_dict(self) -> dict:
return {
"id": self.id,
"username": self.username,
"email": self.email,
"role": self.role,
"email_verified": self.email_verified,
"is_banned": self.is_banned,
"ban_reason": self.ban_reason,
"force_password_reset": self.force_password_reset,
"created_at": self.created_at.isoformat() if self.created_at else None,
"last_login": self.last_login.isoformat() if self.last_login else None,
"last_seen_at": self.last_seen_at.isoformat() if self.last_seen_at else None,
"is_active": self.is_active,
"games_played": self.games_played,
"games_won": self.games_won,
"is_test_account": self.is_test_account,
}
```
- [ ] **Step 2: Update `search_users` to SELECT and filter on `is_test_account`**
In `server/services/admin_service.py` at L312, add `include_test` parameter and column to the SELECT:
```python
async def search_users(
self,
query: str = "",
limit: int = 50,
offset: int = 0,
include_banned: bool = True,
include_deleted: bool = False,
include_test: bool = True,
) -> List[UserDetails]:
```
Modify the SQL to pull `is_test_account`:
```python
sql = """
SELECT u.id, u.username, u.email, u.role,
u.email_verified, u.is_banned, u.ban_reason,
u.force_password_reset, u.created_at, u.last_login,
u.last_seen_at, u.is_active,
COALESCE(u.is_test_account, FALSE) as is_test_account,
COALESCE(s.games_played, 0) as games_played,
COALESCE(s.games_won, 0) as games_won
FROM users_v2 u
LEFT JOIN player_stats s ON u.id = s.user_id
WHERE 1=1
"""
```
After the existing `include_deleted` check, add:
```python
if not include_test:
sql += " AND (u.is_test_account = false OR u.is_test_account IS NULL)"
```
Update the `UserDetails(...)` construction in the list comprehension to include `is_test_account=row["is_test_account"]`.
- [ ] **Step 3: Update `get_user` (single-user lookup) similarly**
In `server/services/admin_service.py` at L384, add `COALESCE(u.is_test_account, FALSE) as is_test_account` to the SELECT and `is_test_account=row["is_test_account"]` to the `UserDetails(...)` construction. The `get_user` method does NOT need the filter parameter — admins looking up individual users should always see them.
- [ ] **Step 4: Add `include_test` to the admin `list_users` route**
In `server/routers/admin.py` at L80:
```python
@router.get("/users")
async def list_users(
query: str = "",
limit: int = 50,
offset: int = 0,
include_banned: bool = True,
include_deleted: bool = False,
include_test: bool = True,
admin: User = Depends(require_admin_v2),
service: AdminService = Depends(get_admin_service_dep),
):
"""
Search and list users.
Args:
query: Search by username or email.
limit: Maximum results to return.
offset: Results to skip.
include_banned: Include banned users.
include_deleted: Include soft-deleted users.
include_test: Include soak-harness test accounts (default true for admins).
"""
users = await service.search_users(
query=query,
limit=limit,
offset=offset,
include_banned=include_banned,
include_deleted=include_deleted,
include_test=include_test,
)
return {"users": [u.to_dict() for u in users]}
```
Note: default is `True` for the admin path — admins should see everything by default. The client-side toggle will explicitly pass `false` when the admin wants to hide test accounts.
- [ ] **Step 5: Verify via curl**
```bash
# Assuming admin token in $TOKEN env var
curl -s "http://localhost:8000/api/admin/users?query=soaktest" \
-H "Authorization: Bearer $TOKEN" | jq '.users[] | {username, is_test_account}'
curl -s "http://localhost:8000/api/admin/users?query=soaktest&include_test=false" \
-H "Authorization: Bearer $TOKEN" | jq '.users[]'
```
Expected: first returns users with `is_test_account: true`; second returns empty (test accounts filtered out).
- [ ] **Step 6: Commit**
```bash
git add server/services/admin_service.py server/routers/admin.py
git commit -m "$(cat <<'EOF'
feat(server): admin users list surfaces is_test_account
UserDetails carries the new column, search_users selects and
optionally filters on it, and the /api/admin/users route accepts
?include_test=false to hide soak-harness accounts.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
EOF
)"
```
---
### Task 7: Admin panel UI — Test badge and filter toggle
Add a visible `[Test]` badge on test accounts in the admin user list, a `[Test-seed]` indicator on invite codes that mark new accounts as test, and an "Include test accounts" checkbox next to the existing "Include banned" toggle.
**Files:**
- Modify: `client/admin.html` — add the new toggle near the existing `#include-banned` checkbox
- Modify: `client/admin.js``loadUsers` (L305), `getStatusBadge` (L246), the invite codes renderer (L443)
- [ ] **Step 1: Add the "Include test accounts" checkbox to admin.html**
In `client/admin.html`, find the existing `#include-banned` checkbox (it's in the users tab filter bar — grep for it). Add a sibling checkbox right after:
```bash
grep -n "include-banned" client/admin.html
```
Add next to that line:
```html
<label>
<input type="checkbox" id="include-test" />
Include test accounts
</label>
```
- [ ] **Step 2: Read the new checkbox in `loadUsers` and pass to getUsers**
In `client/admin.js` at L305:
```javascript
async function loadUsers() {
try {
const query = document.getElementById('user-search').value;
const includeBanned = document.getElementById('include-banned').checked;
const includeTest = document.getElementById('include-test').checked;
const data = await getUsers(query, usersPage * PAGE_SIZE, includeBanned, includeTest);
```
Find `getUsers` at L70 and add the new parameter:
```javascript
async function getUsers(query = '', offset = 0, includeBanned = true, includeTest = true) {
const params = new URLSearchParams({
query,
limit: PAGE_SIZE,
offset,
include_banned: includeBanned,
include_test: includeTest,
});
return apiRequest(`/api/admin/users?${params}`);
}
```
Note: the existing signature builds a URLSearchParams — check the actual code at L70 and match its style; the key change is adding `include_test: includeTest` to the params.
- [ ] **Step 3: Add a "Test" badge to the user table row**
In `client/admin.js` at L314, modify the table row template to render a Test badge inline with the status badge:
```javascript
data.users.forEach(user => {
const testBadge = user.is_test_account
? '<span class="badge badge-info" title="Soak harness test account">Test</span>'
: '';
tbody.innerHTML += `
<tr>
<td>${escapeHtml(user.username)} ${testBadge}</td>
<td>${escapeHtml(user.email || '-')}</td>
<td><span class="badge badge-${user.role === 'admin' ? 'info' : 'muted'}">${user.role}</span></td>
<td>${getStatusBadge(user)}</td>
<td>${user.games_played} (${user.games_won} wins)</td>
<td>${formatDateShort(user.created_at)}</td>
<td>
<button class="btn btn-small" data-action="view-user" data-id="${user.id}">View</button>
</td>
</tr>
`;
});
```
- [ ] **Step 4: Add Test-seed indicator to invite codes list**
In `client/admin.js` around L443 (invite codes list renderer), find the row template and add a `[Test-seed]` badge when `invite.marks_as_test`:
```bash
grep -n "invite.is_active\|invite.code\|invites-tbody\|invites-table" client/admin.js | head
```
Once located, modify the row template to include:
```javascript
const testSeedBadge = invite.marks_as_test
? '<span class="badge badge-info" title="Creates test accounts">Test-seed</span>'
: '';
// Insert testSeedBadge into the invite code column, e.g.
// <td>${escapeHtml(invite.code)} ${testSeedBadge}</td>
```
- [ ] **Step 5: Wire the checkbox change event to reload users**
Find where `#include-banned` has its `change` listener attached (grep for it in admin.js):
```bash
grep -n "include-banned.*addEventListener\|include-banned" client/admin.js
```
Add a parallel listener for `#include-test` that calls `loadUsers()`:
```javascript
document.getElementById('include-test').addEventListener('change', () => {
usersPage = 0;
loadUsers();
});
```
- [ ] **Step 6: Manual verification in browser**
1. Open http://localhost:8000/admin.html
2. Log in as admin
3. Navigate to Users tab
4. Search for "soaktest"
5. Confirm the `[Test]` badge appears next to `soaktest_register1`
6. Uncheck "Include test accounts" — the row should disappear
7. Re-check it — the row should return
8. Navigate to Invite Codes tab
9. Confirm the `[Test-seed]` badge appears next to the `SOAKTEST` code
- [ ] **Step 7: Commit**
```bash
git add client/admin.html client/admin.js
git commit -m "$(cat <<'EOF'
feat(admin): visible Test/Test-seed badges + filter toggle
Users table shows [Test] next to soak-harness accounts, invite codes
list shows [Test-seed] next to codes that flag new accounts as test,
and a new "Include test accounts" checkbox lets admins hide bot
traffic from the user list.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
EOF
)"
```
---
### Task 8: Document the one-time staging setup step
The staging invite code `5VC2MCCN` needs to be flagged as test-seed before the harness can run against staging. This is a manual one-liner; document it in a new bring-up doc.
**Files:**
- Create: `docs/soak-harness-bringup.md`
- [ ] **Step 1: Create the bring-up doc**
```bash
cat > docs/soak-harness-bringup.md <<'EOF'
# Soak Harness Bring-Up
One-time setup steps before running `tests/soak` against an environment.
## Prerequisites
- An invite code exists with 16+ available uses
- You have psql access to the target DB (or admin SQL access via some other means)
## 1. Flag the invite code as test-seed
Any account registered with a `marks_as_test=TRUE` invite code gets
`users_v2.is_test_account=TRUE`, which keeps it out of real-user stats.
### Staging
Invite code: `5VC2MCCN` (16 uses, provisioned 2026-04-10).
```sql
UPDATE invite_codes SET marks_as_test = TRUE WHERE code = '5VC2MCCN';
SELECT code, max_uses, use_count, marks_as_test FROM invite_codes WHERE code = '5VC2MCCN';
```
Expected: `marks_as_test | t`.
### Local dev
The dev DB already has a `SOAKTEST` invite created during Task 4 of
the implementation plan. If you wiped the DB since, recreate it:
```sql
INSERT INTO invite_codes (code, created_by, expires_at, max_uses, is_active, marks_as_test)
SELECT 'SOAKTEST', id, NOW() + INTERVAL '10 years', 100, TRUE, TRUE
FROM users_v2 WHERE role = 'admin' LIMIT 1
ON CONFLICT (code) DO UPDATE SET marks_as_test = TRUE;
```
## 2. Run the harness
```bash
cd tests/soak
npm install
npm run seed # first run only, populates .env.stresstest
TEST_URL=http://localhost:8000 npm run smoke # 30s end-to-end check
```
For staging:
```bash
TEST_URL=https://staging.adlee.work npm run soak -- --scenario=populate
```
See `tests/soak/README.md` for the full flag reference.
EOF
```
- [ ] **Step 2: Commit**
```bash
git add docs/soak-harness-bringup.md
git commit -m "$(cat <<'EOF'
docs: soak harness bring-up steps
Documents the one-time UPDATE invite_codes SET marks_as_test = TRUE
step required before running tests/soak against each environment,
plus the local dev SOAKTEST invite recreation SQL.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
EOF
)"
```
---
## Phase 2 — Harness scaffolding
### Task 9: Create the `tests/soak/` package skeleton
Bare minimum to get `tsx` running against an empty entry point. No behavior yet.
**Files:**
- Create: `tests/soak/package.json`
- Create: `tests/soak/tsconfig.json`
- Create: `tests/soak/.gitignore`
- Create: `tests/soak/.env.stresstest.example`
- Create: `tests/soak/README.md` (stub)
- Create: `tests/soak/runner.ts` (stub — prints "hello")
- [ ] **Step 1: Create `tests/soak/package.json`**
```json
{
"name": "golf-soak",
"version": "0.1.0",
"private": true,
"description": "Multiplayer soak & UX test harness for Golf Card Game",
"scripts": {
"soak": "tsx runner.ts",
"soak:populate": "tsx runner.ts --scenario=populate",
"soak:stress": "tsx runner.ts --scenario=stress",
"seed": "tsx scripts/seed-accounts.ts",
"smoke": "bash scripts/smoke.sh",
"test": "vitest run"
},
"dependencies": {
"playwright-core": "^1.40.0",
"ws": "^8.16.0"
},
"devDependencies": {
"tsx": "^4.7.0",
"@types/ws": "^8.5.0",
"@types/node": "^20.10.0",
"typescript": "^5.3.0",
"vitest": "^1.2.0"
}
}
```
- [ ] **Step 2: Create `tests/soak/tsconfig.json`**
```json
{
"compilerOptions": {
"target": "ES2022",
"module": "commonjs",
"moduleResolution": "node",
"strict": true,
"esModuleInterop": true,
"skipLibCheck": true,
"forceConsistentCasingInFileNames": true,
"resolveJsonModule": true,
"declaration": false,
"sourceMap": true,
"outDir": "./dist",
"rootDir": ".",
"baseUrl": ".",
"lib": ["ES2022", "DOM"],
"paths": {
"@soak/*": ["./*"],
"@bot/*": ["../e2e/bot/*"]
}
},
"include": ["**/*.ts"],
"exclude": ["node_modules", "dist", "artifacts"]
}
```
- [ ] **Step 3: Create `tests/soak/.gitignore`**
```
node_modules/
dist/
artifacts/
.env.stresstest
*.log
```
- [ ] **Step 4: Create `tests/soak/.env.stresstest.example`**
```
# Soak harness account cache.
# This file is AUTO-GENERATED on first run; do not edit by hand.
# Format: SOAK_ACCOUNT_NN=username:password:token
#
# Example (delete before first real run):
# SOAK_ACCOUNT_00=soak_00_a7bx:<generated-password>:<jwt-token>
```
- [ ] **Step 5: Create `tests/soak/README.md` (stub — expanded in Task 31)**
```markdown
# Golf Soak & UX Test Harness
Runs 16 authenticated browser sessions across 4 rooms to populate
staging scoreboards and stress-test multiplayer stability.
**Spec:** `docs/superpowers/specs/2026-04-10-multiplayer-soak-test-design.md`
**Bring-up:** `docs/soak-harness-bringup.md`
## Quick start
```bash
npm install
npm run seed # first run only
TEST_URL=http://localhost:8000 npm run smoke
```
Full documentation arrives with Task 31.
```
- [ ] **Step 6: Create `tests/soak/runner.ts` as a placeholder**
```typescript
#!/usr/bin/env tsx
/**
* Golf Soak Harness — entry point.
*
* Placeholder. Full runner lands in Task 17.
*/
async function main(): Promise<void> {
console.log('golf-soak runner (placeholder)');
console.log('Full implementation lands in Task 17 of the plan.');
}
main().catch((err) => {
console.error(err);
process.exit(1);
});
```
- [ ] **Step 7: Install deps and verify runner executes**
```bash
cd tests/soak
npm install
npx tsx runner.ts
```
Expected output:
```
golf-soak runner (placeholder)
Full implementation lands in Task 17 of the plan.
```
- [ ] **Step 8: Commit**
```bash
git add tests/soak/package.json tests/soak/package-lock.json tests/soak/tsconfig.json tests/soak/.gitignore tests/soak/.env.stresstest.example tests/soak/README.md tests/soak/runner.ts
git commit -m "$(cat <<'EOF'
feat(soak): scaffold tests/soak package
Placeholder runner, tsconfig with @bot alias to tests/e2e/bot,
gitignored .env.stresstest + artifacts. Real behavior follows
in Task 10 onward.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
EOF
)"
```
---
### Task 10: Core types and `Deferred` helper
Pure TypeScript with Vitest tests. No browser, no network. Establishes the type surface the rest of the harness will target.
**Files:**
- Create: `tests/soak/core/types.ts`
- Create: `tests/soak/core/deferred.ts`
- Create: `tests/soak/tests/deferred.test.ts`
- [ ] **Step 1: Write the failing test for `Deferred`**
Create `tests/soak/tests/deferred.test.ts`:
```typescript
import { describe, it, expect } from 'vitest';
import { deferred } from '../core/deferred';
describe('deferred', () => {
it('resolves with the given value', async () => {
const d = deferred<string>();
d.resolve('hello');
await expect(d.promise).resolves.toBe('hello');
});
it('rejects with the given error', async () => {
const d = deferred<string>();
const err = new Error('boom');
d.reject(err);
await expect(d.promise).rejects.toBe(err);
});
it('ignores second resolve calls', async () => {
const d = deferred<number>();
d.resolve(1);
d.resolve(2);
await expect(d.promise).resolves.toBe(1);
});
});
```
- [ ] **Step 2: Run the test to verify it fails**
```bash
cd tests/soak
npx vitest run tests/deferred.test.ts
```
Expected: FAIL — module `../core/deferred` does not exist.
- [ ] **Step 3: Implement `deferred`**
Create `tests/soak/core/deferred.ts`:
```typescript
/**
* Promise deferred primitive — lets external code resolve or reject
* a promise. Used by RoomCoordinator for host→joiners handoff.
*/
export interface Deferred<T> {
promise: Promise<T>;
resolve(value: T): void;
reject(error: unknown): void;
}
export function deferred<T>(): Deferred<T> {
let resolve!: (value: T) => void;
let reject!: (error: unknown) => void;
const promise = new Promise<T>((res, rej) => {
resolve = res;
reject = rej;
});
return { promise, resolve, reject };
}
```
- [ ] **Step 4: Run tests to verify they pass**
```bash
npx vitest run tests/deferred.test.ts
```
Expected: 3 passed.
- [ ] **Step 5: Create `core/types.ts` with the scenario interfaces**
```typescript
/**
* Core type definitions for the soak harness.
*
* Contracts here are consumed by runner.ts, SessionPool, scenarios,
* and the dashboard. Keep this file small and stable.
*/
import type { BrowserContext, Page } from 'playwright-core';
import type { GolfBot } from '../../e2e/bot/golf-bot';
// =============================================================================
// Accounts & sessions
// =============================================================================
export interface Account {
/** Stable key used in logs, e.g. "soak_00". */
key: string;
username: string;
password: string;
/** JWT returned from /api/auth/login, may be refreshed by SessionPool. */
token: string;
}
export interface Session {
account: Account;
context: BrowserContext;
page: Page;
bot: GolfBot;
/** Convenience mirror of account.key. */
key: string;
}
// =============================================================================
// Scenarios
// =============================================================================
export interface ScenarioNeeds {
/** Total number of authenticated sessions the scenario requires. */
accounts: number;
/** How many rooms to partition sessions into (default: 1). */
rooms?: number;
/** CPUs to add per room (default: 0). */
cpusPerRoom?: number;
}
/** Free-form per-scenario config merged with CLI flags. */
export type ScenarioConfig = Record<string, unknown>;
export interface ScenarioError {
room: string;
reason: string;
detail?: string;
timestamp: number;
}
export interface ScenarioResult {
gamesCompleted: number;
errors: ScenarioError[];
durationMs: number;
customMetrics?: Record<string, number>;
}
export interface ScenarioContext {
/** Merged config: CLI flags → env → scenario defaults → runner defaults. */
config: ScenarioConfig;
/** Pre-authenticated sessions; ordered. */
sessions: Session[];
coordinator: RoomCoordinatorApi;
dashboard: DashboardReporter;
logger: Logger;
signal: AbortSignal;
/** Reset the per-room watchdog. Call at each progress point. */
heartbeat(roomId: string): void;
}
export interface Scenario {
name: string;
description: string;
defaultConfig: ScenarioConfig;
needs: ScenarioNeeds;
run(ctx: ScenarioContext): Promise<ScenarioResult>;
}
// =============================================================================
// Room coordination
// =============================================================================
export interface RoomCoordinatorApi {
announce(roomId: string, code: string): void;
await(roomId: string, timeoutMs?: number): Promise<string>;
}
// =============================================================================
// Dashboard reporter
// =============================================================================
export interface RoomState {
phase?: string;
currentPlayer?: string;
hole?: number;
totalHoles?: number;
game?: number;
totalGames?: number;
moves?: number;
players?: Array<{ key: string; score: number | null; isActive: boolean }>;
message?: string;
}
export interface DashboardReporter {
update(roomId: string, state: Partial<RoomState>): void;
log(level: 'info' | 'warn' | 'error', msg: string, meta?: object): void;
incrementMetric(name: string, by?: number): void;
}
// =============================================================================
// Logger
// =============================================================================
export type LogLevel = 'debug' | 'info' | 'warn' | 'error';
export interface Logger {
debug(msg: string, meta?: object): void;
info(msg: string, meta?: object): void;
warn(msg: string, meta?: object): void;
error(msg: string, meta?: object): void;
child(meta: object): Logger;
}
```
- [ ] **Step 6: Verify tsx still parses the runner**
```bash
cd tests/soak
npx tsx runner.ts
```
Expected: still prints the placeholder output; no TypeScript errors from the new `core/` files (they're not imported yet).
- [ ] **Step 7: Commit**
```bash
git add tests/soak/core/deferred.ts tests/soak/core/types.ts tests/soak/tests/deferred.test.ts
git commit -m "$(cat <<'EOF'
feat(soak): core types + Deferred primitive
Establishes the Scenario/Session/Logger/DashboardReporter contracts
the rest of the harness builds on. Deferred is the building block
for RoomCoordinator's host→joiners handoff.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
EOF
)"
```
---
### Task 11: RoomCoordinator with tests
Tiny abstraction over `Deferred` keyed by room ID, with a timeout on `await`.
**Files:**
- Create: `tests/soak/core/room-coordinator.ts`
- Create: `tests/soak/tests/room-coordinator.test.ts`
- [ ] **Step 1: Write failing tests**
```typescript
// tests/soak/tests/room-coordinator.test.ts
import { describe, it, expect } from 'vitest';
import { RoomCoordinator } from '../core/room-coordinator';
describe('RoomCoordinator', () => {
it('resolves await with the announced code (announce then await)', async () => {
const rc = new RoomCoordinator();
rc.announce('room-1', 'ABCD');
await expect(rc.await('room-1')).resolves.toBe('ABCD');
});
it('resolves await with the announced code (await then announce)', async () => {
const rc = new RoomCoordinator();
const p = rc.await('room-2');
rc.announce('room-2', 'WXYZ');
await expect(p).resolves.toBe('WXYZ');
});
it('rejects await after timeout if not announced', async () => {
const rc = new RoomCoordinator();
await expect(rc.await('room-3', 50)).rejects.toThrow(/timed out/i);
});
it('isolates rooms — announcing room-A does not unblock room-B', async () => {
const rc = new RoomCoordinator();
const pB = rc.await('room-B', 100);
rc.announce('room-A', 'A-CODE');
await expect(pB).rejects.toThrow(/timed out/i);
});
});
```
- [ ] **Step 2: Run tests to verify they fail**
```bash
npx vitest run tests/room-coordinator.test.ts
```
Expected: FAIL — module not found.
- [ ] **Step 3: Implement `RoomCoordinator`**
```typescript
// tests/soak/core/room-coordinator.ts
import { deferred, Deferred } from './deferred';
import type { RoomCoordinatorApi } from './types';
export class RoomCoordinator implements RoomCoordinatorApi {
private rooms = new Map<string, Deferred<string>>();
announce(roomId: string, code: string): void {
this.getOrCreate(roomId).resolve(code);
}
async await(roomId: string, timeoutMs: number = 30_000): Promise<string> {
const d = this.getOrCreate(roomId);
let timer: NodeJS.Timeout | undefined;
const timeout = new Promise<never>((_, reject) => {
timer = setTimeout(() => {
reject(new Error(`RoomCoordinator: room "${roomId}" timed out after ${timeoutMs}ms`));
}, timeoutMs);
});
try {
return await Promise.race([d.promise, timeout]);
} finally {
if (timer) clearTimeout(timer);
}
}
private getOrCreate(roomId: string): Deferred<string> {
let d = this.rooms.get(roomId);
if (!d) {
d = deferred<string>();
this.rooms.set(roomId, d);
}
return d;
}
}
```
- [ ] **Step 4: Verify tests pass**
```bash
npx vitest run tests/room-coordinator.test.ts
```
Expected: 4 passed.
- [ ] **Step 5: Commit**
```bash
git add tests/soak/core/room-coordinator.ts tests/soak/tests/room-coordinator.test.ts
git commit -m "$(cat <<'EOF'
feat(soak): RoomCoordinator with host→joiners handoff
Lazy Deferred per roomId with a timeout on await. Lets concurrent
joiner sessions block until their host announces the room code
without polling or page scraping.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
EOF
)"
```
---
### Task 12: Structured JSONL logger
Single module, no transport, writes to `process.stdout`. Supports child loggers with bound metadata (so scenarios can emit logs with `room` / `game` context without repeating it).
**Files:**
- Create: `tests/soak/core/logger.ts`
- Create: `tests/soak/tests/logger.test.ts`
- [ ] **Step 1: Write failing tests**
```typescript
// tests/soak/tests/logger.test.ts
import { describe, it, expect, beforeEach, vi } from 'vitest';
import { createLogger } from '../core/logger';
describe('logger', () => {
let writes: string[];
let write: (s: string) => boolean;
beforeEach(() => {
writes = [];
write = (s: string) => {
writes.push(s);
return true;
};
});
it('emits a JSON line per call with level and msg', () => {
const log = createLogger({ runId: 'r1', write });
log.info('hello');
expect(writes).toHaveLength(1);
const parsed = JSON.parse(writes[0]);
expect(parsed.level).toBe('info');
expect(parsed.msg).toBe('hello');
expect(parsed.runId).toBe('r1');
expect(parsed.timestamp).toBeTypeOf('string');
});
it('merges meta into the log line', () => {
const log = createLogger({ runId: 'r1', write });
log.warn('slow', { turnMs: 3000 });
const parsed = JSON.parse(writes[0]);
expect(parsed.turnMs).toBe(3000);
expect(parsed.level).toBe('warn');
});
it('child logger inherits parent meta', () => {
const log = createLogger({ runId: 'r1', write });
const roomLog = log.child({ room: 'room-1' });
roomLog.info('game_start');
const parsed = JSON.parse(writes[0]);
expect(parsed.room).toBe('room-1');
expect(parsed.runId).toBe('r1');
});
it('respects minimum level', () => {
const log = createLogger({ runId: 'r1', write, minLevel: 'warn' });
log.debug('nope');
log.info('nope');
log.warn('yes');
log.error('yes');
expect(writes).toHaveLength(2);
});
});
```
- [ ] **Step 2: Run tests to verify they fail**
```bash
npx vitest run tests/logger.test.ts
```
Expected: FAIL — module not found.
- [ ] **Step 3: Implement the logger**
```typescript
// tests/soak/core/logger.ts
import type { Logger, LogLevel } from './types';
const LEVEL_ORDER: Record<LogLevel, number> = {
debug: 0,
info: 1,
warn: 2,
error: 3,
};
export interface LoggerOptions {
runId: string;
minLevel?: LogLevel;
/** Defaults to process.stdout.write bound to stdout. Override for tests. */
write?: (line: string) => boolean;
baseMeta?: Record<string, unknown>;
}
export function createLogger(opts: LoggerOptions): Logger {
const minLevel = opts.minLevel ?? 'info';
const write = opts.write ?? ((s: string) => process.stdout.write(s));
const baseMeta = opts.baseMeta ?? {};
function emit(level: LogLevel, msg: string, meta?: object): void {
if (LEVEL_ORDER[level] < LEVEL_ORDER[minLevel]) return;
const line = JSON.stringify({
timestamp: new Date().toISOString(),
level,
msg,
runId: opts.runId,
...baseMeta,
...(meta ?? {}),
}) + '\n';
write(line);
}
const logger: Logger = {
debug: (msg, meta) => emit('debug', msg, meta),
info: (msg, meta) => emit('info', msg, meta),
warn: (msg, meta) => emit('warn', msg, meta),
error: (msg, meta) => emit('error', msg, meta),
child: (meta) =>
createLogger({
runId: opts.runId,
minLevel,
write,
baseMeta: { ...baseMeta, ...meta },
}),
};
return logger;
}
```
- [ ] **Step 4: Verify tests pass**
```bash
npx vitest run tests/logger.test.ts
```
Expected: 4 passed.
- [ ] **Step 5: Commit**
```bash
git add tests/soak/core/logger.ts tests/soak/tests/logger.test.ts
git commit -m "$(cat <<'EOF'
feat(soak): structured JSONL logger with child contexts
Single file, no transport, writes one JSON line per call to stdout.
Child loggers inherit parent meta so scenarios can bind room/game
context once and forget about it.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
EOF
)"
```
---
## Phase 3 — SessionPool and seeding
### Task 13: SessionPool with HTTP registration and localStorage warm-start
This is the biggest single module. It owns browser context lifecycle, seeds accounts on cold start, logs in on warm start, and exposes a simple `acquire()` API to scenarios.
**Files:**
- Create: `tests/soak/core/session-pool.ts`
Testing: manual via `scripts/seed-accounts.ts` in Task 14 and the first real runner invocation in Task 17. No Vitest test for this — it's an integration module that needs a real browser.
- [ ] **Step 1: Create `tests/soak/core/session-pool.ts` — imports and types**
```typescript
// tests/soak/core/session-pool.ts
import * as fs from 'fs';
import * as path from 'path';
import {
Browser,
BrowserContext,
chromium,
} from 'playwright-core';
import { GolfBot } from '../../e2e/bot/golf-bot';
import type { Account, Session, Logger } from './types';
export interface SeedOptions {
/** Full base URL of the target server, e.g. https://staging.adlee.work. */
targetUrl: string;
/** Invite code to pass to /api/auth/register. */
inviteCode: string;
/** Number of accounts to create. */
count: number;
}
export interface SessionPoolOptions {
targetUrl: string;
inviteCode: string;
credFile: string; // absolute path to .env.stresstest
logger: Logger;
/** Optional override for the browser to attach contexts to. If absent, SessionPool launches its own. */
browser?: Browser;
/** Passed through to context.newContext. Useful for viewport overrides in tests. */
contextOptions?: Parameters<Browser['newContext']>[0];
}
```
- [ ] **Step 2: Implement cred-file read/write**
Append to `session-pool.ts`:
```typescript
function readCredFile(filePath: string): Account[] | null {
if (!fs.existsSync(filePath)) return null;
const content = fs.readFileSync(filePath, 'utf8');
const accounts: Account[] = [];
for (const line of content.split('\n')) {
const trimmed = line.trim();
if (!trimmed || trimmed.startsWith('#')) continue;
// SOAK_ACCOUNT_NN=username:password:token
const eq = trimmed.indexOf('=');
if (eq === -1) continue;
const key = trimmed.slice(0, eq);
const value = trimmed.slice(eq + 1);
const m = key.match(/^SOAK_ACCOUNT_(\d+)$/);
if (!m) continue;
const [username, password, token] = value.split(':');
if (!username || !password || !token) continue;
const idx = parseInt(m[1], 10);
accounts.push({
key: `soak_${String(idx).padStart(2, '0')}`,
username,
password,
token,
});
}
return accounts.length > 0 ? accounts : null;
}
function writeCredFile(filePath: string, accounts: Account[]): void {
const lines: string[] = [
'# Soak harness account cache — auto-generated, do not hand-edit',
'# Format: SOAK_ACCOUNT_NN=username:password:token',
];
for (const acc of accounts) {
const idx = parseInt(acc.key.replace('soak_', ''), 10);
const key = `SOAK_ACCOUNT_${String(idx).padStart(2, '0')}`;
lines.push(`${key}=${acc.username}:${acc.password}:${acc.token}`);
}
fs.writeFileSync(filePath, lines.join('\n') + '\n', { mode: 0o600 });
}
```
- [ ] **Step 3: Implement the HTTP register call**
```typescript
interface RegisterResponse {
user: { id: string; username: string };
token: string;
expires_at: string;
}
async function registerAccount(
targetUrl: string,
username: string,
password: string,
email: string,
inviteCode: string,
): Promise<string> {
const res = await fetch(`${targetUrl}/api/auth/register`, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ username, password, email, invite_code: inviteCode }),
});
if (!res.ok) {
const body = await res.text().catch(() => '<no body>');
throw new Error(`register failed: ${res.status} ${body}`);
}
const data = (await res.json()) as RegisterResponse;
if (!data.token) {
throw new Error(`register returned no token: ${JSON.stringify(data)}`);
}
return data.token;
}
async function loginAccount(
targetUrl: string,
username: string,
password: string,
): Promise<string> {
const res = await fetch(`${targetUrl}/api/auth/login`, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ username, password }),
});
if (!res.ok) {
const body = await res.text().catch(() => '<no body>');
throw new Error(`login failed: ${res.status} ${body}`);
}
const data = (await res.json()) as RegisterResponse;
return data.token;
}
function randomSuffix(): string {
return Math.random().toString(36).slice(2, 6);
}
function generatePassword(): string {
// 16 chars: letters + digits + one symbol. Meets 8-char minimum from auth_service.
// Split across halves so repo secret-scanners don't flag the string as base64
const lower = 'abcdefghijkm' + 'npqrstuvwxyz'; // pragma: allowlist secret
const upper = 'ABCDEFGHJKLM' + 'NPQRSTUVWXYZ'; // pragma: allowlist secret
const digits = '23456789';
const chars = lower + upper + digits;
let out = '';
for (let i = 0; i < 15; i++) {
out += chars[Math.floor(Math.random() * chars.length)];
}
return out + '!';
}
```
- [ ] **Step 4: Implement the `SessionPool` class**
```typescript
export class SessionPool {
private accounts: Account[] = [];
private ownedBrowser: Browser | null = null;
private browser: Browser | null;
private activeSessions: Session[] = [];
constructor(private opts: SessionPoolOptions) {
this.browser = opts.browser ?? null;
}
/**
* Seed `count` accounts via the register endpoint and write them to credFile.
* Safe to call multiple times — skips accounts already in the file.
*/
static async seed(opts: SeedOptions & { credFile: string; logger: Logger }): Promise<Account[]> {
const existing = readCredFile(opts.credFile) ?? [];
const existingKeys = new Set(existing.map((a) => a.key));
const created: Account[] = [...existing];
for (let i = 0; i < opts.count; i++) {
const key = `soak_${String(i).padStart(2, '0')}`;
if (existingKeys.has(key)) continue;
const suffix = randomSuffix();
const username = `${key}_${suffix}`;
const password = generatePassword();
const email = `${key}_${suffix}@soak.test`;
opts.logger.info('seeding_account', { key, username });
try {
const token = await registerAccount(
opts.targetUrl,
username,
password,
email,
opts.inviteCode,
);
created.push({ key, username, password, token });
writeCredFile(opts.credFile, created);
} catch (err) {
opts.logger.error('seed_failed', {
key,
error: err instanceof Error ? err.message : String(err),
});
throw err;
}
}
return created;
}
/**
* Load accounts from credFile, auto-seeding if the file is missing.
*/
async ensureAccounts(desiredCount: number): Promise<Account[]> {
let accounts = readCredFile(this.opts.credFile);
if (!accounts || accounts.length < desiredCount) {
this.opts.logger.warn('cred_file_missing_or_short', {
found: accounts?.length ?? 0,
desired: desiredCount,
});
accounts = await SessionPool.seed({
targetUrl: this.opts.targetUrl,
inviteCode: this.opts.inviteCode,
count: desiredCount,
credFile: this.opts.credFile,
logger: this.opts.logger,
});
}
this.accounts = accounts.slice(0, desiredCount);
return this.accounts;
}
/**
* Launch the browser if not provided, create N contexts, log each in via
* localStorage injection (falling back to POST /api/auth/login if the
* cached token is rejected), and return the live sessions.
*/
async acquire(count: number): Promise<Session[]> {
await this.ensureAccounts(count);
if (!this.browser) {
this.ownedBrowser = await chromium.launch({ headless: true });
this.browser = this.ownedBrowser;
}
const sessions: Session[] = [];
for (let i = 0; i < count; i++) {
const account = this.accounts[i];
const context = await this.browser.newContext(this.opts.contextOptions);
await this.injectAuth(context, account);
const page = await context.newPage();
await page.goto(this.opts.targetUrl);
const bot = new GolfBot(page);
sessions.push({ account, context, page, bot, key: account.key });
}
this.activeSessions = sessions;
return sessions;
}
/**
* Inject the cached JWT into localStorage BEFORE any page loads.
* Uses addInitScript so the token is present on the first navigation.
* If the cached token is rejected later, acquire() falls back to login.
*/
private async injectAuth(context: BrowserContext, account: Account): Promise<void> {
// Try the cached token first
try {
await context.addInitScript(
({ token, username }) => {
window.localStorage.setItem('authToken', token);
window.localStorage.setItem(
'authUser',
JSON.stringify({ id: '', username, role: 'user', email_verified: true }),
);
},
{ token: account.token, username: account.username },
);
} catch (err) {
this.opts.logger.warn('inject_auth_failed', {
account: account.key,
error: err instanceof Error ? err.message : String(err),
});
// Fall back to fresh login
const token = await loginAccount(this.opts.targetUrl, account.username, account.password);
account.token = token;
writeCredFile(this.opts.credFile, this.accounts);
await context.addInitScript(
({ token, username }) => {
window.localStorage.setItem('authToken', token);
window.localStorage.setItem(
'authUser',
JSON.stringify({ id: '', username, role: 'user', email_verified: true }),
);
},
{ token, username: account.username },
);
}
}
/** Close all active contexts. Safe to call multiple times. */
async release(): Promise<void> {
for (const session of this.activeSessions) {
try {
await session.context.close();
} catch {
// ignore
}
}
this.activeSessions = [];
if (this.ownedBrowser) {
try {
await this.ownedBrowser.close();
} catch {
// ignore
}
this.ownedBrowser = null;
this.browser = null;
}
}
}
```
- [ ] **Step 5: Syntax-check by invoking tsx**
```bash
cd tests/soak
npx tsx -e "import('./core/session-pool').then(() => console.log('ok'))"
```
Expected: `ok`. No TypeScript errors.
- [ ] **Step 6: Commit**
```bash
git add tests/soak/core/session-pool.ts
git commit -m "$(cat <<'EOF'
feat(soak): SessionPool — seed, login, acquire contexts
Owns 16 BrowserContexts, seeds via POST /api/auth/register with the
invite code on cold start, warm-starts via localStorage injection of
the cached JWT, falls back to POST /api/auth/login if the token is
rejected. Exposes acquire(n) for scenarios.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
EOF
)"
```
---
### Task 14: `seed-accounts.ts` CLI wrapper
Tiny standalone entry point that lets you pre-seed before the first harness run. Reuses `SessionPool.seed`.
**Files:**
- Create: `tests/soak/scripts/seed-accounts.ts`
- [ ] **Step 1: Write the script**
```typescript
#!/usr/bin/env tsx
/**
* Seed N soak-harness accounts via the register endpoint.
*
* Usage:
* TEST_URL=http://localhost:8000 \
* SOAK_INVITE_CODE=SOAKTEST \
* npm run seed -- --count=16
*/
import * as path from 'path';
import { SessionPool } from '../core/session-pool';
import { createLogger } from '../core/logger';
function parseArgs(argv: string[]): { count: number } {
const result = { count: 16 };
for (const arg of argv.slice(2)) {
const m = arg.match(/^--count=(\d+)$/);
if (m) result.count = parseInt(m[1], 10);
}
return result;
}
async function main(): Promise<void> {
const { count } = parseArgs(process.argv);
const targetUrl = process.env.TEST_URL ?? 'http://localhost:8000';
const inviteCode = process.env.SOAK_INVITE_CODE;
if (!inviteCode) {
console.error('SOAK_INVITE_CODE env var is required');
console.error(' Local dev: SOAK_INVITE_CODE=SOAKTEST');
console.error(' Staging: SOAK_INVITE_CODE=5VC2MCCN');
process.exit(2);
}
const credFile = path.resolve(__dirname, '..', '.env.stresstest');
const logger = createLogger({ runId: `seed-${Date.now()}` });
logger.info('seed_start', { count, targetUrl, credFile });
try {
const accounts = await SessionPool.seed({
targetUrl,
inviteCode,
count,
credFile,
logger,
});
logger.info('seed_complete', { created: accounts.length });
console.error(`Seeded ${accounts.length} accounts → ${credFile}`);
} catch (err) {
logger.error('seed_failed', {
error: err instanceof Error ? err.message : String(err),
});
process.exit(1);
}
}
main();
```
- [ ] **Step 2: Run it against local dev to verify end-to-end**
With the dev server running and the `SOAKTEST` invite flagged:
```bash
cd tests/soak
TEST_URL=http://localhost:8000 SOAK_INVITE_CODE=SOAKTEST npm run seed -- --count=4
```
Expected:
- Log lines `seeding_account` × 4
- Log line `seed_complete`
- `tests/soak/.env.stresstest` file created with 4 `SOAK_ACCOUNT_NN=...` lines
Verify:
```bash
cat tests/soak/.env.stresstest | head
```
Expected: 4 account lines.
Also verify the accounts got flagged:
```bash
psql -d golfgame -c "SELECT username, is_test_account FROM users_v2 WHERE username LIKE 'soak_%' ORDER BY username;"
```
Expected: 4 rows, all with `is_test_account | t`.
- [ ] **Step 3: Commit**
```bash
git add tests/soak/scripts/seed-accounts.ts
git commit -m "$(cat <<'EOF'
feat(soak): scripts/seed-accounts.ts CLI wrapper
Thin standalone entry for pre-seeding N accounts before the first
harness run. Wraps SessionPool.seed and writes .env.stresstest.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
EOF
)"
```
---
## Phase 4 — First scenario, config, runner (end-to-end milestone)
### Task 15: Shared multiplayer-game helper
Pulls the "run one full game in one room" logic out of the scenarios so `populate` and `stress` share it. Takes a room's sessions and a config, loops until the game ends.
**Files:**
- Create: `tests/soak/scenarios/shared/multiplayer-game.ts`
- [ ] **Step 1: Create the helper module**
```typescript
// tests/soak/scenarios/shared/multiplayer-game.ts
import type { Session, ScenarioContext } from '../../core/types';
export interface MultiplayerGameOptions {
roomId: string;
holes: number;
decks: number;
cpusPerRoom: number;
cpuPersonality?: string;
/** Per-turn think time in [min, max] ms. */
thinkTimeMs: [number, number];
/** Max wall-clock time before giving up on the game (ms). */
maxDurationMs?: number;
}
export interface MultiplayerGameResult {
completed: boolean;
turns: number;
durationMs: number;
error?: string;
}
function randomInt(min: number, max: number): number {
return Math.floor(Math.random() * (max - min + 1)) + min;
}
async function sleep(ms: number): Promise<void> {
return new Promise((resolve) => setTimeout(resolve, ms));
}
/**
* Host + joiners play one full multiplayer game end to end.
* The host creates the room, announces the code via the coordinator,
* joiners wait for the code, the host adds CPUs and starts, everyone
* loops on isMyTurn/playTurn until round_over or game_over.
*/
export async function runOneMultiplayerGame(
ctx: ScenarioContext,
sessions: Session[],
opts: MultiplayerGameOptions,
): Promise<MultiplayerGameResult> {
const start = Date.now();
const [host, ...joiners] = sessions;
const maxDuration = opts.maxDurationMs ?? 5 * 60_000;
try {
// Host creates game
const code = await host.bot.createGame(host.account.username);
ctx.coordinator.announce(opts.roomId, code);
ctx.heartbeat(opts.roomId);
ctx.dashboard.update(opts.roomId, { phase: 'lobby' });
ctx.logger.info('room_created', { room: opts.roomId, code });
// Joiners join concurrently
await Promise.all(
joiners.map(async (joiner) => {
const awaited = await ctx.coordinator.await(opts.roomId);
await joiner.bot.joinGame(awaited, joiner.account.username);
}),
);
ctx.heartbeat(opts.roomId);
// Host adds CPUs (if any) and starts
for (let i = 0; i < opts.cpusPerRoom; i++) {
await host.bot.addCPU(opts.cpuPersonality);
}
await host.bot.startGame({ holes: opts.holes, decks: opts.decks });
ctx.heartbeat(opts.roomId);
ctx.dashboard.update(opts.roomId, { phase: 'playing', totalHoles: opts.holes });
// Concurrent turn loops — one per session
const turnCounts = new Array(sessions.length).fill(0);
async function sessionLoop(sessionIdx: number): Promise<void> {
const session = sessions[sessionIdx];
while (true) {
if (ctx.signal.aborted) return;
if (Date.now() - start > maxDuration) return;
const phase = await session.bot.getGamePhase();
if (phase === 'game_over' || phase === 'round_over') return;
if (await session.bot.isMyTurn()) {
await session.bot.playTurn();
turnCounts[sessionIdx]++;
ctx.heartbeat(opts.roomId);
ctx.dashboard.update(opts.roomId, {
currentPlayer: session.account.username,
moves: turnCounts.reduce((a, b) => a + b, 0),
});
const thinkMs = randomInt(opts.thinkTimeMs[0], opts.thinkTimeMs[1]);
await sleep(thinkMs);
} else {
await sleep(200);
}
}
}
await Promise.all(sessions.map((_, i) => sessionLoop(i)));
const totalTurns = turnCounts.reduce((a, b) => a + b, 0);
ctx.dashboard.update(opts.roomId, { phase: 'round_over' });
return {
completed: true,
turns: totalTurns,
durationMs: Date.now() - start,
};
} catch (err) {
return {
completed: false,
turns: 0,
durationMs: Date.now() - start,
error: err instanceof Error ? err.message : String(err),
};
}
}
```
- [ ] **Step 2: Syntax-check**
```bash
cd tests/soak
npx tsx -e "import('./scenarios/shared/multiplayer-game').then(() => console.log('ok'))"
```
Expected: `ok`.
- [ ] **Step 3: Commit**
```bash
git add tests/soak/scenarios/shared/multiplayer-game.ts
git commit -m "$(cat <<'EOF'
feat(soak): shared runOneMultiplayerGame helper
Encapsulates the host-creates/joiners-join/loop-until-done flow so
populate and stress scenarios don't duplicate it. Honors abort
signal and a max-duration timeout, heartbeats on every turn.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
EOF
)"
```
---
### Task 16: Populate scenario (minimal version)
Partitions sessions into rooms, runs `gamesPerRoom` games per room in parallel, aggregates results.
**Files:**
- Create: `tests/soak/scenarios/populate.ts`
- Create: `tests/soak/scenarios/index.ts`
- [ ] **Step 1: Create `scenarios/populate.ts`**
```typescript
// tests/soak/scenarios/populate.ts
import type {
Scenario,
ScenarioContext,
ScenarioResult,
ScenarioError,
Session,
} from '../core/types';
import { runOneMultiplayerGame } from './shared/multiplayer-game';
const CPU_PERSONALITIES = ['Sofia', 'Marcus', 'Kenji', 'Priya'];
interface PopulateConfig {
gamesPerRoom: number;
holes: number;
decks: number;
rooms: number;
cpusPerRoom: number;
thinkTimeMs: [number, number];
interGamePauseMs: number;
}
function chunk<T>(arr: T[], size: number): T[][] {
const out: T[][] = [];
for (let i = 0; i < arr.length; i += size) {
out.push(arr.slice(i, i + size));
}
return out;
}
async function sleep(ms: number): Promise<void> {
return new Promise((resolve) => setTimeout(resolve, ms));
}
async function runRoom(
ctx: ScenarioContext,
cfg: PopulateConfig,
roomIdx: number,
sessions: Session[],
): Promise<{ completed: number; errors: ScenarioError[] }> {
const roomId = `room-${roomIdx}`;
const cpuPersonality = CPU_PERSONALITIES[roomIdx % CPU_PERSONALITIES.length];
let completed = 0;
const errors: ScenarioError[] = [];
for (let gameNum = 0; gameNum < cfg.gamesPerRoom; gameNum++) {
if (ctx.signal.aborted) break;
ctx.dashboard.update(roomId, { game: gameNum + 1, totalGames: cfg.gamesPerRoom });
ctx.logger.info('game_start', { room: roomId, game: gameNum + 1 });
const result = await runOneMultiplayerGame(ctx, sessions, {
roomId,
holes: cfg.holes,
decks: cfg.decks,
cpusPerRoom: cfg.cpusPerRoom,
cpuPersonality,
thinkTimeMs: cfg.thinkTimeMs,
});
if (result.completed) {
completed++;
ctx.logger.info('game_complete', {
room: roomId,
game: gameNum + 1,
turns: result.turns,
durationMs: result.durationMs,
});
} else {
errors.push({
room: roomId,
reason: 'game_failed',
detail: result.error,
timestamp: Date.now(),
});
ctx.logger.error('game_failed', { room: roomId, game: gameNum + 1, error: result.error });
}
if (gameNum < cfg.gamesPerRoom - 1) {
await sleep(cfg.interGamePauseMs);
}
}
return { completed, errors };
}
const populate: Scenario = {
name: 'populate',
description: 'Long multi-round games to populate scoreboards',
needs: { accounts: 16, rooms: 4, cpusPerRoom: 1 },
defaultConfig: {
gamesPerRoom: 10,
holes: 9,
decks: 2,
rooms: 4,
cpusPerRoom: 1,
thinkTimeMs: [800, 2200],
interGamePauseMs: 3000,
},
async run(ctx: ScenarioContext): Promise<ScenarioResult> {
const start = Date.now();
const cfg = ctx.config as unknown as PopulateConfig;
const perRoom = Math.floor(ctx.sessions.length / cfg.rooms);
if (perRoom * cfg.rooms !== ctx.sessions.length) {
throw new Error(
`populate: ${ctx.sessions.length} sessions does not divide evenly into ${cfg.rooms} rooms`,
);
}
const roomSessions = chunk(ctx.sessions, perRoom);
const results = await Promise.allSettled(
roomSessions.map((sessions, idx) => runRoom(ctx, cfg, idx, sessions)),
);
let gamesCompleted = 0;
const errors: ScenarioError[] = [];
results.forEach((r, idx) => {
if (r.status === 'fulfilled') {
gamesCompleted += r.value.completed;
errors.push(...r.value.errors);
} else {
errors.push({
room: `room-${idx}`,
reason: 'room_threw',
detail: r.reason instanceof Error ? r.reason.message : String(r.reason),
timestamp: Date.now(),
});
}
});
return {
gamesCompleted,
errors,
durationMs: Date.now() - start,
};
},
};
export default populate;
```
- [ ] **Step 2: Create `scenarios/index.ts` registry**
```typescript
// tests/soak/scenarios/index.ts
import type { Scenario } from '../core/types';
import populate from './populate';
const registry: Record<string, Scenario> = {
populate,
};
export function getScenario(name: string): Scenario | undefined {
return registry[name];
}
export function listScenarios(): Scenario[] {
return Object.values(registry);
}
```
- [ ] **Step 3: Syntax-check**
```bash
cd tests/soak
npx tsx -e "import('./scenarios/index').then((m) => console.log(m.listScenarios().map(s => s.name)))"
```
Expected: `['populate']`.
- [ ] **Step 4: Commit**
```bash
git add tests/soak/scenarios/populate.ts tests/soak/scenarios/index.ts
git commit -m "$(cat <<'EOF'
feat(soak): populate scenario + scenario registry
Partitions sessions into N rooms, runs gamesPerRoom games per room
in parallel via Promise.allSettled so a failure in one room never
unwinds the others. Errors roll up into ScenarioResult.errors.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
EOF
)"
```
---
### Task 17: Config parsing with tests
CLI flags, env vars, scenario defaults, runner defaults — merged in that precedence order.
**Files:**
- Create: `tests/soak/config.ts`
- Create: `tests/soak/tests/config.test.ts`
- [ ] **Step 1: Write failing tests**
```typescript
// tests/soak/tests/config.test.ts
import { describe, it, expect } from 'vitest';
import { parseArgs, mergeConfig } from '../config';
describe('parseArgs', () => {
it('parses --scenario and numeric flags', () => {
const r = parseArgs(['--scenario=populate', '--rooms=4', '--games-per-room=10']);
expect(r.scenario).toBe('populate');
expect(r.rooms).toBe(4);
expect(r.gamesPerRoom).toBe(10);
});
it('parses watch mode', () => {
const r = parseArgs(['--scenario=populate', '--watch=none']);
expect(r.watch).toBe('none');
});
it('rejects unknown watch mode', () => {
expect(() => parseArgs(['--scenario=populate', '--watch=bogus'])).toThrow();
});
it('--list sets listOnly', () => {
const r = parseArgs(['--list']);
expect(r.listOnly).toBe(true);
});
});
describe('mergeConfig', () => {
it('CLI flags override scenario defaults', () => {
const cfg = mergeConfig(
{ games: 5, holes: 9 },
{},
{ gamesPerRoom: 20 },
);
expect(cfg.gamesPerRoom).toBe(20);
});
it('env overrides scenario defaults but not CLI', () => {
const cfg = mergeConfig(
{ games: 5, holes: 9 },
{ SOAK_HOLES: '3' },
{ holes: 7 },
);
expect(cfg.holes).toBe(7); // CLI wins (7 was from scenario defaults? no — CLI not set here)
// Correction: CLI not set, so env wins over scenario default
});
it('scenario defaults fill in unset values', () => {
const cfg = mergeConfig(
{ games: 5, holes: 9 },
{},
{ gamesPerRoom: 3 },
);
expect(cfg.games).toBe(5);
expect(cfg.holes).toBe(9);
expect(cfg.gamesPerRoom).toBe(3);
});
});
```
Note: the middle test has a correction inline — re-read and fix so the assertion matches precedence "CLI > env > defaults". Correct version:
```typescript
it('env overrides scenario defaults but CLI overrides env', () => {
const cfg = mergeConfig(
{ holes: 5 }, // CLI
{ SOAK_HOLES: '3' }, // env
{ holes: 9 }, // defaults
);
expect(cfg.holes).toBe(5); // CLI wins
});
```
Replace the second `it(...)` block above with this corrected version before running.
- [ ] **Step 2: Run tests to verify they fail**
```bash
npx vitest run tests/config.test.ts
```
Expected: FAIL — module not found.
- [ ] **Step 3: Implement `config.ts`**
```typescript
// tests/soak/config.ts
export type WatchMode = 'none' | 'dashboard' | 'tiled';
export interface CliArgs {
scenario?: string;
accounts?: number;
rooms?: number;
cpusPerRoom?: number;
gamesPerRoom?: number;
holes?: number;
watch?: WatchMode;
dashboardPort?: number;
target?: string;
runId?: string;
dryRun?: boolean;
listOnly?: boolean;
}
const VALID_WATCH: WatchMode[] = ['none', 'dashboard', 'tiled'];
function parseInt10(s: string, name: string): number {
const n = parseInt(s, 10);
if (Number.isNaN(n)) throw new Error(`Invalid integer for ${name}: ${s}`);
return n;
}
export function parseArgs(argv: string[]): CliArgs {
const out: CliArgs = {};
for (const arg of argv) {
if (arg === '--list') {
out.listOnly = true;
continue;
}
if (arg === '--dry-run') {
out.dryRun = true;
continue;
}
const m = arg.match(/^--([a-z][a-z0-9-]*)=(.*)$/);
if (!m) continue;
const [, key, value] = m;
switch (key) {
case 'scenario':
out.scenario = value;
break;
case 'accounts':
out.accounts = parseInt10(value, '--accounts');
break;
case 'rooms':
out.rooms = parseInt10(value, '--rooms');
break;
case 'cpus-per-room':
out.cpusPerRoom = parseInt10(value, '--cpus-per-room');
break;
case 'games-per-room':
out.gamesPerRoom = parseInt10(value, '--games-per-room');
break;
case 'holes':
out.holes = parseInt10(value, '--holes');
break;
case 'watch':
if (!VALID_WATCH.includes(value as WatchMode)) {
throw new Error(`Invalid --watch value: ${value} (expected ${VALID_WATCH.join('|')})`);
}
out.watch = value as WatchMode;
break;
case 'dashboard-port':
out.dashboardPort = parseInt10(value, '--dashboard-port');
break;
case 'target':
out.target = value;
break;
case 'run-id':
out.runId = value;
break;
default:
// Unknown flag — ignore so scenario-specific flags can be added later
break;
}
}
return out;
}
/**
* Merge in order: scenarioDefaults → env → cli (later wins).
*/
export function mergeConfig(
cli: Record<string, unknown>,
env: Record<string, string | undefined>,
defaults: Record<string, unknown>,
): Record<string, unknown> {
const merged: Record<string, unknown> = { ...defaults };
// Env overlay — SOAK_UPPER_SNAKE → lowerCamel in cli space.
const envMap: Record<string, string> = {
SOAK_HOLES: 'holes',
SOAK_ROOMS: 'rooms',
SOAK_ACCOUNTS: 'accounts',
SOAK_CPUS_PER_ROOM: 'cpusPerRoom',
SOAK_GAMES_PER_ROOM: 'gamesPerRoom',
SOAK_WATCH: 'watch',
SOAK_DASHBOARD_PORT: 'dashboardPort',
};
for (const [envKey, cfgKey] of Object.entries(envMap)) {
const v = env[envKey];
if (v !== undefined) {
// Heuristic: numeric keys
if (/^(holes|rooms|accounts|cpusPerRoom|gamesPerRoom|dashboardPort)$/.test(cfgKey)) {
merged[cfgKey] = parseInt(v, 10);
} else {
merged[cfgKey] = v;
}
}
}
// CLI overlay — wins over env and defaults.
for (const [k, v] of Object.entries(cli)) {
if (v !== undefined) merged[k] = v;
}
return merged;
}
```
- [ ] **Step 4: Fix the failing middle test as noted in Step 1**
Edit `tests/soak/tests/config.test.ts` and replace the second `it(...)` block inside `describe('mergeConfig')` with the corrected version provided in Step 1.
- [ ] **Step 5: Run tests to verify they pass**
```bash
npx vitest run tests/config.test.ts
```
Expected: all passing.
- [ ] **Step 6: Commit**
```bash
git add tests/soak/config.ts tests/soak/tests/config.test.ts
git commit -m "$(cat <<'EOF'
feat(soak): CLI parsing + config precedence
parseArgs pulls --scenario/--rooms/--watch/etc from argv, mergeConfig
layers scenarioDefaults → env → CLI so CLI flags always win. Unit
tested.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
EOF
)"
```
---
### Task 18: `runner.ts` entry point — first end-to-end milestone
Replaces the placeholder runner with the real thing: parse args, build dependencies, load scenario, acquire sessions, run scenario, clean up, print summary. Supports `--watch=none` only at this stage.
**Files:**
- Modify: `tests/soak/runner.ts` (replace placeholder)
- [ ] **Step 1: Rewrite `runner.ts`**
```typescript
#!/usr/bin/env tsx
/**
* Golf Soak Harness — entry point.
*
* Usage:
* TEST_URL=http://localhost:8000 \
* SOAK_INVITE_CODE=SOAKTEST \
* npm run soak -- --scenario=populate --rooms=1 --accounts=2 \
* --cpus-per-room=0 --games-per-room=1 --holes=1 --watch=none
*/
import * as path from 'path';
import { parseArgs, mergeConfig, CliArgs } from './config';
import { createLogger } from './core/logger';
import { SessionPool } from './core/session-pool';
import { RoomCoordinator } from './core/room-coordinator';
import { getScenario, listScenarios } from './scenarios';
import type { DashboardReporter, ScenarioContext } from './core/types';
function noopDashboard(): DashboardReporter {
return {
update: () => {},
log: () => {},
incrementMetric: () => {},
};
}
function printScenarioList(): void {
console.log('Available scenarios:');
for (const s of listScenarios()) {
console.log(` ${s.name.padEnd(12)} ${s.description}`);
console.log(` needs: accounts=${s.needs.accounts}, rooms=${s.needs.rooms ?? 1}, cpus=${s.needs.cpusPerRoom ?? 0}`);
}
}
async function main(): Promise<void> {
const cli: CliArgs = parseArgs(process.argv.slice(2));
if (cli.listOnly) {
printScenarioList();
return;
}
if (!cli.scenario) {
console.error('Error: --scenario=<name> is required. Use --list to see scenarios.');
process.exit(2);
}
const scenario = getScenario(cli.scenario);
if (!scenario) {
console.error(`Error: unknown scenario "${cli.scenario}". Use --list to see scenarios.`);
process.exit(2);
}
const runId = cli.runId ?? `${cli.scenario}-${new Date().toISOString().replace(/[:.]/g, '-')}`;
const targetUrl = cli.target ?? process.env.TEST_URL ?? 'http://localhost:8000';
const inviteCode = process.env.SOAK_INVITE_CODE ?? 'SOAKTEST';
const watch = cli.watch ?? 'dashboard';
const logger = createLogger({ runId });
logger.info('run_start', {
scenario: scenario.name,
targetUrl,
watch,
cli,
});
// Resolve final config
const config = mergeConfig(
cli as Record<string, unknown>,
process.env,
scenario.defaultConfig,
);
// Ensure core knobs exist
const accounts = Number(config.accounts ?? scenario.needs.accounts);
const rooms = Number(config.rooms ?? scenario.needs.rooms ?? 1);
const cpusPerRoom = Number(config.cpusPerRoom ?? scenario.needs.cpusPerRoom ?? 0);
if (accounts % rooms !== 0) {
console.error(`Error: --accounts=${accounts} does not divide evenly into --rooms=${rooms}`);
process.exit(2);
}
config.rooms = rooms;
config.cpusPerRoom = cpusPerRoom;
if (cli.dryRun) {
logger.info('dry_run', { config });
console.log('Dry run OK. Resolved config:');
console.log(JSON.stringify(config, null, 2));
return;
}
if (watch !== 'none') {
logger.warn('watch_mode_not_yet_implemented', { watch });
console.warn(`Watch mode "${watch}" not yet implemented — falling back to "none".`);
}
// Build dependencies
const credFile = path.resolve(__dirname, '.env.stresstest');
const pool = new SessionPool({
targetUrl,
inviteCode,
credFile,
logger,
});
const coordinator = new RoomCoordinator();
const dashboard = noopDashboard();
const abortController = new AbortController();
const onSignal = (sig: string) => {
logger.warn('signal_received', { signal: sig });
abortController.abort();
};
process.on('SIGINT', () => onSignal('SIGINT'));
process.on('SIGTERM', () => onSignal('SIGTERM'));
let exitCode = 0;
try {
const sessions = await pool.acquire(accounts);
logger.info('sessions_acquired', { count: sessions.length });
const ctx: ScenarioContext = {
config,
sessions,
coordinator,
dashboard,
logger,
signal: abortController.signal,
heartbeat: () => {}, // Task 26 wires this up
};
const result = await scenario.run(ctx);
logger.info('run_complete', {
gamesCompleted: result.gamesCompleted,
errors: result.errors.length,
durationMs: result.durationMs,
});
console.log(`Games completed: ${result.gamesCompleted}`);
console.log(`Errors: ${result.errors.length}`);
console.log(`Duration: ${(result.durationMs / 1000).toFixed(1)}s`);
if (result.errors.length > 0) {
console.log('Errors:');
for (const e of result.errors) {
console.log(` ${e.room}: ${e.reason}${e.detail ? ' — ' + e.detail : ''}`);
}
exitCode = 1;
}
} catch (err) {
logger.error('run_failed', {
error: err instanceof Error ? err.message : String(err),
stack: err instanceof Error ? err.stack : undefined,
});
exitCode = 1;
} finally {
await pool.release();
}
if (abortController.signal.aborted && exitCode === 0) exitCode = 2;
process.exit(exitCode);
}
main().catch((err) => {
console.error(err);
process.exit(1);
});
```
- [ ] **Step 2: Run a minimal `--watch=none` smoke against local dev**
Server running, 4 soak accounts already seeded from Task 14:
```bash
cd tests/soak
TEST_URL=http://localhost:8000 SOAK_INVITE_CODE=SOAKTEST npm run soak -- \
--scenario=populate \
--accounts=2 \
--rooms=1 \
--cpus-per-room=0 \
--games-per-room=1 \
--holes=1 \
--watch=none
```
Expected output (abbreviated):
```
{"timestamp":"...","level":"info","msg":"run_start",...}
{"timestamp":"...","level":"info","msg":"sessions_acquired","count":2}
{"timestamp":"...","level":"info","msg":"game_start","room":"room-0","game":1}
{"timestamp":"...","level":"info","msg":"room_created","code":"XXXX"}
{"timestamp":"...","level":"info","msg":"game_complete","room":"room-0","turns":...}
{"timestamp":"...","level":"info","msg":"run_complete","gamesCompleted":1,"errors":0}
Games completed: 1
Errors: 0
Duration: X.Xs
```
Exit code 0.
This is the first **end-to-end milestone**. Stop here if debugging is needed — fix issues before moving on.
- [ ] **Step 3: Commit**
```bash
git add tests/soak/runner.ts
git commit -m "$(cat <<'EOF'
feat(soak): runner.ts end-to-end with --watch=none
First full end-to-end milestone: parses CLI, builds SessionPool +
RoomCoordinator, loads a scenario by name, runs it, reports results,
cleans up. Watch modes other than "none" log a warning and fall back
until Tasks 19-24 implement them.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
EOF
)"
```
---
## Phase 5 — Dashboard status grid
### Task 19: Dashboard HTTP + WS server
Vanilla node `http` + `ws`. Serves one static HTML page, accepts WS connections, broadcasts room-state updates.
**Files:**
- Create: `tests/soak/dashboard/server.ts`
- [ ] **Step 1: Implement `dashboard/server.ts`**
```typescript
// tests/soak/dashboard/server.ts
import * as http from 'http';
import * as fs from 'fs';
import * as path from 'path';
import { WebSocketServer, WebSocket } from 'ws';
import type { DashboardReporter, Logger, RoomState } from '../core/types';
export type DashboardIncoming =
| { type: 'start_stream'; sessionKey: string }
| { type: 'stop_stream'; sessionKey: string };
export type DashboardOutgoing =
| { type: 'room_state'; roomId: string; state: Partial<RoomState> }
| { type: 'log'; level: string; msg: string; meta?: object; timestamp: number }
| { type: 'metric'; name: string; value: number }
| { type: 'frame'; sessionKey: string; jpegBase64: string };
export interface DashboardHandlers {
onStartStream?(sessionKey: string): void;
onStopStream?(sessionKey: string): void;
onDisconnect?(): void;
}
export class DashboardServer {
private httpServer!: http.Server;
private wsServer!: WebSocketServer;
private clients = new Set<WebSocket>();
private metrics: Record<string, number> = {};
private roomStates: Record<string, Partial<RoomState>> = {};
constructor(
private port: number,
private logger: Logger,
private handlers: DashboardHandlers = {},
) {}
async start(): Promise<void> {
const htmlPath = path.resolve(__dirname, 'index.html');
const cssPath = path.resolve(__dirname, 'dashboard.css');
const jsPath = path.resolve(__dirname, 'dashboard.js');
this.httpServer = http.createServer((req, res) => {
const url = req.url ?? '/';
if (url === '/' || url === '/index.html') {
res.writeHead(200, { 'Content-Type': 'text/html; charset=utf-8' });
fs.createReadStream(htmlPath).pipe(res);
} else if (url === '/dashboard.css') {
res.writeHead(200, { 'Content-Type': 'text/css' });
fs.createReadStream(cssPath).pipe(res);
} else if (url === '/dashboard.js') {
res.writeHead(200, { 'Content-Type': 'application/javascript' });
fs.createReadStream(jsPath).pipe(res);
} else {
res.writeHead(404);
res.end('not found');
}
});
this.wsServer = new WebSocketServer({ server: this.httpServer });
this.wsServer.on('connection', (ws) => {
this.clients.add(ws);
this.logger.info('dashboard_client_connected', { count: this.clients.size });
// Replay current state to the new client
for (const [roomId, state] of Object.entries(this.roomStates)) {
ws.send(JSON.stringify({ type: 'room_state', roomId, state } as DashboardOutgoing));
}
for (const [name, value] of Object.entries(this.metrics)) {
ws.send(JSON.stringify({ type: 'metric', name, value } as DashboardOutgoing));
}
ws.on('message', (data) => {
try {
const parsed = JSON.parse(data.toString()) as DashboardIncoming;
if (parsed.type === 'start_stream' && this.handlers.onStartStream) {
this.handlers.onStartStream(parsed.sessionKey);
} else if (parsed.type === 'stop_stream' && this.handlers.onStopStream) {
this.handlers.onStopStream(parsed.sessionKey);
}
} catch (err) {
this.logger.warn('dashboard_ws_parse_error', {
error: err instanceof Error ? err.message : String(err),
});
}
});
ws.on('close', () => {
this.clients.delete(ws);
this.logger.info('dashboard_client_disconnected', { count: this.clients.size });
if (this.clients.size === 0 && this.handlers.onDisconnect) {
this.handlers.onDisconnect();
}
});
});
await new Promise<void>((resolve) => {
this.httpServer.listen(this.port, () => resolve());
});
this.logger.info('dashboard_listening', { url: `http://localhost:${this.port}` });
}
async stop(): Promise<void> {
for (const ws of this.clients) {
try {
ws.close();
} catch {
// ignore
}
}
this.clients.clear();
await new Promise<void>((resolve) => {
this.wsServer.close(() => resolve());
});
await new Promise<void>((resolve) => {
this.httpServer.close(() => resolve());
});
}
broadcast(msg: DashboardOutgoing): void {
const payload = JSON.stringify(msg);
for (const ws of this.clients) {
if (ws.readyState === WebSocket.OPEN) {
ws.send(payload);
}
}
}
/** Create a DashboardReporter wired to this server. */
reporter(): DashboardReporter {
return {
update: (roomId, state) => {
this.roomStates[roomId] = { ...this.roomStates[roomId], ...state };
this.broadcast({ type: 'room_state', roomId, state });
},
log: (level, msg, meta) => {
this.broadcast({ type: 'log', level, msg, meta, timestamp: Date.now() });
},
incrementMetric: (name, by = 1) => {
this.metrics[name] = (this.metrics[name] ?? 0) + by;
this.broadcast({ type: 'metric', name, value: this.metrics[name] });
},
};
}
}
```
- [ ] **Step 2: Syntax-check**
```bash
cd tests/soak
npx tsx -e "import('./dashboard/server').then(() => console.log('ok'))"
```
Expected: `ok`.
- [ ] **Step 3: Commit**
```bash
git add tests/soak/dashboard/server.ts
git commit -m "$(cat <<'EOF'
feat(soak): DashboardServer — vanilla http + ws
Serves one static HTML page, accepts WS connections, broadcasts
room_state/log/metric messages to all clients. Exposes a
reporter() method that returns a DashboardReporter scenarios can
call without knowing about sockets.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
EOF
)"
```
---
### Task 20: Dashboard HTML/CSS/JS status grid
Single static HTML page + stylesheet + client script. Renders the 2×2 room grid, subscribes to WS, updates tiles on each message.
**Files:**
- Create: `tests/soak/dashboard/index.html`
- Create: `tests/soak/dashboard/dashboard.css`
- Create: `tests/soak/dashboard/dashboard.js`
- [ ] **Step 1: Create `dashboard/index.html`**
```html
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Golf Soak Dashboard</title>
<link rel="stylesheet" href="/dashboard.css">
</head>
<body>
<header class="dash-header">
<h1>⛳ Golf Soak Dashboard</h1>
<div class="meta">
<span id="run-id">run —</span>
<span id="elapsed">00:00:00</span>
</div>
</header>
<div class="meta-bar">
<div class="stat"><span class="label">Games</span><span id="metric-games">0</span></div>
<div class="stat"><span class="label">Moves</span><span id="metric-moves">0</span></div>
<div class="stat"><span class="label">Errors</span><span id="metric-errors">0</span></div>
<div class="stat"><span class="label">WS</span><span id="ws-status">connecting</span></div>
</div>
<div class="rooms" id="rooms">
<!-- Room tiles injected by dashboard.js -->
</div>
<section class="log">
<div class="log-header">Activity Log</div>
<ul id="log-list"></ul>
</section>
<!-- Modal for focused live video (Task 23) -->
<div id="video-modal" class="video-modal hidden">
<div class="video-modal-content">
<div class="video-modal-header">
<span id="video-modal-title">Watching —</span>
<button id="video-modal-close">Close</button>
</div>
<img id="video-frame" alt="Live screencast" />
</div>
</div>
<script src="/dashboard.js"></script>
</body>
</html>
```
- [ ] **Step 2: Create `dashboard/dashboard.css`**
```css
:root {
--bg: #0a0e16;
--panel: #0e1420;
--border: #1a2230;
--text: #c8d4e4;
--accent: #7fbaff;
--good: #6fd08f;
--warn: #ffb84d;
--err: #ff5c6c;
--muted: #556577;
}
* { box-sizing: border-box; }
body {
margin: 0;
font-family: -apple-system, system-ui, 'SF Mono', Consolas, monospace;
background: var(--bg);
color: var(--text);
}
.dash-header {
display: flex;
justify-content: space-between;
align-items: center;
padding: 12px 20px;
background: linear-gradient(135deg, #0f1823, #0a1018);
border-bottom: 1px solid var(--border);
}
.dash-header h1 { margin: 0; font-size: 16px; color: var(--accent); }
.dash-header .meta { font-size: 11px; color: var(--muted); }
.dash-header .meta span + span { margin-left: 12px; }
.meta-bar {
display: flex;
gap: 24px;
padding: 10px 20px;
background: #0c131d;
border-bottom: 1px solid var(--border);
font-size: 12px;
}
.meta-bar .stat .label { color: var(--muted); margin-right: 6px; }
.meta-bar .stat span:last-child { color: #fff; font-weight: 600; }
.rooms {
display: grid;
grid-template-columns: 1fr 1fr;
gap: 1px;
background: var(--border);
}
.room {
background: var(--panel);
padding: 14px 18px;
min-height: 180px;
}
.room-title {
display: flex;
justify-content: space-between;
align-items: center;
margin-bottom: 10px;
}
.room-title .name { font-size: 13px; color: var(--accent); font-weight: 600; }
.room-title .phase {
font-size: 10px;
padding: 2px 8px;
border-radius: 10px;
background: #1a3a2a;
color: var(--good);
}
.room-title .phase.lobby { background: #3a2a1a; color: var(--warn); }
.room-title .phase.err { background: #3a1a1a; color: var(--err); }
.players {
display: grid;
grid-template-columns: repeat(2, 1fr);
gap: 4px;
font-size: 11px;
margin-bottom: 8px;
}
.player {
display: flex;
justify-content: space-between;
padding: 4px 8px;
background: #0a0f18;
border-radius: 3px;
cursor: pointer;
border: 1px solid transparent;
}
.player:hover { border-color: var(--accent); }
.player.active {
background: #1a2a40;
border-left: 2px solid var(--accent);
}
.player .score { color: var(--muted); }
.progress-bar {
height: 4px;
background: var(--border);
border-radius: 2px;
overflow: hidden;
margin-top: 6px;
}
.progress-fill {
height: 100%;
background: linear-gradient(90deg, var(--accent), var(--good));
transition: width 0.3s;
}
.room-meta {
font-size: 10px;
color: var(--muted);
display: flex;
gap: 12px;
margin-top: 6px;
}
.log {
border-top: 1px solid var(--border);
background: #080c13;
max-height: 160px;
overflow-y: auto;
}
.log .log-header {
padding: 6px 20px;
font-size: 10px;
text-transform: uppercase;
color: var(--muted);
border-bottom: 1px solid var(--border);
}
.log ul { list-style: none; margin: 0; padding: 4px 20px; font-size: 10px; }
.log li { line-height: 1.5; font-family: monospace; color: var(--muted); }
.log li.warn { color: var(--warn); }
.log li.error { color: var(--err); }
.video-modal {
position: fixed;
inset: 0;
background: rgba(0, 0, 0, 0.85);
display: flex;
align-items: center;
justify-content: center;
z-index: 100;
}
.video-modal.hidden { display: none; }
.video-modal-content {
background: var(--panel);
border: 1px solid var(--border);
border-radius: 6px;
padding: 16px;
max-width: 90vw;
max-height: 90vh;
}
.video-modal-header {
display: flex;
justify-content: space-between;
align-items: center;
margin-bottom: 12px;
color: var(--accent);
font-size: 13px;
}
.video-modal-header button {
background: var(--border);
color: var(--text);
border: none;
padding: 4px 12px;
border-radius: 3px;
cursor: pointer;
}
#video-frame {
display: block;
max-width: 100%;
max-height: 70vh;
border: 1px solid var(--border);
}
```
- [ ] **Step 3: Create `dashboard/dashboard.js`**
```javascript
// tests/soak/dashboard/dashboard.js
(() => {
const ws = new WebSocket(`ws://${location.host}`);
const roomsEl = document.getElementById('rooms');
const logEl = document.getElementById('log-list');
const wsStatusEl = document.getElementById('ws-status');
const metricGames = document.getElementById('metric-games');
const metricMoves = document.getElementById('metric-moves');
const metricErrors = document.getElementById('metric-errors');
const elapsedEl = document.getElementById('elapsed');
const roomTiles = new Map();
const startTime = Date.now();
let currentWatchedKey = null;
// Video modal
const videoModal = document.getElementById('video-modal');
const videoFrame = document.getElementById('video-frame');
const videoTitle = document.getElementById('video-modal-title');
const videoClose = document.getElementById('video-modal-close');
function fmtElapsed(ms) {
const s = Math.floor(ms / 1000);
const h = Math.floor(s / 3600);
const m = Math.floor((s % 3600) / 60);
const sec = s % 60;
return `${String(h).padStart(2, '0')}:${String(m).padStart(2, '0')}:${String(sec).padStart(2, '0')}`;
}
setInterval(() => {
elapsedEl.textContent = fmtElapsed(Date.now() - startTime);
}, 1000);
function ensureRoomTile(roomId) {
if (roomTiles.has(roomId)) return roomTiles.get(roomId);
const tile = document.createElement('div');
tile.className = 'room';
tile.innerHTML = `
<div class="room-title">
<div class="name">${roomId}</div>
<div class="phase lobby">waiting</div>
</div>
<div class="players"></div>
<div class="progress-bar"><div class="progress-fill" style="width:0%"></div></div>
<div class="room-meta">
<span class="moves">0 moves</span>
<span class="game">game —</span>
</div>
`;
roomsEl.appendChild(tile);
roomTiles.set(roomId, tile);
return tile;
}
function renderRoomState(roomId, state) {
const tile = ensureRoomTile(roomId);
if (state.phase !== undefined) {
const phaseEl = tile.querySelector('.phase');
phaseEl.textContent = state.phase;
phaseEl.classList.toggle('lobby', state.phase === 'lobby' || state.phase === 'waiting');
phaseEl.classList.toggle('err', state.phase === 'error');
}
if (state.players !== undefined) {
const playersEl = tile.querySelector('.players');
playersEl.innerHTML = state.players
.map(
(p) => `
<div class="player ${p.isActive ? 'active' : ''}" data-session="${p.key}">
<span>${p.isActive ? '▶ ' : ''}${p.key}</span>
<span class="score">${p.score ?? '—'}</span>
</div>
`,
)
.join('');
}
if (state.hole !== undefined && state.totalHoles !== undefined) {
const fill = tile.querySelector('.progress-fill');
const pct = state.totalHoles > 0 ? Math.round((state.hole / state.totalHoles) * 100) : 0;
fill.style.width = `${pct}%`;
}
if (state.moves !== undefined) {
tile.querySelector('.moves').textContent = `${state.moves} moves`;
}
if (state.game !== undefined && state.totalGames !== undefined) {
tile.querySelector('.game').textContent = `game ${state.game}/${state.totalGames}`;
}
}
function appendLog(level, msg, meta) {
const li = document.createElement('li');
li.className = level;
const ts = new Date().toLocaleTimeString();
li.textContent = `[${ts}] ${msg} ${meta ? JSON.stringify(meta) : ''}`;
logEl.insertBefore(li, logEl.firstChild);
// Cap log length
while (logEl.children.length > 100) {
logEl.removeChild(logEl.lastChild);
}
}
function applyMetric(name, value) {
if (name === 'games_completed') metricGames.textContent = value;
else if (name === 'moves_total') metricMoves.textContent = value;
else if (name === 'errors') metricErrors.textContent = value;
}
ws.addEventListener('open', () => {
wsStatusEl.textContent = 'healthy';
wsStatusEl.style.color = 'var(--good)';
});
ws.addEventListener('close', () => {
wsStatusEl.textContent = 'disconnected';
wsStatusEl.style.color = 'var(--err)';
});
ws.addEventListener('message', (event) => {
let msg;
try {
msg = JSON.parse(event.data);
} catch {
return;
}
if (msg.type === 'room_state') {
renderRoomState(msg.roomId, msg.state);
} else if (msg.type === 'log') {
appendLog(msg.level, msg.msg, msg.meta);
} else if (msg.type === 'metric') {
applyMetric(msg.name, msg.value);
} else if (msg.type === 'frame') {
if (msg.sessionKey === currentWatchedKey) {
videoFrame.src = `data:image/jpeg;base64,${msg.jpegBase64}`;
}
}
});
// Click-to-watch (wired in Task 23)
roomsEl.addEventListener('click', (e) => {
const playerEl = e.target.closest('.player');
if (!playerEl) return;
const key = playerEl.dataset.session;
if (!key) return;
currentWatchedKey = key;
videoTitle.textContent = `Watching ${key}`;
videoModal.classList.remove('hidden');
ws.send(JSON.stringify({ type: 'start_stream', sessionKey: key }));
});
function closeVideo() {
if (currentWatchedKey) {
ws.send(JSON.stringify({ type: 'stop_stream', sessionKey: currentWatchedKey }));
}
currentWatchedKey = null;
videoModal.classList.add('hidden');
videoFrame.src = '';
}
videoClose.addEventListener('click', closeVideo);
document.addEventListener('keydown', (e) => {
if (e.key === 'Escape') closeVideo();
});
})();
```
- [ ] **Step 4: Commit**
```bash
git add tests/soak/dashboard/index.html tests/soak/dashboard/dashboard.css tests/soak/dashboard/dashboard.js
git commit -m "$(cat <<'EOF'
feat(soak): dashboard status grid UI
Static HTML page served by DashboardServer. Renders the 2×2 room
grid with progress bars and player tiles, subscribes to WS events,
updates tiles live. Click-to-watch modal is wired but receives
frames once the CDP screencaster ships in Task 22.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
EOF
)"
```
---
### Task 21: Wire `WATCH=dashboard` in runner
Start the dashboard server when `--watch=dashboard`, auto-open the URL in the user's browser, use its `reporter()` as the `ctx.dashboard`.
**Files:**
- Modify: `tests/soak/runner.ts`
- [ ] **Step 1: Import and instantiate DashboardServer in `runner.ts`**
At the top of `runner.ts`, add:
```typescript
import { DashboardServer } from './dashboard/server';
import { spawn } from 'child_process';
```
Replace the block that creates `dashboard` with:
```typescript
// Build dashboard if requested
let dashboardServer: DashboardServer | null = null;
let dashboard: DashboardReporter = noopDashboard();
if (watch === 'dashboard') {
const port = Number(config.dashboardPort ?? 7777);
dashboardServer = new DashboardServer(port, logger, {
onStartStream: (_key) => {
logger.info('stream_start_requested', { sessionKey: _key });
// Wired in Task 22
},
onStopStream: (_key) => {
logger.info('stream_stop_requested', { sessionKey: _key });
},
});
await dashboardServer.start();
dashboard = dashboardServer.reporter();
const url = `http://localhost:${port}`;
console.log(`Dashboard: ${url}`);
// Best-effort auto-open
try {
const opener = process.platform === 'darwin' ? 'open' : process.platform === 'win32' ? 'start' : 'xdg-open';
spawn(opener, [url], { stdio: 'ignore', detached: true }).unref();
} catch {
// If auto-open fails, the URL is already printed
}
} else if (watch === 'tiled') {
logger.warn('tiled_not_yet_implemented');
console.warn('Watch mode "tiled" not yet implemented (Task 24). Falling back to none.');
}
```
And in the `finally` block, shut down the server:
```typescript
} finally {
await pool.release();
if (dashboardServer) {
await dashboardServer.stop();
}
}
```
Also remove the earlier `if (watch !== 'none')` warning block — it's replaced by the dispatch above.
- [ ] **Step 2: Run smoke against dev with dashboard**
```bash
TEST_URL=http://localhost:8000 SOAK_INVITE_CODE=SOAKTEST npm run soak -- \
--scenario=populate \
--accounts=2 --rooms=1 --cpus-per-room=0 --games-per-room=1 --holes=1 \
--watch=dashboard
```
Expected:
- `Dashboard: http://localhost:7777` printed
- Browser auto-opens (or you open it manually)
- Page shows the dashboard with `WS: healthy`
- During the game, the `room-0` tile shows `phase: playing`, increments `moves`, updates progress
- After game completes, the runner exits 0 and the dashboard stops
- [ ] **Step 3: Commit**
```bash
git add tests/soak/runner.ts
git commit -m "$(cat <<'EOF'
feat(soak): wire --watch=dashboard in runner
Starts DashboardServer on 7777 (configurable), uses its reporter as
ctx.dashboard, auto-opens the URL. Cleans up on exit.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
EOF
)"
```
---
## Phase 6 — Live video click-to-watch
### Task 22: CDP screencast module
Attach a CDP session to a given page, start screencasting JPEG frames at a fixed rate, forward each frame to a callback, detach on stop.
**Files:**
- Create: `tests/soak/core/screencaster.ts`
- [ ] **Step 1: Implement `core/screencaster.ts`**
```typescript
// tests/soak/core/screencaster.ts
import type { Page, CDPSession } from 'playwright-core';
import type { Logger } from './types';
export interface ScreencastOptions {
format?: 'jpeg' | 'png';
quality?: number;
maxWidth?: number;
maxHeight?: number;
everyNthFrame?: number;
}
export type FrameCallback = (jpegBase64: string) => void;
export class Screencaster {
private sessions = new Map<string, CDPSession>();
constructor(private logger: Logger) {}
/**
* Attach a CDP session to the given page and start forwarding frames.
* If already streaming, this is a no-op.
*/
async start(
sessionKey: string,
page: Page,
onFrame: FrameCallback,
opts: ScreencastOptions = {},
): Promise<void> {
if (this.sessions.has(sessionKey)) {
this.logger.warn('screencast_already_running', { sessionKey });
return;
}
const client = await page.context().newCDPSession(page);
this.sessions.set(sessionKey, client);
client.on('Page.screencastFrame', async (evt: { data: string; sessionId: number }) => {
try {
onFrame(evt.data);
await client.send('Page.screencastFrameAck', { sessionId: evt.sessionId });
} catch (err) {
this.logger.warn('screencast_frame_error', {
sessionKey,
error: err instanceof Error ? err.message : String(err),
});
}
});
await client.send('Page.startScreencast', {
format: opts.format ?? 'jpeg',
quality: opts.quality ?? 60,
maxWidth: opts.maxWidth ?? 640,
maxHeight: opts.maxHeight ?? 360,
everyNthFrame: opts.everyNthFrame ?? 2,
});
this.logger.info('screencast_started', { sessionKey });
}
async stop(sessionKey: string): Promise<void> {
const client = this.sessions.get(sessionKey);
if (!client) return;
try {
await client.send('Page.stopScreencast');
await client.detach();
} catch (err) {
this.logger.warn('screencast_stop_error', {
sessionKey,
error: err instanceof Error ? err.message : String(err),
});
}
this.sessions.delete(sessionKey);
this.logger.info('screencast_stopped', { sessionKey });
}
async stopAll(): Promise<void> {
const keys = Array.from(this.sessions.keys());
await Promise.all(keys.map((k) => this.stop(k)));
}
}
```
- [ ] **Step 2: Syntax-check**
```bash
cd tests/soak
npx tsx -e "import('./core/screencaster').then(() => console.log('ok'))"
```
Expected: `ok`.
- [ ] **Step 3: Commit**
```bash
git add tests/soak/core/screencaster.ts
git commit -m "$(cat <<'EOF'
feat(soak): Screencaster — CDP Page.startScreencast wrapper
Attach/detach CDP sessions per Playwright Page, start/stop JPEG
screencasts with configurable quality and frame rate, forward each
frame to a callback. Used by the dashboard for click-to-watch
live video.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
EOF
)"
```
---
### Task 23: Wire screencaster to dashboard click-to-watch
Runner creates a `Screencaster`, passes callbacks into `DashboardServer.onStartStream/onStopStream` that look up the right session and start/stop streaming. Each frame is broadcast to the dashboard.
**Files:**
- Modify: `tests/soak/runner.ts`
- [ ] **Step 1: Import Screencaster and hold a sessions map**
In `runner.ts`, add at the top:
```typescript
import { Screencaster } from './core/screencaster';
```
After `const sessions = await pool.acquire(accounts);`, build a lookup map:
```typescript
const sessionsByKey = new Map<string, typeof sessions[number]>();
for (const s of sessions) sessionsByKey.set(s.key, s);
```
Create the screencaster before the dashboard (or right after sessions are acquired):
```typescript
const screencaster = new Screencaster(logger);
```
- [ ] **Step 2: Replace the `onStartStream`/`onStopStream` no-ops with real wiring**
Update the `DashboardServer` construction (earlier in the function) to accept handlers that close over `screencaster` and `sessionsByKey`. But since those are built after the dashboard, we need to build the dashboard AFTER sessions are acquired. Reorganize:
Move the dashboard construction to AFTER `sessions = await pool.acquire(accounts)`. Then:
```typescript
if (watch === 'dashboard') {
const port = Number(config.dashboardPort ?? 7777);
dashboardServer = new DashboardServer(port, logger, {
onStartStream: (key) => {
const session = sessionsByKey.get(key);
if (!session) {
logger.warn('stream_start_unknown_session', { sessionKey: key });
return;
}
screencaster
.start(key, session.page, (jpegBase64) => {
dashboardServer!.broadcast({ type: 'frame', sessionKey: key, jpegBase64 });
})
.catch((err) =>
logger.error('screencast_start_failed', {
key,
error: err instanceof Error ? err.message : String(err),
}),
);
},
onStopStream: (key) => {
screencaster.stop(key).catch(() => {});
},
onDisconnect: () => {
screencaster.stopAll().catch(() => {});
},
});
await dashboardServer.start();
dashboard = dashboardServer.reporter();
const url = `http://localhost:${port}`;
console.log(`Dashboard: ${url}`);
// ... auto-open
}
```
Make sure the `ctx.dashboard` assignment happens AFTER the dashboard setup (it already does — `const ctx = { ... dashboard, ... }` comes later).
In the `finally` block, add:
```typescript
await screencaster.stopAll();
```
- [ ] **Step 3: Manual test end-to-end**
Run a longer populate game so there's time to click:
```bash
TEST_URL=http://localhost:8000 SOAK_INVITE_CODE=SOAKTEST npm run soak -- \
--scenario=populate \
--accounts=4 --rooms=1 --cpus-per-room=0 --games-per-room=2 --holes=3 \
--watch=dashboard
```
Expected:
1. Dashboard opens, shows 1 room with 4 players
2. Click on any player tile (`soak_00`, `soak_01`, ...)
3. Modal opens, shows live JPEG frames of that player's view of the game
4. Close modal (Esc or Close button) — frames stop, screencast detaches
5. Run completes cleanly
- [ ] **Step 4: Commit**
```bash
git add tests/soak/runner.ts
git commit -m "$(cat <<'EOF'
feat(soak): click-to-watch live video via CDP screencast
Runner creates a Screencaster and wires its start/stop into
DashboardServer.onStartStream/onStopStream. Clicking a player tile
in the dashboard starts a CDP screencast on that session's page,
forwards JPEG frames as WS "frame" messages, closes on modal
dismiss or WS disconnect.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
EOF
)"
```
---
## Phase 7 — Tiled mode
### Task 24: `--watch=tiled` native windows
Launch a second headed browser for the 4 host contexts, position their windows in a 2×2 grid using `page.evaluate(window.moveTo)`.
**Files:**
- Modify: `tests/soak/core/session-pool.ts` — add optional headed-host support
- Modify: `tests/soak/runner.ts` — enable tiled mode
- [ ] **Step 1: Extend `SessionPool` to support headed host contexts**
Add a new option and method to `SessionPool`. In `core/session-pool.ts`:
```typescript
export interface SessionPoolOptions {
targetUrl: string;
inviteCode: string;
credFile: string;
logger: Logger;
browser?: Browser;
contextOptions?: Parameters<Browser['newContext']>[0];
/** If set, the first `headedHostCount` sessions use a separate headed browser. */
headedHostCount?: number;
}
```
Inside the class, add a `headedBrowser` field and extend `acquire`:
```typescript
private headedBrowser: Browser | null = null;
// ... in acquire(), before the loop:
if ((this.opts.headedHostCount ?? 0) > 0 && !this.headedBrowser) {
this.headedBrowser = await chromium.launch({
headless: false,
slowMo: 50,
});
}
for (let i = 0; i < count; i++) {
const account = this.accounts[i];
const useHeaded = i < (this.opts.headedHostCount ?? 0);
const targetBrowser = useHeaded ? this.headedBrowser! : this.browser!;
const context = await targetBrowser.newContext({
...this.opts.contextOptions,
...(useHeaded ? { viewport: { width: 960, height: 540 } } : {}),
});
await this.injectAuth(context, account);
const page = await context.newPage();
await page.goto(this.opts.targetUrl);
// Position headed windows in a 2×2 grid
if (useHeaded) {
const col = i % 2;
const row = Math.floor(i / 2);
const x = col * 960;
const y = row * 560;
await page.evaluate(
([x, y, w, h]) => {
window.moveTo(x, y);
window.resizeTo(w, h);
},
[x, y, 960, 540] as [number, number, number, number],
);
}
const bot = new GolfBot(page);
sessions.push({ account, context, page, bot, key: account.key });
}
```
Update `release` to close the headed browser too:
```typescript
async release(): Promise<void> {
for (const session of this.activeSessions) {
try { await session.context.close(); } catch { /* ignore */ }
}
this.activeSessions = [];
if (this.ownedBrowser) {
try { await this.ownedBrowser.close(); } catch { /* ignore */ }
this.ownedBrowser = null;
this.browser = null;
}
if (this.headedBrowser) {
try { await this.headedBrowser.close(); } catch { /* ignore */ }
this.headedBrowser = null;
}
}
```
- [ ] **Step 2: Wire `watch === 'tiled'` in the runner**
In `runner.ts`, replace the existing `tiled_not_yet_implemented` warning with:
```typescript
const headedHostCount = watch === 'tiled' ? rooms : 0;
const pool = new SessionPool({
targetUrl,
inviteCode,
credFile,
logger,
headedHostCount,
});
```
(Move that `pool` creation up so it's aware of `watch`.)
- [ ] **Step 3: Test tiled mode**
```bash
TEST_URL=http://localhost:8000 SOAK_INVITE_CODE=SOAKTEST npm run soak -- \
--scenario=populate \
--accounts=4 --rooms=2 --cpus-per-room=0 --games-per-room=1 --holes=1 \
--watch=tiled
```
Expected: 2 native Chromium windows appear (one per host), sized ~960×540 and positioned at the upper-left of the screen. They play the game visibly. On exit, windows close.
- [ ] **Step 4: Commit**
```bash
git add tests/soak/core/session-pool.ts tests/soak/runner.ts
git commit -m "$(cat <<'EOF'
feat(soak): --watch=tiled launches N headed host windows
SessionPool accepts headedHostCount; when > 0 it launches a second
Chromium in headed mode, creates those contexts there, and positions
each host window in a 2×2 grid via window.moveTo/resizeTo.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
EOF
)"
```
---
## Phase 8 — Stress scenario
### Task 25: Chaos injector + stress scenario
Short 1-hole games in tight loops, with a 5% per-turn chance of injecting a chaos event (rapid clicks, brief offline toggle, tab navigation).
**Files:**
- Create: `tests/soak/scenarios/stress.ts`
- Create: `tests/soak/scenarios/shared/chaos.ts`
- Modify: `tests/soak/scenarios/index.ts` — register `stress`
- [ ] **Step 1: Create `scenarios/shared/chaos.ts`**
```typescript
// tests/soak/scenarios/shared/chaos.ts
import type { Session, Logger } from '../../core/types';
export type ChaosEvent =
| 'rapid_clicks'
| 'tab_blur'
| 'brief_offline';
const ALL_EVENTS: ChaosEvent[] = ['rapid_clicks', 'tab_blur', 'brief_offline'];
function pickEvent(): ChaosEvent {
return ALL_EVENTS[Math.floor(Math.random() * ALL_EVENTS.length)];
}
export async function maybeInjectChaos(
session: Session,
probability: number,
logger: Logger,
roomId: string,
): Promise<ChaosEvent | null> {
if (Math.random() >= probability) return null;
const event = pickEvent();
logger.info('chaos_injected', { room: roomId, session: session.key, event });
try {
switch (event) {
case 'rapid_clicks': {
// Fire 5 rapid clicks at the player's own cards
for (let i = 0; i < 5; i++) {
await session.page.locator(`#player-cards .card:nth-child(${(i % 6) + 1})`)
.click({ timeout: 300 })
.catch(() => {});
}
break;
}
case 'tab_blur': {
// Briefly dispatch blur then focus
await session.page.evaluate(() => {
window.dispatchEvent(new Event('blur'));
setTimeout(() => window.dispatchEvent(new Event('focus')), 200);
});
break;
}
case 'brief_offline': {
await session.context.setOffline(true);
await new Promise((r) => setTimeout(r, 300));
await session.context.setOffline(false);
break;
}
}
} catch (err) {
logger.warn('chaos_error', {
event,
error: err instanceof Error ? err.message : String(err),
});
}
return event;
}
```
- [ ] **Step 2: Create `scenarios/stress.ts`**
```typescript
// tests/soak/scenarios/stress.ts
import type {
Scenario,
ScenarioContext,
ScenarioResult,
ScenarioError,
Session,
} from '../core/types';
import { runOneMultiplayerGame } from './shared/multiplayer-game';
import { maybeInjectChaos } from './shared/chaos';
interface StressConfig {
gamesPerRoom: number;
holes: number;
decks: number;
rooms: number;
cpusPerRoom: number;
thinkTimeMs: [number, number];
interGamePauseMs: number;
chaosChance: number;
}
function chunk<T>(arr: T[], size: number): T[][] {
const out: T[][] = [];
for (let i = 0; i < arr.length; i += size) out.push(arr.slice(i, i + size));
return out;
}
async function sleep(ms: number): Promise<void> {
return new Promise((r) => setTimeout(r, ms));
}
async function runStressRoom(
ctx: ScenarioContext,
cfg: StressConfig,
roomIdx: number,
sessions: Session[],
): Promise<{ completed: number; errors: ScenarioError[]; chaosFired: number }> {
const roomId = `room-${roomIdx}`;
let completed = 0;
let chaosFired = 0;
const errors: ScenarioError[] = [];
for (let gameNum = 0; gameNum < cfg.gamesPerRoom; gameNum++) {
if (ctx.signal.aborted) break;
ctx.dashboard.update(roomId, { game: gameNum + 1, totalGames: cfg.gamesPerRoom });
// Start a background chaos loop for this game
let chaosActive = true;
const chaosLoop = (async () => {
while (chaosActive && !ctx.signal.aborted) {
await sleep(500);
for (const session of sessions) {
const e = await maybeInjectChaos(session, cfg.chaosChance, ctx.logger, roomId);
if (e) chaosFired++;
}
}
})();
const result = await runOneMultiplayerGame(ctx, sessions, {
roomId,
holes: cfg.holes,
decks: cfg.decks,
cpusPerRoom: cfg.cpusPerRoom,
thinkTimeMs: cfg.thinkTimeMs,
});
chaosActive = false;
await chaosLoop;
if (result.completed) {
completed++;
ctx.logger.info('game_complete', { room: roomId, game: gameNum + 1, turns: result.turns });
} else {
errors.push({
room: roomId,
reason: 'game_failed',
detail: result.error,
timestamp: Date.now(),
});
ctx.logger.error('game_failed', { room: roomId, error: result.error });
}
await sleep(cfg.interGamePauseMs);
}
return { completed, errors, chaosFired };
}
const stress: Scenario = {
name: 'stress',
description: 'Rapid short games for stability & race condition hunting',
needs: { accounts: 16, rooms: 4, cpusPerRoom: 2 },
defaultConfig: {
gamesPerRoom: 50,
holes: 1,
decks: 1,
rooms: 4,
cpusPerRoom: 2,
thinkTimeMs: [50, 150],
interGamePauseMs: 200,
chaosChance: 0.05,
},
async run(ctx: ScenarioContext): Promise<ScenarioResult> {
const start = Date.now();
const cfg = ctx.config as unknown as StressConfig;
const perRoom = Math.floor(ctx.sessions.length / cfg.rooms);
const roomSessions = chunk(ctx.sessions, perRoom);
const results = await Promise.allSettled(
roomSessions.map((s, idx) => runStressRoom(ctx, cfg, idx, s)),
);
let gamesCompleted = 0;
let chaosFired = 0;
const errors: ScenarioError[] = [];
results.forEach((r, idx) => {
if (r.status === 'fulfilled') {
gamesCompleted += r.value.completed;
chaosFired += r.value.chaosFired;
errors.push(...r.value.errors);
} else {
errors.push({
room: `room-${idx}`,
reason: 'room_threw',
detail: r.reason instanceof Error ? r.reason.message : String(r.reason),
timestamp: Date.now(),
});
}
});
return {
gamesCompleted,
errors,
durationMs: Date.now() - start,
customMetrics: { chaos_fired: chaosFired },
};
},
};
export default stress;
```
- [ ] **Step 3: Register stress in the registry**
Edit `tests/soak/scenarios/index.ts`:
```typescript
import type { Scenario } from '../core/types';
import populate from './populate';
import stress from './stress';
const registry: Record<string, Scenario> = {
populate,
stress,
};
export function getScenario(name: string): Scenario | undefined {
return registry[name];
}
export function listScenarios(): Scenario[] {
return Object.values(registry);
}
```
- [ ] **Step 4: Smoke test stress scenario**
```bash
TEST_URL=http://localhost:8000 SOAK_INVITE_CODE=SOAKTEST npm run soak -- \
--scenario=stress \
--accounts=4 --rooms=1 --cpus-per-room=1 --games-per-room=3 --holes=1 \
--watch=none
```
Expected: 3 quick games complete, chaos events in logs (look for `chaos_injected`), exit 0.
- [ ] **Step 5: Commit**
```bash
git add tests/soak/scenarios/stress.ts tests/soak/scenarios/shared/chaos.ts tests/soak/scenarios/index.ts
git commit -m "$(cat <<'EOF'
feat(soak): stress scenario with chaos injection
Rapid 1-hole games with a parallel chaos loop that has a 5% per-turn
chance of firing rapid_clicks, tab_blur, or brief_offline events.
Chaos counts roll up into ScenarioResult.customMetrics.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
EOF
)"
```
---
## Phase 9 — Failure handling
### Task 26: Watchdog + heartbeat wiring
Per-room timeout that fires if no heartbeat arrives within N ms. Runner wires it into `ctx.heartbeat`. Vitest-tested.
**Files:**
- Create: `tests/soak/core/watchdog.ts`
- Create: `tests/soak/tests/watchdog.test.ts`
- Modify: `tests/soak/runner.ts` — wire `heartbeat` to per-room watchdogs
- [ ] **Step 1: Write failing tests**
```typescript
// tests/soak/tests/watchdog.test.ts
import { describe, it, expect, vi, beforeEach, afterEach } from 'vitest';
import { Watchdog } from '../core/watchdog';
describe('Watchdog', () => {
beforeEach(() => vi.useFakeTimers());
afterEach(() => vi.useRealTimers());
it('fires after timeout if no heartbeat', () => {
const onTimeout = vi.fn();
const w = new Watchdog(1000, onTimeout);
w.start();
vi.advanceTimersByTime(1001);
expect(onTimeout).toHaveBeenCalledOnce();
});
it('heartbeat resets the timer', () => {
const onTimeout = vi.fn();
const w = new Watchdog(1000, onTimeout);
w.start();
vi.advanceTimersByTime(800);
w.heartbeat();
vi.advanceTimersByTime(800);
expect(onTimeout).not.toHaveBeenCalled();
vi.advanceTimersByTime(300);
expect(onTimeout).toHaveBeenCalledOnce();
});
it('stop cancels pending timeout', () => {
const onTimeout = vi.fn();
const w = new Watchdog(1000, onTimeout);
w.start();
w.stop();
vi.advanceTimersByTime(2000);
expect(onTimeout).not.toHaveBeenCalled();
});
it('does not fire twice after stop', () => {
const onTimeout = vi.fn();
const w = new Watchdog(1000, onTimeout);
w.start();
vi.advanceTimersByTime(1001);
w.heartbeat();
vi.advanceTimersByTime(1001);
expect(onTimeout).toHaveBeenCalledOnce();
});
});
```
- [ ] **Step 2: Run to verify failure**
```bash
npx vitest run tests/watchdog.test.ts
```
Expected: FAIL.
- [ ] **Step 3: Implement `core/watchdog.ts`**
```typescript
// tests/soak/core/watchdog.ts
export class Watchdog {
private timer: NodeJS.Timeout | null = null;
private fired = false;
constructor(
private timeoutMs: number,
private onTimeout: () => void,
) {}
start(): void {
this.stop();
this.fired = false;
this.timer = setTimeout(() => {
if (this.fired) return;
this.fired = true;
this.onTimeout();
}, this.timeoutMs);
}
heartbeat(): void {
if (this.fired) return;
this.start();
}
stop(): void {
if (this.timer) {
clearTimeout(this.timer);
this.timer = null;
}
}
}
```
- [ ] **Step 4: Verify tests pass**
```bash
npx vitest run tests/watchdog.test.ts
```
Expected: all passing.
- [ ] **Step 5: Wire watchdogs into the runner**
In `runner.ts`, add before building `ctx`:
```typescript
const watchdogs = new Map<string, Watchdog>();
const roomAborters = new Map<string, AbortController>();
for (let i = 0; i < rooms; i++) {
const roomId = `room-${i}`;
const aborter = new AbortController();
roomAborters.set(roomId, aborter);
const w = new Watchdog(60_000, () => {
logger.error('watchdog_fired', { room: roomId });
aborter.abort();
dashboard.update(roomId, { phase: 'error' });
});
w.start();
watchdogs.set(roomId, w);
}
```
Import at the top:
```typescript
import { Watchdog } from './core/watchdog';
```
Set `ctx.heartbeat` to:
```typescript
heartbeat: (roomId: string) => {
const w = watchdogs.get(roomId);
if (w) w.heartbeat();
},
```
In the `finally` block, stop all watchdogs:
```typescript
for (const w of watchdogs.values()) w.stop();
```
Note: for now the `roomAborters` aren't fully plumbed into scenario cancellation — scenarios see the global `ctx.signal` only. This is intentional; per-room abort requires scenario-side awareness and is deferred until a scenario genuinely misbehaves. The watchdog still catches stuck runs and flips the global error state.
- [ ] **Step 6: Commit**
```bash
git add tests/soak/core/watchdog.ts tests/soak/tests/watchdog.test.ts tests/soak/runner.ts
git commit -m "$(cat <<'EOF'
feat(soak): per-room watchdog with heartbeat
Watchdog class with Vitest tests, wired into ctx.heartbeat in the
runner. One watchdog per room, 60s timeout; firing logs an error
and marks the room's dashboard tile as errored.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
EOF
)"
```
---
### Task 27: Artifact capture on failure
When the runner catches an error, snapshot every session's page: screenshot, HTML, console log tail, game state JSON.
**Files:**
- Create: `tests/soak/core/artifacts.ts`
- Modify: `tests/soak/runner.ts` — call `captureArtifacts` in the catch block
- [ ] **Step 1: Implement `core/artifacts.ts`**
```typescript
// tests/soak/core/artifacts.ts
import * as fs from 'fs';
import * as path from 'path';
import type { Session, Logger } from './types';
export interface ArtifactsOptions {
runId: string;
/** Absolute path to the artifacts root, e.g., /path/to/tests/soak/artifacts */
rootDir: string;
logger: Logger;
}
export class Artifacts {
readonly runDir: string;
constructor(private opts: ArtifactsOptions) {
this.runDir = path.join(opts.rootDir, opts.runId);
fs.mkdirSync(this.runDir, { recursive: true });
}
/** Capture everything for a single session. */
async captureSession(session: Session, roomId: string): Promise<void> {
const dir = path.join(this.runDir, roomId);
fs.mkdirSync(dir, { recursive: true });
const prefix = session.key;
try {
const png = await session.page.screenshot({ fullPage: true });
fs.writeFileSync(path.join(dir, `${prefix}.png`), png);
} catch (err) {
this.opts.logger.warn('artifact_screenshot_failed', {
session: session.key,
error: err instanceof Error ? err.message : String(err),
});
}
try {
const html = await session.page.content();
fs.writeFileSync(path.join(dir, `${prefix}.html`), html);
} catch (err) {
this.opts.logger.warn('artifact_html_failed', {
session: session.key,
error: err instanceof Error ? err.message : String(err),
});
}
try {
const state = await session.bot.getGameState();
fs.writeFileSync(
path.join(dir, `${prefix}.state.json`),
JSON.stringify(state, null, 2),
);
} catch (err) {
this.opts.logger.warn('artifact_state_failed', {
session: session.key,
error: err instanceof Error ? err.message : String(err),
});
}
try {
const errors = session.bot.getConsoleErrors?.() ?? [];
fs.writeFileSync(path.join(dir, `${prefix}.console.txt`), errors.join('\n'));
} catch {
// ignore — not all bots expose this
}
}
async captureAll(sessions: Session[]): Promise<void> {
// Best-effort: partition sessions by their key prefix (doesn't matter)
// and write everything under room-unknown/ unless callers pre-partition
await Promise.all(
sessions.map((s) => this.captureSession(s, 'room-unknown')),
);
}
writeSummary(summary: object): void {
fs.writeFileSync(
path.join(this.runDir, 'summary.json'),
JSON.stringify(summary, null, 2),
);
}
}
/** Prune run directories older than `maxAgeMs`. */
export function pruneOldRuns(rootDir: string, maxAgeMs: number, logger: Logger): void {
if (!fs.existsSync(rootDir)) return;
const now = Date.now();
for (const entry of fs.readdirSync(rootDir)) {
const full = path.join(rootDir, entry);
try {
const stat = fs.statSync(full);
if (stat.isDirectory() && now - stat.mtimeMs > maxAgeMs) {
fs.rmSync(full, { recursive: true, force: true });
logger.info('artifact_pruned', { runId: entry });
}
} catch {
// ignore
}
}
}
```
- [ ] **Step 2: Call artifact capture from the runner's error path**
In `runner.ts`, import:
```typescript
import { Artifacts, pruneOldRuns } from './core/artifacts';
```
After `const runId = ...`, instantiate and prune:
```typescript
const artifactsRoot = path.resolve(__dirname, 'artifacts');
const artifacts = new Artifacts({ runId, rootDir: artifactsRoot, logger });
pruneOldRuns(artifactsRoot, 7 * 24 * 3600 * 1000, logger);
```
In the `catch (err)` block, after logging, capture:
```typescript
} catch (err) {
logger.error('run_failed', {
error: err instanceof Error ? err.message : String(err),
stack: err instanceof Error ? err.stack : undefined,
});
try {
const liveSessions = pool['activeSessions'] as Session[] | undefined;
if (liveSessions && liveSessions.length > 0) {
await artifacts.captureAll(liveSessions);
}
} catch (captureErr) {
logger.warn('artifact_capture_failed', {
error: captureErr instanceof Error ? captureErr.message : String(captureErr),
});
}
exitCode = 1;
}
```
(Note: the `pool['activeSessions']` access bypasses visibility to avoid adding a public getter for one call site. Acceptable for an error path in a test harness.)
After successful run, write the summary:
```typescript
artifacts.writeSummary({
runId,
scenario: scenario.name,
targetUrl,
gamesCompleted: result.gamesCompleted,
errors: result.errors,
durationMs: result.durationMs,
customMetrics: result.customMetrics,
});
```
Import `Session` type:
```typescript
import type { Session } from './core/types';
```
- [ ] **Step 3: Verify by forcing a failure**
Kill the server mid-run and confirm artifacts are written:
```bash
# In one terminal
TEST_URL=http://localhost:8000 SOAK_INVITE_CODE=SOAKTEST npm run soak -- \
--scenario=populate --accounts=2 --rooms=1 --cpus-per-room=0 \
--games-per-room=5 --holes=3 --watch=none
# In another: wait ~3 seconds then Ctrl-C the dev server
# The soak run should catch errors and write artifacts
ls tests/soak/artifacts/
ls tests/soak/artifacts/<run-id>/
```
Expected: a run directory exists with `summary.json` (if it got far enough) or per-session screenshots / HTML under `room-unknown/`.
- [ ] **Step 4: Commit**
```bash
git add tests/soak/core/artifacts.ts tests/soak/runner.ts
git commit -m "$(cat <<'EOF'
feat(soak): artifact capture on failure + run summary
Screenshots, HTML, game state, and console errors are captured into
tests/soak/artifacts/<run-id>/ when a scenario throws. Runs older
than 7 days are pruned on startup. Successful runs get a
summary.json next to the artifacts dir.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
EOF
)"
```
---
### Task 28: Graceful shutdown (already partially in place) + exit codes
SIGINT/SIGTERM already flip the abort controller. Formalize the timeout-and-force-exit path and the three exit codes (`0` / `1` / `2`).
**Files:**
- Modify: `tests/soak/runner.ts`
- [ ] **Step 1: Add a graceful shutdown timeout**
In `runner.ts`, replace the existing signal handlers with:
```typescript
let forceExitTimer: NodeJS.Timeout | null = null;
const onSignal = (sig: string) => {
if (abortController.signal.aborted) {
// Second signal: force exit
logger.warn('force_exit', { signal: sig });
process.exit(130);
}
logger.warn('signal_received', { signal: sig });
abortController.abort();
// Hard-kill after 10s if cleanup hangs
forceExitTimer = setTimeout(() => {
logger.error('graceful_shutdown_timeout');
process.exit(130);
}, 10_000);
};
process.on('SIGINT', () => onSignal('SIGINT'));
process.on('SIGTERM', () => onSignal('SIGTERM'));
```
In the `finally` block, clear the force-exit timer:
```typescript
if (forceExitTimer) clearTimeout(forceExitTimer);
```
- [ ] **Step 2: Manual test — Ctrl-C a long run**
```bash
TEST_URL=http://localhost:8000 SOAK_INVITE_CODE=SOAKTEST npm run soak -- \
--scenario=populate --accounts=2 --rooms=1 --cpus-per-room=0 \
--games-per-room=10 --holes=3 --watch=none
# After ~5 seconds: Ctrl-C
```
Expected: runner logs `signal_received`, finishes current turn, prints summary, exits with code 2 (check `echo $?`).
- [ ] **Step 3: Commit**
```bash
git add tests/soak/runner.ts
git commit -m "$(cat <<'EOF'
feat(soak): graceful shutdown with 10s hard-kill fallback
SIGINT/SIGTERM flips the abort signal; scenarios finish the current
turn then exit. If cleanup hangs >10s the runner force-exits. Second
Ctrl-C is an immediate hard kill. Exit codes: 0 success, 1 errors,
2 interrupted.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
EOF
)"
```
---
### Task 29: Periodic health probes
Every 30s, fetch `/api/health` on the target server. Three consecutive failures declare a fatal error and abort.
**Files:**
- Modify: `tests/soak/runner.ts`
- [ ] **Step 1: Add a health probe interval**
In `runner.ts`, after building the abort controller and before running the scenario:
```typescript
let healthFailures = 0;
const healthTimer = setInterval(async () => {
try {
const res = await fetch(`${targetUrl}/api/health`);
if (!res.ok) throw new Error(`status ${res.status}`);
healthFailures = 0;
} catch (err) {
healthFailures++;
logger.warn('health_probe_failed', {
consecutive: healthFailures,
error: err instanceof Error ? err.message : String(err),
});
if (healthFailures >= 3) {
logger.error('health_fatal', { consecutive: healthFailures });
abortController.abort();
}
}
}, 30_000);
```
In the `finally` block:
```typescript
clearInterval(healthTimer);
```
- [ ] **Step 2: Commit**
```bash
git add tests/soak/runner.ts
git commit -m "$(cat <<'EOF'
feat(soak): periodic health probes against target server
Every 30s GET /api/health. Three consecutive failures abort the
run with a fatal error, so staging outages don't get misattributed
to harness bugs.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
EOF
)"
```
---
## Phase 10 — Polish and bring-up
### Task 30: Smoke test script
`tests/soak/scripts/smoke.sh` — the canary run that takes ~30s against local dev.
**Files:**
- Create: `tests/soak/scripts/smoke.sh`
- [ ] **Step 1: Create the script**
```bash
#!/usr/bin/env bash
# Soak harness smoke test — end-to-end canary against local dev.
# Expected runtime: ~30 seconds.
set -euo pipefail
cd "$(dirname "$0")/.."
: "${TEST_URL:=http://localhost:8000}"
: "${SOAK_INVITE_CODE:=SOAKTEST}"
echo "Smoke target: $TEST_URL"
echo "Invite code: $SOAK_INVITE_CODE"
# 1. Health probe
curl -fsS "$TEST_URL/api/health" > /dev/null || {
echo "FAIL: target server unreachable at $TEST_URL"
exit 1
}
# 2. Ensure minimum accounts
if [ ! -f .env.stresstest ]; then
echo "Seeding accounts..."
npm run seed -- --count=4
fi
# 3. Run minimum viable scenario
TEST_URL="$TEST_URL" SOAK_INVITE_CODE="$SOAK_INVITE_CODE" \
npm run soak -- \
--scenario=populate \
--accounts=2 \
--rooms=1 \
--cpus-per-room=0 \
--games-per-room=1 \
--holes=1 \
--watch=none
echo "Smoke PASSED"
```
- [ ] **Step 2: Make it executable and run it**
```bash
chmod +x tests/soak/scripts/smoke.sh
cd tests/soak && bash scripts/smoke.sh
```
Expected: `Smoke PASSED` within ~30s.
- [ ] **Step 3: Commit**
```bash
git add tests/soak/scripts/smoke.sh
git commit -m "$(cat <<'EOF'
feat(soak): smoke test script — 30s end-to-end canary
Confirms the harness works against local dev with the absolute
minimum config. Run after any change.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
EOF
)"
```
---
### Task 31: README + CHECKLIST
Replace the README stub with a full quickstart and flag reference. Add the manual validation checklist.
**Files:**
- Modify: `tests/soak/README.md`
- Create: `tests/soak/CHECKLIST.md`
- [ ] **Step 1: Rewrite `tests/soak/README.md`**
```markdown
# Golf Soak & UX Test Harness
Standalone Playwright-based runner that drives multi-user authenticated
game sessions for scoreboard population and stability testing.
**Spec:** `../../docs/superpowers/specs/2026-04-10-multiplayer-soak-test-design.md`
**Bring-up:** `../../docs/soak-harness-bringup.md`
## Quick start
```bash
cd tests/soak
npm install
# First run only: seed 16 accounts
TEST_URL=http://localhost:8000 SOAK_INVITE_CODE=SOAKTEST npm run seed
# 30-second end-to-end smoke test
bash scripts/smoke.sh
# Populate scoreboard (4 rooms × 4 accounts × 10 long games)
TEST_URL=http://localhost:8000 SOAK_INVITE_CODE=SOAKTEST \
npm run soak:populate
# Stress test (4 rooms × 50 rapid games with chaos)
TEST_URL=http://localhost:8000 SOAK_INVITE_CODE=SOAKTEST \
npm run soak:stress
```
## CLI flags
```
--scenario=populate|stress required
--accounts=<n> total sessions (default: scenario.needs.accounts)
--rooms=<n> default from scenario.needs
--cpus-per-room=<n> default from scenario.needs
--games-per-room=<n> default from scenario.defaultConfig
--holes=<n> default from scenario.defaultConfig
--watch=none|dashboard|tiled default: dashboard
--dashboard-port=<n> default: 7777
--target=<url> default: TEST_URL env
--run-id=<string> default: ISO timestamp
--list print scenarios and exit
--dry-run validate config, don't run
```
Derived: `accounts / rooms` must divide evenly.
## Environment variables
```
TEST_URL target base URL (e.g. https://staging.adlee.work)
SOAK_INVITE_CODE invite code flagged marks_as_test (staging: 5VC2MCCN)
SOAK_HOLES override --holes
SOAK_ROOMS override --rooms
SOAK_ACCOUNTS override --accounts
SOAK_CPUS_PER_ROOM override --cpus-per-room
SOAK_GAMES_PER_ROOM override --games-per-room
SOAK_WATCH override --watch
SOAK_DASHBOARD_PORT override --dashboard-port
```
## Watch modes
- **`none`** — pure headless, JSON logs to stdout. Use for CI and overnight runs.
- **`dashboard`** (default) — HTTP+WS server on localhost:7777 serving a live status grid. Click any player tile to watch their live session via CDP screencast.
- **`tiled`** — 4 native Chromium windows for the host of each room, positioned in a 2×2 grid. Joiners stay headless.
## Scenarios
| Name | Description |
|---|---|
| `populate` | Long 9-hole games with varied CPU personalities, realistic pacing, for populating scoreboards |
| `stress` | Rapid 1-hole games with chaos injection (rapid clicks, offline toggles, tab blur) for hunting race conditions |
Add new scenarios by creating `scenarios/<name>.ts` and registering in `scenarios/index.ts`.
## Architecture
See the design spec for full module breakdown. Key modules:
- `runner.ts` — CLI entry, wires everything together
- `core/session-pool.ts` — owns browser contexts, seeds/logs in 16 accounts
- `core/room-coordinator.ts` — host→joiners room-code handoff
- `core/watchdog.ts` — per-room timeout detection
- `core/screencaster.ts` — CDP Page.startScreencast for live video
- `dashboard/server.ts` — HTTP + WS server
- `scenarios/` — pluggable scenarios
Reuses `../../tests/e2e/bot/golf-bot.ts` unchanged.
## Running tests (unit)
```bash
npm test
```
Tests cover `Deferred`, `RoomCoordinator`, `Watchdog`, and `config`.
Integration-level modules are verified by the smoke test.
```
- [ ] **Step 2: Create `tests/soak/CHECKLIST.md`**
```markdown
# Soak Harness Manual Validation Checklist
Run after any significant change or before calling the implementation complete.
## Bring-up
- [ ] Local dev server is running (`python server/main.py`)
- [ ] `SOAKTEST` invite code exists locally with `marks_as_test=TRUE`
- [ ] `npm install` in `tests/soak/` succeeded
- [ ] `npm run seed -- --count=16` creates/updates 16 accounts
- [ ] `.env.stresstest` has 16 `SOAK_ACCOUNT_NN=...` lines
- [ ] All seeded users show `is_test_account=TRUE` in the DB
## Smoke
- [ ] `bash scripts/smoke.sh` exits 0 within 60s
## Scenarios
- [ ] `--scenario=populate --rooms=1 --games-per-room=1` completes cleanly
- [ ] `--scenario=populate --rooms=4 --games-per-room=1` runs 4 rooms in parallel with no cross-contamination
- [ ] `--scenario=stress --games-per-room=3` logs `chaos_injected` events
## Watch modes
- [ ] `--watch=none` produces JSONL on stdout, nothing else
- [ ] `--watch=dashboard` opens http://localhost:7777, grid renders, tiles update live, WS status shows `healthy`
- [ ] Clicking any player tile opens the video modal and streams live JPEG frames (~10 fps)
- [ ] Closing the modal stops the screencast (check logs for `screencast_stopped`)
- [ ] `--watch=tiled` opens 4 native Chromium windows for the 4 hosts
## Failure modes
- [ ] Ctrl-C during a run → graceful shutdown, summary printed, exit code 2
- [ ] Double Ctrl-C → hard exit (130)
- [ ] Killing the dev server mid-run → health probes fail 3× → fatal abort, artifacts captured, exit 1
- [ ] Artifacts directory contains a subdirectory per failed run with screenshots and state.json
- [ ] Artifacts older than 7 days are pruned on next startup
## Server-side filtering
- [ ] `GET /api/stats/leaderboard` (default) hides soak_* accounts
- [ ] `GET /api/stats/leaderboard?include_test=true` shows soak_* accounts
- [ ] Admin panel user list shows `[Test]` badge on soak_* accounts
- [ ] Admin panel "Include test accounts" checkbox filters them out
- [ ] Admin panel invite codes tab shows `[Test-seed]` next to SOAKTEST
## Post-deploy schema verification
Run after the server-side changes (Tasks 17) ship to each environment.
- [ ] Server restarted (docker compose up -d or CI/CD deploy)
- [ ] Server logs show `User store schema initialized` after restart
- [ ] `\d users_v2` on target DB shows `is_test_account` column with default `false`
- [ ] `\d invite_codes` shows `marks_as_test` column with default `false`
- [ ] `\d leaderboard_overall` shows `is_test_account` column
- [ ] `\di idx_users_v2_is_test_account` shows the partial index
- [ ] `SELECT count(*) FROM leaderboard_overall` returns nonzero (view re-populated after rebuild)
- [ ] Default leaderboard query still works: `curl .../api/stats/leaderboard` returns entries
- [ ] `?include_test=true` parameter is accepted (no 422/500)
## Staging bring-up (final step)
- [ ] `UPDATE invite_codes SET marks_as_test = TRUE WHERE code = '5VC2MCCN';` run on staging
- [ ] `SOAK_INVITE_CODE=5VC2MCCN TEST_URL=https://staging.adlee.work npm run seed -- --count=16` seeds staging accounts
- [ ] Staging run with `--scenario=populate --watch=none` completes
- [ ] Staging leaderboard with `include_test=true` shows the soak accounts
- [ ] Staging leaderboard default (no param) does NOT show the soak accounts
```
- [ ] **Step 3: Commit**
```bash
git add tests/soak/README.md tests/soak/CHECKLIST.md
git commit -m "$(cat <<'EOF'
docs(soak): full README + manual validation checklist
Quickstart, flag reference, env var reference, scenario table, and
the bring-up/validation checklist that gates calling the harness
implementation complete.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
EOF
)"
```
---
### Task 32: Staging bring-up (manual, no code)
This is a documentation-only task — the actual run happens on your workstation. Listed here so the implementation plan is complete end to end.
- [ ] **Step 1: Flag `5VC2MCCN` as test-seed on staging**
From your workstation (requires DB access to staging):
```bash
ssh root@129.212.150.189 \
'docker exec -i golfgame-postgres psql -U postgres -d golfgame' <<'EOF'
UPDATE invite_codes SET marks_as_test = TRUE WHERE code = '5VC2MCCN';
SELECT code, max_uses, use_count, marks_as_test FROM invite_codes WHERE code = '5VC2MCCN';
EOF
```
Expected: `marks_as_test | t`.
(The exact docker container name may differ — adjust based on `docker ps` on the staging host.)
- [ ] **Step 2: Seed the 16 staging accounts**
```bash
cd tests/soak
rm -f .env.stresstest
TEST_URL=https://staging.adlee.work \
SOAK_INVITE_CODE=5VC2MCCN \
npm run seed -- --count=16
```
Expected: `.env.stresstest` populated with 16 entries.
- [ ] **Step 3: Run populate against staging**
```bash
TEST_URL=https://staging.adlee.work \
SOAK_INVITE_CODE=5VC2MCCN \
npm run soak -- \
--scenario=populate \
--rooms=4 \
--games-per-room=3 \
--holes=3 \
--watch=dashboard
```
Expected: dashboard opens, 4 rooms play 3 games each, staging scoreboard accumulates data. Exit 0 at the end.
- [ ] **Step 4: Verify scoreboard filtering on staging**
```bash
# Should NOT contain soak_* usernames
curl -s "https://staging.adlee.work/api/stats/leaderboard?metric=wins" | jq '.entries[] | select(.username | startswith("soak_"))'
# Should contain soak_* usernames
curl -s "https://staging.adlee.work/api/stats/leaderboard?metric=wins&include_test=true" | jq '.entries[] | select(.username | startswith("soak_"))'
```
Expected: first returns nothing, second returns entries.
- [ ] **Step 5: Mark implementation complete**
Check off all items in `tests/soak/CHECKLIST.md` that correspond to this plan. Commit the filled-in checklist if you want a record:
```bash
git add tests/soak/CHECKLIST.md
git commit -m "docs(soak): checklist passed on initial staging run"
```
---
## Phase 11 — Version bump
### Task 33: Bump to v3.3.4 and add footer to admin.html
Updates all HTML footers from `v3.1.6` to `v3.3.4`, adds a footer to admin.html which currently has none, bumps `pyproject.toml`.
**Files:**
- Modify: `client/index.html` — both footer occurrences (L58, L291)
- Modify: `client/admin.html` — add footer
- Modify: `pyproject.toml` — version field
- [ ] **Step 1: Update `client/index.html` footers**
```bash
grep -n "v3\.1\.6" client/index.html
```
For each match, replace `v3.1.6` with `v3.3.4`. There should be exactly two matches.
- [ ] **Step 2: Add footer to `client/admin.html`**
Find the closing `</body>` in `client/admin.html` and add a footer just before it:
```html
<footer class="app-footer" style="text-align: center; padding: 16px; color: var(--muted, #666); font-size: 12px;">v3.3.4 &copy; Aaron D. Lee</footer>
</body>
```
(The inline style is a fallback — admin.css may already have an `.app-footer` class; if so, drop the inline styles.)
```bash
grep -n "app-footer" client/admin.css 2>/dev/null
```
If the class exists, use just `<footer class="app-footer">v3.3.4 &copy; Aaron D. Lee</footer>`.
- [ ] **Step 3: Bump `pyproject.toml`**
```bash
sed -i 's/^version = "3\.1\.6"$/version = "3.3.4"/' pyproject.toml
grep version pyproject.toml
```
Expected: `version = "3.3.4"`.
- [ ] **Step 4: Verify in the browser**
Restart the dev server, open http://localhost:8000 and http://localhost:8000/admin.html. Confirm both show `v3.3.4` in the footer.
- [ ] **Step 5: Commit**
```bash
git add client/index.html client/admin.html pyproject.toml
git commit -m "$(cat <<'EOF'
chore: bump version to v3.3.4
Updates client/index.html footer (×2) and pyproject.toml from
v3.1.6 → v3.3.4, and adds a matching footer to client/admin.html
which previously had none.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
EOF
)"
```
---
## Summary
33 tasks across 11 phases:
| Phase | Tasks | Milestone |
|---|---|---|
| 1 — Server changes | 18 | Stats filter works, test accounts are separable |
| 2 — Harness scaffolding | 912 | Core pure-logic modules with Vitest tests pass |
| 3 — SessionPool + seeding | 1314 | `.env.stresstest` seeded via real HTTP |
| 4 — First run | 1518 | **`--watch=none` smoke test passes end-to-end** |
| 5 — Dashboard | 1921 | Live status grid in browser |
| 6 — Live video | 2223 | Click-to-watch CDP screencast |
| 7 — Tiled mode | 24 | Native host windows |
| 8 — Stress scenario | 25 | Chaos injection runs clean |
| 9 — Failure handling | 2629 | Watchdog + artifacts + graceful shutdown + health probes |
| 10 — Polish | 3031 | Smoke script + README + CHECKLIST |
| 11 — Version bump | 33 | v3.3.4 everywhere |
(Task 32 is the manual staging bring-up — no code.)
Dependencies between tasks:
- Tasks 18 are independent of the harness (ship them first if you want immediate value for admins)
- Tasks 918 are strictly sequential (each builds on the previous)
- Tasks 1921, 2223, 24, 25 are independent of each other — can be done in any order after Task 18
- Tasks 2629 can be done after Task 18 but are most valuable after Task 25
- Tasks 3031 come last before staging
- Task 33 is independent and can be done any time after Task 8