Makes the deployment path explicit in Task 1: traces the existing lifespan → get_user_store → initialize_schema → conn.execute(SCHEMA_SQL) flow, notes that the DO $$/IF NOT EXISTS pattern is the same one every post-v1 column migration uses, and explains why rollback is safe (additive changes only). Adds two new verification steps to Task 1: - Step 7: post-deploy psql checks against staging - Step 8: same against production Adds a "Post-deploy schema verification" block to CHECKLIST.md so the schema state is verified after every server restart against each target environment. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
168 KiB
Multiplayer Soak & UX Test Harness — Implementation Plan
For agentic workers: REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (
- [ ]) syntax for tracking.
Goal: Build a standalone Playwright-based soak runner in tests/soak/ that drives 16 authenticated browser sessions across 4 concurrent rooms playing many multiplayer games, with pluggable scenarios, a click-to-watch dashboard via CDP screencast, and strict per-room failure isolation.
Architecture: Single-process node runner reusing the existing GolfBot class from tests/e2e/bot/. One shared browser (16 contexts) by default; WATCH=tiled uses a second headed browser for the 4 host contexts. Scenarios are plain TS modules exported from tests/soak/scenarios/. Dashboard is a tiny HTTP+WS server serving one static page that pushes live status and on-demand CDP screencast frames.
Tech Stack: TypeScript + tsx (no build step), Playwright Core, ws (WebSocket server), Vitest for unit tests, FastAPI + asyncpg (existing server), PostgreSQL (existing).
Spec: docs/superpowers/specs/2026-04-10-multiplayer-soak-test-design.md
Testing Strategy Notes
- Server-side Python changes: The existing test suite mocks stores with
AsyncMockand has no real-Postgres fixtures. Rather than inventing a new fixture pattern for this plan, server tasks use curl-based verification against a running local dev server as the explicit verification step after each commit. Runpython server/main.pyin another terminal (requires Postgres + Redis running — seedocs/INSTALL.md). - TypeScript harness logic: Unit-tested with Vitest for pure modules (Deferred, RoomCoordinator, Watchdog, Config). Integration-level modules (SessionPool, Dashboard, Screencaster, Scenarios) are verified by running the harness itself via the smoke test.
- End-to-end validation:
tests/soak/scripts/smoke.shis the canary — after every non-trivial change, run it against local dev and expect exit 0 within ~30s.
Phase 1 — Server-side changes (independent, ships first)
Task 1: Schema migration for is_test_account and marks_as_test
Add two columns, one partial index, and rebuild the leaderboard_overall materialized view to include is_test_account (so the filter works through the view fast path).
Deploy path (this is load-bearing — read before editing):
The existing codebase applies schema changes via inline DO $$ BEGIN IF NOT EXISTS (...) THEN ALTER TABLE ... END IF; END $$; blocks inside SCHEMA_SQL in server/stores/user_store.py. That string gets executed on every server startup by UserStore.create() → initialize_schema() → conn.execute(SCHEMA_SQL), which is called from the FastAPI lifespan via get_user_store(config.POSTGRES_URL) in server/main.py. Same pattern added every other post-v1 column (is_banned, force_password_reset, last_seen_at, rating, and many others — see the existing DO blocks in SCHEMA_SQL).
What this means for deploy:
- No separate migration tool needed. CI/CD rebuilds the image,
docker compose up -drestarts the container, lifespan fires,SCHEMA_SQLexecutes, the newDO $$blocks see the missing columns andALTER TABLE ADD COLUMNthem in place. - Idempotent by construction. Re-running against an already-migrated DB is a no-op — the
IF NOT EXISTSguard in each DO block skips the ALTER. - Fresh installs work.
CREATE TABLE IF NOT EXISTS users_v2uses the current column list; the ADD COLUMN DO blocks are no-ops because the column is already there from the CREATE. - Matview rebuild is atomic. The
DO $$block that DROPs+CREATEsleaderboard_overallruns inside a single transaction.CREATE MATERIALIZED VIEW ... AS SELECTpopulates immediately (noWITH NO DATA), so concurrent readers never see an empty or missing view — they see either the old version (pre-commit) or the new version (post-commit). - Rollback is safe. All changes are additive. If you have to revert the code, the new columns just sit unused — old code never references them, so nothing breaks.
Files:
-
Modify:
server/stores/user_store.py— append toSCHEMA_SQL(ALTER blocks near L79–L98 and the matview block near L298–L335) -
Step 1: Add column migration to
SCHEMA_SQL
Open server/stores/user_store.py. Inside the first DO $$ BEGIN ... END $$; block (around line 80–98 that handles admin columns), append the is_test_account column check. Then add a second ALTER for invite_codes.marks_as_test in a new DO $$ block right after.
Add after the existing last_seen_at check (before END $$; on line ~98):
IF NOT EXISTS (SELECT 1 FROM information_schema.columns
WHERE table_name = 'users_v2' AND column_name = 'is_test_account') THEN
ALTER TABLE users_v2 ADD COLUMN is_test_account BOOLEAN DEFAULT FALSE;
END IF;
Then, immediately after the END $$; that closes the users_v2 admin block, add a new block for invite_codes:
-- Add marks_as_test to invite_codes if not exists
DO $$
BEGIN
IF NOT EXISTS (SELECT 1 FROM information_schema.columns
WHERE table_name = 'invite_codes' AND column_name = 'marks_as_test') THEN
ALTER TABLE invite_codes ADD COLUMN marks_as_test BOOLEAN DEFAULT FALSE;
END IF;
END $$;
- Step 2: Add partial index on
is_test_account
Find the indexes block near line 338. After the existing idx_users_banned index (line ~344), add:
CREATE INDEX IF NOT EXISTS idx_users_v2_is_test_account ON users_v2(is_test_account)
WHERE is_test_account = TRUE;
- Step 3: Rebuild
leaderboard_overallmaterialized view to includeis_test_account
Find the existing matview block at line ~298. Modify the version-check DO block so the view is dropped and recreated if it lacks the is_test_account column. Replace the existing block:
-- Leaderboard materialized view (refreshed periodically)
-- Drop and recreate if missing is_test_account column (soak harness migration)
DO $$
BEGIN
IF EXISTS (SELECT 1 FROM pg_matviews WHERE matviewname = 'leaderboard_overall') THEN
-- Check if is_test_account column exists in the view
IF NOT EXISTS (
SELECT 1 FROM information_schema.columns
WHERE table_name = 'leaderboard_overall' AND column_name = 'is_test_account'
) THEN
DROP MATERIALIZED VIEW leaderboard_overall;
END IF;
END IF;
IF NOT EXISTS (SELECT 1 FROM pg_matviews WHERE matviewname = 'leaderboard_overall') THEN
EXECUTE '
CREATE MATERIALIZED VIEW leaderboard_overall AS
SELECT
u.id as user_id,
u.username,
COALESCE(u.is_test_account, FALSE) as is_test_account,
s.games_played,
s.games_won,
ROUND(s.games_won::numeric / NULLIF(s.games_played, 0) * 100, 1) as win_rate,
s.rounds_won,
ROUND(s.total_points::numeric / NULLIF(s.total_rounds, 0), 1) as avg_score,
s.best_score as best_round_score,
s.knockouts,
s.best_win_streak,
COALESCE(s.rating, 1500) as rating,
s.last_game_at
FROM player_stats s
JOIN users_v2 u ON s.user_id = u.id
WHERE s.games_played >= 5
AND u.deleted_at IS NULL
AND (u.is_banned = false OR u.is_banned IS NULL)
';
END IF;
END $$;
Note: the only differences from the existing block are the changed comment, the changed column-existence check (is_test_account instead of rating), and the new COALESCE(u.is_test_account, FALSE) as is_test_account column in the SELECT. Everything else stays identical.
- Step 4: Start the server to run migrations
Run (in another terminal, with Postgres + Redis up):
cd /home/alee/Sources/golfgame
python server/main.py
Expected: server starts cleanly, no errors about is_test_account or marks_as_test or leaderboard_overall.
- Step 5: Verify schema via psql
Connect to the dev database and confirm:
psql -d golfgame -c "\d users_v2" | grep is_test_account
psql -d golfgame -c "\d invite_codes" | grep marks_as_test
psql -d golfgame -c "\d leaderboard_overall" | grep is_test_account
psql -d golfgame -c "\di idx_users_v2_is_test_account"
Expected: all four commands return matching rows.
- Step 6: Commit
git add server/stores/user_store.py
git commit -m "$(cat <<'EOF'
feat(server): add is_test_account + marks_as_test schema
New columns support separating soak-harness test traffic from real
user traffic in stats queries. Rebuilds leaderboard_overall matview
to include is_test_account so the fast path stays filterable.
Migration is idempotent via DO $$ / IF NOT EXISTS blocks inside
SCHEMA_SQL, which runs on every server startup — same mechanism
every existing post-v1 column migration uses.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
EOF
)"
- Step 7: Post-deploy verification (staging)
After this commit ships to staging via CI/CD (or docker compose up -d on the staging host), verify the migration actually applied:
ssh root@129.212.150.189 << 'REMOTE'
cd /opt/golfgame
# Find the postgres container name (it may vary across compose files)
PG_CONTAINER=$(docker compose -f docker-compose.staging.yml ps -q postgres)
docker exec -i $PG_CONTAINER psql -U postgres -d golfgame << 'SQL'
-- Confirm columns exist
\d users_v2
\d invite_codes
\d leaderboard_overall
-- Targeted checks
SELECT column_name, data_type, column_default
FROM information_schema.columns
WHERE table_name = 'users_v2' AND column_name = 'is_test_account';
SELECT column_name, data_type, column_default
FROM information_schema.columns
WHERE table_name = 'invite_codes' AND column_name = 'marks_as_test';
SELECT column_name FROM information_schema.columns
WHERE table_name = 'leaderboard_overall' AND column_name = 'is_test_account';
-- Partial index
SELECT indexname, indexdef FROM pg_indexes
WHERE indexname = 'idx_users_v2_is_test_account';
SQL
REMOTE
Expected (all four present):
users_v2.is_test_accountwith defaultfalseinvite_codes.marks_as_testwith defaultfalseleaderboard_overallhas anis_test_accountcolumnidx_users_v2_is_test_accountexists
If any of these are missing, the server didn't actually restart (or restarted but the container has a stale image). Check docker compose logs golfgame for the line User store schema initialized — if it's not there, the migration never ran.
- Step 8: Post-deploy verification (production)
Same check, against prod, after the prod deploy:
ssh root@165.245.152.51 << 'REMOTE'
cd /opt/golfgame
PG_CONTAINER=$(docker compose -f docker-compose.prod.yml ps -q postgres)
docker exec -i $PG_CONTAINER psql -U postgres -d golfgame -c "\d users_v2" | grep is_test_account
docker exec -i $PG_CONTAINER psql -U postgres -d golfgame -c "\d invite_codes" | grep marks_as_test
docker exec -i $PG_CONTAINER psql -U postgres -d golfgame -c "\d leaderboard_overall" | grep is_test_account
REMOTE
Expected: three matching rows. If prod migration fails, the rollback story is clean — revert the commit, redeploy, old code keeps working because it never referenced the new columns.
Task 2: Propagate is_test_account through User model and user_store
Wire the new column into the User dataclass, create_user signature, _row_to_user mapping, and every SELECT list that already pulls user columns.
Files:
-
Modify:
server/models/user.py—Userdataclass (L22–L68) +to_dict(L82–L116) +from_dict(L118+) -
Modify:
server/stores/user_store.py—create_user(L454–L501),_row_to_user(L997–L1020),get_user_by_id/get_user_by_username/get_user_by_emailSELECT lists (L503–L570) -
Step 1: Add
is_test_accountto theUserdataclass
In server/models/user.py, add a new field to the User dataclass (after force_password_reset on L68):
is_test_account: bool = False
Update the docstring Attributes: block around L45 to include:
is_test_account: True for accounts created by the soak test harness.
- Step 2: Include
is_test_accountinto_dictandfrom_dict
In User.to_dict at L82, add to the d dict (after force_password_reset):
"is_test_account": self.is_test_account,
In User.from_dict, add the corresponding parse — find where force_password_reset is parsed and add the same pattern:
is_test_account=d.get("is_test_account", False),
- Step 3: Add
is_test_accountparameter tocreate_user
In server/stores/user_store.py at L454, add a new parameter:
async def create_user(
self,
username: str,
password_hash: str,
email: Optional[str] = None,
role: UserRole = UserRole.USER,
guest_id: Optional[str] = None,
verification_token: Optional[str] = None,
verification_expires: Optional[datetime] = None,
is_test_account: bool = False,
) -> Optional[User]:
Update the docstring to add a line in Args: describing is_test_account.
Change the INSERT SQL block to include the new column:
row = await conn.fetchrow(
"""
INSERT INTO users_v2 (username, password_hash, email, role, guest_id,
verification_token, verification_expires,
is_test_account)
VALUES ($1, $2, $3, $4, $5, $6, $7, $8)
RETURNING id, username, email, password_hash, role, email_verified,
verification_token, verification_expires, reset_token, reset_expires,
guest_id, deleted_at, preferences, created_at, last_login, last_seen_at,
is_active, is_banned, ban_reason, force_password_reset, is_test_account
""",
username,
password_hash,
email,
role.value,
guest_id,
verification_token,
verification_expires,
is_test_account,
)
- Step 4: Update
_row_to_usermapping
In server/stores/user_store.py at L997, add to the User(...) call (after force_password_reset):
is_test_account=row.get("is_test_account", False) or False,
- Step 5: Update all other SELECT lists in user_store
Find every query in server/stores/user_store.py that returns a full user row and passes it to _row_to_user. Add is_test_account to the SELECT column list for each. Grep to find them:
grep -n "is_active, is_banned, ban_reason, force_password_reset" server/stores/user_store.py
For each match, append , is_test_account to the SELECT list. Expected locations:
-
create_userINSERT ... RETURNING (already updated in Step 3) -
get_user_by_idat L503 -
get_user_by_usernameat L519 -
get_user_by_email(find it) -
Any other
SELECT... FROM users_v2 that calls_row_to_user -
Step 6: Restart server, verify no errors
# Kill and restart the dev server
python server/main.py
Expected: server starts cleanly. Any query that touches users now returns is_test_account correctly.
- Step 7: Smoke test via curl
# Register a throwaway test user (no invite code needed if DAILY_OPEN_SIGNUPS > 0 locally,
# or use the 5VC2MCCN invite code if INVITE_ONLY=true)
# Set PW to any password of your choice (>= 8 chars).
PW='SomeTestPw_1!'
curl -sX POST http://localhost:8000/api/auth/register \
-H 'Content-Type: application/json' \
-d "{\"username\":\"soaktest_smoke1\",\"password\":\"$PW\",\"email\":\"soaktest_smoke1@example.com\",\"invite_code\":\"5VC2MCCN\"}"
Expected: HTTP 200 with {"user":{...},"token":"..."}. The registration path now runs through the new column without errors even though the value is still always FALSE at this stage.
- Step 8: Commit
git add server/models/user.py server/stores/user_store.py
git commit -m "$(cat <<'EOF'
feat(server): propagate is_test_account through User model & store
User dataclass, create_user, and all SELECT lists now round-trip the
new column. Value is always FALSE until Task 4 wires the register
flow to the invite code's marks_as_test flag.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
EOF
)"
Task 3: Expose marks_as_test on InviteCode and add lookup helper
validate_invite_code currently returns a bare bool. We need a new helper that returns the full row so the register flow can check marks_as_test without a second query.
Files:
-
Modify:
server/services/admin_service.py—InviteCodedataclass (L115–L138),get_invite_codesSELECT (L1106–L1141), add newget_invite_code_detailsmethod -
Step 1: Add
marks_as_testfield toInviteCodedataclass
In server/services/admin_service.py at L115:
@dataclass
class InviteCode:
"""Invite code details."""
code: str
created_by: str
created_by_username: str
created_at: datetime
expires_at: datetime
max_uses: int
use_count: int
is_active: bool
marks_as_test: bool = False
Update to_dict at L127 to include the field:
def to_dict(self) -> dict:
return {
"code": self.code,
"created_by": self.created_by,
"created_by_username": self.created_by_username,
"created_at": self.created_at.isoformat() if self.created_at else None,
"expires_at": self.expires_at.isoformat() if self.expires_at else None,
"max_uses": self.max_uses,
"use_count": self.use_count,
"is_active": self.is_active,
"remaining_uses": max(0, self.max_uses - self.use_count),
"marks_as_test": self.marks_as_test,
}
- Step 2: Update
get_invite_codesSELECT to includemarks_as_test
Find get_invite_codes at L1106. Modify the SQL to pull the column and pass it through:
async def get_invite_codes(self, include_expired: bool = False) -> List[InviteCode]:
"""List all invite codes."""
async with self.pool.acquire() as conn:
sql = """
SELECT c.code, c.created_by, u.username as created_by_username,
c.created_at, c.expires_at,
c.max_uses, c.use_count, c.is_active,
COALESCE(c.marks_as_test, FALSE) as marks_as_test
FROM invite_codes c
LEFT JOIN users_v2 u ON c.created_by = u.id
"""
Find the list comprehension that constructs InviteCode(...) objects and add the new kwarg:
InviteCode(
code=row["code"],
created_by=str(row["created_by"]),
created_by_username=row["created_by_username"] or "unknown",
created_at=row["created_at"].replace(tzinfo=timezone.utc) if row["created_at"] else None,
expires_at=row["expires_at"].replace(tzinfo=timezone.utc) if row["expires_at"] else None,
max_uses=row["max_uses"],
use_count=row["use_count"],
is_active=row["is_active"],
marks_as_test=row["marks_as_test"],
)
- Step 3: Add new
get_invite_code_detailsmethod
Add a new method right after validate_invite_code (around L1214) that returns the row with marks_as_test. The register flow will call this to resolve the flag. Place it between validate_invite_code and use_invite_code:
async def get_invite_code_details(self, code: str) -> Optional[dict]:
"""
Look up an invite code's row including marks_as_test.
Returns None if the code does not exist. Does NOT validate expiry
or usage — use validate_invite_code for that. This is purely a
helper for the register flow to discover the test-seed flag.
"""
async with self.pool.acquire() as conn:
row = await conn.fetchrow(
"""
SELECT code, max_uses, use_count, is_active,
COALESCE(marks_as_test, FALSE) as marks_as_test
FROM invite_codes
WHERE code = $1
""",
code,
)
if not row:
return None
return {
"code": row["code"],
"max_uses": row["max_uses"],
"use_count": row["use_count"],
"is_active": row["is_active"],
"marks_as_test": row["marks_as_test"],
}
- Step 4: Verify with curl via admin panel endpoint
Assuming you have an admin token from a local dev user. Hit the existing admin invites listing:
# Replace TOKEN with a valid admin JWT
curl -s http://localhost:8000/api/admin/invites \
-H "Authorization: Bearer $TOKEN" | jq '.codes[0]'
Expected: response includes "marks_as_test": false on at least one code.
- Step 5: Commit
git add server/services/admin_service.py
git commit -m "$(cat <<'EOF'
feat(server): expose marks_as_test on InviteCode
Adds the field to the dataclass, SELECT list in get_invite_codes,
and a new get_invite_code_details helper that the register flow
will use to discover whether an invite should flag new accounts
as test accounts.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
EOF
)"
Task 4: Wire register flow to set is_test_account from invite
When a user registers with an invite whose marks_as_test=TRUE, the new account is flagged. The plumbing lives in two places: the router reads the flag and passes it to the service; the service passes it to the store.
Files:
-
Modify:
server/routers/auth.py—registerhandler (L224–L320) -
Modify:
server/services/auth_service.py—registermethod (L98–L178) -
Step 1: Add
is_test_accountparameter toauth_service.register
In server/services/auth_service.py at L98, add the new parameter:
async def register(
self,
username: str,
password: str,
email: Optional[str] = None,
guest_id: Optional[str] = None,
is_test_account: bool = False,
) -> RegistrationResult:
Update the docstring Args: block:
is_test_account: Mark this user as a soak-harness test account.
Pass the value through to create_user at L146:
user = await self.user_store.create_user(
username=username,
password_hash=password_hash,
email=email,
role=UserRole.USER,
guest_id=guest_id,
verification_token=verification_token,
verification_expires=verification_expires,
is_test_account=is_test_account,
)
- Step 2: Update the router to resolve
marks_as_testand pass it through
In server/routers/auth.py, find the register handler at L224. After the existing invite-code validation block (around L248–L252), fetch the invite details and compute is_test:
# --- Invite code validation ---
is_test_account = False
if has_invite:
if not _admin_service:
raise HTTPException(status_code=503, detail="Admin service not initialized")
if not await _admin_service.validate_invite_code(request_body.invite_code):
raise HTTPException(status_code=400, detail="Invalid or expired invite code")
# Check if this invite flags new accounts as test accounts
invite_details = await _admin_service.get_invite_code_details(request_body.invite_code)
if invite_details and invite_details.get("marks_as_test"):
is_test_account = True
Then pass it to auth_service.register at L276:
# --- Create the account ---
result = await auth_service.register(
username=request_body.username,
password=request_body.password,
email=request_body.email,
is_test_account=is_test_account,
)
- Step 3: Flag the dev invite code for testing
Before we can test end-to-end locally, we need an invite code with marks_as_test=TRUE in the local dev DB. Run (once, manually):
# First, check if 5VC2MCCN exists locally (it probably doesn't — that's staging's code).
# Create a local test invite code and flag it:
psql -d golfgame <<'EOF'
-- Create a local dev test-seed invite if not exists
INSERT INTO invite_codes (code, created_by, expires_at, max_uses, is_active, marks_as_test)
SELECT 'SOAKTEST', id, NOW() + INTERVAL '10 years', 100, TRUE, TRUE
FROM users_v2 WHERE role = 'admin' LIMIT 1
ON CONFLICT (code) DO UPDATE SET marks_as_test = TRUE;
-- Verify
SELECT code, max_uses, use_count, marks_as_test FROM invite_codes WHERE code = 'SOAKTEST';
EOF
Expected: marks_as_test | t in the last row.
- Step 4: Verify register flow sets
is_test_account
Restart the dev server, then:
curl -sX POST http://localhost:8000/api/auth/register \
-H 'Content-Type: application/json' \
-d "{\"username\":\"soaktest_register1\",\"password\":\"$PW\",\"email\":\"soaktest_register1@example.com\",\"invite_code\":\"SOAKTEST\"}"
# Verify in DB
psql -d golfgame -c "SELECT username, is_test_account FROM users_v2 WHERE username = 'soaktest_register1';"
Expected: is_test_account | t.
- Step 5: Verify non-test invite does NOT flag new accounts
# Create a non-test invite
psql -d golfgame <<'EOF'
INSERT INTO invite_codes (code, created_by, expires_at, max_uses, is_active, marks_as_test)
SELECT 'NORMAL01', id, NOW() + INTERVAL '10 years', 10, TRUE, FALSE
FROM users_v2 WHERE role = 'admin' LIMIT 1
ON CONFLICT (code) DO UPDATE SET marks_as_test = FALSE;
EOF
curl -sX POST http://localhost:8000/api/auth/register \
-H 'Content-Type: application/json' \
-d "{\"username\":\"realuser_smoke1\",\"password\":\"$PW\",\"email\":\"realuser_smoke1@example.com\",\"invite_code\":\"NORMAL01\"}"
psql -d golfgame -c "SELECT username, is_test_account FROM users_v2 WHERE username = 'realuser_smoke1';"
Expected: is_test_account | f.
- Step 6: Commit
git add server/routers/auth.py server/services/auth_service.py
git commit -m "$(cat <<'EOF'
feat(server): register flow flags accounts from test-seed invites
When a user registers with an invite_code whose marks_as_test=TRUE,
their users_v2.is_test_account is set to TRUE. Normal invite codes
and invite-less signups are unaffected.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
EOF
)"
Task 5: Stats filtering (include_test parameter)
Thread an include_test: bool = False parameter through get_leaderboard, get_player_rank, and the corresponding router handlers. Default is False — real users never see soak traffic.
Files:
-
Modify:
server/services/stats_service.py—get_leaderboard(L169),get_player_rank(L249) -
Modify:
server/routers/stats.py—get_leaderboardroute (L157),get_player_rankroute (L227),get_my_rankroute (L348) -
Step 1: Add
include_testtoget_leaderboardservice method
In server/services/stats_service.py at L169:
async def get_leaderboard(
self,
metric: str = "wins",
limit: int = 50,
offset: int = 0,
include_test: bool = False,
) -> List[LeaderboardEntry]:
Inside the method, find both SQL paths (materialized view and fallback). In the view path at L208, change the WHERE clause:
if view_exists:
# Use materialized view for performance
rows = await conn.fetch(f"""
SELECT
user_id, username, games_played, games_won,
win_rate, avg_score, knockouts, best_win_streak,
COALESCE(rating, 1500) as rating,
ROW_NUMBER() OVER (ORDER BY {column} {direction}) as rank
FROM leaderboard_overall
WHERE ($3 OR NOT is_test_account)
ORDER BY {column} {direction}
LIMIT $1 OFFSET $2
""", limit, offset, include_test)
In the fallback path at L220, add the WHERE clause and parameter:
else:
# Fall back to direct query
rows = await conn.fetch(f"""
SELECT
s.user_id, u.username, s.games_played, s.games_won,
ROUND(s.games_won::numeric / NULLIF(s.games_played, 0) * 100, 1) as win_rate,
ROUND(s.total_points::numeric / NULLIF(s.total_rounds, 0), 1) as avg_score,
s.knockouts, s.best_win_streak,
COALESCE(s.rating, 1500) as rating,
ROW_NUMBER() OVER (ORDER BY {column} {direction}) as rank
FROM player_stats s
JOIN users_v2 u ON s.user_id = u.id
WHERE s.games_played >= 5
AND u.deleted_at IS NULL
AND (u.is_banned = false OR u.is_banned IS NULL)
AND ($3 OR NOT COALESCE(u.is_test_account, FALSE))
ORDER BY {column} {direction}
LIMIT $1 OFFSET $2
""", limit, offset, include_test)
- Step 2: Apply the same pattern to
get_player_rank
In server/services/stats_service.py at L249:
async def get_player_rank(
self,
user_id: str,
metric: str = "wins",
include_test: bool = False,
) -> Optional[int]:
Update both SQL paths to include the include_test filter. View path at L287:
if view_exists:
row = await conn.fetchrow(f"""
SELECT rank FROM (
SELECT user_id, ROW_NUMBER() OVER (ORDER BY {column} {direction}) as rank
FROM leaderboard_overall
WHERE ($2 OR NOT is_test_account)
) ranked
WHERE user_id = $1
""", user_id, include_test)
Fallback path at L294:
else:
row = await conn.fetchrow(f"""
SELECT rank FROM (
SELECT s.user_id, ROW_NUMBER() OVER (ORDER BY {column} {direction}) as rank
FROM player_stats s
JOIN users_v2 u ON s.user_id = u.id
WHERE s.games_played >= 5
AND u.deleted_at IS NULL
AND (u.is_banned = false OR u.is_banned IS NULL)
AND ($2 OR NOT COALESCE(u.is_test_account, FALSE))
) ranked
WHERE user_id = $1
""", user_id, include_test)
- Step 3: Expose
include_testas a query parameter on the leaderboard route
In server/routers/stats.py at L157:
@router.get("/leaderboard", response_model=LeaderboardResponse)
async def get_leaderboard(
metric: str = Query("wins", pattern="^(wins|win_rate|avg_score|knockouts|streak|rating)$"),
limit: int = Query(50, ge=1, le=100),
offset: int = Query(0, ge=0),
include_test: bool = Query(False, description="Include soak-harness test accounts"),
service: StatsService = Depends(get_stats_service_dep),
):
"""
Get leaderboard by metric.
Metrics:
- wins: Total games won
- win_rate: Win percentage (requires 5+ games)
- avg_score: Average points per round (lower is better)
- knockouts: Times going out first
- streak: Best win streak
Players must have 5+ games to appear on leaderboards.
By default, soak-harness test accounts are hidden.
"""
entries = await service.get_leaderboard(metric, limit, offset, include_test)
- Step 4: Same for
get_player_rankandget_my_rankroutes
At L227:
@router.get("/players/{user_id}/rank", response_model=PlayerRankResponse)
async def get_player_rank(
user_id: str,
metric: str = Query("wins", pattern="^(wins|win_rate|avg_score|knockouts|streak|rating)$"),
include_test: bool = Query(False),
service: StatsService = Depends(get_stats_service_dep),
):
"""Get player's rank on a leaderboard."""
rank = await service.get_player_rank(user_id, metric, include_test)
At L348:
@router.get("/me/rank", response_model=PlayerRankResponse)
async def get_my_rank(
metric: str = Query("wins", pattern="^(wins|win_rate|avg_score|knockouts|streak|rating)$"),
include_test: bool = Query(False),
user: User = Depends(require_user),
service: StatsService = Depends(get_stats_service_dep),
):
"""Get current user's rank on a leaderboard."""
rank = await service.get_player_rank(user.id, metric, include_test)
- Step 5: Verify filtering works via curl
# Mark a test user we registered earlier as having games played (synthetic)
psql -d golfgame <<'EOF'
INSERT INTO player_stats (user_id, games_played, games_won, total_points, total_rounds, rounds_won)
SELECT id, 10, 8, 50, 30, 20 FROM users_v2 WHERE username = 'soaktest_register1'
ON CONFLICT (user_id) DO UPDATE SET games_played = 10, games_won = 8;
-- Refresh the matview so the test account shows up
REFRESH MATERIALIZED VIEW leaderboard_overall;
EOF
# Default (include_test=false) should NOT include soaktest_register1
curl -s "http://localhost:8000/api/stats/leaderboard?metric=wins" | jq '.entries[] | select(.username | startswith("soaktest_"))'
# include_test=true should include soaktest_register1
curl -s "http://localhost:8000/api/stats/leaderboard?metric=wins&include_test=true" | jq '.entries[] | select(.username | startswith("soaktest_"))'
Expected: first command returns nothing, second returns a JSON object for soaktest_register1.
- Step 6: Commit
git add server/services/stats_service.py server/routers/stats.py
git commit -m "$(cat <<'EOF'
feat(server): stats queries support include_test filter
Leaderboard and rank queries take an optional include_test param
(default false). Real users never see soak-harness traffic unless
they explicitly opt in via ?include_test=true.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
EOF
)"
Task 6: Admin service + route surfaces is_test_account
UserDetails exposes the flag, search_users selects it, and list_users admin route accepts an include_test query parameter.
Files:
-
Modify:
server/services/admin_service.py—UserDetails(L24–L58),search_users(L312–L382),get_user(L384–L428) -
Modify:
server/routers/admin.py—list_usersroute (L80–L107) -
Step 1: Add field to
UserDetailsdataclass
In server/services/admin_service.py at L24, add to the dataclass:
@dataclass
class UserDetails:
"""Extended user info for admin view."""
id: str
username: str
email: Optional[str]
role: str
email_verified: bool
is_banned: bool
ban_reason: Optional[str]
force_password_reset: bool
created_at: datetime
last_login: Optional[datetime]
last_seen_at: Optional[datetime]
is_active: bool
games_played: int
games_won: int
is_test_account: bool = False
Update to_dict to include it:
def to_dict(self) -> dict:
return {
"id": self.id,
"username": self.username,
"email": self.email,
"role": self.role,
"email_verified": self.email_verified,
"is_banned": self.is_banned,
"ban_reason": self.ban_reason,
"force_password_reset": self.force_password_reset,
"created_at": self.created_at.isoformat() if self.created_at else None,
"last_login": self.last_login.isoformat() if self.last_login else None,
"last_seen_at": self.last_seen_at.isoformat() if self.last_seen_at else None,
"is_active": self.is_active,
"games_played": self.games_played,
"games_won": self.games_won,
"is_test_account": self.is_test_account,
}
- Step 2: Update
search_usersto SELECT and filter onis_test_account
In server/services/admin_service.py at L312, add include_test parameter and column to the SELECT:
async def search_users(
self,
query: str = "",
limit: int = 50,
offset: int = 0,
include_banned: bool = True,
include_deleted: bool = False,
include_test: bool = True,
) -> List[UserDetails]:
Modify the SQL to pull is_test_account:
sql = """
SELECT u.id, u.username, u.email, u.role,
u.email_verified, u.is_banned, u.ban_reason,
u.force_password_reset, u.created_at, u.last_login,
u.last_seen_at, u.is_active,
COALESCE(u.is_test_account, FALSE) as is_test_account,
COALESCE(s.games_played, 0) as games_played,
COALESCE(s.games_won, 0) as games_won
FROM users_v2 u
LEFT JOIN player_stats s ON u.id = s.user_id
WHERE 1=1
"""
After the existing include_deleted check, add:
if not include_test:
sql += " AND (u.is_test_account = false OR u.is_test_account IS NULL)"
Update the UserDetails(...) construction in the list comprehension to include is_test_account=row["is_test_account"].
- Step 3: Update
get_user(single-user lookup) similarly
In server/services/admin_service.py at L384, add COALESCE(u.is_test_account, FALSE) as is_test_account to the SELECT and is_test_account=row["is_test_account"] to the UserDetails(...) construction. The get_user method does NOT need the filter parameter — admins looking up individual users should always see them.
- Step 4: Add
include_testto the adminlist_usersroute
In server/routers/admin.py at L80:
@router.get("/users")
async def list_users(
query: str = "",
limit: int = 50,
offset: int = 0,
include_banned: bool = True,
include_deleted: bool = False,
include_test: bool = True,
admin: User = Depends(require_admin_v2),
service: AdminService = Depends(get_admin_service_dep),
):
"""
Search and list users.
Args:
query: Search by username or email.
limit: Maximum results to return.
offset: Results to skip.
include_banned: Include banned users.
include_deleted: Include soft-deleted users.
include_test: Include soak-harness test accounts (default true for admins).
"""
users = await service.search_users(
query=query,
limit=limit,
offset=offset,
include_banned=include_banned,
include_deleted=include_deleted,
include_test=include_test,
)
return {"users": [u.to_dict() for u in users]}
Note: default is True for the admin path — admins should see everything by default. The client-side toggle will explicitly pass false when the admin wants to hide test accounts.
- Step 5: Verify via curl
# Assuming admin token in $TOKEN env var
curl -s "http://localhost:8000/api/admin/users?query=soaktest" \
-H "Authorization: Bearer $TOKEN" | jq '.users[] | {username, is_test_account}'
curl -s "http://localhost:8000/api/admin/users?query=soaktest&include_test=false" \
-H "Authorization: Bearer $TOKEN" | jq '.users[]'
Expected: first returns users with is_test_account: true; second returns empty (test accounts filtered out).
- Step 6: Commit
git add server/services/admin_service.py server/routers/admin.py
git commit -m "$(cat <<'EOF'
feat(server): admin users list surfaces is_test_account
UserDetails carries the new column, search_users selects and
optionally filters on it, and the /api/admin/users route accepts
?include_test=false to hide soak-harness accounts.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
EOF
)"
Task 7: Admin panel UI — Test badge and filter toggle
Add a visible [Test] badge on test accounts in the admin user list, a [Test-seed] indicator on invite codes that mark new accounts as test, and an "Include test accounts" checkbox next to the existing "Include banned" toggle.
Files:
-
Modify:
client/admin.html— add the new toggle near the existing#include-bannedcheckbox -
Modify:
client/admin.js—loadUsers(L305),getStatusBadge(L246), the invite codes renderer (L443) -
Step 1: Add the "Include test accounts" checkbox to admin.html
In client/admin.html, find the existing #include-banned checkbox (it's in the users tab filter bar — grep for it). Add a sibling checkbox right after:
grep -n "include-banned" client/admin.html
Add next to that line:
<label>
<input type="checkbox" id="include-test" />
Include test accounts
</label>
- Step 2: Read the new checkbox in
loadUsersand pass to getUsers
In client/admin.js at L305:
async function loadUsers() {
try {
const query = document.getElementById('user-search').value;
const includeBanned = document.getElementById('include-banned').checked;
const includeTest = document.getElementById('include-test').checked;
const data = await getUsers(query, usersPage * PAGE_SIZE, includeBanned, includeTest);
Find getUsers at L70 and add the new parameter:
async function getUsers(query = '', offset = 0, includeBanned = true, includeTest = true) {
const params = new URLSearchParams({
query,
limit: PAGE_SIZE,
offset,
include_banned: includeBanned,
include_test: includeTest,
});
return apiRequest(`/api/admin/users?${params}`);
}
Note: the existing signature builds a URLSearchParams — check the actual code at L70 and match its style; the key change is adding include_test: includeTest to the params.
- Step 3: Add a "Test" badge to the user table row
In client/admin.js at L314, modify the table row template to render a Test badge inline with the status badge:
data.users.forEach(user => {
const testBadge = user.is_test_account
? '<span class="badge badge-info" title="Soak harness test account">Test</span>'
: '';
tbody.innerHTML += `
<tr>
<td>${escapeHtml(user.username)} ${testBadge}</td>
<td>${escapeHtml(user.email || '-')}</td>
<td><span class="badge badge-${user.role === 'admin' ? 'info' : 'muted'}">${user.role}</span></td>
<td>${getStatusBadge(user)}</td>
<td>${user.games_played} (${user.games_won} wins)</td>
<td>${formatDateShort(user.created_at)}</td>
<td>
<button class="btn btn-small" data-action="view-user" data-id="${user.id}">View</button>
</td>
</tr>
`;
});
- Step 4: Add Test-seed indicator to invite codes list
In client/admin.js around L443 (invite codes list renderer), find the row template and add a [Test-seed] badge when invite.marks_as_test:
grep -n "invite.is_active\|invite.code\|invites-tbody\|invites-table" client/admin.js | head
Once located, modify the row template to include:
const testSeedBadge = invite.marks_as_test
? '<span class="badge badge-info" title="Creates test accounts">Test-seed</span>'
: '';
// Insert testSeedBadge into the invite code column, e.g.
// <td>${escapeHtml(invite.code)} ${testSeedBadge}</td>
- Step 5: Wire the checkbox change event to reload users
Find where #include-banned has its change listener attached (grep for it in admin.js):
grep -n "include-banned.*addEventListener\|include-banned" client/admin.js
Add a parallel listener for #include-test that calls loadUsers():
document.getElementById('include-test').addEventListener('change', () => {
usersPage = 0;
loadUsers();
});
- Step 6: Manual verification in browser
- Open http://localhost:8000/admin.html
- Log in as admin
- Navigate to Users tab
- Search for "soaktest"
- Confirm the
[Test]badge appears next tosoaktest_register1 - Uncheck "Include test accounts" — the row should disappear
- Re-check it — the row should return
- Navigate to Invite Codes tab
- Confirm the
[Test-seed]badge appears next to theSOAKTESTcode
- Step 7: Commit
git add client/admin.html client/admin.js
git commit -m "$(cat <<'EOF'
feat(admin): visible Test/Test-seed badges + filter toggle
Users table shows [Test] next to soak-harness accounts, invite codes
list shows [Test-seed] next to codes that flag new accounts as test,
and a new "Include test accounts" checkbox lets admins hide bot
traffic from the user list.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
EOF
)"
Task 8: Document the one-time staging setup step
The staging invite code 5VC2MCCN needs to be flagged as test-seed before the harness can run against staging. This is a manual one-liner; document it in a new bring-up doc.
Files:
-
Create:
docs/soak-harness-bringup.md -
Step 1: Create the bring-up doc
cat > docs/soak-harness-bringup.md <<'EOF'
# Soak Harness Bring-Up
One-time setup steps before running `tests/soak` against an environment.
## Prerequisites
- An invite code exists with 16+ available uses
- You have psql access to the target DB (or admin SQL access via some other means)
## 1. Flag the invite code as test-seed
Any account registered with a `marks_as_test=TRUE` invite code gets
`users_v2.is_test_account=TRUE`, which keeps it out of real-user stats.
### Staging
Invite code: `5VC2MCCN` (16 uses, provisioned 2026-04-10).
```sql
UPDATE invite_codes SET marks_as_test = TRUE WHERE code = '5VC2MCCN';
SELECT code, max_uses, use_count, marks_as_test FROM invite_codes WHERE code = '5VC2MCCN';
Expected: marks_as_test | t.
Local dev
The dev DB already has a SOAKTEST invite created during Task 4 of
the implementation plan. If you wiped the DB since, recreate it:
INSERT INTO invite_codes (code, created_by, expires_at, max_uses, is_active, marks_as_test)
SELECT 'SOAKTEST', id, NOW() + INTERVAL '10 years', 100, TRUE, TRUE
FROM users_v2 WHERE role = 'admin' LIMIT 1
ON CONFLICT (code) DO UPDATE SET marks_as_test = TRUE;
2. Run the harness
cd tests/soak
npm install
npm run seed # first run only, populates .env.stresstest
TEST_URL=http://localhost:8000 npm run smoke # 30s end-to-end check
For staging:
TEST_URL=https://staging.adlee.work npm run soak -- --scenario=populate
See tests/soak/README.md for the full flag reference.
EOF
- [ ] **Step 2: Commit**
```bash
git add docs/soak-harness-bringup.md
git commit -m "$(cat <<'EOF'
docs: soak harness bring-up steps
Documents the one-time UPDATE invite_codes SET marks_as_test = TRUE
step required before running tests/soak against each environment,
plus the local dev SOAKTEST invite recreation SQL.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
EOF
)"
Phase 2 — Harness scaffolding
Task 9: Create the tests/soak/ package skeleton
Bare minimum to get tsx running against an empty entry point. No behavior yet.
Files:
-
Create:
tests/soak/package.json -
Create:
tests/soak/tsconfig.json -
Create:
tests/soak/.gitignore -
Create:
tests/soak/.env.stresstest.example -
Create:
tests/soak/README.md(stub) -
Create:
tests/soak/runner.ts(stub — prints "hello") -
Step 1: Create
tests/soak/package.json
{
"name": "golf-soak",
"version": "0.1.0",
"private": true,
"description": "Multiplayer soak & UX test harness for Golf Card Game",
"scripts": {
"soak": "tsx runner.ts",
"soak:populate": "tsx runner.ts --scenario=populate",
"soak:stress": "tsx runner.ts --scenario=stress",
"seed": "tsx scripts/seed-accounts.ts",
"smoke": "bash scripts/smoke.sh",
"test": "vitest run"
},
"dependencies": {
"playwright-core": "^1.40.0",
"ws": "^8.16.0"
},
"devDependencies": {
"tsx": "^4.7.0",
"@types/ws": "^8.5.0",
"@types/node": "^20.10.0",
"typescript": "^5.3.0",
"vitest": "^1.2.0"
}
}
- Step 2: Create
tests/soak/tsconfig.json
{
"compilerOptions": {
"target": "ES2022",
"module": "commonjs",
"moduleResolution": "node",
"strict": true,
"esModuleInterop": true,
"skipLibCheck": true,
"forceConsistentCasingInFileNames": true,
"resolveJsonModule": true,
"declaration": false,
"sourceMap": true,
"outDir": "./dist",
"rootDir": ".",
"baseUrl": ".",
"lib": ["ES2022", "DOM"],
"paths": {
"@soak/*": ["./*"],
"@bot/*": ["../e2e/bot/*"]
}
},
"include": ["**/*.ts"],
"exclude": ["node_modules", "dist", "artifacts"]
}
- Step 3: Create
tests/soak/.gitignore
node_modules/
dist/
artifacts/
.env.stresstest
*.log
- Step 4: Create
tests/soak/.env.stresstest.example
# Soak harness account cache.
# This file is AUTO-GENERATED on first run; do not edit by hand.
# Format: SOAK_ACCOUNT_NN=username:password:token
#
# Example (delete before first real run):
# SOAK_ACCOUNT_00=soak_00_a7bx:<generated-password>:<jwt-token>
- Step 5: Create
tests/soak/README.md(stub — expanded in Task 31)
# Golf Soak & UX Test Harness
Runs 16 authenticated browser sessions across 4 rooms to populate
staging scoreboards and stress-test multiplayer stability.
**Spec:** `docs/superpowers/specs/2026-04-10-multiplayer-soak-test-design.md`
**Bring-up:** `docs/soak-harness-bringup.md`
## Quick start
```bash
npm install
npm run seed # first run only
TEST_URL=http://localhost:8000 npm run smoke
Full documentation arrives with Task 31.
- [ ] **Step 6: Create `tests/soak/runner.ts` as a placeholder**
```typescript
#!/usr/bin/env tsx
/**
* Golf Soak Harness — entry point.
*
* Placeholder. Full runner lands in Task 17.
*/
async function main(): Promise<void> {
console.log('golf-soak runner (placeholder)');
console.log('Full implementation lands in Task 17 of the plan.');
}
main().catch((err) => {
console.error(err);
process.exit(1);
});
- Step 7: Install deps and verify runner executes
cd tests/soak
npm install
npx tsx runner.ts
Expected output:
golf-soak runner (placeholder)
Full implementation lands in Task 17 of the plan.
- Step 8: Commit
git add tests/soak/package.json tests/soak/package-lock.json tests/soak/tsconfig.json tests/soak/.gitignore tests/soak/.env.stresstest.example tests/soak/README.md tests/soak/runner.ts
git commit -m "$(cat <<'EOF'
feat(soak): scaffold tests/soak package
Placeholder runner, tsconfig with @bot alias to tests/e2e/bot,
gitignored .env.stresstest + artifacts. Real behavior follows
in Task 10 onward.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
EOF
)"
Task 10: Core types and Deferred helper
Pure TypeScript with Vitest tests. No browser, no network. Establishes the type surface the rest of the harness will target.
Files:
-
Create:
tests/soak/core/types.ts -
Create:
tests/soak/core/deferred.ts -
Create:
tests/soak/tests/deferred.test.ts -
Step 1: Write the failing test for
Deferred
Create tests/soak/tests/deferred.test.ts:
import { describe, it, expect } from 'vitest';
import { deferred } from '../core/deferred';
describe('deferred', () => {
it('resolves with the given value', async () => {
const d = deferred<string>();
d.resolve('hello');
await expect(d.promise).resolves.toBe('hello');
});
it('rejects with the given error', async () => {
const d = deferred<string>();
const err = new Error('boom');
d.reject(err);
await expect(d.promise).rejects.toBe(err);
});
it('ignores second resolve calls', async () => {
const d = deferred<number>();
d.resolve(1);
d.resolve(2);
await expect(d.promise).resolves.toBe(1);
});
});
- Step 2: Run the test to verify it fails
cd tests/soak
npx vitest run tests/deferred.test.ts
Expected: FAIL — module ../core/deferred does not exist.
- Step 3: Implement
deferred
Create tests/soak/core/deferred.ts:
/**
* Promise deferred primitive — lets external code resolve or reject
* a promise. Used by RoomCoordinator for host→joiners handoff.
*/
export interface Deferred<T> {
promise: Promise<T>;
resolve(value: T): void;
reject(error: unknown): void;
}
export function deferred<T>(): Deferred<T> {
let resolve!: (value: T) => void;
let reject!: (error: unknown) => void;
const promise = new Promise<T>((res, rej) => {
resolve = res;
reject = rej;
});
return { promise, resolve, reject };
}
- Step 4: Run tests to verify they pass
npx vitest run tests/deferred.test.ts
Expected: 3 passed.
- Step 5: Create
core/types.tswith the scenario interfaces
/**
* Core type definitions for the soak harness.
*
* Contracts here are consumed by runner.ts, SessionPool, scenarios,
* and the dashboard. Keep this file small and stable.
*/
import type { BrowserContext, Page } from 'playwright-core';
import type { GolfBot } from '../../e2e/bot/golf-bot';
// =============================================================================
// Accounts & sessions
// =============================================================================
export interface Account {
/** Stable key used in logs, e.g. "soak_00". */
key: string;
username: string;
password: string;
/** JWT returned from /api/auth/login, may be refreshed by SessionPool. */
token: string;
}
export interface Session {
account: Account;
context: BrowserContext;
page: Page;
bot: GolfBot;
/** Convenience mirror of account.key. */
key: string;
}
// =============================================================================
// Scenarios
// =============================================================================
export interface ScenarioNeeds {
/** Total number of authenticated sessions the scenario requires. */
accounts: number;
/** How many rooms to partition sessions into (default: 1). */
rooms?: number;
/** CPUs to add per room (default: 0). */
cpusPerRoom?: number;
}
/** Free-form per-scenario config merged with CLI flags. */
export type ScenarioConfig = Record<string, unknown>;
export interface ScenarioError {
room: string;
reason: string;
detail?: string;
timestamp: number;
}
export interface ScenarioResult {
gamesCompleted: number;
errors: ScenarioError[];
durationMs: number;
customMetrics?: Record<string, number>;
}
export interface ScenarioContext {
/** Merged config: CLI flags → env → scenario defaults → runner defaults. */
config: ScenarioConfig;
/** Pre-authenticated sessions; ordered. */
sessions: Session[];
coordinator: RoomCoordinatorApi;
dashboard: DashboardReporter;
logger: Logger;
signal: AbortSignal;
/** Reset the per-room watchdog. Call at each progress point. */
heartbeat(roomId: string): void;
}
export interface Scenario {
name: string;
description: string;
defaultConfig: ScenarioConfig;
needs: ScenarioNeeds;
run(ctx: ScenarioContext): Promise<ScenarioResult>;
}
// =============================================================================
// Room coordination
// =============================================================================
export interface RoomCoordinatorApi {
announce(roomId: string, code: string): void;
await(roomId: string, timeoutMs?: number): Promise<string>;
}
// =============================================================================
// Dashboard reporter
// =============================================================================
export interface RoomState {
phase?: string;
currentPlayer?: string;
hole?: number;
totalHoles?: number;
game?: number;
totalGames?: number;
moves?: number;
players?: Array<{ key: string; score: number | null; isActive: boolean }>;
message?: string;
}
export interface DashboardReporter {
update(roomId: string, state: Partial<RoomState>): void;
log(level: 'info' | 'warn' | 'error', msg: string, meta?: object): void;
incrementMetric(name: string, by?: number): void;
}
// =============================================================================
// Logger
// =============================================================================
export type LogLevel = 'debug' | 'info' | 'warn' | 'error';
export interface Logger {
debug(msg: string, meta?: object): void;
info(msg: string, meta?: object): void;
warn(msg: string, meta?: object): void;
error(msg: string, meta?: object): void;
child(meta: object): Logger;
}
- Step 6: Verify tsx still parses the runner
cd tests/soak
npx tsx runner.ts
Expected: still prints the placeholder output; no TypeScript errors from the new core/ files (they're not imported yet).
- Step 7: Commit
git add tests/soak/core/deferred.ts tests/soak/core/types.ts tests/soak/tests/deferred.test.ts
git commit -m "$(cat <<'EOF'
feat(soak): core types + Deferred primitive
Establishes the Scenario/Session/Logger/DashboardReporter contracts
the rest of the harness builds on. Deferred is the building block
for RoomCoordinator's host→joiners handoff.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
EOF
)"
Task 11: RoomCoordinator with tests
Tiny abstraction over Deferred keyed by room ID, with a timeout on await.
Files:
-
Create:
tests/soak/core/room-coordinator.ts -
Create:
tests/soak/tests/room-coordinator.test.ts -
Step 1: Write failing tests
// tests/soak/tests/room-coordinator.test.ts
import { describe, it, expect } from 'vitest';
import { RoomCoordinator } from '../core/room-coordinator';
describe('RoomCoordinator', () => {
it('resolves await with the announced code (announce then await)', async () => {
const rc = new RoomCoordinator();
rc.announce('room-1', 'ABCD');
await expect(rc.await('room-1')).resolves.toBe('ABCD');
});
it('resolves await with the announced code (await then announce)', async () => {
const rc = new RoomCoordinator();
const p = rc.await('room-2');
rc.announce('room-2', 'WXYZ');
await expect(p).resolves.toBe('WXYZ');
});
it('rejects await after timeout if not announced', async () => {
const rc = new RoomCoordinator();
await expect(rc.await('room-3', 50)).rejects.toThrow(/timed out/i);
});
it('isolates rooms — announcing room-A does not unblock room-B', async () => {
const rc = new RoomCoordinator();
const pB = rc.await('room-B', 100);
rc.announce('room-A', 'A-CODE');
await expect(pB).rejects.toThrow(/timed out/i);
});
});
- Step 2: Run tests to verify they fail
npx vitest run tests/room-coordinator.test.ts
Expected: FAIL — module not found.
- Step 3: Implement
RoomCoordinator
// tests/soak/core/room-coordinator.ts
import { deferred, Deferred } from './deferred';
import type { RoomCoordinatorApi } from './types';
export class RoomCoordinator implements RoomCoordinatorApi {
private rooms = new Map<string, Deferred<string>>();
announce(roomId: string, code: string): void {
this.getOrCreate(roomId).resolve(code);
}
async await(roomId: string, timeoutMs: number = 30_000): Promise<string> {
const d = this.getOrCreate(roomId);
let timer: NodeJS.Timeout | undefined;
const timeout = new Promise<never>((_, reject) => {
timer = setTimeout(() => {
reject(new Error(`RoomCoordinator: room "${roomId}" timed out after ${timeoutMs}ms`));
}, timeoutMs);
});
try {
return await Promise.race([d.promise, timeout]);
} finally {
if (timer) clearTimeout(timer);
}
}
private getOrCreate(roomId: string): Deferred<string> {
let d = this.rooms.get(roomId);
if (!d) {
d = deferred<string>();
this.rooms.set(roomId, d);
}
return d;
}
}
- Step 4: Verify tests pass
npx vitest run tests/room-coordinator.test.ts
Expected: 4 passed.
- Step 5: Commit
git add tests/soak/core/room-coordinator.ts tests/soak/tests/room-coordinator.test.ts
git commit -m "$(cat <<'EOF'
feat(soak): RoomCoordinator with host→joiners handoff
Lazy Deferred per roomId with a timeout on await. Lets concurrent
joiner sessions block until their host announces the room code
without polling or page scraping.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
EOF
)"
Task 12: Structured JSONL logger
Single module, no transport, writes to process.stdout. Supports child loggers with bound metadata (so scenarios can emit logs with room / game context without repeating it).
Files:
-
Create:
tests/soak/core/logger.ts -
Create:
tests/soak/tests/logger.test.ts -
Step 1: Write failing tests
// tests/soak/tests/logger.test.ts
import { describe, it, expect, beforeEach, vi } from 'vitest';
import { createLogger } from '../core/logger';
describe('logger', () => {
let writes: string[];
let write: (s: string) => boolean;
beforeEach(() => {
writes = [];
write = (s: string) => {
writes.push(s);
return true;
};
});
it('emits a JSON line per call with level and msg', () => {
const log = createLogger({ runId: 'r1', write });
log.info('hello');
expect(writes).toHaveLength(1);
const parsed = JSON.parse(writes[0]);
expect(parsed.level).toBe('info');
expect(parsed.msg).toBe('hello');
expect(parsed.runId).toBe('r1');
expect(parsed.timestamp).toBeTypeOf('string');
});
it('merges meta into the log line', () => {
const log = createLogger({ runId: 'r1', write });
log.warn('slow', { turnMs: 3000 });
const parsed = JSON.parse(writes[0]);
expect(parsed.turnMs).toBe(3000);
expect(parsed.level).toBe('warn');
});
it('child logger inherits parent meta', () => {
const log = createLogger({ runId: 'r1', write });
const roomLog = log.child({ room: 'room-1' });
roomLog.info('game_start');
const parsed = JSON.parse(writes[0]);
expect(parsed.room).toBe('room-1');
expect(parsed.runId).toBe('r1');
});
it('respects minimum level', () => {
const log = createLogger({ runId: 'r1', write, minLevel: 'warn' });
log.debug('nope');
log.info('nope');
log.warn('yes');
log.error('yes');
expect(writes).toHaveLength(2);
});
});
- Step 2: Run tests to verify they fail
npx vitest run tests/logger.test.ts
Expected: FAIL — module not found.
- Step 3: Implement the logger
// tests/soak/core/logger.ts
import type { Logger, LogLevel } from './types';
const LEVEL_ORDER: Record<LogLevel, number> = {
debug: 0,
info: 1,
warn: 2,
error: 3,
};
export interface LoggerOptions {
runId: string;
minLevel?: LogLevel;
/** Defaults to process.stdout.write bound to stdout. Override for tests. */
write?: (line: string) => boolean;
baseMeta?: Record<string, unknown>;
}
export function createLogger(opts: LoggerOptions): Logger {
const minLevel = opts.minLevel ?? 'info';
const write = opts.write ?? ((s: string) => process.stdout.write(s));
const baseMeta = opts.baseMeta ?? {};
function emit(level: LogLevel, msg: string, meta?: object): void {
if (LEVEL_ORDER[level] < LEVEL_ORDER[minLevel]) return;
const line = JSON.stringify({
timestamp: new Date().toISOString(),
level,
msg,
runId: opts.runId,
...baseMeta,
...(meta ?? {}),
}) + '\n';
write(line);
}
const logger: Logger = {
debug: (msg, meta) => emit('debug', msg, meta),
info: (msg, meta) => emit('info', msg, meta),
warn: (msg, meta) => emit('warn', msg, meta),
error: (msg, meta) => emit('error', msg, meta),
child: (meta) =>
createLogger({
runId: opts.runId,
minLevel,
write,
baseMeta: { ...baseMeta, ...meta },
}),
};
return logger;
}
- Step 4: Verify tests pass
npx vitest run tests/logger.test.ts
Expected: 4 passed.
- Step 5: Commit
git add tests/soak/core/logger.ts tests/soak/tests/logger.test.ts
git commit -m "$(cat <<'EOF'
feat(soak): structured JSONL logger with child contexts
Single file, no transport, writes one JSON line per call to stdout.
Child loggers inherit parent meta so scenarios can bind room/game
context once and forget about it.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
EOF
)"
Phase 3 — SessionPool and seeding
Task 13: SessionPool with HTTP registration and localStorage warm-start
This is the biggest single module. It owns browser context lifecycle, seeds accounts on cold start, logs in on warm start, and exposes a simple acquire() API to scenarios.
Files:
- Create:
tests/soak/core/session-pool.ts
Testing: manual via scripts/seed-accounts.ts in Task 14 and the first real runner invocation in Task 17. No Vitest test for this — it's an integration module that needs a real browser.
- Step 1: Create
tests/soak/core/session-pool.ts— imports and types
// tests/soak/core/session-pool.ts
import * as fs from 'fs';
import * as path from 'path';
import {
Browser,
BrowserContext,
chromium,
} from 'playwright-core';
import { GolfBot } from '../../e2e/bot/golf-bot';
import type { Account, Session, Logger } from './types';
export interface SeedOptions {
/** Full base URL of the target server, e.g. https://staging.adlee.work. */
targetUrl: string;
/** Invite code to pass to /api/auth/register. */
inviteCode: string;
/** Number of accounts to create. */
count: number;
}
export interface SessionPoolOptions {
targetUrl: string;
inviteCode: string;
credFile: string; // absolute path to .env.stresstest
logger: Logger;
/** Optional override for the browser to attach contexts to. If absent, SessionPool launches its own. */
browser?: Browser;
/** Passed through to context.newContext. Useful for viewport overrides in tests. */
contextOptions?: Parameters<Browser['newContext']>[0];
}
- Step 2: Implement cred-file read/write
Append to session-pool.ts:
function readCredFile(filePath: string): Account[] | null {
if (!fs.existsSync(filePath)) return null;
const content = fs.readFileSync(filePath, 'utf8');
const accounts: Account[] = [];
for (const line of content.split('\n')) {
const trimmed = line.trim();
if (!trimmed || trimmed.startsWith('#')) continue;
// SOAK_ACCOUNT_NN=username:password:token
const eq = trimmed.indexOf('=');
if (eq === -1) continue;
const key = trimmed.slice(0, eq);
const value = trimmed.slice(eq + 1);
const m = key.match(/^SOAK_ACCOUNT_(\d+)$/);
if (!m) continue;
const [username, password, token] = value.split(':');
if (!username || !password || !token) continue;
const idx = parseInt(m[1], 10);
accounts.push({
key: `soak_${String(idx).padStart(2, '0')}`,
username,
password,
token,
});
}
return accounts.length > 0 ? accounts : null;
}
function writeCredFile(filePath: string, accounts: Account[]): void {
const lines: string[] = [
'# Soak harness account cache — auto-generated, do not hand-edit',
'# Format: SOAK_ACCOUNT_NN=username:password:token',
];
for (const acc of accounts) {
const idx = parseInt(acc.key.replace('soak_', ''), 10);
const key = `SOAK_ACCOUNT_${String(idx).padStart(2, '0')}`;
lines.push(`${key}=${acc.username}:${acc.password}:${acc.token}`);
}
fs.writeFileSync(filePath, lines.join('\n') + '\n', { mode: 0o600 });
}
- Step 3: Implement the HTTP register call
interface RegisterResponse {
user: { id: string; username: string };
token: string;
expires_at: string;
}
async function registerAccount(
targetUrl: string,
username: string,
password: string,
email: string,
inviteCode: string,
): Promise<string> {
const res = await fetch(`${targetUrl}/api/auth/register`, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ username, password, email, invite_code: inviteCode }),
});
if (!res.ok) {
const body = await res.text().catch(() => '<no body>');
throw new Error(`register failed: ${res.status} ${body}`);
}
const data = (await res.json()) as RegisterResponse;
if (!data.token) {
throw new Error(`register returned no token: ${JSON.stringify(data)}`);
}
return data.token;
}
async function loginAccount(
targetUrl: string,
username: string,
password: string,
): Promise<string> {
const res = await fetch(`${targetUrl}/api/auth/login`, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ username, password }),
});
if (!res.ok) {
const body = await res.text().catch(() => '<no body>');
throw new Error(`login failed: ${res.status} ${body}`);
}
const data = (await res.json()) as RegisterResponse;
return data.token;
}
function randomSuffix(): string {
return Math.random().toString(36).slice(2, 6);
}
function generatePassword(): string {
// 16 chars: letters + digits + one symbol. Meets 8-char minimum from auth_service.
// Split across halves so repo secret-scanners don't flag the string as base64
const lower = 'abcdefghijkm' + 'npqrstuvwxyz'; // pragma: allowlist secret
const upper = 'ABCDEFGHJKLM' + 'NPQRSTUVWXYZ'; // pragma: allowlist secret
const digits = '23456789';
const chars = lower + upper + digits;
let out = '';
for (let i = 0; i < 15; i++) {
out += chars[Math.floor(Math.random() * chars.length)];
}
return out + '!';
}
- Step 4: Implement the
SessionPoolclass
export class SessionPool {
private accounts: Account[] = [];
private ownedBrowser: Browser | null = null;
private browser: Browser | null;
private activeSessions: Session[] = [];
constructor(private opts: SessionPoolOptions) {
this.browser = opts.browser ?? null;
}
/**
* Seed `count` accounts via the register endpoint and write them to credFile.
* Safe to call multiple times — skips accounts already in the file.
*/
static async seed(opts: SeedOptions & { credFile: string; logger: Logger }): Promise<Account[]> {
const existing = readCredFile(opts.credFile) ?? [];
const existingKeys = new Set(existing.map((a) => a.key));
const created: Account[] = [...existing];
for (let i = 0; i < opts.count; i++) {
const key = `soak_${String(i).padStart(2, '0')}`;
if (existingKeys.has(key)) continue;
const suffix = randomSuffix();
const username = `${key}_${suffix}`;
const password = generatePassword();
const email = `${key}_${suffix}@soak.test`;
opts.logger.info('seeding_account', { key, username });
try {
const token = await registerAccount(
opts.targetUrl,
username,
password,
email,
opts.inviteCode,
);
created.push({ key, username, password, token });
writeCredFile(opts.credFile, created);
} catch (err) {
opts.logger.error('seed_failed', {
key,
error: err instanceof Error ? err.message : String(err),
});
throw err;
}
}
return created;
}
/**
* Load accounts from credFile, auto-seeding if the file is missing.
*/
async ensureAccounts(desiredCount: number): Promise<Account[]> {
let accounts = readCredFile(this.opts.credFile);
if (!accounts || accounts.length < desiredCount) {
this.opts.logger.warn('cred_file_missing_or_short', {
found: accounts?.length ?? 0,
desired: desiredCount,
});
accounts = await SessionPool.seed({
targetUrl: this.opts.targetUrl,
inviteCode: this.opts.inviteCode,
count: desiredCount,
credFile: this.opts.credFile,
logger: this.opts.logger,
});
}
this.accounts = accounts.slice(0, desiredCount);
return this.accounts;
}
/**
* Launch the browser if not provided, create N contexts, log each in via
* localStorage injection (falling back to POST /api/auth/login if the
* cached token is rejected), and return the live sessions.
*/
async acquire(count: number): Promise<Session[]> {
await this.ensureAccounts(count);
if (!this.browser) {
this.ownedBrowser = await chromium.launch({ headless: true });
this.browser = this.ownedBrowser;
}
const sessions: Session[] = [];
for (let i = 0; i < count; i++) {
const account = this.accounts[i];
const context = await this.browser.newContext(this.opts.contextOptions);
await this.injectAuth(context, account);
const page = await context.newPage();
await page.goto(this.opts.targetUrl);
const bot = new GolfBot(page);
sessions.push({ account, context, page, bot, key: account.key });
}
this.activeSessions = sessions;
return sessions;
}
/**
* Inject the cached JWT into localStorage BEFORE any page loads.
* Uses addInitScript so the token is present on the first navigation.
* If the cached token is rejected later, acquire() falls back to login.
*/
private async injectAuth(context: BrowserContext, account: Account): Promise<void> {
// Try the cached token first
try {
await context.addInitScript(
({ token, username }) => {
window.localStorage.setItem('authToken', token);
window.localStorage.setItem(
'authUser',
JSON.stringify({ id: '', username, role: 'user', email_verified: true }),
);
},
{ token: account.token, username: account.username },
);
} catch (err) {
this.opts.logger.warn('inject_auth_failed', {
account: account.key,
error: err instanceof Error ? err.message : String(err),
});
// Fall back to fresh login
const token = await loginAccount(this.opts.targetUrl, account.username, account.password);
account.token = token;
writeCredFile(this.opts.credFile, this.accounts);
await context.addInitScript(
({ token, username }) => {
window.localStorage.setItem('authToken', token);
window.localStorage.setItem(
'authUser',
JSON.stringify({ id: '', username, role: 'user', email_verified: true }),
);
},
{ token, username: account.username },
);
}
}
/** Close all active contexts. Safe to call multiple times. */
async release(): Promise<void> {
for (const session of this.activeSessions) {
try {
await session.context.close();
} catch {
// ignore
}
}
this.activeSessions = [];
if (this.ownedBrowser) {
try {
await this.ownedBrowser.close();
} catch {
// ignore
}
this.ownedBrowser = null;
this.browser = null;
}
}
}
- Step 5: Syntax-check by invoking tsx
cd tests/soak
npx tsx -e "import('./core/session-pool').then(() => console.log('ok'))"
Expected: ok. No TypeScript errors.
- Step 6: Commit
git add tests/soak/core/session-pool.ts
git commit -m "$(cat <<'EOF'
feat(soak): SessionPool — seed, login, acquire contexts
Owns 16 BrowserContexts, seeds via POST /api/auth/register with the
invite code on cold start, warm-starts via localStorage injection of
the cached JWT, falls back to POST /api/auth/login if the token is
rejected. Exposes acquire(n) for scenarios.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
EOF
)"
Task 14: seed-accounts.ts CLI wrapper
Tiny standalone entry point that lets you pre-seed before the first harness run. Reuses SessionPool.seed.
Files:
-
Create:
tests/soak/scripts/seed-accounts.ts -
Step 1: Write the script
#!/usr/bin/env tsx
/**
* Seed N soak-harness accounts via the register endpoint.
*
* Usage:
* TEST_URL=http://localhost:8000 \
* SOAK_INVITE_CODE=SOAKTEST \
* npm run seed -- --count=16
*/
import * as path from 'path';
import { SessionPool } from '../core/session-pool';
import { createLogger } from '../core/logger';
function parseArgs(argv: string[]): { count: number } {
const result = { count: 16 };
for (const arg of argv.slice(2)) {
const m = arg.match(/^--count=(\d+)$/);
if (m) result.count = parseInt(m[1], 10);
}
return result;
}
async function main(): Promise<void> {
const { count } = parseArgs(process.argv);
const targetUrl = process.env.TEST_URL ?? 'http://localhost:8000';
const inviteCode = process.env.SOAK_INVITE_CODE;
if (!inviteCode) {
console.error('SOAK_INVITE_CODE env var is required');
console.error(' Local dev: SOAK_INVITE_CODE=SOAKTEST');
console.error(' Staging: SOAK_INVITE_CODE=5VC2MCCN');
process.exit(2);
}
const credFile = path.resolve(__dirname, '..', '.env.stresstest');
const logger = createLogger({ runId: `seed-${Date.now()}` });
logger.info('seed_start', { count, targetUrl, credFile });
try {
const accounts = await SessionPool.seed({
targetUrl,
inviteCode,
count,
credFile,
logger,
});
logger.info('seed_complete', { created: accounts.length });
console.error(`Seeded ${accounts.length} accounts → ${credFile}`);
} catch (err) {
logger.error('seed_failed', {
error: err instanceof Error ? err.message : String(err),
});
process.exit(1);
}
}
main();
- Step 2: Run it against local dev to verify end-to-end
With the dev server running and the SOAKTEST invite flagged:
cd tests/soak
TEST_URL=http://localhost:8000 SOAK_INVITE_CODE=SOAKTEST npm run seed -- --count=4
Expected:
- Log lines
seeding_account× 4 - Log line
seed_complete tests/soak/.env.stresstestfile created with 4SOAK_ACCOUNT_NN=...lines
Verify:
cat tests/soak/.env.stresstest | head
Expected: 4 account lines.
Also verify the accounts got flagged:
psql -d golfgame -c "SELECT username, is_test_account FROM users_v2 WHERE username LIKE 'soak_%' ORDER BY username;"
Expected: 4 rows, all with is_test_account | t.
- Step 3: Commit
git add tests/soak/scripts/seed-accounts.ts
git commit -m "$(cat <<'EOF'
feat(soak): scripts/seed-accounts.ts CLI wrapper
Thin standalone entry for pre-seeding N accounts before the first
harness run. Wraps SessionPool.seed and writes .env.stresstest.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
EOF
)"
Phase 4 — First scenario, config, runner (end-to-end milestone)
Task 15: Shared multiplayer-game helper
Pulls the "run one full game in one room" logic out of the scenarios so populate and stress share it. Takes a room's sessions and a config, loops until the game ends.
Files:
-
Create:
tests/soak/scenarios/shared/multiplayer-game.ts -
Step 1: Create the helper module
// tests/soak/scenarios/shared/multiplayer-game.ts
import type { Session, ScenarioContext } from '../../core/types';
export interface MultiplayerGameOptions {
roomId: string;
holes: number;
decks: number;
cpusPerRoom: number;
cpuPersonality?: string;
/** Per-turn think time in [min, max] ms. */
thinkTimeMs: [number, number];
/** Max wall-clock time before giving up on the game (ms). */
maxDurationMs?: number;
}
export interface MultiplayerGameResult {
completed: boolean;
turns: number;
durationMs: number;
error?: string;
}
function randomInt(min: number, max: number): number {
return Math.floor(Math.random() * (max - min + 1)) + min;
}
async function sleep(ms: number): Promise<void> {
return new Promise((resolve) => setTimeout(resolve, ms));
}
/**
* Host + joiners play one full multiplayer game end to end.
* The host creates the room, announces the code via the coordinator,
* joiners wait for the code, the host adds CPUs and starts, everyone
* loops on isMyTurn/playTurn until round_over or game_over.
*/
export async function runOneMultiplayerGame(
ctx: ScenarioContext,
sessions: Session[],
opts: MultiplayerGameOptions,
): Promise<MultiplayerGameResult> {
const start = Date.now();
const [host, ...joiners] = sessions;
const maxDuration = opts.maxDurationMs ?? 5 * 60_000;
try {
// Host creates game
const code = await host.bot.createGame(host.account.username);
ctx.coordinator.announce(opts.roomId, code);
ctx.heartbeat(opts.roomId);
ctx.dashboard.update(opts.roomId, { phase: 'lobby' });
ctx.logger.info('room_created', { room: opts.roomId, code });
// Joiners join concurrently
await Promise.all(
joiners.map(async (joiner) => {
const awaited = await ctx.coordinator.await(opts.roomId);
await joiner.bot.joinGame(awaited, joiner.account.username);
}),
);
ctx.heartbeat(opts.roomId);
// Host adds CPUs (if any) and starts
for (let i = 0; i < opts.cpusPerRoom; i++) {
await host.bot.addCPU(opts.cpuPersonality);
}
await host.bot.startGame({ holes: opts.holes, decks: opts.decks });
ctx.heartbeat(opts.roomId);
ctx.dashboard.update(opts.roomId, { phase: 'playing', totalHoles: opts.holes });
// Concurrent turn loops — one per session
const turnCounts = new Array(sessions.length).fill(0);
async function sessionLoop(sessionIdx: number): Promise<void> {
const session = sessions[sessionIdx];
while (true) {
if (ctx.signal.aborted) return;
if (Date.now() - start > maxDuration) return;
const phase = await session.bot.getGamePhase();
if (phase === 'game_over' || phase === 'round_over') return;
if (await session.bot.isMyTurn()) {
await session.bot.playTurn();
turnCounts[sessionIdx]++;
ctx.heartbeat(opts.roomId);
ctx.dashboard.update(opts.roomId, {
currentPlayer: session.account.username,
moves: turnCounts.reduce((a, b) => a + b, 0),
});
const thinkMs = randomInt(opts.thinkTimeMs[0], opts.thinkTimeMs[1]);
await sleep(thinkMs);
} else {
await sleep(200);
}
}
}
await Promise.all(sessions.map((_, i) => sessionLoop(i)));
const totalTurns = turnCounts.reduce((a, b) => a + b, 0);
ctx.dashboard.update(opts.roomId, { phase: 'round_over' });
return {
completed: true,
turns: totalTurns,
durationMs: Date.now() - start,
};
} catch (err) {
return {
completed: false,
turns: 0,
durationMs: Date.now() - start,
error: err instanceof Error ? err.message : String(err),
};
}
}
- Step 2: Syntax-check
cd tests/soak
npx tsx -e "import('./scenarios/shared/multiplayer-game').then(() => console.log('ok'))"
Expected: ok.
- Step 3: Commit
git add tests/soak/scenarios/shared/multiplayer-game.ts
git commit -m "$(cat <<'EOF'
feat(soak): shared runOneMultiplayerGame helper
Encapsulates the host-creates/joiners-join/loop-until-done flow so
populate and stress scenarios don't duplicate it. Honors abort
signal and a max-duration timeout, heartbeats on every turn.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
EOF
)"
Task 16: Populate scenario (minimal version)
Partitions sessions into rooms, runs gamesPerRoom games per room in parallel, aggregates results.
Files:
-
Create:
tests/soak/scenarios/populate.ts -
Create:
tests/soak/scenarios/index.ts -
Step 1: Create
scenarios/populate.ts
// tests/soak/scenarios/populate.ts
import type {
Scenario,
ScenarioContext,
ScenarioResult,
ScenarioError,
Session,
} from '../core/types';
import { runOneMultiplayerGame } from './shared/multiplayer-game';
const CPU_PERSONALITIES = ['Sofia', 'Marcus', 'Kenji', 'Priya'];
interface PopulateConfig {
gamesPerRoom: number;
holes: number;
decks: number;
rooms: number;
cpusPerRoom: number;
thinkTimeMs: [number, number];
interGamePauseMs: number;
}
function chunk<T>(arr: T[], size: number): T[][] {
const out: T[][] = [];
for (let i = 0; i < arr.length; i += size) {
out.push(arr.slice(i, i + size));
}
return out;
}
async function sleep(ms: number): Promise<void> {
return new Promise((resolve) => setTimeout(resolve, ms));
}
async function runRoom(
ctx: ScenarioContext,
cfg: PopulateConfig,
roomIdx: number,
sessions: Session[],
): Promise<{ completed: number; errors: ScenarioError[] }> {
const roomId = `room-${roomIdx}`;
const cpuPersonality = CPU_PERSONALITIES[roomIdx % CPU_PERSONALITIES.length];
let completed = 0;
const errors: ScenarioError[] = [];
for (let gameNum = 0; gameNum < cfg.gamesPerRoom; gameNum++) {
if (ctx.signal.aborted) break;
ctx.dashboard.update(roomId, { game: gameNum + 1, totalGames: cfg.gamesPerRoom });
ctx.logger.info('game_start', { room: roomId, game: gameNum + 1 });
const result = await runOneMultiplayerGame(ctx, sessions, {
roomId,
holes: cfg.holes,
decks: cfg.decks,
cpusPerRoom: cfg.cpusPerRoom,
cpuPersonality,
thinkTimeMs: cfg.thinkTimeMs,
});
if (result.completed) {
completed++;
ctx.logger.info('game_complete', {
room: roomId,
game: gameNum + 1,
turns: result.turns,
durationMs: result.durationMs,
});
} else {
errors.push({
room: roomId,
reason: 'game_failed',
detail: result.error,
timestamp: Date.now(),
});
ctx.logger.error('game_failed', { room: roomId, game: gameNum + 1, error: result.error });
}
if (gameNum < cfg.gamesPerRoom - 1) {
await sleep(cfg.interGamePauseMs);
}
}
return { completed, errors };
}
const populate: Scenario = {
name: 'populate',
description: 'Long multi-round games to populate scoreboards',
needs: { accounts: 16, rooms: 4, cpusPerRoom: 1 },
defaultConfig: {
gamesPerRoom: 10,
holes: 9,
decks: 2,
rooms: 4,
cpusPerRoom: 1,
thinkTimeMs: [800, 2200],
interGamePauseMs: 3000,
},
async run(ctx: ScenarioContext): Promise<ScenarioResult> {
const start = Date.now();
const cfg = ctx.config as unknown as PopulateConfig;
const perRoom = Math.floor(ctx.sessions.length / cfg.rooms);
if (perRoom * cfg.rooms !== ctx.sessions.length) {
throw new Error(
`populate: ${ctx.sessions.length} sessions does not divide evenly into ${cfg.rooms} rooms`,
);
}
const roomSessions = chunk(ctx.sessions, perRoom);
const results = await Promise.allSettled(
roomSessions.map((sessions, idx) => runRoom(ctx, cfg, idx, sessions)),
);
let gamesCompleted = 0;
const errors: ScenarioError[] = [];
results.forEach((r, idx) => {
if (r.status === 'fulfilled') {
gamesCompleted += r.value.completed;
errors.push(...r.value.errors);
} else {
errors.push({
room: `room-${idx}`,
reason: 'room_threw',
detail: r.reason instanceof Error ? r.reason.message : String(r.reason),
timestamp: Date.now(),
});
}
});
return {
gamesCompleted,
errors,
durationMs: Date.now() - start,
};
},
};
export default populate;
- Step 2: Create
scenarios/index.tsregistry
// tests/soak/scenarios/index.ts
import type { Scenario } from '../core/types';
import populate from './populate';
const registry: Record<string, Scenario> = {
populate,
};
export function getScenario(name: string): Scenario | undefined {
return registry[name];
}
export function listScenarios(): Scenario[] {
return Object.values(registry);
}
- Step 3: Syntax-check
cd tests/soak
npx tsx -e "import('./scenarios/index').then((m) => console.log(m.listScenarios().map(s => s.name)))"
Expected: ['populate'].
- Step 4: Commit
git add tests/soak/scenarios/populate.ts tests/soak/scenarios/index.ts
git commit -m "$(cat <<'EOF'
feat(soak): populate scenario + scenario registry
Partitions sessions into N rooms, runs gamesPerRoom games per room
in parallel via Promise.allSettled so a failure in one room never
unwinds the others. Errors roll up into ScenarioResult.errors.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
EOF
)"
Task 17: Config parsing with tests
CLI flags, env vars, scenario defaults, runner defaults — merged in that precedence order.
Files:
-
Create:
tests/soak/config.ts -
Create:
tests/soak/tests/config.test.ts -
Step 1: Write failing tests
// tests/soak/tests/config.test.ts
import { describe, it, expect } from 'vitest';
import { parseArgs, mergeConfig } from '../config';
describe('parseArgs', () => {
it('parses --scenario and numeric flags', () => {
const r = parseArgs(['--scenario=populate', '--rooms=4', '--games-per-room=10']);
expect(r.scenario).toBe('populate');
expect(r.rooms).toBe(4);
expect(r.gamesPerRoom).toBe(10);
});
it('parses watch mode', () => {
const r = parseArgs(['--scenario=populate', '--watch=none']);
expect(r.watch).toBe('none');
});
it('rejects unknown watch mode', () => {
expect(() => parseArgs(['--scenario=populate', '--watch=bogus'])).toThrow();
});
it('--list sets listOnly', () => {
const r = parseArgs(['--list']);
expect(r.listOnly).toBe(true);
});
});
describe('mergeConfig', () => {
it('CLI flags override scenario defaults', () => {
const cfg = mergeConfig(
{ games: 5, holes: 9 },
{},
{ gamesPerRoom: 20 },
);
expect(cfg.gamesPerRoom).toBe(20);
});
it('env overrides scenario defaults but not CLI', () => {
const cfg = mergeConfig(
{ games: 5, holes: 9 },
{ SOAK_HOLES: '3' },
{ holes: 7 },
);
expect(cfg.holes).toBe(7); // CLI wins (7 was from scenario defaults? no — CLI not set here)
// Correction: CLI not set, so env wins over scenario default
});
it('scenario defaults fill in unset values', () => {
const cfg = mergeConfig(
{ games: 5, holes: 9 },
{},
{ gamesPerRoom: 3 },
);
expect(cfg.games).toBe(5);
expect(cfg.holes).toBe(9);
expect(cfg.gamesPerRoom).toBe(3);
});
});
Note: the middle test has a correction inline — re-read and fix so the assertion matches precedence "CLI > env > defaults". Correct version:
it('env overrides scenario defaults but CLI overrides env', () => {
const cfg = mergeConfig(
{ holes: 5 }, // CLI
{ SOAK_HOLES: '3' }, // env
{ holes: 9 }, // defaults
);
expect(cfg.holes).toBe(5); // CLI wins
});
Replace the second it(...) block above with this corrected version before running.
- Step 2: Run tests to verify they fail
npx vitest run tests/config.test.ts
Expected: FAIL — module not found.
- Step 3: Implement
config.ts
// tests/soak/config.ts
export type WatchMode = 'none' | 'dashboard' | 'tiled';
export interface CliArgs {
scenario?: string;
accounts?: number;
rooms?: number;
cpusPerRoom?: number;
gamesPerRoom?: number;
holes?: number;
watch?: WatchMode;
dashboardPort?: number;
target?: string;
runId?: string;
dryRun?: boolean;
listOnly?: boolean;
}
const VALID_WATCH: WatchMode[] = ['none', 'dashboard', 'tiled'];
function parseInt10(s: string, name: string): number {
const n = parseInt(s, 10);
if (Number.isNaN(n)) throw new Error(`Invalid integer for ${name}: ${s}`);
return n;
}
export function parseArgs(argv: string[]): CliArgs {
const out: CliArgs = {};
for (const arg of argv) {
if (arg === '--list') {
out.listOnly = true;
continue;
}
if (arg === '--dry-run') {
out.dryRun = true;
continue;
}
const m = arg.match(/^--([a-z][a-z0-9-]*)=(.*)$/);
if (!m) continue;
const [, key, value] = m;
switch (key) {
case 'scenario':
out.scenario = value;
break;
case 'accounts':
out.accounts = parseInt10(value, '--accounts');
break;
case 'rooms':
out.rooms = parseInt10(value, '--rooms');
break;
case 'cpus-per-room':
out.cpusPerRoom = parseInt10(value, '--cpus-per-room');
break;
case 'games-per-room':
out.gamesPerRoom = parseInt10(value, '--games-per-room');
break;
case 'holes':
out.holes = parseInt10(value, '--holes');
break;
case 'watch':
if (!VALID_WATCH.includes(value as WatchMode)) {
throw new Error(`Invalid --watch value: ${value} (expected ${VALID_WATCH.join('|')})`);
}
out.watch = value as WatchMode;
break;
case 'dashboard-port':
out.dashboardPort = parseInt10(value, '--dashboard-port');
break;
case 'target':
out.target = value;
break;
case 'run-id':
out.runId = value;
break;
default:
// Unknown flag — ignore so scenario-specific flags can be added later
break;
}
}
return out;
}
/**
* Merge in order: scenarioDefaults → env → cli (later wins).
*/
export function mergeConfig(
cli: Record<string, unknown>,
env: Record<string, string | undefined>,
defaults: Record<string, unknown>,
): Record<string, unknown> {
const merged: Record<string, unknown> = { ...defaults };
// Env overlay — SOAK_UPPER_SNAKE → lowerCamel in cli space.
const envMap: Record<string, string> = {
SOAK_HOLES: 'holes',
SOAK_ROOMS: 'rooms',
SOAK_ACCOUNTS: 'accounts',
SOAK_CPUS_PER_ROOM: 'cpusPerRoom',
SOAK_GAMES_PER_ROOM: 'gamesPerRoom',
SOAK_WATCH: 'watch',
SOAK_DASHBOARD_PORT: 'dashboardPort',
};
for (const [envKey, cfgKey] of Object.entries(envMap)) {
const v = env[envKey];
if (v !== undefined) {
// Heuristic: numeric keys
if (/^(holes|rooms|accounts|cpusPerRoom|gamesPerRoom|dashboardPort)$/.test(cfgKey)) {
merged[cfgKey] = parseInt(v, 10);
} else {
merged[cfgKey] = v;
}
}
}
// CLI overlay — wins over env and defaults.
for (const [k, v] of Object.entries(cli)) {
if (v !== undefined) merged[k] = v;
}
return merged;
}
- Step 4: Fix the failing middle test as noted in Step 1
Edit tests/soak/tests/config.test.ts and replace the second it(...) block inside describe('mergeConfig') with the corrected version provided in Step 1.
- Step 5: Run tests to verify they pass
npx vitest run tests/config.test.ts
Expected: all passing.
- Step 6: Commit
git add tests/soak/config.ts tests/soak/tests/config.test.ts
git commit -m "$(cat <<'EOF'
feat(soak): CLI parsing + config precedence
parseArgs pulls --scenario/--rooms/--watch/etc from argv, mergeConfig
layers scenarioDefaults → env → CLI so CLI flags always win. Unit
tested.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
EOF
)"
Task 18: runner.ts entry point — first end-to-end milestone
Replaces the placeholder runner with the real thing: parse args, build dependencies, load scenario, acquire sessions, run scenario, clean up, print summary. Supports --watch=none only at this stage.
Files:
-
Modify:
tests/soak/runner.ts(replace placeholder) -
Step 1: Rewrite
runner.ts
#!/usr/bin/env tsx
/**
* Golf Soak Harness — entry point.
*
* Usage:
* TEST_URL=http://localhost:8000 \
* SOAK_INVITE_CODE=SOAKTEST \
* npm run soak -- --scenario=populate --rooms=1 --accounts=2 \
* --cpus-per-room=0 --games-per-room=1 --holes=1 --watch=none
*/
import * as path from 'path';
import { parseArgs, mergeConfig, CliArgs } from './config';
import { createLogger } from './core/logger';
import { SessionPool } from './core/session-pool';
import { RoomCoordinator } from './core/room-coordinator';
import { getScenario, listScenarios } from './scenarios';
import type { DashboardReporter, ScenarioContext } from './core/types';
function noopDashboard(): DashboardReporter {
return {
update: () => {},
log: () => {},
incrementMetric: () => {},
};
}
function printScenarioList(): void {
console.log('Available scenarios:');
for (const s of listScenarios()) {
console.log(` ${s.name.padEnd(12)} ${s.description}`);
console.log(` needs: accounts=${s.needs.accounts}, rooms=${s.needs.rooms ?? 1}, cpus=${s.needs.cpusPerRoom ?? 0}`);
}
}
async function main(): Promise<void> {
const cli: CliArgs = parseArgs(process.argv.slice(2));
if (cli.listOnly) {
printScenarioList();
return;
}
if (!cli.scenario) {
console.error('Error: --scenario=<name> is required. Use --list to see scenarios.');
process.exit(2);
}
const scenario = getScenario(cli.scenario);
if (!scenario) {
console.error(`Error: unknown scenario "${cli.scenario}". Use --list to see scenarios.`);
process.exit(2);
}
const runId = cli.runId ?? `${cli.scenario}-${new Date().toISOString().replace(/[:.]/g, '-')}`;
const targetUrl = cli.target ?? process.env.TEST_URL ?? 'http://localhost:8000';
const inviteCode = process.env.SOAK_INVITE_CODE ?? 'SOAKTEST';
const watch = cli.watch ?? 'dashboard';
const logger = createLogger({ runId });
logger.info('run_start', {
scenario: scenario.name,
targetUrl,
watch,
cli,
});
// Resolve final config
const config = mergeConfig(
cli as Record<string, unknown>,
process.env,
scenario.defaultConfig,
);
// Ensure core knobs exist
const accounts = Number(config.accounts ?? scenario.needs.accounts);
const rooms = Number(config.rooms ?? scenario.needs.rooms ?? 1);
const cpusPerRoom = Number(config.cpusPerRoom ?? scenario.needs.cpusPerRoom ?? 0);
if (accounts % rooms !== 0) {
console.error(`Error: --accounts=${accounts} does not divide evenly into --rooms=${rooms}`);
process.exit(2);
}
config.rooms = rooms;
config.cpusPerRoom = cpusPerRoom;
if (cli.dryRun) {
logger.info('dry_run', { config });
console.log('Dry run OK. Resolved config:');
console.log(JSON.stringify(config, null, 2));
return;
}
if (watch !== 'none') {
logger.warn('watch_mode_not_yet_implemented', { watch });
console.warn(`Watch mode "${watch}" not yet implemented — falling back to "none".`);
}
// Build dependencies
const credFile = path.resolve(__dirname, '.env.stresstest');
const pool = new SessionPool({
targetUrl,
inviteCode,
credFile,
logger,
});
const coordinator = new RoomCoordinator();
const dashboard = noopDashboard();
const abortController = new AbortController();
const onSignal = (sig: string) => {
logger.warn('signal_received', { signal: sig });
abortController.abort();
};
process.on('SIGINT', () => onSignal('SIGINT'));
process.on('SIGTERM', () => onSignal('SIGTERM'));
let exitCode = 0;
try {
const sessions = await pool.acquire(accounts);
logger.info('sessions_acquired', { count: sessions.length });
const ctx: ScenarioContext = {
config,
sessions,
coordinator,
dashboard,
logger,
signal: abortController.signal,
heartbeat: () => {}, // Task 26 wires this up
};
const result = await scenario.run(ctx);
logger.info('run_complete', {
gamesCompleted: result.gamesCompleted,
errors: result.errors.length,
durationMs: result.durationMs,
});
console.log(`Games completed: ${result.gamesCompleted}`);
console.log(`Errors: ${result.errors.length}`);
console.log(`Duration: ${(result.durationMs / 1000).toFixed(1)}s`);
if (result.errors.length > 0) {
console.log('Errors:');
for (const e of result.errors) {
console.log(` ${e.room}: ${e.reason}${e.detail ? ' — ' + e.detail : ''}`);
}
exitCode = 1;
}
} catch (err) {
logger.error('run_failed', {
error: err instanceof Error ? err.message : String(err),
stack: err instanceof Error ? err.stack : undefined,
});
exitCode = 1;
} finally {
await pool.release();
}
if (abortController.signal.aborted && exitCode === 0) exitCode = 2;
process.exit(exitCode);
}
main().catch((err) => {
console.error(err);
process.exit(1);
});
- Step 2: Run a minimal
--watch=nonesmoke against local dev
Server running, 4 soak accounts already seeded from Task 14:
cd tests/soak
TEST_URL=http://localhost:8000 SOAK_INVITE_CODE=SOAKTEST npm run soak -- \
--scenario=populate \
--accounts=2 \
--rooms=1 \
--cpus-per-room=0 \
--games-per-room=1 \
--holes=1 \
--watch=none
Expected output (abbreviated):
{"timestamp":"...","level":"info","msg":"run_start",...}
{"timestamp":"...","level":"info","msg":"sessions_acquired","count":2}
{"timestamp":"...","level":"info","msg":"game_start","room":"room-0","game":1}
{"timestamp":"...","level":"info","msg":"room_created","code":"XXXX"}
{"timestamp":"...","level":"info","msg":"game_complete","room":"room-0","turns":...}
{"timestamp":"...","level":"info","msg":"run_complete","gamesCompleted":1,"errors":0}
Games completed: 1
Errors: 0
Duration: X.Xs
Exit code 0.
This is the first end-to-end milestone. Stop here if debugging is needed — fix issues before moving on.
- Step 3: Commit
git add tests/soak/runner.ts
git commit -m "$(cat <<'EOF'
feat(soak): runner.ts end-to-end with --watch=none
First full end-to-end milestone: parses CLI, builds SessionPool +
RoomCoordinator, loads a scenario by name, runs it, reports results,
cleans up. Watch modes other than "none" log a warning and fall back
until Tasks 19-24 implement them.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
EOF
)"
Phase 5 — Dashboard status grid
Task 19: Dashboard HTTP + WS server
Vanilla node http + ws. Serves one static HTML page, accepts WS connections, broadcasts room-state updates.
Files:
-
Create:
tests/soak/dashboard/server.ts -
Step 1: Implement
dashboard/server.ts
// tests/soak/dashboard/server.ts
import * as http from 'http';
import * as fs from 'fs';
import * as path from 'path';
import { WebSocketServer, WebSocket } from 'ws';
import type { DashboardReporter, Logger, RoomState } from '../core/types';
export type DashboardIncoming =
| { type: 'start_stream'; sessionKey: string }
| { type: 'stop_stream'; sessionKey: string };
export type DashboardOutgoing =
| { type: 'room_state'; roomId: string; state: Partial<RoomState> }
| { type: 'log'; level: string; msg: string; meta?: object; timestamp: number }
| { type: 'metric'; name: string; value: number }
| { type: 'frame'; sessionKey: string; jpegBase64: string };
export interface DashboardHandlers {
onStartStream?(sessionKey: string): void;
onStopStream?(sessionKey: string): void;
onDisconnect?(): void;
}
export class DashboardServer {
private httpServer!: http.Server;
private wsServer!: WebSocketServer;
private clients = new Set<WebSocket>();
private metrics: Record<string, number> = {};
private roomStates: Record<string, Partial<RoomState>> = {};
constructor(
private port: number,
private logger: Logger,
private handlers: DashboardHandlers = {},
) {}
async start(): Promise<void> {
const htmlPath = path.resolve(__dirname, 'index.html');
const cssPath = path.resolve(__dirname, 'dashboard.css');
const jsPath = path.resolve(__dirname, 'dashboard.js');
this.httpServer = http.createServer((req, res) => {
const url = req.url ?? '/';
if (url === '/' || url === '/index.html') {
res.writeHead(200, { 'Content-Type': 'text/html; charset=utf-8' });
fs.createReadStream(htmlPath).pipe(res);
} else if (url === '/dashboard.css') {
res.writeHead(200, { 'Content-Type': 'text/css' });
fs.createReadStream(cssPath).pipe(res);
} else if (url === '/dashboard.js') {
res.writeHead(200, { 'Content-Type': 'application/javascript' });
fs.createReadStream(jsPath).pipe(res);
} else {
res.writeHead(404);
res.end('not found');
}
});
this.wsServer = new WebSocketServer({ server: this.httpServer });
this.wsServer.on('connection', (ws) => {
this.clients.add(ws);
this.logger.info('dashboard_client_connected', { count: this.clients.size });
// Replay current state to the new client
for (const [roomId, state] of Object.entries(this.roomStates)) {
ws.send(JSON.stringify({ type: 'room_state', roomId, state } as DashboardOutgoing));
}
for (const [name, value] of Object.entries(this.metrics)) {
ws.send(JSON.stringify({ type: 'metric', name, value } as DashboardOutgoing));
}
ws.on('message', (data) => {
try {
const parsed = JSON.parse(data.toString()) as DashboardIncoming;
if (parsed.type === 'start_stream' && this.handlers.onStartStream) {
this.handlers.onStartStream(parsed.sessionKey);
} else if (parsed.type === 'stop_stream' && this.handlers.onStopStream) {
this.handlers.onStopStream(parsed.sessionKey);
}
} catch (err) {
this.logger.warn('dashboard_ws_parse_error', {
error: err instanceof Error ? err.message : String(err),
});
}
});
ws.on('close', () => {
this.clients.delete(ws);
this.logger.info('dashboard_client_disconnected', { count: this.clients.size });
if (this.clients.size === 0 && this.handlers.onDisconnect) {
this.handlers.onDisconnect();
}
});
});
await new Promise<void>((resolve) => {
this.httpServer.listen(this.port, () => resolve());
});
this.logger.info('dashboard_listening', { url: `http://localhost:${this.port}` });
}
async stop(): Promise<void> {
for (const ws of this.clients) {
try {
ws.close();
} catch {
// ignore
}
}
this.clients.clear();
await new Promise<void>((resolve) => {
this.wsServer.close(() => resolve());
});
await new Promise<void>((resolve) => {
this.httpServer.close(() => resolve());
});
}
broadcast(msg: DashboardOutgoing): void {
const payload = JSON.stringify(msg);
for (const ws of this.clients) {
if (ws.readyState === WebSocket.OPEN) {
ws.send(payload);
}
}
}
/** Create a DashboardReporter wired to this server. */
reporter(): DashboardReporter {
return {
update: (roomId, state) => {
this.roomStates[roomId] = { ...this.roomStates[roomId], ...state };
this.broadcast({ type: 'room_state', roomId, state });
},
log: (level, msg, meta) => {
this.broadcast({ type: 'log', level, msg, meta, timestamp: Date.now() });
},
incrementMetric: (name, by = 1) => {
this.metrics[name] = (this.metrics[name] ?? 0) + by;
this.broadcast({ type: 'metric', name, value: this.metrics[name] });
},
};
}
}
- Step 2: Syntax-check
cd tests/soak
npx tsx -e "import('./dashboard/server').then(() => console.log('ok'))"
Expected: ok.
- Step 3: Commit
git add tests/soak/dashboard/server.ts
git commit -m "$(cat <<'EOF'
feat(soak): DashboardServer — vanilla http + ws
Serves one static HTML page, accepts WS connections, broadcasts
room_state/log/metric messages to all clients. Exposes a
reporter() method that returns a DashboardReporter scenarios can
call without knowing about sockets.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
EOF
)"
Task 20: Dashboard HTML/CSS/JS status grid
Single static HTML page + stylesheet + client script. Renders the 2×2 room grid, subscribes to WS, updates tiles on each message.
Files:
-
Create:
tests/soak/dashboard/index.html -
Create:
tests/soak/dashboard/dashboard.css -
Create:
tests/soak/dashboard/dashboard.js -
Step 1: Create
dashboard/index.html
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Golf Soak Dashboard</title>
<link rel="stylesheet" href="/dashboard.css">
</head>
<body>
<header class="dash-header">
<h1>⛳ Golf Soak Dashboard</h1>
<div class="meta">
<span id="run-id">run —</span>
<span id="elapsed">00:00:00</span>
</div>
</header>
<div class="meta-bar">
<div class="stat"><span class="label">Games</span><span id="metric-games">0</span></div>
<div class="stat"><span class="label">Moves</span><span id="metric-moves">0</span></div>
<div class="stat"><span class="label">Errors</span><span id="metric-errors">0</span></div>
<div class="stat"><span class="label">WS</span><span id="ws-status">connecting</span></div>
</div>
<div class="rooms" id="rooms">
<!-- Room tiles injected by dashboard.js -->
</div>
<section class="log">
<div class="log-header">Activity Log</div>
<ul id="log-list"></ul>
</section>
<!-- Modal for focused live video (Task 23) -->
<div id="video-modal" class="video-modal hidden">
<div class="video-modal-content">
<div class="video-modal-header">
<span id="video-modal-title">Watching —</span>
<button id="video-modal-close">Close</button>
</div>
<img id="video-frame" alt="Live screencast" />
</div>
</div>
<script src="/dashboard.js"></script>
</body>
</html>
- Step 2: Create
dashboard/dashboard.css
:root {
--bg: #0a0e16;
--panel: #0e1420;
--border: #1a2230;
--text: #c8d4e4;
--accent: #7fbaff;
--good: #6fd08f;
--warn: #ffb84d;
--err: #ff5c6c;
--muted: #556577;
}
* { box-sizing: border-box; }
body {
margin: 0;
font-family: -apple-system, system-ui, 'SF Mono', Consolas, monospace;
background: var(--bg);
color: var(--text);
}
.dash-header {
display: flex;
justify-content: space-between;
align-items: center;
padding: 12px 20px;
background: linear-gradient(135deg, #0f1823, #0a1018);
border-bottom: 1px solid var(--border);
}
.dash-header h1 { margin: 0; font-size: 16px; color: var(--accent); }
.dash-header .meta { font-size: 11px; color: var(--muted); }
.dash-header .meta span + span { margin-left: 12px; }
.meta-bar {
display: flex;
gap: 24px;
padding: 10px 20px;
background: #0c131d;
border-bottom: 1px solid var(--border);
font-size: 12px;
}
.meta-bar .stat .label { color: var(--muted); margin-right: 6px; }
.meta-bar .stat span:last-child { color: #fff; font-weight: 600; }
.rooms {
display: grid;
grid-template-columns: 1fr 1fr;
gap: 1px;
background: var(--border);
}
.room {
background: var(--panel);
padding: 14px 18px;
min-height: 180px;
}
.room-title {
display: flex;
justify-content: space-between;
align-items: center;
margin-bottom: 10px;
}
.room-title .name { font-size: 13px; color: var(--accent); font-weight: 600; }
.room-title .phase {
font-size: 10px;
padding: 2px 8px;
border-radius: 10px;
background: #1a3a2a;
color: var(--good);
}
.room-title .phase.lobby { background: #3a2a1a; color: var(--warn); }
.room-title .phase.err { background: #3a1a1a; color: var(--err); }
.players {
display: grid;
grid-template-columns: repeat(2, 1fr);
gap: 4px;
font-size: 11px;
margin-bottom: 8px;
}
.player {
display: flex;
justify-content: space-between;
padding: 4px 8px;
background: #0a0f18;
border-radius: 3px;
cursor: pointer;
border: 1px solid transparent;
}
.player:hover { border-color: var(--accent); }
.player.active {
background: #1a2a40;
border-left: 2px solid var(--accent);
}
.player .score { color: var(--muted); }
.progress-bar {
height: 4px;
background: var(--border);
border-radius: 2px;
overflow: hidden;
margin-top: 6px;
}
.progress-fill {
height: 100%;
background: linear-gradient(90deg, var(--accent), var(--good));
transition: width 0.3s;
}
.room-meta {
font-size: 10px;
color: var(--muted);
display: flex;
gap: 12px;
margin-top: 6px;
}
.log {
border-top: 1px solid var(--border);
background: #080c13;
max-height: 160px;
overflow-y: auto;
}
.log .log-header {
padding: 6px 20px;
font-size: 10px;
text-transform: uppercase;
color: var(--muted);
border-bottom: 1px solid var(--border);
}
.log ul { list-style: none; margin: 0; padding: 4px 20px; font-size: 10px; }
.log li { line-height: 1.5; font-family: monospace; color: var(--muted); }
.log li.warn { color: var(--warn); }
.log li.error { color: var(--err); }
.video-modal {
position: fixed;
inset: 0;
background: rgba(0, 0, 0, 0.85);
display: flex;
align-items: center;
justify-content: center;
z-index: 100;
}
.video-modal.hidden { display: none; }
.video-modal-content {
background: var(--panel);
border: 1px solid var(--border);
border-radius: 6px;
padding: 16px;
max-width: 90vw;
max-height: 90vh;
}
.video-modal-header {
display: flex;
justify-content: space-between;
align-items: center;
margin-bottom: 12px;
color: var(--accent);
font-size: 13px;
}
.video-modal-header button {
background: var(--border);
color: var(--text);
border: none;
padding: 4px 12px;
border-radius: 3px;
cursor: pointer;
}
#video-frame {
display: block;
max-width: 100%;
max-height: 70vh;
border: 1px solid var(--border);
}
- Step 3: Create
dashboard/dashboard.js
// tests/soak/dashboard/dashboard.js
(() => {
const ws = new WebSocket(`ws://${location.host}`);
const roomsEl = document.getElementById('rooms');
const logEl = document.getElementById('log-list');
const wsStatusEl = document.getElementById('ws-status');
const metricGames = document.getElementById('metric-games');
const metricMoves = document.getElementById('metric-moves');
const metricErrors = document.getElementById('metric-errors');
const elapsedEl = document.getElementById('elapsed');
const roomTiles = new Map();
const startTime = Date.now();
let currentWatchedKey = null;
// Video modal
const videoModal = document.getElementById('video-modal');
const videoFrame = document.getElementById('video-frame');
const videoTitle = document.getElementById('video-modal-title');
const videoClose = document.getElementById('video-modal-close');
function fmtElapsed(ms) {
const s = Math.floor(ms / 1000);
const h = Math.floor(s / 3600);
const m = Math.floor((s % 3600) / 60);
const sec = s % 60;
return `${String(h).padStart(2, '0')}:${String(m).padStart(2, '0')}:${String(sec).padStart(2, '0')}`;
}
setInterval(() => {
elapsedEl.textContent = fmtElapsed(Date.now() - startTime);
}, 1000);
function ensureRoomTile(roomId) {
if (roomTiles.has(roomId)) return roomTiles.get(roomId);
const tile = document.createElement('div');
tile.className = 'room';
tile.innerHTML = `
<div class="room-title">
<div class="name">${roomId}</div>
<div class="phase lobby">waiting</div>
</div>
<div class="players"></div>
<div class="progress-bar"><div class="progress-fill" style="width:0%"></div></div>
<div class="room-meta">
<span class="moves">0 moves</span>
<span class="game">game —</span>
</div>
`;
roomsEl.appendChild(tile);
roomTiles.set(roomId, tile);
return tile;
}
function renderRoomState(roomId, state) {
const tile = ensureRoomTile(roomId);
if (state.phase !== undefined) {
const phaseEl = tile.querySelector('.phase');
phaseEl.textContent = state.phase;
phaseEl.classList.toggle('lobby', state.phase === 'lobby' || state.phase === 'waiting');
phaseEl.classList.toggle('err', state.phase === 'error');
}
if (state.players !== undefined) {
const playersEl = tile.querySelector('.players');
playersEl.innerHTML = state.players
.map(
(p) => `
<div class="player ${p.isActive ? 'active' : ''}" data-session="${p.key}">
<span>${p.isActive ? '▶ ' : ''}${p.key}</span>
<span class="score">${p.score ?? '—'}</span>
</div>
`,
)
.join('');
}
if (state.hole !== undefined && state.totalHoles !== undefined) {
const fill = tile.querySelector('.progress-fill');
const pct = state.totalHoles > 0 ? Math.round((state.hole / state.totalHoles) * 100) : 0;
fill.style.width = `${pct}%`;
}
if (state.moves !== undefined) {
tile.querySelector('.moves').textContent = `${state.moves} moves`;
}
if (state.game !== undefined && state.totalGames !== undefined) {
tile.querySelector('.game').textContent = `game ${state.game}/${state.totalGames}`;
}
}
function appendLog(level, msg, meta) {
const li = document.createElement('li');
li.className = level;
const ts = new Date().toLocaleTimeString();
li.textContent = `[${ts}] ${msg} ${meta ? JSON.stringify(meta) : ''}`;
logEl.insertBefore(li, logEl.firstChild);
// Cap log length
while (logEl.children.length > 100) {
logEl.removeChild(logEl.lastChild);
}
}
function applyMetric(name, value) {
if (name === 'games_completed') metricGames.textContent = value;
else if (name === 'moves_total') metricMoves.textContent = value;
else if (name === 'errors') metricErrors.textContent = value;
}
ws.addEventListener('open', () => {
wsStatusEl.textContent = 'healthy';
wsStatusEl.style.color = 'var(--good)';
});
ws.addEventListener('close', () => {
wsStatusEl.textContent = 'disconnected';
wsStatusEl.style.color = 'var(--err)';
});
ws.addEventListener('message', (event) => {
let msg;
try {
msg = JSON.parse(event.data);
} catch {
return;
}
if (msg.type === 'room_state') {
renderRoomState(msg.roomId, msg.state);
} else if (msg.type === 'log') {
appendLog(msg.level, msg.msg, msg.meta);
} else if (msg.type === 'metric') {
applyMetric(msg.name, msg.value);
} else if (msg.type === 'frame') {
if (msg.sessionKey === currentWatchedKey) {
videoFrame.src = `data:image/jpeg;base64,${msg.jpegBase64}`;
}
}
});
// Click-to-watch (wired in Task 23)
roomsEl.addEventListener('click', (e) => {
const playerEl = e.target.closest('.player');
if (!playerEl) return;
const key = playerEl.dataset.session;
if (!key) return;
currentWatchedKey = key;
videoTitle.textContent = `Watching ${key}`;
videoModal.classList.remove('hidden');
ws.send(JSON.stringify({ type: 'start_stream', sessionKey: key }));
});
function closeVideo() {
if (currentWatchedKey) {
ws.send(JSON.stringify({ type: 'stop_stream', sessionKey: currentWatchedKey }));
}
currentWatchedKey = null;
videoModal.classList.add('hidden');
videoFrame.src = '';
}
videoClose.addEventListener('click', closeVideo);
document.addEventListener('keydown', (e) => {
if (e.key === 'Escape') closeVideo();
});
})();
- Step 4: Commit
git add tests/soak/dashboard/index.html tests/soak/dashboard/dashboard.css tests/soak/dashboard/dashboard.js
git commit -m "$(cat <<'EOF'
feat(soak): dashboard status grid UI
Static HTML page served by DashboardServer. Renders the 2×2 room
grid with progress bars and player tiles, subscribes to WS events,
updates tiles live. Click-to-watch modal is wired but receives
frames once the CDP screencaster ships in Task 22.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
EOF
)"
Task 21: Wire WATCH=dashboard in runner
Start the dashboard server when --watch=dashboard, auto-open the URL in the user's browser, use its reporter() as the ctx.dashboard.
Files:
-
Modify:
tests/soak/runner.ts -
Step 1: Import and instantiate DashboardServer in
runner.ts
At the top of runner.ts, add:
import { DashboardServer } from './dashboard/server';
import { spawn } from 'child_process';
Replace the block that creates dashboard with:
// Build dashboard if requested
let dashboardServer: DashboardServer | null = null;
let dashboard: DashboardReporter = noopDashboard();
if (watch === 'dashboard') {
const port = Number(config.dashboardPort ?? 7777);
dashboardServer = new DashboardServer(port, logger, {
onStartStream: (_key) => {
logger.info('stream_start_requested', { sessionKey: _key });
// Wired in Task 22
},
onStopStream: (_key) => {
logger.info('stream_stop_requested', { sessionKey: _key });
},
});
await dashboardServer.start();
dashboard = dashboardServer.reporter();
const url = `http://localhost:${port}`;
console.log(`Dashboard: ${url}`);
// Best-effort auto-open
try {
const opener = process.platform === 'darwin' ? 'open' : process.platform === 'win32' ? 'start' : 'xdg-open';
spawn(opener, [url], { stdio: 'ignore', detached: true }).unref();
} catch {
// If auto-open fails, the URL is already printed
}
} else if (watch === 'tiled') {
logger.warn('tiled_not_yet_implemented');
console.warn('Watch mode "tiled" not yet implemented (Task 24). Falling back to none.');
}
And in the finally block, shut down the server:
} finally {
await pool.release();
if (dashboardServer) {
await dashboardServer.stop();
}
}
Also remove the earlier if (watch !== 'none') warning block — it's replaced by the dispatch above.
- Step 2: Run smoke against dev with dashboard
TEST_URL=http://localhost:8000 SOAK_INVITE_CODE=SOAKTEST npm run soak -- \
--scenario=populate \
--accounts=2 --rooms=1 --cpus-per-room=0 --games-per-room=1 --holes=1 \
--watch=dashboard
Expected:
-
Dashboard: http://localhost:7777printed -
Browser auto-opens (or you open it manually)
-
Page shows the dashboard with
WS: healthy -
During the game, the
room-0tile showsphase: playing, incrementsmoves, updates progress -
After game completes, the runner exits 0 and the dashboard stops
-
Step 3: Commit
git add tests/soak/runner.ts
git commit -m "$(cat <<'EOF'
feat(soak): wire --watch=dashboard in runner
Starts DashboardServer on 7777 (configurable), uses its reporter as
ctx.dashboard, auto-opens the URL. Cleans up on exit.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
EOF
)"
Phase 6 — Live video click-to-watch
Task 22: CDP screencast module
Attach a CDP session to a given page, start screencasting JPEG frames at a fixed rate, forward each frame to a callback, detach on stop.
Files:
-
Create:
tests/soak/core/screencaster.ts -
Step 1: Implement
core/screencaster.ts
// tests/soak/core/screencaster.ts
import type { Page, CDPSession } from 'playwright-core';
import type { Logger } from './types';
export interface ScreencastOptions {
format?: 'jpeg' | 'png';
quality?: number;
maxWidth?: number;
maxHeight?: number;
everyNthFrame?: number;
}
export type FrameCallback = (jpegBase64: string) => void;
export class Screencaster {
private sessions = new Map<string, CDPSession>();
constructor(private logger: Logger) {}
/**
* Attach a CDP session to the given page and start forwarding frames.
* If already streaming, this is a no-op.
*/
async start(
sessionKey: string,
page: Page,
onFrame: FrameCallback,
opts: ScreencastOptions = {},
): Promise<void> {
if (this.sessions.has(sessionKey)) {
this.logger.warn('screencast_already_running', { sessionKey });
return;
}
const client = await page.context().newCDPSession(page);
this.sessions.set(sessionKey, client);
client.on('Page.screencastFrame', async (evt: { data: string; sessionId: number }) => {
try {
onFrame(evt.data);
await client.send('Page.screencastFrameAck', { sessionId: evt.sessionId });
} catch (err) {
this.logger.warn('screencast_frame_error', {
sessionKey,
error: err instanceof Error ? err.message : String(err),
});
}
});
await client.send('Page.startScreencast', {
format: opts.format ?? 'jpeg',
quality: opts.quality ?? 60,
maxWidth: opts.maxWidth ?? 640,
maxHeight: opts.maxHeight ?? 360,
everyNthFrame: opts.everyNthFrame ?? 2,
});
this.logger.info('screencast_started', { sessionKey });
}
async stop(sessionKey: string): Promise<void> {
const client = this.sessions.get(sessionKey);
if (!client) return;
try {
await client.send('Page.stopScreencast');
await client.detach();
} catch (err) {
this.logger.warn('screencast_stop_error', {
sessionKey,
error: err instanceof Error ? err.message : String(err),
});
}
this.sessions.delete(sessionKey);
this.logger.info('screencast_stopped', { sessionKey });
}
async stopAll(): Promise<void> {
const keys = Array.from(this.sessions.keys());
await Promise.all(keys.map((k) => this.stop(k)));
}
}
- Step 2: Syntax-check
cd tests/soak
npx tsx -e "import('./core/screencaster').then(() => console.log('ok'))"
Expected: ok.
- Step 3: Commit
git add tests/soak/core/screencaster.ts
git commit -m "$(cat <<'EOF'
feat(soak): Screencaster — CDP Page.startScreencast wrapper
Attach/detach CDP sessions per Playwright Page, start/stop JPEG
screencasts with configurable quality and frame rate, forward each
frame to a callback. Used by the dashboard for click-to-watch
live video.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
EOF
)"
Task 23: Wire screencaster to dashboard click-to-watch
Runner creates a Screencaster, passes callbacks into DashboardServer.onStartStream/onStopStream that look up the right session and start/stop streaming. Each frame is broadcast to the dashboard.
Files:
-
Modify:
tests/soak/runner.ts -
Step 1: Import Screencaster and hold a sessions map
In runner.ts, add at the top:
import { Screencaster } from './core/screencaster';
After const sessions = await pool.acquire(accounts);, build a lookup map:
const sessionsByKey = new Map<string, typeof sessions[number]>();
for (const s of sessions) sessionsByKey.set(s.key, s);
Create the screencaster before the dashboard (or right after sessions are acquired):
const screencaster = new Screencaster(logger);
- Step 2: Replace the
onStartStream/onStopStreamno-ops with real wiring
Update the DashboardServer construction (earlier in the function) to accept handlers that close over screencaster and sessionsByKey. But since those are built after the dashboard, we need to build the dashboard AFTER sessions are acquired. Reorganize:
Move the dashboard construction to AFTER sessions = await pool.acquire(accounts). Then:
if (watch === 'dashboard') {
const port = Number(config.dashboardPort ?? 7777);
dashboardServer = new DashboardServer(port, logger, {
onStartStream: (key) => {
const session = sessionsByKey.get(key);
if (!session) {
logger.warn('stream_start_unknown_session', { sessionKey: key });
return;
}
screencaster
.start(key, session.page, (jpegBase64) => {
dashboardServer!.broadcast({ type: 'frame', sessionKey: key, jpegBase64 });
})
.catch((err) =>
logger.error('screencast_start_failed', {
key,
error: err instanceof Error ? err.message : String(err),
}),
);
},
onStopStream: (key) => {
screencaster.stop(key).catch(() => {});
},
onDisconnect: () => {
screencaster.stopAll().catch(() => {});
},
});
await dashboardServer.start();
dashboard = dashboardServer.reporter();
const url = `http://localhost:${port}`;
console.log(`Dashboard: ${url}`);
// ... auto-open
}
Make sure the ctx.dashboard assignment happens AFTER the dashboard setup (it already does — const ctx = { ... dashboard, ... } comes later).
In the finally block, add:
await screencaster.stopAll();
- Step 3: Manual test end-to-end
Run a longer populate game so there's time to click:
TEST_URL=http://localhost:8000 SOAK_INVITE_CODE=SOAKTEST npm run soak -- \
--scenario=populate \
--accounts=4 --rooms=1 --cpus-per-room=0 --games-per-room=2 --holes=3 \
--watch=dashboard
Expected:
- Dashboard opens, shows 1 room with 4 players
- Click on any player tile (
soak_00,soak_01, ...) - Modal opens, shows live JPEG frames of that player's view of the game
- Close modal (Esc or Close button) — frames stop, screencast detaches
- Run completes cleanly
- Step 4: Commit
git add tests/soak/runner.ts
git commit -m "$(cat <<'EOF'
feat(soak): click-to-watch live video via CDP screencast
Runner creates a Screencaster and wires its start/stop into
DashboardServer.onStartStream/onStopStream. Clicking a player tile
in the dashboard starts a CDP screencast on that session's page,
forwards JPEG frames as WS "frame" messages, closes on modal
dismiss or WS disconnect.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
EOF
)"
Phase 7 — Tiled mode
Task 24: --watch=tiled native windows
Launch a second headed browser for the 4 host contexts, position their windows in a 2×2 grid using page.evaluate(window.moveTo).
Files:
-
Modify:
tests/soak/core/session-pool.ts— add optional headed-host support -
Modify:
tests/soak/runner.ts— enable tiled mode -
Step 1: Extend
SessionPoolto support headed host contexts
Add a new option and method to SessionPool. In core/session-pool.ts:
export interface SessionPoolOptions {
targetUrl: string;
inviteCode: string;
credFile: string;
logger: Logger;
browser?: Browser;
contextOptions?: Parameters<Browser['newContext']>[0];
/** If set, the first `headedHostCount` sessions use a separate headed browser. */
headedHostCount?: number;
}
Inside the class, add a headedBrowser field and extend acquire:
private headedBrowser: Browser | null = null;
// ... in acquire(), before the loop:
if ((this.opts.headedHostCount ?? 0) > 0 && !this.headedBrowser) {
this.headedBrowser = await chromium.launch({
headless: false,
slowMo: 50,
});
}
for (let i = 0; i < count; i++) {
const account = this.accounts[i];
const useHeaded = i < (this.opts.headedHostCount ?? 0);
const targetBrowser = useHeaded ? this.headedBrowser! : this.browser!;
const context = await targetBrowser.newContext({
...this.opts.contextOptions,
...(useHeaded ? { viewport: { width: 960, height: 540 } } : {}),
});
await this.injectAuth(context, account);
const page = await context.newPage();
await page.goto(this.opts.targetUrl);
// Position headed windows in a 2×2 grid
if (useHeaded) {
const col = i % 2;
const row = Math.floor(i / 2);
const x = col * 960;
const y = row * 560;
await page.evaluate(
([x, y, w, h]) => {
window.moveTo(x, y);
window.resizeTo(w, h);
},
[x, y, 960, 540] as [number, number, number, number],
);
}
const bot = new GolfBot(page);
sessions.push({ account, context, page, bot, key: account.key });
}
Update release to close the headed browser too:
async release(): Promise<void> {
for (const session of this.activeSessions) {
try { await session.context.close(); } catch { /* ignore */ }
}
this.activeSessions = [];
if (this.ownedBrowser) {
try { await this.ownedBrowser.close(); } catch { /* ignore */ }
this.ownedBrowser = null;
this.browser = null;
}
if (this.headedBrowser) {
try { await this.headedBrowser.close(); } catch { /* ignore */ }
this.headedBrowser = null;
}
}
- Step 2: Wire
watch === 'tiled'in the runner
In runner.ts, replace the existing tiled_not_yet_implemented warning with:
const headedHostCount = watch === 'tiled' ? rooms : 0;
const pool = new SessionPool({
targetUrl,
inviteCode,
credFile,
logger,
headedHostCount,
});
(Move that pool creation up so it's aware of watch.)
- Step 3: Test tiled mode
TEST_URL=http://localhost:8000 SOAK_INVITE_CODE=SOAKTEST npm run soak -- \
--scenario=populate \
--accounts=4 --rooms=2 --cpus-per-room=0 --games-per-room=1 --holes=1 \
--watch=tiled
Expected: 2 native Chromium windows appear (one per host), sized ~960×540 and positioned at the upper-left of the screen. They play the game visibly. On exit, windows close.
- Step 4: Commit
git add tests/soak/core/session-pool.ts tests/soak/runner.ts
git commit -m "$(cat <<'EOF'
feat(soak): --watch=tiled launches N headed host windows
SessionPool accepts headedHostCount; when > 0 it launches a second
Chromium in headed mode, creates those contexts there, and positions
each host window in a 2×2 grid via window.moveTo/resizeTo.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
EOF
)"
Phase 8 — Stress scenario
Task 25: Chaos injector + stress scenario
Short 1-hole games in tight loops, with a 5% per-turn chance of injecting a chaos event (rapid clicks, brief offline toggle, tab navigation).
Files:
-
Create:
tests/soak/scenarios/stress.ts -
Create:
tests/soak/scenarios/shared/chaos.ts -
Modify:
tests/soak/scenarios/index.ts— registerstress -
Step 1: Create
scenarios/shared/chaos.ts
// tests/soak/scenarios/shared/chaos.ts
import type { Session, Logger } from '../../core/types';
export type ChaosEvent =
| 'rapid_clicks'
| 'tab_blur'
| 'brief_offline';
const ALL_EVENTS: ChaosEvent[] = ['rapid_clicks', 'tab_blur', 'brief_offline'];
function pickEvent(): ChaosEvent {
return ALL_EVENTS[Math.floor(Math.random() * ALL_EVENTS.length)];
}
export async function maybeInjectChaos(
session: Session,
probability: number,
logger: Logger,
roomId: string,
): Promise<ChaosEvent | null> {
if (Math.random() >= probability) return null;
const event = pickEvent();
logger.info('chaos_injected', { room: roomId, session: session.key, event });
try {
switch (event) {
case 'rapid_clicks': {
// Fire 5 rapid clicks at the player's own cards
for (let i = 0; i < 5; i++) {
await session.page.locator(`#player-cards .card:nth-child(${(i % 6) + 1})`)
.click({ timeout: 300 })
.catch(() => {});
}
break;
}
case 'tab_blur': {
// Briefly dispatch blur then focus
await session.page.evaluate(() => {
window.dispatchEvent(new Event('blur'));
setTimeout(() => window.dispatchEvent(new Event('focus')), 200);
});
break;
}
case 'brief_offline': {
await session.context.setOffline(true);
await new Promise((r) => setTimeout(r, 300));
await session.context.setOffline(false);
break;
}
}
} catch (err) {
logger.warn('chaos_error', {
event,
error: err instanceof Error ? err.message : String(err),
});
}
return event;
}
- Step 2: Create
scenarios/stress.ts
// tests/soak/scenarios/stress.ts
import type {
Scenario,
ScenarioContext,
ScenarioResult,
ScenarioError,
Session,
} from '../core/types';
import { runOneMultiplayerGame } from './shared/multiplayer-game';
import { maybeInjectChaos } from './shared/chaos';
interface StressConfig {
gamesPerRoom: number;
holes: number;
decks: number;
rooms: number;
cpusPerRoom: number;
thinkTimeMs: [number, number];
interGamePauseMs: number;
chaosChance: number;
}
function chunk<T>(arr: T[], size: number): T[][] {
const out: T[][] = [];
for (let i = 0; i < arr.length; i += size) out.push(arr.slice(i, i + size));
return out;
}
async function sleep(ms: number): Promise<void> {
return new Promise((r) => setTimeout(r, ms));
}
async function runStressRoom(
ctx: ScenarioContext,
cfg: StressConfig,
roomIdx: number,
sessions: Session[],
): Promise<{ completed: number; errors: ScenarioError[]; chaosFired: number }> {
const roomId = `room-${roomIdx}`;
let completed = 0;
let chaosFired = 0;
const errors: ScenarioError[] = [];
for (let gameNum = 0; gameNum < cfg.gamesPerRoom; gameNum++) {
if (ctx.signal.aborted) break;
ctx.dashboard.update(roomId, { game: gameNum + 1, totalGames: cfg.gamesPerRoom });
// Start a background chaos loop for this game
let chaosActive = true;
const chaosLoop = (async () => {
while (chaosActive && !ctx.signal.aborted) {
await sleep(500);
for (const session of sessions) {
const e = await maybeInjectChaos(session, cfg.chaosChance, ctx.logger, roomId);
if (e) chaosFired++;
}
}
})();
const result = await runOneMultiplayerGame(ctx, sessions, {
roomId,
holes: cfg.holes,
decks: cfg.decks,
cpusPerRoom: cfg.cpusPerRoom,
thinkTimeMs: cfg.thinkTimeMs,
});
chaosActive = false;
await chaosLoop;
if (result.completed) {
completed++;
ctx.logger.info('game_complete', { room: roomId, game: gameNum + 1, turns: result.turns });
} else {
errors.push({
room: roomId,
reason: 'game_failed',
detail: result.error,
timestamp: Date.now(),
});
ctx.logger.error('game_failed', { room: roomId, error: result.error });
}
await sleep(cfg.interGamePauseMs);
}
return { completed, errors, chaosFired };
}
const stress: Scenario = {
name: 'stress',
description: 'Rapid short games for stability & race condition hunting',
needs: { accounts: 16, rooms: 4, cpusPerRoom: 2 },
defaultConfig: {
gamesPerRoom: 50,
holes: 1,
decks: 1,
rooms: 4,
cpusPerRoom: 2,
thinkTimeMs: [50, 150],
interGamePauseMs: 200,
chaosChance: 0.05,
},
async run(ctx: ScenarioContext): Promise<ScenarioResult> {
const start = Date.now();
const cfg = ctx.config as unknown as StressConfig;
const perRoom = Math.floor(ctx.sessions.length / cfg.rooms);
const roomSessions = chunk(ctx.sessions, perRoom);
const results = await Promise.allSettled(
roomSessions.map((s, idx) => runStressRoom(ctx, cfg, idx, s)),
);
let gamesCompleted = 0;
let chaosFired = 0;
const errors: ScenarioError[] = [];
results.forEach((r, idx) => {
if (r.status === 'fulfilled') {
gamesCompleted += r.value.completed;
chaosFired += r.value.chaosFired;
errors.push(...r.value.errors);
} else {
errors.push({
room: `room-${idx}`,
reason: 'room_threw',
detail: r.reason instanceof Error ? r.reason.message : String(r.reason),
timestamp: Date.now(),
});
}
});
return {
gamesCompleted,
errors,
durationMs: Date.now() - start,
customMetrics: { chaos_fired: chaosFired },
};
},
};
export default stress;
- Step 3: Register stress in the registry
Edit tests/soak/scenarios/index.ts:
import type { Scenario } from '../core/types';
import populate from './populate';
import stress from './stress';
const registry: Record<string, Scenario> = {
populate,
stress,
};
export function getScenario(name: string): Scenario | undefined {
return registry[name];
}
export function listScenarios(): Scenario[] {
return Object.values(registry);
}
- Step 4: Smoke test stress scenario
TEST_URL=http://localhost:8000 SOAK_INVITE_CODE=SOAKTEST npm run soak -- \
--scenario=stress \
--accounts=4 --rooms=1 --cpus-per-room=1 --games-per-room=3 --holes=1 \
--watch=none
Expected: 3 quick games complete, chaos events in logs (look for chaos_injected), exit 0.
- Step 5: Commit
git add tests/soak/scenarios/stress.ts tests/soak/scenarios/shared/chaos.ts tests/soak/scenarios/index.ts
git commit -m "$(cat <<'EOF'
feat(soak): stress scenario with chaos injection
Rapid 1-hole games with a parallel chaos loop that has a 5% per-turn
chance of firing rapid_clicks, tab_blur, or brief_offline events.
Chaos counts roll up into ScenarioResult.customMetrics.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
EOF
)"
Phase 9 — Failure handling
Task 26: Watchdog + heartbeat wiring
Per-room timeout that fires if no heartbeat arrives within N ms. Runner wires it into ctx.heartbeat. Vitest-tested.
Files:
-
Create:
tests/soak/core/watchdog.ts -
Create:
tests/soak/tests/watchdog.test.ts -
Modify:
tests/soak/runner.ts— wireheartbeatto per-room watchdogs -
Step 1: Write failing tests
// tests/soak/tests/watchdog.test.ts
import { describe, it, expect, vi, beforeEach, afterEach } from 'vitest';
import { Watchdog } from '../core/watchdog';
describe('Watchdog', () => {
beforeEach(() => vi.useFakeTimers());
afterEach(() => vi.useRealTimers());
it('fires after timeout if no heartbeat', () => {
const onTimeout = vi.fn();
const w = new Watchdog(1000, onTimeout);
w.start();
vi.advanceTimersByTime(1001);
expect(onTimeout).toHaveBeenCalledOnce();
});
it('heartbeat resets the timer', () => {
const onTimeout = vi.fn();
const w = new Watchdog(1000, onTimeout);
w.start();
vi.advanceTimersByTime(800);
w.heartbeat();
vi.advanceTimersByTime(800);
expect(onTimeout).not.toHaveBeenCalled();
vi.advanceTimersByTime(300);
expect(onTimeout).toHaveBeenCalledOnce();
});
it('stop cancels pending timeout', () => {
const onTimeout = vi.fn();
const w = new Watchdog(1000, onTimeout);
w.start();
w.stop();
vi.advanceTimersByTime(2000);
expect(onTimeout).not.toHaveBeenCalled();
});
it('does not fire twice after stop', () => {
const onTimeout = vi.fn();
const w = new Watchdog(1000, onTimeout);
w.start();
vi.advanceTimersByTime(1001);
w.heartbeat();
vi.advanceTimersByTime(1001);
expect(onTimeout).toHaveBeenCalledOnce();
});
});
- Step 2: Run to verify failure
npx vitest run tests/watchdog.test.ts
Expected: FAIL.
- Step 3: Implement
core/watchdog.ts
// tests/soak/core/watchdog.ts
export class Watchdog {
private timer: NodeJS.Timeout | null = null;
private fired = false;
constructor(
private timeoutMs: number,
private onTimeout: () => void,
) {}
start(): void {
this.stop();
this.fired = false;
this.timer = setTimeout(() => {
if (this.fired) return;
this.fired = true;
this.onTimeout();
}, this.timeoutMs);
}
heartbeat(): void {
if (this.fired) return;
this.start();
}
stop(): void {
if (this.timer) {
clearTimeout(this.timer);
this.timer = null;
}
}
}
- Step 4: Verify tests pass
npx vitest run tests/watchdog.test.ts
Expected: all passing.
- Step 5: Wire watchdogs into the runner
In runner.ts, add before building ctx:
const watchdogs = new Map<string, Watchdog>();
const roomAborters = new Map<string, AbortController>();
for (let i = 0; i < rooms; i++) {
const roomId = `room-${i}`;
const aborter = new AbortController();
roomAborters.set(roomId, aborter);
const w = new Watchdog(60_000, () => {
logger.error('watchdog_fired', { room: roomId });
aborter.abort();
dashboard.update(roomId, { phase: 'error' });
});
w.start();
watchdogs.set(roomId, w);
}
Import at the top:
import { Watchdog } from './core/watchdog';
Set ctx.heartbeat to:
heartbeat: (roomId: string) => {
const w = watchdogs.get(roomId);
if (w) w.heartbeat();
},
In the finally block, stop all watchdogs:
for (const w of watchdogs.values()) w.stop();
Note: for now the roomAborters aren't fully plumbed into scenario cancellation — scenarios see the global ctx.signal only. This is intentional; per-room abort requires scenario-side awareness and is deferred until a scenario genuinely misbehaves. The watchdog still catches stuck runs and flips the global error state.
- Step 6: Commit
git add tests/soak/core/watchdog.ts tests/soak/tests/watchdog.test.ts tests/soak/runner.ts
git commit -m "$(cat <<'EOF'
feat(soak): per-room watchdog with heartbeat
Watchdog class with Vitest tests, wired into ctx.heartbeat in the
runner. One watchdog per room, 60s timeout; firing logs an error
and marks the room's dashboard tile as errored.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
EOF
)"
Task 27: Artifact capture on failure
When the runner catches an error, snapshot every session's page: screenshot, HTML, console log tail, game state JSON.
Files:
-
Create:
tests/soak/core/artifacts.ts -
Modify:
tests/soak/runner.ts— callcaptureArtifactsin the catch block -
Step 1: Implement
core/artifacts.ts
// tests/soak/core/artifacts.ts
import * as fs from 'fs';
import * as path from 'path';
import type { Session, Logger } from './types';
export interface ArtifactsOptions {
runId: string;
/** Absolute path to the artifacts root, e.g., /path/to/tests/soak/artifacts */
rootDir: string;
logger: Logger;
}
export class Artifacts {
readonly runDir: string;
constructor(private opts: ArtifactsOptions) {
this.runDir = path.join(opts.rootDir, opts.runId);
fs.mkdirSync(this.runDir, { recursive: true });
}
/** Capture everything for a single session. */
async captureSession(session: Session, roomId: string): Promise<void> {
const dir = path.join(this.runDir, roomId);
fs.mkdirSync(dir, { recursive: true });
const prefix = session.key;
try {
const png = await session.page.screenshot({ fullPage: true });
fs.writeFileSync(path.join(dir, `${prefix}.png`), png);
} catch (err) {
this.opts.logger.warn('artifact_screenshot_failed', {
session: session.key,
error: err instanceof Error ? err.message : String(err),
});
}
try {
const html = await session.page.content();
fs.writeFileSync(path.join(dir, `${prefix}.html`), html);
} catch (err) {
this.opts.logger.warn('artifact_html_failed', {
session: session.key,
error: err instanceof Error ? err.message : String(err),
});
}
try {
const state = await session.bot.getGameState();
fs.writeFileSync(
path.join(dir, `${prefix}.state.json`),
JSON.stringify(state, null, 2),
);
} catch (err) {
this.opts.logger.warn('artifact_state_failed', {
session: session.key,
error: err instanceof Error ? err.message : String(err),
});
}
try {
const errors = session.bot.getConsoleErrors?.() ?? [];
fs.writeFileSync(path.join(dir, `${prefix}.console.txt`), errors.join('\n'));
} catch {
// ignore — not all bots expose this
}
}
async captureAll(sessions: Session[]): Promise<void> {
// Best-effort: partition sessions by their key prefix (doesn't matter)
// and write everything under room-unknown/ unless callers pre-partition
await Promise.all(
sessions.map((s) => this.captureSession(s, 'room-unknown')),
);
}
writeSummary(summary: object): void {
fs.writeFileSync(
path.join(this.runDir, 'summary.json'),
JSON.stringify(summary, null, 2),
);
}
}
/** Prune run directories older than `maxAgeMs`. */
export function pruneOldRuns(rootDir: string, maxAgeMs: number, logger: Logger): void {
if (!fs.existsSync(rootDir)) return;
const now = Date.now();
for (const entry of fs.readdirSync(rootDir)) {
const full = path.join(rootDir, entry);
try {
const stat = fs.statSync(full);
if (stat.isDirectory() && now - stat.mtimeMs > maxAgeMs) {
fs.rmSync(full, { recursive: true, force: true });
logger.info('artifact_pruned', { runId: entry });
}
} catch {
// ignore
}
}
}
- Step 2: Call artifact capture from the runner's error path
In runner.ts, import:
import { Artifacts, pruneOldRuns } from './core/artifacts';
After const runId = ..., instantiate and prune:
const artifactsRoot = path.resolve(__dirname, 'artifacts');
const artifacts = new Artifacts({ runId, rootDir: artifactsRoot, logger });
pruneOldRuns(artifactsRoot, 7 * 24 * 3600 * 1000, logger);
In the catch (err) block, after logging, capture:
} catch (err) {
logger.error('run_failed', {
error: err instanceof Error ? err.message : String(err),
stack: err instanceof Error ? err.stack : undefined,
});
try {
const liveSessions = pool['activeSessions'] as Session[] | undefined;
if (liveSessions && liveSessions.length > 0) {
await artifacts.captureAll(liveSessions);
}
} catch (captureErr) {
logger.warn('artifact_capture_failed', {
error: captureErr instanceof Error ? captureErr.message : String(captureErr),
});
}
exitCode = 1;
}
(Note: the pool['activeSessions'] access bypasses visibility to avoid adding a public getter for one call site. Acceptable for an error path in a test harness.)
After successful run, write the summary:
artifacts.writeSummary({
runId,
scenario: scenario.name,
targetUrl,
gamesCompleted: result.gamesCompleted,
errors: result.errors,
durationMs: result.durationMs,
customMetrics: result.customMetrics,
});
Import Session type:
import type { Session } from './core/types';
- Step 3: Verify by forcing a failure
Kill the server mid-run and confirm artifacts are written:
# In one terminal
TEST_URL=http://localhost:8000 SOAK_INVITE_CODE=SOAKTEST npm run soak -- \
--scenario=populate --accounts=2 --rooms=1 --cpus-per-room=0 \
--games-per-room=5 --holes=3 --watch=none
# In another: wait ~3 seconds then Ctrl-C the dev server
# The soak run should catch errors and write artifacts
ls tests/soak/artifacts/
ls tests/soak/artifacts/<run-id>/
Expected: a run directory exists with summary.json (if it got far enough) or per-session screenshots / HTML under room-unknown/.
- Step 4: Commit
git add tests/soak/core/artifacts.ts tests/soak/runner.ts
git commit -m "$(cat <<'EOF'
feat(soak): artifact capture on failure + run summary
Screenshots, HTML, game state, and console errors are captured into
tests/soak/artifacts/<run-id>/ when a scenario throws. Runs older
than 7 days are pruned on startup. Successful runs get a
summary.json next to the artifacts dir.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
EOF
)"
Task 28: Graceful shutdown (already partially in place) + exit codes
SIGINT/SIGTERM already flip the abort controller. Formalize the timeout-and-force-exit path and the three exit codes (0 / 1 / 2).
Files:
-
Modify:
tests/soak/runner.ts -
Step 1: Add a graceful shutdown timeout
In runner.ts, replace the existing signal handlers with:
let forceExitTimer: NodeJS.Timeout | null = null;
const onSignal = (sig: string) => {
if (abortController.signal.aborted) {
// Second signal: force exit
logger.warn('force_exit', { signal: sig });
process.exit(130);
}
logger.warn('signal_received', { signal: sig });
abortController.abort();
// Hard-kill after 10s if cleanup hangs
forceExitTimer = setTimeout(() => {
logger.error('graceful_shutdown_timeout');
process.exit(130);
}, 10_000);
};
process.on('SIGINT', () => onSignal('SIGINT'));
process.on('SIGTERM', () => onSignal('SIGTERM'));
In the finally block, clear the force-exit timer:
if (forceExitTimer) clearTimeout(forceExitTimer);
- Step 2: Manual test — Ctrl-C a long run
TEST_URL=http://localhost:8000 SOAK_INVITE_CODE=SOAKTEST npm run soak -- \
--scenario=populate --accounts=2 --rooms=1 --cpus-per-room=0 \
--games-per-room=10 --holes=3 --watch=none
# After ~5 seconds: Ctrl-C
Expected: runner logs signal_received, finishes current turn, prints summary, exits with code 2 (check echo $?).
- Step 3: Commit
git add tests/soak/runner.ts
git commit -m "$(cat <<'EOF'
feat(soak): graceful shutdown with 10s hard-kill fallback
SIGINT/SIGTERM flips the abort signal; scenarios finish the current
turn then exit. If cleanup hangs >10s the runner force-exits. Second
Ctrl-C is an immediate hard kill. Exit codes: 0 success, 1 errors,
2 interrupted.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
EOF
)"
Task 29: Periodic health probes
Every 30s, fetch /api/health on the target server. Three consecutive failures declare a fatal error and abort.
Files:
-
Modify:
tests/soak/runner.ts -
Step 1: Add a health probe interval
In runner.ts, after building the abort controller and before running the scenario:
let healthFailures = 0;
const healthTimer = setInterval(async () => {
try {
const res = await fetch(`${targetUrl}/api/health`);
if (!res.ok) throw new Error(`status ${res.status}`);
healthFailures = 0;
} catch (err) {
healthFailures++;
logger.warn('health_probe_failed', {
consecutive: healthFailures,
error: err instanceof Error ? err.message : String(err),
});
if (healthFailures >= 3) {
logger.error('health_fatal', { consecutive: healthFailures });
abortController.abort();
}
}
}, 30_000);
In the finally block:
clearInterval(healthTimer);
- Step 2: Commit
git add tests/soak/runner.ts
git commit -m "$(cat <<'EOF'
feat(soak): periodic health probes against target server
Every 30s GET /api/health. Three consecutive failures abort the
run with a fatal error, so staging outages don't get misattributed
to harness bugs.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
EOF
)"
Phase 10 — Polish and bring-up
Task 30: Smoke test script
tests/soak/scripts/smoke.sh — the canary run that takes ~30s against local dev.
Files:
-
Create:
tests/soak/scripts/smoke.sh -
Step 1: Create the script
#!/usr/bin/env bash
# Soak harness smoke test — end-to-end canary against local dev.
# Expected runtime: ~30 seconds.
set -euo pipefail
cd "$(dirname "$0")/.."
: "${TEST_URL:=http://localhost:8000}"
: "${SOAK_INVITE_CODE:=SOAKTEST}"
echo "Smoke target: $TEST_URL"
echo "Invite code: $SOAK_INVITE_CODE"
# 1. Health probe
curl -fsS "$TEST_URL/api/health" > /dev/null || {
echo "FAIL: target server unreachable at $TEST_URL"
exit 1
}
# 2. Ensure minimum accounts
if [ ! -f .env.stresstest ]; then
echo "Seeding accounts..."
npm run seed -- --count=4
fi
# 3. Run minimum viable scenario
TEST_URL="$TEST_URL" SOAK_INVITE_CODE="$SOAK_INVITE_CODE" \
npm run soak -- \
--scenario=populate \
--accounts=2 \
--rooms=1 \
--cpus-per-room=0 \
--games-per-room=1 \
--holes=1 \
--watch=none
echo "Smoke PASSED"
- Step 2: Make it executable and run it
chmod +x tests/soak/scripts/smoke.sh
cd tests/soak && bash scripts/smoke.sh
Expected: Smoke PASSED within ~30s.
- Step 3: Commit
git add tests/soak/scripts/smoke.sh
git commit -m "$(cat <<'EOF'
feat(soak): smoke test script — 30s end-to-end canary
Confirms the harness works against local dev with the absolute
minimum config. Run after any change.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
EOF
)"
Task 31: README + CHECKLIST
Replace the README stub with a full quickstart and flag reference. Add the manual validation checklist.
Files:
-
Modify:
tests/soak/README.md -
Create:
tests/soak/CHECKLIST.md -
Step 1: Rewrite
tests/soak/README.md
# Golf Soak & UX Test Harness
Standalone Playwright-based runner that drives multi-user authenticated
game sessions for scoreboard population and stability testing.
**Spec:** `../../docs/superpowers/specs/2026-04-10-multiplayer-soak-test-design.md`
**Bring-up:** `../../docs/soak-harness-bringup.md`
## Quick start
```bash
cd tests/soak
npm install
# First run only: seed 16 accounts
TEST_URL=http://localhost:8000 SOAK_INVITE_CODE=SOAKTEST npm run seed
# 30-second end-to-end smoke test
bash scripts/smoke.sh
# Populate scoreboard (4 rooms × 4 accounts × 10 long games)
TEST_URL=http://localhost:8000 SOAK_INVITE_CODE=SOAKTEST \
npm run soak:populate
# Stress test (4 rooms × 50 rapid games with chaos)
TEST_URL=http://localhost:8000 SOAK_INVITE_CODE=SOAKTEST \
npm run soak:stress
CLI flags
--scenario=populate|stress required
--accounts=<n> total sessions (default: scenario.needs.accounts)
--rooms=<n> default from scenario.needs
--cpus-per-room=<n> default from scenario.needs
--games-per-room=<n> default from scenario.defaultConfig
--holes=<n> default from scenario.defaultConfig
--watch=none|dashboard|tiled default: dashboard
--dashboard-port=<n> default: 7777
--target=<url> default: TEST_URL env
--run-id=<string> default: ISO timestamp
--list print scenarios and exit
--dry-run validate config, don't run
Derived: accounts / rooms must divide evenly.
Environment variables
TEST_URL target base URL (e.g. https://staging.adlee.work)
SOAK_INVITE_CODE invite code flagged marks_as_test (staging: 5VC2MCCN)
SOAK_HOLES override --holes
SOAK_ROOMS override --rooms
SOAK_ACCOUNTS override --accounts
SOAK_CPUS_PER_ROOM override --cpus-per-room
SOAK_GAMES_PER_ROOM override --games-per-room
SOAK_WATCH override --watch
SOAK_DASHBOARD_PORT override --dashboard-port
Watch modes
none— pure headless, JSON logs to stdout. Use for CI and overnight runs.dashboard(default) — HTTP+WS server on localhost:7777 serving a live status grid. Click any player tile to watch their live session via CDP screencast.tiled— 4 native Chromium windows for the host of each room, positioned in a 2×2 grid. Joiners stay headless.
Scenarios
| Name | Description |
|---|---|
populate |
Long 9-hole games with varied CPU personalities, realistic pacing, for populating scoreboards |
stress |
Rapid 1-hole games with chaos injection (rapid clicks, offline toggles, tab blur) for hunting race conditions |
Add new scenarios by creating scenarios/<name>.ts and registering in scenarios/index.ts.
Architecture
See the design spec for full module breakdown. Key modules:
runner.ts— CLI entry, wires everything togethercore/session-pool.ts— owns browser contexts, seeds/logs in 16 accountscore/room-coordinator.ts— host→joiners room-code handoffcore/watchdog.ts— per-room timeout detectioncore/screencaster.ts— CDP Page.startScreencast for live videodashboard/server.ts— HTTP + WS serverscenarios/— pluggable scenarios
Reuses ../../tests/e2e/bot/golf-bot.ts unchanged.
Running tests (unit)
npm test
Tests cover Deferred, RoomCoordinator, Watchdog, and config.
Integration-level modules are verified by the smoke test.
- [ ] **Step 2: Create `tests/soak/CHECKLIST.md`**
```markdown
# Soak Harness Manual Validation Checklist
Run after any significant change or before calling the implementation complete.
## Bring-up
- [ ] Local dev server is running (`python server/main.py`)
- [ ] `SOAKTEST` invite code exists locally with `marks_as_test=TRUE`
- [ ] `npm install` in `tests/soak/` succeeded
- [ ] `npm run seed -- --count=16` creates/updates 16 accounts
- [ ] `.env.stresstest` has 16 `SOAK_ACCOUNT_NN=...` lines
- [ ] All seeded users show `is_test_account=TRUE` in the DB
## Smoke
- [ ] `bash scripts/smoke.sh` exits 0 within 60s
## Scenarios
- [ ] `--scenario=populate --rooms=1 --games-per-room=1` completes cleanly
- [ ] `--scenario=populate --rooms=4 --games-per-room=1` runs 4 rooms in parallel with no cross-contamination
- [ ] `--scenario=stress --games-per-room=3` logs `chaos_injected` events
## Watch modes
- [ ] `--watch=none` produces JSONL on stdout, nothing else
- [ ] `--watch=dashboard` opens http://localhost:7777, grid renders, tiles update live, WS status shows `healthy`
- [ ] Clicking any player tile opens the video modal and streams live JPEG frames (~10 fps)
- [ ] Closing the modal stops the screencast (check logs for `screencast_stopped`)
- [ ] `--watch=tiled` opens 4 native Chromium windows for the 4 hosts
## Failure modes
- [ ] Ctrl-C during a run → graceful shutdown, summary printed, exit code 2
- [ ] Double Ctrl-C → hard exit (130)
- [ ] Killing the dev server mid-run → health probes fail 3× → fatal abort, artifacts captured, exit 1
- [ ] Artifacts directory contains a subdirectory per failed run with screenshots and state.json
- [ ] Artifacts older than 7 days are pruned on next startup
## Server-side filtering
- [ ] `GET /api/stats/leaderboard` (default) hides soak_* accounts
- [ ] `GET /api/stats/leaderboard?include_test=true` shows soak_* accounts
- [ ] Admin panel user list shows `[Test]` badge on soak_* accounts
- [ ] Admin panel "Include test accounts" checkbox filters them out
- [ ] Admin panel invite codes tab shows `[Test-seed]` next to SOAKTEST
## Post-deploy schema verification
Run after the server-side changes (Tasks 1–7) ship to each environment.
- [ ] Server restarted (docker compose up -d or CI/CD deploy)
- [ ] Server logs show `User store schema initialized` after restart
- [ ] `\d users_v2` on target DB shows `is_test_account` column with default `false`
- [ ] `\d invite_codes` shows `marks_as_test` column with default `false`
- [ ] `\d leaderboard_overall` shows `is_test_account` column
- [ ] `\di idx_users_v2_is_test_account` shows the partial index
- [ ] `SELECT count(*) FROM leaderboard_overall` returns nonzero (view re-populated after rebuild)
- [ ] Default leaderboard query still works: `curl .../api/stats/leaderboard` returns entries
- [ ] `?include_test=true` parameter is accepted (no 422/500)
## Staging bring-up (final step)
- [ ] `UPDATE invite_codes SET marks_as_test = TRUE WHERE code = '5VC2MCCN';` run on staging
- [ ] `SOAK_INVITE_CODE=5VC2MCCN TEST_URL=https://staging.adlee.work npm run seed -- --count=16` seeds staging accounts
- [ ] Staging run with `--scenario=populate --watch=none` completes
- [ ] Staging leaderboard with `include_test=true` shows the soak accounts
- [ ] Staging leaderboard default (no param) does NOT show the soak accounts
- Step 3: Commit
git add tests/soak/README.md tests/soak/CHECKLIST.md
git commit -m "$(cat <<'EOF'
docs(soak): full README + manual validation checklist
Quickstart, flag reference, env var reference, scenario table, and
the bring-up/validation checklist that gates calling the harness
implementation complete.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
EOF
)"
Task 32: Staging bring-up (manual, no code)
This is a documentation-only task — the actual run happens on your workstation. Listed here so the implementation plan is complete end to end.
- Step 1: Flag
5VC2MCCNas test-seed on staging
From your workstation (requires DB access to staging):
ssh root@129.212.150.189 \
'docker exec -i golfgame-postgres psql -U postgres -d golfgame' <<'EOF'
UPDATE invite_codes SET marks_as_test = TRUE WHERE code = '5VC2MCCN';
SELECT code, max_uses, use_count, marks_as_test FROM invite_codes WHERE code = '5VC2MCCN';
EOF
Expected: marks_as_test | t.
(The exact docker container name may differ — adjust based on docker ps on the staging host.)
- Step 2: Seed the 16 staging accounts
cd tests/soak
rm -f .env.stresstest
TEST_URL=https://staging.adlee.work \
SOAK_INVITE_CODE=5VC2MCCN \
npm run seed -- --count=16
Expected: .env.stresstest populated with 16 entries.
- Step 3: Run populate against staging
TEST_URL=https://staging.adlee.work \
SOAK_INVITE_CODE=5VC2MCCN \
npm run soak -- \
--scenario=populate \
--rooms=4 \
--games-per-room=3 \
--holes=3 \
--watch=dashboard
Expected: dashboard opens, 4 rooms play 3 games each, staging scoreboard accumulates data. Exit 0 at the end.
- Step 4: Verify scoreboard filtering on staging
# Should NOT contain soak_* usernames
curl -s "https://staging.adlee.work/api/stats/leaderboard?metric=wins" | jq '.entries[] | select(.username | startswith("soak_"))'
# Should contain soak_* usernames
curl -s "https://staging.adlee.work/api/stats/leaderboard?metric=wins&include_test=true" | jq '.entries[] | select(.username | startswith("soak_"))'
Expected: first returns nothing, second returns entries.
- Step 5: Mark implementation complete
Check off all items in tests/soak/CHECKLIST.md that correspond to this plan. Commit the filled-in checklist if you want a record:
git add tests/soak/CHECKLIST.md
git commit -m "docs(soak): checklist passed on initial staging run"
Phase 11 — Version bump
Task 33: Bump to v3.3.4 and add footer to admin.html
Updates all HTML footers from v3.1.6 to v3.3.4, adds a footer to admin.html which currently has none, bumps pyproject.toml.
Files:
-
Modify:
client/index.html— both footer occurrences (L58, L291) -
Modify:
client/admin.html— add footer -
Modify:
pyproject.toml— version field -
Step 1: Update
client/index.htmlfooters
grep -n "v3\.1\.6" client/index.html
For each match, replace v3.1.6 with v3.3.4. There should be exactly two matches.
- Step 2: Add footer to
client/admin.html
Find the closing </body> in client/admin.html and add a footer just before it:
<footer class="app-footer" style="text-align: center; padding: 16px; color: var(--muted, #666); font-size: 12px;">v3.3.4 © Aaron D. Lee</footer>
</body>
(The inline style is a fallback — admin.css may already have an .app-footer class; if so, drop the inline styles.)
grep -n "app-footer" client/admin.css 2>/dev/null
If the class exists, use just <footer class="app-footer">v3.3.4 © Aaron D. Lee</footer>.
- Step 3: Bump
pyproject.toml
sed -i 's/^version = "3\.1\.6"$/version = "3.3.4"/' pyproject.toml
grep version pyproject.toml
Expected: version = "3.3.4".
- Step 4: Verify in the browser
Restart the dev server, open http://localhost:8000 and http://localhost:8000/admin.html. Confirm both show v3.3.4 in the footer.
- Step 5: Commit
git add client/index.html client/admin.html pyproject.toml
git commit -m "$(cat <<'EOF'
chore: bump version to v3.3.4
Updates client/index.html footer (×2) and pyproject.toml from
v3.1.6 → v3.3.4, and adds a matching footer to client/admin.html
which previously had none.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
EOF
)"
Summary
33 tasks across 11 phases:
| Phase | Tasks | Milestone |
|---|---|---|
| 1 — Server changes | 1–8 | Stats filter works, test accounts are separable |
| 2 — Harness scaffolding | 9–12 | Core pure-logic modules with Vitest tests pass |
| 3 — SessionPool + seeding | 13–14 | .env.stresstest seeded via real HTTP |
| 4 — First run | 15–18 | --watch=none smoke test passes end-to-end |
| 5 — Dashboard | 19–21 | Live status grid in browser |
| 6 — Live video | 22–23 | Click-to-watch CDP screencast |
| 7 — Tiled mode | 24 | Native host windows |
| 8 — Stress scenario | 25 | Chaos injection runs clean |
| 9 — Failure handling | 26–29 | Watchdog + artifacts + graceful shutdown + health probes |
| 10 — Polish | 30–31 | Smoke script + README + CHECKLIST |
| 11 — Version bump | 33 | v3.3.4 everywhere |
(Task 32 is the manual staging bring-up — no code.)
Dependencies between tasks:
- Tasks 1–8 are independent of the harness (ship them first if you want immediate value for admins)
- Tasks 9–18 are strictly sequential (each builds on the previous)
- Tasks 19–21, 22–23, 24, 25 are independent of each other — can be done in any order after Task 18
- Tasks 26–29 can be done after Task 18 but are most valuable after Task 25
- Tasks 30–31 come last before staging
- Task 33 is independent and can be done any time after Task 8