3 Commits

Author SHA1 Message Date
adlee-was-taken
121959ba27 docs(CLAUDE.md): staging deploy verification checklist
All checks were successful
Build & Deploy Staging / build-and-deploy (release) Successful in 29s
Encodes the lessons from the v3.3.5 → v3.3.5.1 hotfix cascade: CI green
is necessary but not sufficient. Walk the chain: clean worktree → tag
sha matches → container up recently → new code introspectable → env vars
present → DB state correct → end-to-end smoke.

Each step calls out a specific failure mode we just hit, so future-me
doesn't assume the next deploy will 'just work' when the primitives
underneath (git fetch tag-cache, compose env wiring, image reuse) can
silently skip changes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 14:06:37 -04:00
adlee-was-taken
612eccf03b fix(ci): force-update tags on deploy fetch
All checks were successful
Build & Deploy Staging / build-and-deploy (release) Successful in 28s
git fetch origin won't replace a tag that already exists locally pointing
at a different commit. When v3.3.5 was force-moved on origin after a
first failed CI run, the staging runner kept the stale tag cached and
re-checked-out the old commit — the compose-env-wiring fix was never
actually applied and the container booted without LEADERBOARD_INCLUDE_TEST_DEFAULT.

--tags --force makes the behaviour safe for moved tags.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 14:02:06 -04:00
adlee-was-taken
76a9de27c2 chore(staging): wire LEADERBOARD_INCLUDE_TEST_DEFAULT into compose
All checks were successful
Build & Deploy Staging / build-and-deploy (release) Successful in 31s
The v3.3.5 router reads config.LEADERBOARD_INCLUDE_TEST_DEFAULT but the
staging compose file was never passing the env through to the container.
This change was applied manually on the staging host before but never
made it back into the repo — fixing that so CI deploys pick it up.

Value on staging is sourced from .env (already set to true). Production
leaves it unset, so the default of false applies.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 00:55:00 -04:00
4 changed files with 61 additions and 4 deletions

View File

@@ -29,8 +29,10 @@ jobs:
docker pull "$IMAGE:$TAG" docker pull "$IMAGE:$TAG"
docker tag "$IMAGE:$TAG" golfgame-app:latest docker tag "$IMAGE:$TAG" golfgame-app:latest
# Update code for compose/env changes # Update code for compose/env changes. `--tags --force` so a
git fetch origin # moved tag (hotfix on top of existing version) updates locally
# instead of silently checking out the stale cached position.
git fetch origin --tags --force
git checkout "$TAG" git checkout "$TAG"
# Restart app # Restart app

View File

@@ -21,8 +21,11 @@ jobs:
cd /opt/golfgame cd /opt/golfgame
# Pull latest code and checkout the release tag # Pull latest code and checkout the release tag. `--tags --force`
git fetch origin # so that a tag moved on origin (e.g. hotfix on top of an existing
# version) actually updates locally instead of silently reusing a
# stale cached tag position.
git fetch origin --tags --force
git checkout "$TAG" git checkout "$TAG"
# Build the image # Build the image

View File

@@ -166,6 +166,57 @@ python server/simulate.py 100 --compare
- **Middleware** (`server/middleware/`): Security headers, request ID tracking, rate limiting - **Middleware** (`server/middleware/`): Security headers, request ID tracking, rate limiting
- **Handlers** (`server/handlers.py`): WebSocket message dispatch (extracted from main.py) - **Handlers** (`server/handlers.py`): WebSocket message dispatch (extracted from main.py)
## Staging Deploy Verification Checklist
After any release that triggers a staging deploy (via `.gitea/workflows/deploy-staging.yml`), do NOT trust "CI went green" — walk the full chain end-to-end. A green CI run does not prove the deploy did what you intended: `git fetch` won't update already-cached tags, compose yaml and `.env` can drift out of sync, and container images can cache without visible signal. The v3.3.5 → v3.3.5.1 saga cost us two releases because each of these bit in turn.
**Run through every step before declaring a staging deploy successful:**
1. **Worktree is clean on staging BEFORE cutting the release.**
```bash
ssh root@staging.golfcards.club 'cd /opt/golfgame && git status --short'
```
Must be empty. Dirty files or untracked files will abort `git checkout $TAG` mid-pipeline. If you ever scp files to staging for hot-patching, land those changes on main + commit before the release, OR `git reset --hard HEAD && git clean -fd` on staging first.
2. **Staging is actually at the new tag, not a stale cached position.**
```bash
ssh root@staging.golfcards.club 'cd /opt/golfgame && git rev-parse HEAD && git log --oneline -1'
```
Compare the sha to `git rev-parse v3.x.y` locally. If they differ, a moved tag was force-pushed and the runner used stale cache. Workflows now `git fetch --tags --force`, but verify.
3. **Container is running the new image (not a recycled old one).**
```bash
ssh root@staging.golfcards.club 'docker ps --format "{{.Names}} {{.Status}}"; docker inspect golfgame-app-1 --format "{{.Created}}"'
```
`Up X seconds/minutes` with a recent `.Created` time; not `Up 13 hours`. A compose restart picks up the *current* `:latest` image — confirm that image was built by this release, not a prior one.
4. **New code is actually in the container.** Introspect a signature/attribute that changed in this release:
```bash
ssh root@staging.golfcards.club 'docker exec golfgame-app-1 python -c "
import sys, inspect; sys.path.insert(0, \"/app/server\")
from services.game_logger import GameLogger
print(inspect.signature(GameLogger.log_game_start_async).parameters.keys())
"'
```
5. **Container env has every var the code reads.** Any config added to `server/config.py` this release needs TWO edits to flow through: `.env` on the host AND `- FOO=${FOO:-default}` in the compose yaml's `environment:` block. Setting only `.env` silently does nothing.
```bash
ssh root@staging.golfcards.club 'docker exec golfgame-app-1 printenv | grep -iE "YOUR_NEW_VAR"'
```
6. **DB schema + invariants hold.** Sample the tables this release touches and confirm the new columns/values look right:
```bash
ssh root@staging.golfcards.club 'docker exec golfgame-postgres-1 psql -U golf -d golf -c "SELECT status, COUNT(*) FROM games_v2 GROUP BY status;"'
```
7. **End-to-end smoke.** For a feature visible through the API, curl it and verify the response shape and content match expectations:
```bash
curl -s 'https://staging.golfcards.club/api/stats/leaderboard?metric=wins' | python3 -m json.tool
```
For features that only fire on specific game events (GAME_OVER, abandonment, etc.), run a soak game or manual repro and re-check the DB — don't assume "code is deployed" = "code has executed."
**If any step fails, stop and diagnose before running the next release.** Cascading hotfixes amplify the problem — each force-moved tag is another chance for the runner's cache to lie to you.
## Common Development Tasks ## Common Development Tasks
### Adjusting Animation Speed ### Adjusting Animation Speed

View File

@@ -35,6 +35,7 @@ services:
- BOOTSTRAP_ADMIN_USERNAME=${BOOTSTRAP_ADMIN_USERNAME:-} - BOOTSTRAP_ADMIN_USERNAME=${BOOTSTRAP_ADMIN_USERNAME:-}
- BOOTSTRAP_ADMIN_PASSWORD=${BOOTSTRAP_ADMIN_PASSWORD:-} - BOOTSTRAP_ADMIN_PASSWORD=${BOOTSTRAP_ADMIN_PASSWORD:-}
- MATCHMAKING_ENABLED=true - MATCHMAKING_ENABLED=true
- LEADERBOARD_INCLUDE_TEST_DEFAULT=${LEADERBOARD_INCLUDE_TEST_DEFAULT:-false}
depends_on: depends_on:
postgres: postgres:
condition: service_healthy condition: service_healthy