Files
vigilar/docs/superpowers/specs/2026-04-05-project-documentation-design.md
adlee-was-taken 1fd80ad31c docs: clarify NAS backup steps in documentation spec
Specify that backup timer snippets are inline in the guides, not
shipped as new unit files, to match the no-code-changes scope.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-05 09:17:58 -04:00

14 KiB
Raw Blame History

Vigilar Project Documentation — Design Spec

Date: 2026-04-05 Status: Approved, ready for implementation plan Scope: Create user-facing and architectural documentation for Vigilar, plus a polished top-level README.md.


1. Goal

Vigilar currently has no top-level README.md, no user guide, and no architectural reference. Contributors and would-be home users have to read source code to understand what the project is or how to run it. This effort closes that gap with a single coordinated documentation pass.

The docs must:

  • Give a homeowner with a mini PC a clear linear path from bare hardware to working cameras on their phone.
  • Give a self-hoster a reference for config, CLI, secrets, backups, upgrades, and troubleshooting.
  • Give a contributor enough architectural context to navigate the codebase without reading every file.
  • Match the project's ethos: plain, no bloat, no cloud, no extra tooling.

2. Non-goals

  • No doc site / MkDocs build (files are organized as if MkDocs-ready, but no tooling is added).
  • No rewrites of existing code or config.
  • No changes to docs/camera-hardware-guide.md (already exists, untouched).
  • No changes to anything under docs/superpowers/ other than this spec.
  • No CI doc-linting, no link-checker automation beyond a one-time manual verification pass.
  • No recording-backup-to-NAS feature work. Docs describe only what exists today; future backup improvements are noted as planned, not documented as present.

3. Audience & doc layout (Approach 3 — Hybrid)

Three organizing principles, one tree:

  • User-facing guides are monolithic linear narratives (humans read top to bottom once).
  • Architecture docs are split reference material (contributors jump to the subsystem they're touching).
  • README is the front door that ties everything together.

Final tree:

README.md                              # NEW — top-level front door
docs/
├── home-user-guide.md                 # NEW — monolithic, linear
├── operator-guide.md                  # NEW — monolithic, reference
├── architecture/
│   ├── overview.md                    # NEW — process model, bus, data flow
│   ├── conventions.md                 # NEW — coding conventions distilled
│   └── subsystems/
│       ├── camera.md                  # NEW
│       ├── detection.md               # NEW
│       ├── events.md                  # NEW
│       ├── alerts.md                  # NEW
│       ├── sensors.md                 # NEW
│       ├── ups.md                     # NEW
│       ├── storage.md                 # NEW
│       ├── highlights.md              # NEW
│       ├── presence.md                # NEW
│       ├── pets.md                    # NEW
│       ├── health.md                  # NEW
│       └── web.md                     # NEW
└── camera-hardware-guide.md           # EXISTING — untouched

4. README.md (top-level)

Sells the project, orients newcomers, links into the doc tree.

Required sections, in order:

  1. Title + tagline — "Vigilar — DIY offline-first home security"
  2. One-line pitch (~20 words)
  3. Hero image placeholder![screenshot](docs/images/grid.png) with a comment noting it's a placeholder. Do not create the image.
  4. Why Vigilar — bullet list of differentiators: offline-first, HLS grid + MJPEG single, OpenCV MOG2 motion with 5-sec pre-motion ring buffer, event timeline + highlight reels + timelapses, visitor/pet/wildlife tracking, PWA with VAPID push (no Firebase), AES-256 encrypted recordings, NUT UPS monitoring, runs on cheap mini PC.
  5. Quick paths table — 4 rows mapping intent → doc: home user guide, operator guide, architecture overview, camera hardware guide.
  6. 60-second overview — small ASCII diagram (cameras → mini PC → phone/browser) + 4 tech-stack bullets (Python 3.11, Flask+Bootstrap 5, SQLite WAL, MQTT, FFmpeg).
  7. Status — "Alpha. Works on the author's hardware. Expect rough edges and breaking changes."
  8. Installation TL;DR — 3-line fenced block (git clone, sudo ./scripts/install.sh, sudo systemctl start vigilar), followed by links to the full home-user and operator guides.
  9. Documentation — nested bullet list mirroring the docs/ tree, each entry with a one-line description.
  10. License — GPL-3.0. If no LICENSE file exists, the README states the intent and notes a LICENSE file will follow; we do not create the license file as part of this effort unless asked.
  11. Contributing — short stub: "Issues and PRs welcome. See CLAUDE.md for code conventions."

5. docs/home-user-guide.md

Monolithic, linear, ~15002500 words. Target reader: homeowner with a mini PC, some Linux comfort, wants cameras on phone.

Required sections, in order:

  1. What you'll end up with — outcome in 2 sentences plus a small ASCII diagram.
  2. What you need — hardware checklist: mini PC (x86_64, 4GB+ RAM, 128GB+ SSD), USB stick for OS install, one or more RTSP cameras, phone, optional NAS. Link to camera-hardware-guide.md for camera picks.
  3. Step 1 — Install Debian/Ubuntu Server on the mini PC — brief, points at upstream installer docs, tells the user to enable SSH. No hand-holding on the OS install itself.
  4. Step 2 — Get Vigilar onto the boxgit clone, sudo ./scripts/install.sh, plus 3 bullets summarizing what install.sh does (read scripts/install.sh at write-time to ground these bullets).
  5. Step 3 — First bootsudo systemctl enable --now vigilar, then open http://<mini-pc-ip>:49735 in a browser on the same LAN. Mention the port is configurable under [web] in vigilar.toml.
  6. Step 4 — Set your PIN — UI walkthrough, 23 sentences, screenshot placeholder.
  7. Step 5 — Add your first camera — UI walkthrough: RTSP URL, credentials, test stream, save. Point at camera-hardware-guide.md for URL formats.
  8. Step 6 — Phone push notifications (PWA) — open web UI on phone, "Add to Home Screen", allow notifications. Under-the-hood note: VAPID keys already generated by install.sh.
  9. Step 7 — Optional: NAS backup of config + database — mount NAS share at /mnt/nas/vigilar-backups, set VIGILAR_BACKUP_DIR, and set up a nightly run of scripts/backup.sh. The guide provides a copy-pasteable systemd service + timer snippet inline (no new units are shipped in the repo as part of this effort). Explicitly state that this backs up DB + /etc/vigilar (config + secrets) only, and that recordings stay local — point at a "planned" note for recording backup.
  10. Troubleshooting — camera won't connect, no push notifications, service won't start (journalctl -u vigilar), motion detection too sensitive, how to reset PIN.
  11. Where to go next — links to Operator Guide and Architecture Overview.

Grounding rule: every shell command must correspond to a real file in scripts/ or a real vigilar CLI subcommand. Verify before writing.

6. docs/operator-guide.md

Monolithic, reference-oriented, ~25004000 words. Target reader: self-hoster tuning, upgrading, securing.

Required sections, in order:

  1. Audience & scope — for admins, not first-time home users. Points at home-user-guide.md for initial setup.
  2. Layout on disk — table of /opt/vigilar, /etc/vigilar/{vigilar.toml, certs/, secrets/}, /var/vigilar/{data/vigilar.db, recordings/, hls/, backups/}.
  3. Installation — what scripts/install.sh does, systemd/vigilar.service summary, systemd/vigilar-mosquitto.conf summary, system dependencies (ffmpeg, mosquitto, sqlite3, Python 3.11+).
  4. Configuration reference (vigilar.toml) — one subsection per [section] in the default TOML. Each key: default, what it controls, when to change. Sections to cover (from current config/vigilar.toml): system, mqtt, web, zigbee2mqtt, ups, storage, remote, alerts.local, alerts.web_push, alerts.email, plus any additional sections discovered by re-reading the TOML at write-time.
  5. CLI reference (vigilar ...) — enumerated at write-time by reading vigilar/cli/. One subsection per top-level command. Do not guess commands.
  6. Secrets & security/etc/vigilar/secrets/ layout and permissions; vigilar config set-pin; vigilar config set-password; TLS via scripts/gen_cert.sh[web] tls_cert/tls_key; VAPID via scripts/gen_vapid_keys.sh; storage encryption key (storage.key) — explicit warning: do not lose it, recordings are unrecoverable without it; recommended firewall stance (LAN-only by default).
  7. UPS / NUT integrationscripts/setup_nut.sh, [ups] options, shutdown behavior, low-battery thresholds.
  8. Backups — what scripts/backup.sh captures (DB + /etc/vigilar) and what it does not (recordings); VIGILAR_BACKUP_DIR and VIGILAR_BACKUP_RETENTION_DAYS; copy-pasteable systemd service + timer snippet (inline in the doc; no new unit files added to the repo); restore procedure.
  9. Upgradesgit pull + pip install -e . + systemctl restart vigilar; rollback by restoring a backup tarball. If DB migrations exist, note how they're applied; if they don't, say so.
  10. Logs & healthjournalctl -u vigilar, log_level in [system], health endpoints (enumerated at write-time by reading vigilar/web/blueprints/system.py and vigilar/health/).
  11. Remote access[remote] section, tunnel-based remote HLS, bandwidth-shaped downscaled streams, reiterated not-a-cloud.
  12. Troubleshooting — service crash loops, MQTT broker won't start, camera worker thrashing, disk full / free_space_floor_gb triggered, HLS stalling.

Grounding rule: every TOML key, every CLI command, every file path, every endpoint must be verified against the current code before writing. Any that can't be verified must be omitted, not guessed.

7. docs/architecture/overview.md

~10001500 words. Target reader: contributor new to the codebase.

Required sections, in order:

  1. Design principles — offline-first, subsystem isolation via multiprocessing, loose coupling via local MQTT bus, SQLite WAL as single durable store, SQLAlchemy Core (not ORM), adaptive FPS (2 idle / 30 on motion) with ring buffer.
  2. Process topology — ASCII or Mermaid diagram showing parent supervisor + N subsystem processes + mosquitto + Flask web.
  3. The MQTT bus — broker location, topic naming convention vigilar/<subsystem>/<entity>/<event>, retention/QoS notes (verify at write-time), rationale for MQTT over an in-process queue.
  4. Data flow: the motion → alert path — numbered sequence from RTSP capture through motion detection, recording, event creation, highlight scoring, push notification, and UI update. Each step names the actual file/function where it happens (verify at write-time).
  5. Storage layout — SQLite table summary (enumerate at write-time by reading vigilar/storage/), recordings (.vge, AES-256-GCM, key path), HLS segments, backups.
  6. Configuration & secrets — TOML → Pydantic v2 validation, secrets as file paths (never inline), PIN & password hashing with constant-time compare.
  7. The web tier — Flask + Blueprints, Jinja2 + Bootstrap 5 dark, HLS grid + MJPEG single view rationale, PWA + VAPID.
  8. What's NOT in the critical path — remote access (optional), email alerts (optional), cloud (never).

8. docs/architecture/conventions.md

~400 words. Distilled from CLAUDE.md but written for human contributors, not the AI. Covers: StrEnum for string constants (vigilar/constants.py), SQLAlchemy Core only (no mapped ORM classes), type hints on public functions, no docstrings unless logic is non-obvious, Ruff line-length 100, multiprocessing-per-subsystem rule, MQTT topic naming, Pydantic-validated TOML config, secrets-as-file-paths.

9. docs/architecture/subsystems/*.md (12 files)

One file per subdirectory under vigilar/: camera, detection, events, alerts, sensors, ups, storage, highlights, presence, pets, health, web.

Uniform template (≈150400 words each):

# <Subsystem name>

## Purpose
One paragraph — what this subsystem is responsible for.

## Key files
- `vigilar/<sub>/foo.py` — role
- ...

## MQTT topics
**Subscribes:** `vigilar/...`
**Publishes:** `vigilar/...`

## Database tables
`table_name` — what it holds. Or "none."

## Depends on
- sister subsystem X (via topic Y)

## Consumed by
- sister subsystem Z (via topic W)

## Notes
Gotchas or perf notes, only if any.

Grounding rule (hard): every topic name, every table name, every file role must come from reading the actual code. If a topic cannot be found, the doc must say "no MQTT publishers found at time of writing" — not invent one. This rule is the most important verification step in the plan.

10. Verification checklist (before completion)

  1. Link check — every relative link in every new file resolves to a real path.
  2. Command check — every shell command in the user guides exists as a real script under scripts/ or a real vigilar CLI subcommand.
  3. Grounding check — every topic name, table name, file path, and endpoint is verified against code, or omitted. Nothing guessed.
  4. TOML coverage check — every [section] in config/vigilar.toml is covered in the operator guide's configuration reference.
  5. Subsystem coverage check — every subdirectory in vigilar/ (matching the 12-file list) has a corresponding subsystem doc.
  6. Read-through pass — tone and terminology consistent across all files.
  7. README link check — all doc tree links in README.md resolve.

11. Out of scope (explicit)

  • LICENSE file creation (the README declares GPL-3.0; creating the file is a separate request).
  • Screenshot/image creation (placeholders only).
  • MkDocs configuration.
  • Any code changes.
  • Any changes to docs/camera-hardware-guide.md.
  • Any doc-linting CI.
  • Recording-backup-to-NAS feature or docs beyond the "planned" note.
  • Migration documentation beyond noting whether migrations exist.

12. Success criteria

  • A new homeowner can go from a bare mini PC to working cameras on their phone using only README.md + docs/home-user-guide.md.
  • A self-hoster can answer any "how do I configure / back up / upgrade / troubleshoot" question from docs/operator-guide.md alone.
  • A new contributor can identify which subsystem owns a given behavior within 5 minutes using docs/architecture/overview.md + the subsystem files.
  • Every claim in every doc is either verified against current code or explicitly flagged.