Files
vigilar/docs/superpowers/specs/2026-04-05-project-documentation-design.md
adlee-was-taken 1fd80ad31c docs: clarify NAS backup steps in documentation spec
Specify that backup timer snippets are inline in the guides, not
shipped as new unit files, to match the no-code-changes scope.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-05 09:17:58 -04:00

203 lines
14 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Vigilar Project Documentation — Design Spec
**Date:** 2026-04-05
**Status:** Approved, ready for implementation plan
**Scope:** Create user-facing and architectural documentation for Vigilar, plus a polished top-level `README.md`.
---
## 1. Goal
Vigilar currently has no top-level `README.md`, no user guide, and no architectural reference. Contributors and would-be home users have to read source code to understand what the project is or how to run it. This effort closes that gap with a single coordinated documentation pass.
The docs must:
- Give a homeowner with a mini PC a clear linear path from bare hardware to working cameras on their phone.
- Give a self-hoster a reference for config, CLI, secrets, backups, upgrades, and troubleshooting.
- Give a contributor enough architectural context to navigate the codebase without reading every file.
- Match the project's ethos: plain, no bloat, no cloud, no extra tooling.
## 2. Non-goals
- No doc site / MkDocs build (files are organized *as if* MkDocs-ready, but no tooling is added).
- No rewrites of existing code or config.
- No changes to `docs/camera-hardware-guide.md` (already exists, untouched).
- No changes to anything under `docs/superpowers/` other than this spec.
- No CI doc-linting, no link-checker automation beyond a one-time manual verification pass.
- No recording-backup-to-NAS feature work. Docs describe only what exists today; future backup improvements are noted as planned, not documented as present.
## 3. Audience & doc layout (Approach 3 — Hybrid)
Three organizing principles, one tree:
- **User-facing guides** are monolithic linear narratives (humans read top to bottom once).
- **Architecture docs** are split reference material (contributors jump to the subsystem they're touching).
- **README** is the front door that ties everything together.
Final tree:
```
README.md # NEW — top-level front door
docs/
├── home-user-guide.md # NEW — monolithic, linear
├── operator-guide.md # NEW — monolithic, reference
├── architecture/
│ ├── overview.md # NEW — process model, bus, data flow
│ ├── conventions.md # NEW — coding conventions distilled
│ └── subsystems/
│ ├── camera.md # NEW
│ ├── detection.md # NEW
│ ├── events.md # NEW
│ ├── alerts.md # NEW
│ ├── sensors.md # NEW
│ ├── ups.md # NEW
│ ├── storage.md # NEW
│ ├── highlights.md # NEW
│ ├── presence.md # NEW
│ ├── pets.md # NEW
│ ├── health.md # NEW
│ └── web.md # NEW
└── camera-hardware-guide.md # EXISTING — untouched
```
## 4. `README.md` (top-level)
Sells the project, orients newcomers, links into the doc tree.
**Required sections, in order:**
1. **Title + tagline** — "Vigilar — DIY offline-first home security"
2. **One-line pitch** (~20 words)
3. **Hero image placeholder**`![screenshot](docs/images/grid.png)` with a comment noting it's a placeholder. Do not create the image.
4. **Why Vigilar** — bullet list of differentiators: offline-first, HLS grid + MJPEG single, OpenCV MOG2 motion with 5-sec pre-motion ring buffer, event timeline + highlight reels + timelapses, visitor/pet/wildlife tracking, PWA with VAPID push (no Firebase), AES-256 encrypted recordings, NUT UPS monitoring, runs on cheap mini PC.
5. **Quick paths table** — 4 rows mapping intent → doc: home user guide, operator guide, architecture overview, camera hardware guide.
6. **60-second overview** — small ASCII diagram (cameras → mini PC → phone/browser) + 4 tech-stack bullets (Python 3.11, Flask+Bootstrap 5, SQLite WAL, MQTT, FFmpeg).
7. **Status** — "Alpha. Works on the author's hardware. Expect rough edges and breaking changes."
8. **Installation TL;DR** — 3-line fenced block (`git clone`, `sudo ./scripts/install.sh`, `sudo systemctl start vigilar`), followed by links to the full home-user and operator guides.
9. **Documentation** — nested bullet list mirroring the `docs/` tree, each entry with a one-line description.
10. **License** — GPL-3.0. If no `LICENSE` file exists, the README states the intent and notes a `LICENSE` file will follow; we do **not** create the license file as part of this effort unless asked.
11. **Contributing** — short stub: "Issues and PRs welcome. See `CLAUDE.md` for code conventions."
## 5. `docs/home-user-guide.md`
Monolithic, linear, ~15002500 words. Target reader: homeowner with a mini PC, some Linux comfort, wants cameras on phone.
**Required sections, in order:**
1. **What you'll end up with** — outcome in 2 sentences plus a small ASCII diagram.
2. **What you need** — hardware checklist: mini PC (x86_64, 4GB+ RAM, 128GB+ SSD), USB stick for OS install, one or more RTSP cameras, phone, optional NAS. Link to `camera-hardware-guide.md` for camera picks.
3. **Step 1 — Install Debian/Ubuntu Server on the mini PC** — brief, points at upstream installer docs, tells the user to enable SSH. No hand-holding on the OS install itself.
4. **Step 2 — Get Vigilar onto the box**`git clone`, `sudo ./scripts/install.sh`, plus 3 bullets summarizing what `install.sh` does (read `scripts/install.sh` at write-time to ground these bullets).
5. **Step 3 — First boot**`sudo systemctl enable --now vigilar`, then open `http://<mini-pc-ip>:49735` in a browser on the same LAN. Mention the port is configurable under `[web]` in `vigilar.toml`.
6. **Step 4 — Set your PIN** — UI walkthrough, 23 sentences, screenshot placeholder.
7. **Step 5 — Add your first camera** — UI walkthrough: RTSP URL, credentials, test stream, save. Point at `camera-hardware-guide.md` for URL formats.
8. **Step 6 — Phone push notifications (PWA)** — open web UI on phone, "Add to Home Screen", allow notifications. Under-the-hood note: VAPID keys already generated by `install.sh`.
9. **Step 7 — Optional: NAS backup of config + database** — mount NAS share at `/mnt/nas/vigilar-backups`, set `VIGILAR_BACKUP_DIR`, and set up a nightly run of `scripts/backup.sh`. The guide provides a copy-pasteable systemd service + timer snippet inline (no new units are shipped in the repo as part of this effort). **Explicitly state** that this backs up DB + `/etc/vigilar` (config + secrets) only, and that **recordings stay local** — point at a "planned" note for recording backup.
10. **Troubleshooting** — camera won't connect, no push notifications, service won't start (`journalctl -u vigilar`), motion detection too sensitive, how to reset PIN.
11. **Where to go next** — links to Operator Guide and Architecture Overview.
**Grounding rule:** every shell command must correspond to a real file in `scripts/` or a real `vigilar` CLI subcommand. Verify before writing.
## 6. `docs/operator-guide.md`
Monolithic, reference-oriented, ~25004000 words. Target reader: self-hoster tuning, upgrading, securing.
**Required sections, in order:**
1. **Audience & scope** — for admins, not first-time home users. Points at home-user-guide.md for initial setup.
2. **Layout on disk** — table of `/opt/vigilar`, `/etc/vigilar/{vigilar.toml, certs/, secrets/}`, `/var/vigilar/{data/vigilar.db, recordings/, hls/, backups/}`.
3. **Installation** — what `scripts/install.sh` does, `systemd/vigilar.service` summary, `systemd/vigilar-mosquitto.conf` summary, system dependencies (ffmpeg, mosquitto, sqlite3, Python 3.11+).
4. **Configuration reference (`vigilar.toml`)** — one subsection per `[section]` in the default TOML. Each key: default, what it controls, when to change. Sections to cover (from current `config/vigilar.toml`): `system`, `mqtt`, `web`, `zigbee2mqtt`, `ups`, `storage`, `remote`, `alerts.local`, `alerts.web_push`, `alerts.email`, plus any additional sections discovered by re-reading the TOML at write-time.
5. **CLI reference (`vigilar ...`)** — enumerated at write-time by reading `vigilar/cli/`. One subsection per top-level command. Do not guess commands.
6. **Secrets & security**`/etc/vigilar/secrets/` layout and permissions; `vigilar config set-pin`; `vigilar config set-password`; TLS via `scripts/gen_cert.sh``[web] tls_cert/tls_key`; VAPID via `scripts/gen_vapid_keys.sh`; storage encryption key (`storage.key`) — **explicit warning: do not lose it, recordings are unrecoverable without it**; recommended firewall stance (LAN-only by default).
7. **UPS / NUT integration**`scripts/setup_nut.sh`, `[ups]` options, shutdown behavior, low-battery thresholds.
8. **Backups** — what `scripts/backup.sh` captures (DB + `/etc/vigilar`) and what it does **not** (recordings); `VIGILAR_BACKUP_DIR` and `VIGILAR_BACKUP_RETENTION_DAYS`; copy-pasteable systemd service + timer snippet (inline in the doc; no new unit files added to the repo); restore procedure.
9. **Upgrades**`git pull` + `pip install -e .` + `systemctl restart vigilar`; rollback by restoring a backup tarball. If DB migrations exist, note how they're applied; if they don't, say so.
10. **Logs & health**`journalctl -u vigilar`, `log_level` in `[system]`, health endpoints (enumerated at write-time by reading `vigilar/web/blueprints/system.py` and `vigilar/health/`).
11. **Remote access**`[remote]` section, tunnel-based remote HLS, bandwidth-shaped downscaled streams, reiterated not-a-cloud.
12. **Troubleshooting** — service crash loops, MQTT broker won't start, camera worker thrashing, disk full / `free_space_floor_gb` triggered, HLS stalling.
**Grounding rule:** every TOML key, every CLI command, every file path, every endpoint must be verified against the current code before writing. Any that can't be verified must be omitted, not guessed.
## 7. `docs/architecture/overview.md`
~10001500 words. Target reader: contributor new to the codebase.
**Required sections, in order:**
1. **Design principles** — offline-first, subsystem isolation via multiprocessing, loose coupling via local MQTT bus, SQLite WAL as single durable store, SQLAlchemy Core (not ORM), adaptive FPS (2 idle / 30 on motion) with ring buffer.
2. **Process topology** — ASCII or Mermaid diagram showing parent supervisor + N subsystem processes + mosquitto + Flask web.
3. **The MQTT bus** — broker location, topic naming convention `vigilar/<subsystem>/<entity>/<event>`, retention/QoS notes (verify at write-time), rationale for MQTT over an in-process queue.
4. **Data flow: the motion → alert path** — numbered sequence from RTSP capture through motion detection, recording, event creation, highlight scoring, push notification, and UI update. Each step names the actual file/function where it happens (verify at write-time).
5. **Storage layout** — SQLite table summary (enumerate at write-time by reading `vigilar/storage/`), recordings (`.vge`, AES-256-GCM, key path), HLS segments, backups.
6. **Configuration & secrets** — TOML → Pydantic v2 validation, secrets as file paths (never inline), PIN & password hashing with constant-time compare.
7. **The web tier** — Flask + Blueprints, Jinja2 + Bootstrap 5 dark, HLS grid + MJPEG single view rationale, PWA + VAPID.
8. **What's NOT in the critical path** — remote access (optional), email alerts (optional), cloud (never).
## 8. `docs/architecture/conventions.md`
~400 words. Distilled from `CLAUDE.md` but written for human contributors, not the AI. Covers: StrEnum for string constants (`vigilar/constants.py`), SQLAlchemy Core only (no mapped ORM classes), type hints on public functions, no docstrings unless logic is non-obvious, Ruff line-length 100, multiprocessing-per-subsystem rule, MQTT topic naming, Pydantic-validated TOML config, secrets-as-file-paths.
## 9. `docs/architecture/subsystems/*.md` (12 files)
One file per subdirectory under `vigilar/`: `camera`, `detection`, `events`, `alerts`, `sensors`, `ups`, `storage`, `highlights`, `presence`, `pets`, `health`, `web`.
**Uniform template** (≈150400 words each):
```
# <Subsystem name>
## Purpose
One paragraph — what this subsystem is responsible for.
## Key files
- `vigilar/<sub>/foo.py` — role
- ...
## MQTT topics
**Subscribes:** `vigilar/...`
**Publishes:** `vigilar/...`
## Database tables
`table_name` — what it holds. Or "none."
## Depends on
- sister subsystem X (via topic Y)
## Consumed by
- sister subsystem Z (via topic W)
## Notes
Gotchas or perf notes, only if any.
```
**Grounding rule (hard):** every topic name, every table name, every file role must come from reading the actual code. If a topic cannot be found, the doc must say "no MQTT publishers found at time of writing" — not invent one. This rule is the most important verification step in the plan.
## 10. Verification checklist (before completion)
1. **Link check** — every relative link in every new file resolves to a real path.
2. **Command check** — every shell command in the user guides exists as a real script under `scripts/` or a real `vigilar` CLI subcommand.
3. **Grounding check** — every topic name, table name, file path, and endpoint is verified against code, or omitted. Nothing guessed.
4. **TOML coverage check** — every `[section]` in `config/vigilar.toml` is covered in the operator guide's configuration reference.
5. **Subsystem coverage check** — every subdirectory in `vigilar/` (matching the 12-file list) has a corresponding subsystem doc.
6. **Read-through pass** — tone and terminology consistent across all files.
7. **README link check** — all doc tree links in `README.md` resolve.
## 11. Out of scope (explicit)
- `LICENSE` file creation (the README declares GPL-3.0; creating the file is a separate request).
- Screenshot/image creation (placeholders only).
- MkDocs configuration.
- Any code changes.
- Any changes to `docs/camera-hardware-guide.md`.
- Any doc-linting CI.
- Recording-backup-to-NAS feature or docs beyond the "planned" note.
- Migration documentation beyond noting whether migrations exist.
## 12. Success criteria
- A new homeowner can go from a bare mini PC to working cameras on their phone using only `README.md` + `docs/home-user-guide.md`.
- A self-hoster can answer any "how do I configure / back up / upgrade / troubleshoot" question from `docs/operator-guide.md` alone.
- A new contributor can identify which subsystem owns a given behavior within 5 minutes using `docs/architecture/overview.md` + the subsystem files.
- Every claim in every doc is either verified against current code or explicitly flagged.