docs: add design spec for project documentation effort

Captures scope and structure for top-level README, home user guide,
operator guide, and architecture docs (overview + conventions + 12
per-subsystem files). Approach 3 (hybrid): monolithic user guides,
split architecture reference.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
adlee-was-taken
2026-04-05 09:17:07 -04:00
parent 965dc3b13d
commit 4dc2db00e0

View File

@@ -0,0 +1,202 @@
# Vigilar Project Documentation — Design Spec
**Date:** 2026-04-05
**Status:** Approved, ready for implementation plan
**Scope:** Create user-facing and architectural documentation for Vigilar, plus a polished top-level `README.md`.
---
## 1. Goal
Vigilar currently has no top-level `README.md`, no user guide, and no architectural reference. Contributors and would-be home users have to read source code to understand what the project is or how to run it. This effort closes that gap with a single coordinated documentation pass.
The docs must:
- Give a homeowner with a mini PC a clear linear path from bare hardware to working cameras on their phone.
- Give a self-hoster a reference for config, CLI, secrets, backups, upgrades, and troubleshooting.
- Give a contributor enough architectural context to navigate the codebase without reading every file.
- Match the project's ethos: plain, no bloat, no cloud, no extra tooling.
## 2. Non-goals
- No doc site / MkDocs build (files are organized *as if* MkDocs-ready, but no tooling is added).
- No rewrites of existing code or config.
- No changes to `docs/camera-hardware-guide.md` (already exists, untouched).
- No changes to anything under `docs/superpowers/` other than this spec.
- No CI doc-linting, no link-checker automation beyond a one-time manual verification pass.
- No recording-backup-to-NAS feature work. Docs describe only what exists today; future backup improvements are noted as planned, not documented as present.
## 3. Audience & doc layout (Approach 3 — Hybrid)
Three organizing principles, one tree:
- **User-facing guides** are monolithic linear narratives (humans read top to bottom once).
- **Architecture docs** are split reference material (contributors jump to the subsystem they're touching).
- **README** is the front door that ties everything together.
Final tree:
```
README.md # NEW — top-level front door
docs/
├── home-user-guide.md # NEW — monolithic, linear
├── operator-guide.md # NEW — monolithic, reference
├── architecture/
│ ├── overview.md # NEW — process model, bus, data flow
│ ├── conventions.md # NEW — coding conventions distilled
│ └── subsystems/
│ ├── camera.md # NEW
│ ├── detection.md # NEW
│ ├── events.md # NEW
│ ├── alerts.md # NEW
│ ├── sensors.md # NEW
│ ├── ups.md # NEW
│ ├── storage.md # NEW
│ ├── highlights.md # NEW
│ ├── presence.md # NEW
│ ├── pets.md # NEW
│ ├── health.md # NEW
│ └── web.md # NEW
└── camera-hardware-guide.md # EXISTING — untouched
```
## 4. `README.md` (top-level)
Sells the project, orients newcomers, links into the doc tree.
**Required sections, in order:**
1. **Title + tagline** — "Vigilar — DIY offline-first home security"
2. **One-line pitch** (~20 words)
3. **Hero image placeholder**`![screenshot](docs/images/grid.png)` with a comment noting it's a placeholder. Do not create the image.
4. **Why Vigilar** — bullet list of differentiators: offline-first, HLS grid + MJPEG single, OpenCV MOG2 motion with 5-sec pre-motion ring buffer, event timeline + highlight reels + timelapses, visitor/pet/wildlife tracking, PWA with VAPID push (no Firebase), AES-256 encrypted recordings, NUT UPS monitoring, runs on cheap mini PC.
5. **Quick paths table** — 4 rows mapping intent → doc: home user guide, operator guide, architecture overview, camera hardware guide.
6. **60-second overview** — small ASCII diagram (cameras → mini PC → phone/browser) + 4 tech-stack bullets (Python 3.11, Flask+Bootstrap 5, SQLite WAL, MQTT, FFmpeg).
7. **Status** — "Alpha. Works on the author's hardware. Expect rough edges and breaking changes."
8. **Installation TL;DR** — 3-line fenced block (`git clone`, `sudo ./scripts/install.sh`, `sudo systemctl start vigilar`), followed by links to the full home-user and operator guides.
9. **Documentation** — nested bullet list mirroring the `docs/` tree, each entry with a one-line description.
10. **License** — GPL-3.0. If no `LICENSE` file exists, the README states the intent and notes a `LICENSE` file will follow; we do **not** create the license file as part of this effort unless asked.
11. **Contributing** — short stub: "Issues and PRs welcome. See `CLAUDE.md` for code conventions."
## 5. `docs/home-user-guide.md`
Monolithic, linear, ~15002500 words. Target reader: homeowner with a mini PC, some Linux comfort, wants cameras on phone.
**Required sections, in order:**
1. **What you'll end up with** — outcome in 2 sentences plus a small ASCII diagram.
2. **What you need** — hardware checklist: mini PC (x86_64, 4GB+ RAM, 128GB+ SSD), USB stick for OS install, one or more RTSP cameras, phone, optional NAS. Link to `camera-hardware-guide.md` for camera picks.
3. **Step 1 — Install Debian/Ubuntu Server on the mini PC** — brief, points at upstream installer docs, tells the user to enable SSH. No hand-holding on the OS install itself.
4. **Step 2 — Get Vigilar onto the box**`git clone`, `sudo ./scripts/install.sh`, plus 3 bullets summarizing what `install.sh` does (read `scripts/install.sh` at write-time to ground these bullets).
5. **Step 3 — First boot**`sudo systemctl enable --now vigilar`, then open `http://<mini-pc-ip>:49735` in a browser on the same LAN. Mention the port is configurable under `[web]` in `vigilar.toml`.
6. **Step 4 — Set your PIN** — UI walkthrough, 23 sentences, screenshot placeholder.
7. **Step 5 — Add your first camera** — UI walkthrough: RTSP URL, credentials, test stream, save. Point at `camera-hardware-guide.md` for URL formats.
8. **Step 6 — Phone push notifications (PWA)** — open web UI on phone, "Add to Home Screen", allow notifications. Under-the-hood note: VAPID keys already generated by `install.sh`.
9. **Step 7 — Optional: NAS backup of config + database** — mount NAS share at `/mnt/nas/vigilar-backups`, set `VIGILAR_BACKUP_DIR`, enable a systemd timer wrapping `scripts/backup.sh`. **Explicitly state** that this backs up DB + `/etc/vigilar` (config + secrets) only, and that **recordings stay local** — point at a "planned" note for recording backup.
10. **Troubleshooting** — camera won't connect, no push notifications, service won't start (`journalctl -u vigilar`), motion detection too sensitive, how to reset PIN.
11. **Where to go next** — links to Operator Guide and Architecture Overview.
**Grounding rule:** every shell command must correspond to a real file in `scripts/` or a real `vigilar` CLI subcommand. Verify before writing.
## 6. `docs/operator-guide.md`
Monolithic, reference-oriented, ~25004000 words. Target reader: self-hoster tuning, upgrading, securing.
**Required sections, in order:**
1. **Audience & scope** — for admins, not first-time home users. Points at home-user-guide.md for initial setup.
2. **Layout on disk** — table of `/opt/vigilar`, `/etc/vigilar/{vigilar.toml, certs/, secrets/}`, `/var/vigilar/{data/vigilar.db, recordings/, hls/, backups/}`.
3. **Installation** — what `scripts/install.sh` does, `systemd/vigilar.service` summary, `systemd/vigilar-mosquitto.conf` summary, system dependencies (ffmpeg, mosquitto, sqlite3, Python 3.11+).
4. **Configuration reference (`vigilar.toml`)** — one subsection per `[section]` in the default TOML. Each key: default, what it controls, when to change. Sections to cover (from current `config/vigilar.toml`): `system`, `mqtt`, `web`, `zigbee2mqtt`, `ups`, `storage`, `remote`, `alerts.local`, `alerts.web_push`, `alerts.email`, plus any additional sections discovered by re-reading the TOML at write-time.
5. **CLI reference (`vigilar ...`)** — enumerated at write-time by reading `vigilar/cli/`. One subsection per top-level command. Do not guess commands.
6. **Secrets & security**`/etc/vigilar/secrets/` layout and permissions; `vigilar config set-pin`; `vigilar config set-password`; TLS via `scripts/gen_cert.sh``[web] tls_cert/tls_key`; VAPID via `scripts/gen_vapid_keys.sh`; storage encryption key (`storage.key`) — **explicit warning: do not lose it, recordings are unrecoverable without it**; recommended firewall stance (LAN-only by default).
7. **UPS / NUT integration**`scripts/setup_nut.sh`, `[ups]` options, shutdown behavior, low-battery thresholds.
8. **Backups** — what `scripts/backup.sh` captures (DB + `/etc/vigilar`) and what it does **not** (recordings); `VIGILAR_BACKUP_DIR` and `VIGILAR_BACKUP_RETENTION_DAYS`; suggested systemd timer snippet; restore procedure.
9. **Upgrades**`git pull` + `pip install -e .` + `systemctl restart vigilar`; rollback by restoring a backup tarball. If DB migrations exist, note how they're applied; if they don't, say so.
10. **Logs & health**`journalctl -u vigilar`, `log_level` in `[system]`, health endpoints (enumerated at write-time by reading `vigilar/web/blueprints/system.py` and `vigilar/health/`).
11. **Remote access**`[remote]` section, tunnel-based remote HLS, bandwidth-shaped downscaled streams, reiterated not-a-cloud.
12. **Troubleshooting** — service crash loops, MQTT broker won't start, camera worker thrashing, disk full / `free_space_floor_gb` triggered, HLS stalling.
**Grounding rule:** every TOML key, every CLI command, every file path, every endpoint must be verified against the current code before writing. Any that can't be verified must be omitted, not guessed.
## 7. `docs/architecture/overview.md`
~10001500 words. Target reader: contributor new to the codebase.
**Required sections, in order:**
1. **Design principles** — offline-first, subsystem isolation via multiprocessing, loose coupling via local MQTT bus, SQLite WAL as single durable store, SQLAlchemy Core (not ORM), adaptive FPS (2 idle / 30 on motion) with ring buffer.
2. **Process topology** — ASCII or Mermaid diagram showing parent supervisor + N subsystem processes + mosquitto + Flask web.
3. **The MQTT bus** — broker location, topic naming convention `vigilar/<subsystem>/<entity>/<event>`, retention/QoS notes (verify at write-time), rationale for MQTT over an in-process queue.
4. **Data flow: the motion → alert path** — numbered sequence from RTSP capture through motion detection, recording, event creation, highlight scoring, push notification, and UI update. Each step names the actual file/function where it happens (verify at write-time).
5. **Storage layout** — SQLite table summary (enumerate at write-time by reading `vigilar/storage/`), recordings (`.vge`, AES-256-GCM, key path), HLS segments, backups.
6. **Configuration & secrets** — TOML → Pydantic v2 validation, secrets as file paths (never inline), PIN & password hashing with constant-time compare.
7. **The web tier** — Flask + Blueprints, Jinja2 + Bootstrap 5 dark, HLS grid + MJPEG single view rationale, PWA + VAPID.
8. **What's NOT in the critical path** — remote access (optional), email alerts (optional), cloud (never).
## 8. `docs/architecture/conventions.md`
~400 words. Distilled from `CLAUDE.md` but written for human contributors, not the AI. Covers: StrEnum for string constants (`vigilar/constants.py`), SQLAlchemy Core only (no mapped ORM classes), type hints on public functions, no docstrings unless logic is non-obvious, Ruff line-length 100, multiprocessing-per-subsystem rule, MQTT topic naming, Pydantic-validated TOML config, secrets-as-file-paths.
## 9. `docs/architecture/subsystems/*.md` (12 files)
One file per subdirectory under `vigilar/`: `camera`, `detection`, `events`, `alerts`, `sensors`, `ups`, `storage`, `highlights`, `presence`, `pets`, `health`, `web`.
**Uniform template** (≈150400 words each):
```
# <Subsystem name>
## Purpose
One paragraph — what this subsystem is responsible for.
## Key files
- `vigilar/<sub>/foo.py` — role
- ...
## MQTT topics
**Subscribes:** `vigilar/...`
**Publishes:** `vigilar/...`
## Database tables
`table_name` — what it holds. Or "none."
## Depends on
- sister subsystem X (via topic Y)
## Consumed by
- sister subsystem Z (via topic W)
## Notes
Gotchas or perf notes, only if any.
```
**Grounding rule (hard):** every topic name, every table name, every file role must come from reading the actual code. If a topic cannot be found, the doc must say "no MQTT publishers found at time of writing" — not invent one. This rule is the most important verification step in the plan.
## 10. Verification checklist (before completion)
1. **Link check** — every relative link in every new file resolves to a real path.
2. **Command check** — every shell command in the user guides exists as a real script under `scripts/` or a real `vigilar` CLI subcommand.
3. **Grounding check** — every topic name, table name, file path, and endpoint is verified against code, or omitted. Nothing guessed.
4. **TOML coverage check** — every `[section]` in `config/vigilar.toml` is covered in the operator guide's configuration reference.
5. **Subsystem coverage check** — every subdirectory in `vigilar/` (matching the 12-file list) has a corresponding subsystem doc.
6. **Read-through pass** — tone and terminology consistent across all files.
7. **README link check** — all doc tree links in `README.md` resolve.
## 11. Out of scope (explicit)
- `LICENSE` file creation (the README declares GPL-3.0; creating the file is a separate request).
- Screenshot/image creation (placeholders only).
- MkDocs configuration.
- Any code changes.
- Any changes to `docs/camera-hardware-guide.md`.
- Any doc-linting CI.
- Recording-backup-to-NAS feature or docs beyond the "planned" note.
- Migration documentation beyond noting whether migrations exist.
## 12. Success criteria
- A new homeowner can go from a bare mini PC to working cameras on their phone using only `README.md` + `docs/home-user-guide.md`.
- A self-hoster can answer any "how do I configure / back up / upgrade / troubleshoot" question from `docs/operator-guide.md` alone.
- A new contributor can identify which subsystem owns a given behavior within 5 minutes using `docs/architecture/overview.md` + the subsystem files.
- Every claim in every doc is either verified against current code or explicitly flagged.