docs: add architecture overview

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
adlee-was-taken
2026-04-05 09:36:37 -04:00
parent 0e4e2c1ca7
commit d38b0c4e25

View File

@@ -0,0 +1,170 @@
# Vigilar Architecture Overview
This document explains how Vigilar is put together for someone reading the
codebase for the first time. It is short on purpose — per-subsystem details
live under `subsystems/`.
## Design principles
- **Offline-first.** No external calls in the critical path. Cloud integrations,
if any, are opt-in and off the hot path.
- **Subsystem isolation.** Each subsystem runs in its own process. A crash in
one subsystem cannot take down another — the supervisor in `vigilar/main.py`
restarts crashed children with exponential backoff.
- **Loose coupling via MQTT.** Subsystems do not call each other directly.
They publish and subscribe to a local Mosquitto broker on `127.0.0.1:1883`.
- **SQLite (WAL) is the single durable store.** Access goes through
SQLAlchemy Core expressions (`vigilar/storage/schema.py`), not ORM mapped
classes. WAL mode and `synchronous=NORMAL` are set on every connection in
`vigilar/storage/db.py`.
- **Adaptive cost.** Cameras idle at 2 FPS and jump to 30 FPS on motion, with
a 5-second ring buffer so the moment leading up to the trigger is kept.
- **Configuration is typed.** `config/vigilar.toml` is loaded and validated
by Pydantic v2. Secrets are never inline — they are file paths under
`/etc/vigilar/secrets/`.
## Process topology
`vigilar start` loads config and calls `run_supervisor()` in `vigilar/main.py`.
The supervisor spawns every subsystem as a `multiprocessing.Process` (via the
`SubsystemProcess` wrapper) and monitors them in a 2-second restart loop.
Cameras are managed separately by `CameraManager`, which owns one child
process per configured camera.
```
systemd (vigilar.service)
|
v
vigilar start (supervisor, main.py)
|
+--------+-----------+-----------+-----------+----------+
| | | | | |
v v v v v v
web event- sensor- ups- presence- health-
(Flask) processor bridge monitor monitor monitor
CameraManager --> camera worker (front_door)
--> camera worker (backyard)
--> camera worker (side_yard)
--> camera worker (garage)
^ ^
| MQTT |
v v
mosquitto (127.0.0.1:1883, loopback only)
```
Every arrow touching the broker is a local TCP connection to loopback. The
`web` process is a Flask server (`vigilar/web/app.py:create_app`) with one
Blueprint per feature area under `vigilar/web/blueprints/`.
## The MQTT bus
- Broker: Mosquitto, bound to loopback only (see `systemd/vigilar-mosquitto.conf`:
`listener 1883 127.0.0.1`, `allow_anonymous true`, `persistence false`).
- Topic convention: every topic starts with `vigilar/` and is defined in
`vigilar/constants.py` via the `Topics` class (either static strings or
builder functions taking an ID). Real examples:
- `vigilar/camera/{camera_id}/motion/start`
- `vigilar/camera/{camera_id}/motion/end`
- `vigilar/camera/{camera_id}/heartbeat`
- `vigilar/sensor/{sensor_id}/{event_type}`
- `vigilar/ups/status`, `vigilar/ups/power_loss`, `vigilar/ups/low_battery`
- `vigilar/system/arm_state`, `vigilar/system/alert`
- Wildcard subscriptions: `vigilar/#`, `vigilar/camera/#`, `vigilar/sensor/#`
- Payloads are JSON dicts. Publishers use `bus.publish_event(topic, **kwargs)`
from `vigilar/bus.py`; new fields are callers' responsibility.
- Why MQTT rather than an in-process queue: crash isolation, introspection with
`mosquitto_sub`, and the option to move subsystems to separate hosts later
without changing the wire format.
## Data flow: from motion to phone notification
1. `vigilar/camera/worker.py:run_camera_worker` — opens the RTSP stream via
`cv2.VideoCapture(..., cv2.CAP_FFMPEG)` with reconnect/backoff, pushes every
frame into a ring buffer, and drives the capture loop.
2. `vigilar/camera/motion.py:MotionDetector.detect` — MOG2 background
subtraction on a downscaled frame; when a new motion edge is found,
`worker.py` publishes `vigilar/camera/{camera_id}/motion/start` with
confidence and zone count.
3. `vigilar/camera/recorder.py:AdaptiveRecorder.start_motion_recording`
stops any idle recording, launches a fresh FFmpeg subprocess at
`motion_fps` (default 30), and writes the flushed ring-buffer frames
(default 5s of pre-roll) before the live frames. On stop, if
`VIGILAR_ENCRYPTION_KEY` is set, the MP4 is re-encrypted in place to
`.vge` via `vigilar/storage/encryption.py:encrypt_file` (AES-256-CTR).
4. `vigilar/events/processor.py:EventProcessor._handle_event` — subscribes
to `vigilar/#`, classifies the topic into an `EventType`/`Severity`, and
writes a row to the `events` table via
`vigilar/storage/queries.py:insert_event`. Wildlife and pet sightings also
get rows in `wildlife_sightings` / `pet_sightings`.
5. `vigilar/events/rules.py:RuleEngine.evaluate` — matches the event against
configured `[[rules]]` from `vigilar.toml` (AND/OR on arm state, sensor
event, camera motion, time window), honours per-rule cooldowns, and returns
a list of actions.
6. `vigilar/alerts/sender.py:send_alert` — for `alert_all` / `push_and_record`
actions, builds a notification from the `_CONTENT_MAP` table, loads the
VAPID key from `[alerts.web_push].vapid_private_key_file`, and calls
`pywebpush.webpush` for every row in `push_subscriptions`. Successes and
failures are recorded in `alert_log`; endpoints returning `410 Gone` are
pruned.
7. Web UI — the browser holds an open SSE connection to a handler in
`vigilar/web/blueprints/events.py` (`mimetype="text/event-stream"`), which
tails new event rows and pushes them to the timeline live.
## Storage layout
- `vigilar.db` under `[system] data_dir` (default `/var/vigilar/data`), SQLite
in WAL mode. Tables defined in `vigilar/storage/schema.py`:
`cameras`, `sensors`, `sensor_states`, `events`, `recordings`,
`system_events`, `arm_state_log`, `alert_log`, `push_subscriptions`,
`pets`, `pet_sightings`, `wildlife_sightings`, `package_events`,
`pet_training_images`, `pet_rules`, `face_profiles`, `face_embeddings`,
`visits`, `timelapse_schedules`.
- Recordings: `.vge` files under `[system] recordings_dir` (default
`/var/vigilar/recordings`), AES-256-CTR with a random 16-byte IV prefixed
to each file. Key at `/etc/vigilar/secrets/storage.key`. **Losing the key
means losing the recordings** — there is no recovery path.
- HLS: rolling segments under `[system] hls_dir` (default `/var/vigilar/hls`),
written by the per-camera `HLSStreamer` in `vigilar/camera/hls.py`.
- Backups: DB + `/etc/vigilar` tarball via `scripts/backup.sh`.
## Configuration and secrets
- `config/vigilar.toml` is the only configuration file the app reads
(systemd points `VIGILAR_CONFIG` at `/etc/vigilar/vigilar.toml` in
production).
- Validated by Pydantic v2 at startup (`vigilar/config.py`).
- Secrets never live in the TOML; they are file paths under
`/etc/vigilar/secrets/` (`storage.key`, `vapid_private.pem`).
- The arm PIN and admin password are hashed; comparisons are constant-time
(see `vigilar/alerts/pin.py`). The PIN hash is written into the TOML via
`vigilar config set-pin`, never typed by hand.
## The web tier
- Flask with Blueprints, one per feature area under
`vigilar/web/blueprints/`: `cameras`, `events`, `kiosk`, `pets`,
`recordings`, `sensors`, `system`, `visitors`, `wildlife`. All registered
in `vigilar/web/app.py:create_app`.
- Jinja2 templates under `vigilar/web/templates/`, Bootstrap 5 dark theme,
static assets under `vigilar/web/static/`.
- Live view: `hls.js` grid for bandwidth efficiency, MJPEG single view for
low latency.
- Live timeline updates via Server-Sent Events from
`vigilar/web/blueprints/events.py`.
- PWA with VAPID web push — no Firebase, no Google Cloud Messaging. Service
worker at `vigilar/web/static/sw.js`.
## What is NOT in the critical path
- Remote access (`[remote]` section) — optional, bandwidth-shaped HLS over
a WireGuard tunnel.
- Email alerts (`[alerts.email]`) and webhook alerts (`[alerts.webhook]`)
— optional, off by default.
- Any cloud service — never.
## Where to go next
- Conventions: `conventions.md`
- Per-subsystem details: `subsystems/`