Files
vigilar/docs/operator-guide.md
adlee-was-taken 5745388880 fix: address final-review items (status endpoint, docs, tests)
Follow-up to the holistic review of the PIN-unification branch:

- /system/status now reads the real arm state from the arm_state_log
  table via get_current_arm_state, instead of returning a hardcoded
  'DISARMED' stub. Without this, polling after the new async 202
  arm/disarm flow was a UX dead-end — clients never saw the state
  change they just requested. DB read failures degrade gracefully.

- Operator guide: correct the claim that 'vigilar config set-pin'
  populates recovery_passphrase_hash. It doesn't. recovery_passphrase
  _hash has no CLI helper today; it must be set manually.

- Tests: add a fail-closed regression for verify_pin on malformed
  stored hashes, and a companion test confirming the deprecation
  warning stays silent on a fully migrated config.

All address specific review comments on the branch; no scope creep.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 12:58:09 -04:00

668 lines
27 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Vigilar Operator Guide
## Audience and scope
This guide is for administrators installing and operating Vigilar on a
server they already manage. It is a reference for the on-disk layout,
configuration keys, CLI, systemd integration, secrets, UPS integration,
backups, upgrades, health and pruning, remote access, and the current
set of known limitations.
If you are setting up a home system from a bare mini PC for the first
time, start with the [Home User Guide](home-user-guide.md) and return
here when you need reference-level detail.
## Layout on disk
`scripts/install.sh` lays the system out as follows. The paths are
fixed in the installer; the configuration keys that reference them are
shown in parentheses.
| Path | Owner | Mode | Purpose |
|---|---|---|---|
| `/opt/vigilar/` | `vigilar:vigilar` | 0755 | Install root and home of the service user |
| `/opt/vigilar/venv/` | `vigilar:vigilar` | 0755 | Python virtual environment with the `vigilar` entry point |
| `/etc/vigilar/` | `root:root` | 0755 | Configuration root |
| `/etc/vigilar/vigilar.toml` | `root:vigilar` | 0644 | Main config file (`VIGILAR_CONFIG`) |
| `/etc/vigilar/secrets/` | `root:root` | 0700 | Storage key, VAPID private key |
| `/etc/vigilar/secrets/storage.key` | `root:root` | 0600 | 32-byte AES-256 key for recording encryption |
| `/etc/vigilar/secrets/vapid_private.pem` | `root:root` | 0600 | VAPID signing key for Web Push |
| `/etc/vigilar/secrets/vapid_public.txt` | `root:vigilar` | 0644 | VAPID public key (base64url) |
| `/etc/vigilar/certs/` | `root:vigilar` | 0750 | TLS material |
| `/etc/vigilar/certs/cert.pem` | `root:vigilar` | 0644 | TLS certificate (`[web] tls_cert`) |
| `/etc/vigilar/certs/key.pem` | `root:vigilar` | 0640 | TLS private key (`[web] tls_key`) |
| `/var/vigilar/` | `vigilar:vigilar` | 0750 | Runtime data root |
| `/var/vigilar/data/` | `vigilar:vigilar` | 0750 | SQLite database and supporting files (`[system] data_dir`) |
| `/var/vigilar/data/vigilar.db` | `vigilar:vigilar` | 0640 | Main SQLite database (WAL mode) |
| `/var/vigilar/recordings/` | `vigilar:vigilar` | 0750 | `.vge` encrypted recordings and thumbnails (`[system] recordings_dir`) |
| `/var/vigilar/hls/` | `vigilar:vigilar` | 0750 | HLS segments and playlists (`[system] hls_dir`) |
| `/etc/systemd/system/vigilar.service` | `root:root` | 0644 | Systemd unit |
| `/etc/mosquitto/conf.d/vigilar.conf` | `root:root` | 0644 | Localhost-only MQTT config |
The `VIGILAR_CONFIG` environment variable is read by the CLI and the
web blueprints to locate `vigilar.toml`; the systemd unit sets it to
`/etc/vigilar/vigilar.toml`.
## Installation
`scripts/install.sh` is idempotent and supports Debian/Ubuntu (apt) and
Arch Linux (pacman). It performs eight phases:
1. **System dependencies.** On apt: `ffmpeg mosquitto python3
python3-venv python3-pip nut-client`. On pacman: `ffmpeg mosquitto
python python-virtualenv nut`.
2. **System user.** Creates the `vigilar` system user and group with
`/opt/vigilar` as the home directory and `/usr/sbin/nologin` as the
shell.
3. **Directories and permissions.** Creates `/var/vigilar/{data,
recordings,hls}` owned by `vigilar:vigilar` at 0750, plus
`/etc/vigilar/{secrets,certs}` with the modes shown above.
4. **Python venv.** Creates `/opt/vigilar/venv` as the `vigilar` user
and installs the project in place with `pip install "${PROJECT_DIR}"`.
5. **Storage encryption key.** Writes 32 random bytes from
`/dev/urandom` to `/etc/vigilar/secrets/storage.key` if it does not
already exist. This file is never rewritten by the installer.
6. **Sample config.** Copies the repository's `config/vigilar.toml` to
`/etc/vigilar/vigilar.toml` if a config does not already exist.
7. **Systemd unit.** Installs and enables `vigilar.service`.
8. **Mosquitto.** Installs `systemd/vigilar-mosquitto.conf` to
`/etc/mosquitto/conf.d/vigilar.conf` and restarts `mosquitto.service`.
The installer prints recommended follow-up steps: edit the TOML, then
run `gen_cert.sh`, `gen_vapid_keys.sh`, and `setup_nut.sh`, then start
the service.
### Systemd unit
`vigilar.service` runs `/opt/vigilar/venv/bin/vigilar start --config
/etc/vigilar/vigilar.toml` as `vigilar:vigilar` with
`VIGILAR_CONFIG=/etc/vigilar/vigilar.toml`. It requires
`mosquitto.service`, wants `nut-monitor.service`, and uses
`Restart=on-failure`, `RestartSec=10`, and `WatchdogSec=120`. The unit
applies `ProtectSystem=strict`, `ProtectHome`, `PrivateTmp`,
`PrivateDevices`, `NoNewPrivileges`, and a `@system-service` syscall
filter. `ReadWritePaths` is limited to `/var/vigilar/{data,recordings,
hls}`; `/etc/vigilar` is mounted read-only. Output goes to the journal
with `SyslogIdentifier=vigilar`.
### Mosquitto configuration
`vigilar-mosquitto.conf` binds a single listener on `127.0.0.1:1883`,
allows anonymous connections (localhost only), disables persistence
(all state lives in SQLite), and logs errors, warnings, notices, and
connection events to syslog. Vigilar never authenticates to the broker
and never exposes it beyond loopback.
## Configuration reference
`config/vigilar.toml` is parsed by `tomllib`, then validated by the
Pydantic models in `vigilar/config.py`. The models are the source of
truth: any unknown key is rejected, and each section has a default so
omitted sections behave sensibly.
### `[system]`
- `name` (default `"Vigilar Home Security"`): display name used in
logs and the web UI.
- `timezone` (default `"UTC"`; sample ships as
`"America/New_York"`): used for daily digests, highlight scheduling,
and timestamped file paths.
- `data_dir` (default `/var/vigilar/data`): SQLite database and
derived state.
- `recordings_dir` (default `/var/vigilar/recordings`): encrypted
`.vge` files.
- `hls_dir` (default `/var/vigilar/hls`): HLS segment output.
- `log_level` (default `"INFO"`): one of DEBUG, INFO, WARNING, ERROR.
- `arm_pin_hash` (default `""`): **deprecated.** Still parsed but
ignored at runtime. Use `[security] pin_hash` instead; run
`vigilar config set-pin` to generate the canonical hash.
### `[mqtt]`
- `host` (default `127.0.0.1`) and `port` (default `1883`): broker
address. Leave on loopback unless you deliberately run a shared
broker.
- `username`, `password` (default `""`): unused by the shipped
mosquitto config, present for operators who run their own broker.
### `[web]`
- `host` (default `0.0.0.0`) and `port` (default `49735`): Flask
listener. Change `host` to `127.0.0.1` if you front with a reverse
proxy.
- `tls_cert`, `tls_key` (default `""`): PEM paths. `gen_cert.sh`
fills these in.
- `username` (default `"admin"`): web UI login name.
- `password_hash` (default `""`): scrypt hash set via `vigilar config
set-password`.
- `session_timeout` (default `3600` seconds).
### `[zigbee2mqtt]`
- `mqtt_topic_prefix` (default `"zigbee2mqtt"`): used when subscribing
to sensor topics from an external Zigbee2MQTT bridge.
### `[ups]`
See also the UPS/NUT section below.
- `enabled` (default `true`).
- `nut_host` (default `127.0.0.1`), `nut_port` (default `3493`),
`ups_name` (default `"ups"`): matches the `[ups]` block generated by
`setup_nut.sh`.
- `poll_interval_s` (default `30`).
- `low_battery_threshold_pct` (default `20`, range 595).
- `critical_runtime_threshold_s` (default `300`).
- `shutdown_delay_s` (default `60`).
### `[storage]`
- `encrypt_recordings` (default `true`): toggles AES-256-CTR
encryption of new `.vge` files. Changing this does not re-encrypt
existing recordings.
- `key_file` (default `/etc/vigilar/secrets/storage.key`): 32-byte
raw key.
- `max_disk_usage_gb` (default `200`) and `free_space_floor_gb`
(default `10`): **legacy keys**. They are defined on the Pydantic
model and exposed in the settings UI, but no pruning or recording
code currently reads them. The real disk ceiling is the
percentage-based pruner in `[health]`. Do not rely on these two
fields to cap disk usage today.
### `[remote]`
- `enabled` (default `false`): turns on the remote-access bridge.
- `upload_bandwidth_mbps` (default `22.0`): informational ceiling.
- `remote_hls_resolution` (default `[426, 240]`), `remote_hls_fps`
(default `10`), `remote_hls_bitrate_kbps` (default `500`): quality
profile for HLS served over the tunnel.
- `max_remote_viewers` (default `4`; `0` = unlimited).
- `tunnel_ip` (default `"10.99.0.2"`): WireGuard address of the home
server, for display only.
### `[alerts.local]`
- `enabled` (default `true`).
- `syslog` (default `true`): the supervisor installs a `SysLogHandler`
on the `vigilar.alerts` logger when this is true.
- `desktop_notify` (default `false`): `notify-send` fallback for
operator-console deployments.
### `[alerts.web_push]`
- `enabled` (default `true`).
- `vapid_private_key_file` (default
`/etc/vigilar/secrets/vapid_private.pem`).
- `vapid_claim_email` (default `"mailto:admin@vigilar.local"`): used
as the VAPID `sub` claim.
### `[alerts.email]`
- `enabled` (default `false`).
- `smtp_host`, `smtp_port` (default `587`), `from_addr`, `to_addr`,
`use_tls` (default `true`).
### `[alerts.webhook]`
- `enabled` (default `false`).
- `url`, `secret`: HMAC secret signs outbound webhook bodies.
### `[[cameras]]` (array of tables)
One block per camera. Keys:
- `id`, `display_name`, `rtsp_url`: required.
- `enabled` (default `true`).
- `record_continuous` (default `false`), `record_on_motion` (default
`true`).
- `motion_sensitivity` (default `0.7`, range 0.01.0) and
`motion_min_area_px` (default `500`).
- `motion_zones`, `zones`: polygon and named-zone overrides.
- `pre_motion_buffer_s` (default `5`) and `post_motion_buffer_s`
(default `30`).
- `idle_fps` (default `2`, range 130) and `motion_fps` (default
`30`, range 160): the adaptive FPS pair.
- `retention_days` (default `30`).
- `resolution_capture` (default `[1920, 1080]`) and
`resolution_motion` (default `[640, 360]`): capture size and the
downscale used for MOG2 motion detection.
- `location` (default `INTERIOR`): `CameraLocation` enum, used for
alert profiles.
Camera IDs must be unique; the Pydantic root validator rejects
duplicates.
### `[[sensors]]` and `[sensors.gpio]`
Each `[[sensors]]` block has `id`, `display_name`, `type` (e.g.
`CONTACT`, `MOTION`, `TEMPERATURE`), `protocol` (`ZIGBEE`, `ZWAVE`,
`GPIO`), `device_address`, `location`, and `enabled` (default
`true`). `[sensors.gpio] bounce_time_ms` (default `50`) applies to all
GPIO sensors. Sensor IDs must also be unique.
### `[[rules]]`
Each rule has `id`, `description`, `conditions` (list of `{type,
value, sensor_id, event}` maps), `logic` (`AND` or `OR`, default
`AND`), `actions` (list of action names like `alert_all` or
`record_all_cameras`), and `cooldown_s` (default `60`).
### `[detection]` and `[vehicles]`
- `[detection] person_detection` (default `false`), `model_path`,
`model_config_path`, `confidence_threshold` (default `0.5`),
`cameras` (empty list means all cameras).
- `[[vehicles.known]]` entries define recognised vehicles with
`name`, `color_profile`, `size_class`, `calibration_file`.
### `[presence]`
- `enabled` (default `false`).
- `ping_interval_s` (default `30`) and `departure_delay_m` (default
`10`).
- `method`: `icmp` or `arping`.
- `[[presence.members]]` entries with `name`, `ip`, and `role`
(`adult` or `child`).
- `actions`: mapping of states (`EMPTY`, `ADULTS_HOME`, `KIDS_HOME`,
`ALL_HOME`) to arm states.
### `[health]`
This is where pruning actually lives.
- `enabled` (default `true`).
- `disk_warn_pct` (default `85`): warning threshold on the partition
hosting `data_dir`.
- `disk_critical_pct` (default `95`): critical threshold. When crossed
and `auto_prune` is true, the health monitor runs the pruner.
- `auto_prune` (default `true`).
- `auto_prune_target_pct` (default `80`): pruner deletes the oldest
non-starred recordings until disk usage drops below this percentage.
- `daily_digest` (default `true`) and `daily_digest_time` (default
`"08:00"`).
### `[pets]`, `[visitors]`, `[highlights]`, `[kiosk]`
Subsystem-specific toggles. See the subsystem references under
`docs/architecture/` for per-key behaviour. Notable defaults: `[pets]
enabled = false`, `[visitors] enabled = false`, `[highlights] enabled
= true`, `[kiosk] ambient_enabled = true`.
### `[location]` and `[security]`
- `[location] latitude`, `longitude` (default `0.0`): used for sunrise
and sunset lookups.
- `[security] pin_hash` (canonical arm/disarm PIN store): populated by
`vigilar config set-pin`, which emits a PBKDF2-SHA256 hash to paste
into the `[security]` section. The legacy `[system] arm_pin_hash`
field is deprecated; see the `[system]` section above.
- `[security] recovery_passphrase_hash`: used by the web
`/system/api/reset-pin` endpoint to authenticate PIN-reset requests.
There is no CLI helper for this field today — set it by hashing a
passphrase manually with `vigilar.alerts.pin.hash_pin` and pasting
the result into `[security]`, or leave it unset to disable recovery.
## CLI reference
The entry point is `/opt/vigilar/venv/bin/vigilar`. All commands
accept `--version`. In production, run subcommands as the service user
so file ownership and venv paths line up:
```
sudo -u vigilar /opt/vigilar/venv/bin/vigilar <subcommand>
```
The CLI exposes exactly two top-level commands: `start` and `config`.
### `vigilar start`
Starts all services under the supervisor.
```
sudo -u vigilar /opt/vigilar/venv/bin/vigilar start \
--config /etc/vigilar/vigilar.toml
```
Options: `--config/-c PATH` (defaults to `$VIGILAR_CONFIG` then
`config/vigilar.toml`); `--log-level {DEBUG,INFO,WARNING,ERROR}`
(overrides `[system] log_level`). On invocation it loads and validates
the config, configures a console log formatter, prints a startup
summary (camera count, sensor count, UPS state), then hands off to
`vigilar.main.run_supervisor`.
### `vigilar config validate`
```
sudo -u vigilar /opt/vigilar/venv/bin/vigilar config validate \
-c /etc/vigilar/vigilar.toml
```
Parses and validates the TOML against the Pydantic models and prints
a summary. Exits non-zero if validation fails. Run this after every
edit before restarting the service.
### `vigilar config show`
```
sudo -u vigilar /opt/vigilar/venv/bin/vigilar config show \
-c /etc/vigilar/vigilar.toml
```
Dumps the parsed config as JSON with `web.password_hash`,
`security.pin_hash`, `security.recovery_passphrase_hash`, and
`alerts.webhook.secret` redacted. Useful for confirming which
defaults Pydantic applied for keys you did not set.
### `vigilar config set-password`
```
sudo -u vigilar /opt/vigilar/venv/bin/vigilar config set-password
```
Prompts for a web UI password (hidden, confirmed), derives a scrypt
hash (`n=16384, r=8, p=1`, random 16-byte salt, 32-byte output), and
prints a `password_hash = "salt_hex:key_hex"` line to paste into
`[web]`. It does not write the file.
### `vigilar config set-pin`
```
sudo -u vigilar /opt/vigilar/venv/bin/vigilar config set-pin
```
Prompts for an arm/disarm PIN, derives a salted PBKDF2-SHA256 hash
(600,000 iterations) via `vigilar.alerts.pin.hash_pin`, and prints a
`pin_hash = "pbkdf2_sha256$salt$dk"` line to paste into `[security]`.
Again, no file write. The same hash format is verified identically by
the web arm/disarm endpoint and by `ArmStateFSM` in the event
processor — there is one canonical PIN store.
## Secrets and security
- `/etc/vigilar/secrets/` is `root:root` mode `0700`. The `vigilar`
user cannot list it. Individual files the service needs (for
example `vapid_public.txt`) are readable by group `vigilar`.
- The storage encryption key is `/etc/vigilar/secrets/storage.key`:
32 raw bytes. **If this file is lost, every existing `.vge`
recording becomes unrecoverable.** Back it up separately (and
offline) from your tar archive whenever you take the system into
production.
- Recordings use **AES-256-CTR** (see `vigilar/storage/encryption.py`).
CTR provides confidentiality but no authentication: `.vge` files
are confidential but not tamper-evident. An attacker with write
access to the recordings directory can flip bits in a ciphertext
without detection. If tamper-evidence matters, keep the recordings
volume on integrity-verified storage (dm-integrity, ZFS with
checksums) or mirror to write-once media.
- The web UI password is a scrypt hash set by `vigilar config
set-password` and stored at `[web] password_hash`. The arm/disarm
PIN is a PBKDF2-SHA256 hash (600k iterations, salted) set by
`vigilar config set-pin` and stored at `[security] pin_hash`.
A legacy `[system] arm_pin_hash` field is still parsed but ignored
at runtime; if it's set and `[security] pin_hash` is empty, the
service logs a deprecation warning at startup and arm/disarm will
behave as if no PIN were configured until you re-run `set-pin`.
- TLS: `gen_cert.sh` uses `mkcert` if present, otherwise an `openssl`
ECDSA P-256 self-signed certificate valid for 3650 days with SANs
for `vigilar.local`, `localhost`, `127.0.0.1`, and the detected LAN
IP. It patches `[web] tls_cert`/`tls_key` into the config.
- VAPID: `gen_vapid_keys.sh` writes
`/etc/vigilar/secrets/vapid_private.pem` (mode 0600) and
`/etc/vigilar/secrets/vapid_public.txt` (the browser-side key).
- Firewall stance: the mosquitto broker and NUT daemon bind only to
`127.0.0.1`. The only port Vigilar exposes on the LAN is the web UI
port (default `49735`). Open that port only on the interface that
serves your LAN, and keep WAN exposure behind the WireGuard tunnel
described under `[remote]`.
## UPS and NUT integration
`scripts/setup_nut.sh` installs NUT, attempts to detect a USB UPS
(using `nut-scanner` first, then a short list of vendor IDs as a
fallback), and writes a standalone configuration:
- `/etc/nut/nut.conf` with `MODE=standalone`.
- `/etc/nut/ups.conf` with `[ups] driver=usbhid-ups port=auto` (the
block name `ups` matches the default `[ups] ups_name`).
- `/etc/nut/upsd.conf` with `LISTEN 127.0.0.1 3493` — loopback only.
- `/etc/nut/upsd.users` with a `vigilar` local monitoring user.
- `/etc/nut/upsmon.conf` pointing at `ups@localhost`.
It then enables `nut-driver`, `nut-server`, and `nut-monitor` (or
`upsd`/`upsmon` on distros that ship the old unit names). Test with
`upsc ups@localhost`. The Vigilar UPS subsystem polls this daemon
using the keys under `[ups]`.
## Backups
`scripts/backup.sh` produces
`${VIGILAR_BACKUP_DIR:-/var/vigilar/backups}/vigilar-backup-YYYYMMDD-HHMMSS.tar.gz`
and includes:
- A consistent SQLite snapshot produced with `sqlite3 … .backup` (or
a direct file copy if `sqlite3` is not available), plus any
`-wal`/`-shm` files.
- The entire `/etc/vigilar/` tree (config, secrets, certs).
It does **not** include `/var/vigilar/recordings` or
`/var/vigilar/hls`. Video is assumed to be either expendable or
handled by a separate storage tier.
Environment variables:
- `VIGILAR_BACKUP_DIR` — destination directory (default
`/var/vigilar/backups`).
- `VIGILAR_BACKUP_RETENTION_DAYS` — age in days after which old
archives are pruned; set to `0` to keep forever (default `30`).
The archive is `chmod 0600 root:root` because it contains secrets.
### Scheduling
You can run it from cron, as the script comment suggests
(`0 3 * * * /opt/vigilar/scripts/backup.sh`), or via a dedicated
systemd timer. A minimal pair of units, kept in your local systemd
directory (not in the repo):
```
# /etc/systemd/system/vigilar-backup.service
[Unit]
Description=Vigilar nightly backup
After=vigilar.service
[Service]
Type=oneshot
Environment=VIGILAR_BACKUP_DIR=/srv/backups/vigilar
ExecStart=/opt/vigilar/scripts/backup.sh
```
```
# /etc/systemd/system/vigilar-backup.timer
[Unit]
Description=Run Vigilar backup nightly
[Timer]
OnCalendar=*-*-* 03:00:00
Persistent=true
[Install]
WantedBy=timers.target
```
Enable with `sudo systemctl enable --now vigilar-backup.timer`.
### Restore
1. `sudo systemctl stop vigilar.service`.
2. Extract the archive to a staging directory.
3. Copy `etc/vigilar/` back into `/etc/vigilar/`, preserving
permissions. Double-check `/etc/vigilar/secrets/storage.key` is
`root:root 0600`.
4. Copy the database snapshot to `/var/vigilar/data/vigilar.db` and
remove any stale `vigilar.db-wal`/`vigilar.db-shm` files.
5. `sudo chown -R vigilar:vigilar /var/vigilar/data`.
6. `sudo -u vigilar /opt/vigilar/venv/bin/vigilar config validate
-c /etc/vigilar/vigilar.toml`.
7. `sudo systemctl start vigilar.service` and watch the journal.
## Upgrades
1. `sudo systemctl stop vigilar.service`.
2. `cd /path/to/vigilar && git pull`.
3. `sudo -u vigilar /opt/vigilar/venv/bin/pip install --upgrade .`
4. Diff the shipped `config/vigilar.toml` against `/etc/vigilar/
vigilar.toml` and merge any new keys by hand; Pydantic will reject
unknown keys but is tolerant of missing keys that have defaults.
5. `sudo -u vigilar /opt/vigilar/venv/bin/vigilar config validate
-c /etc/vigilar/vigilar.toml`.
6. `sudo systemctl start vigilar.service`.
**Schema migrations:** there is no migration framework. Vigilar does
not ship Alembic; `vigilar/storage/schema.py` defines the tables
(`cameras`, `sensors`, `sensor_states`, `events`, `recordings`,
`system_events`, `arm_state_log`, `alert_log`, `push_subscriptions`,
`pets`, `pet_sightings`, `wildlife_sightings`, `package_events`,
`pet_training_images`, `pet_rules`, `face_profiles`, `face_embeddings`,
`visits`, `timelapse_schedules`) and new columns are added by code
path at startup or not at all. Take a backup before every upgrade so
you can roll back if a column assumption changes.
## Logs and health
All subsystem output goes to the journal under the `vigilar` syslog
identifier:
```
sudo journalctl -u vigilar.service -f
sudo journalctl -u vigilar.service --since "1 hour ago"
sudo journalctl -u vigilar.service -p warning
```
The alerts subsystem additionally mirrors messages to syslog via the
`vigilar.alerts` logger when `[alerts.local] syslog = true`, which is
the default; the supervisor installs the handler at startup.
Set `[system] log_level = "DEBUG"` (or pass `--log-level DEBUG` to
`vigilar start`) to trace MQTT traffic, motion scoring, and FFmpeg
invocations. Expect a significant volume increase; revert to `INFO`
once you have the evidence you need.
The only HTTP endpoint currently exposing health is
`GET /system/status` on the web UI, which returns a JSON blob with
arm state, camera counts, and sensor counts. The richer health data
(disk percentage, MQTT reachability) is published to the
`vigilar/system/health` MQTT topic by `HealthMonitor` every ten
seconds and is not yet surfaced as a REST endpoint.
## Pruning and disk management
`vigilar/health/monitor.py` runs a disk check every five minutes
against `[system] data_dir` using `shutil.disk_usage`. When usage
crosses `[health] disk_critical_pct` and `[health] auto_prune` is
true, it calls `vigilar.health.pruner.auto_prune`:
- Selects up to 20 unstarred recordings at a time, ordered oldest
first.
- Deletes the file on disk, any thumbnail, and the row from the
`recordings` table.
- Loops until disk usage drops below `[health] auto_prune_target_pct`
or no more candidates exist.
Starred recordings (`recordings.starred = 1`) are never auto-pruned.
Per-camera `retention_days` is enforced separately by the camera
subsystem. There is no hard byte ceiling; the pruner is entirely
percentage-driven. The `[storage] max_disk_usage_gb` and
`[storage] free_space_floor_gb` keys described above are not
consulted by the pruner.
## Remote access
`[remote]` controls the lower-bitrate HLS profile that Vigilar serves
through a WireGuard tunnel. The tunnel itself is not set up by this
project — you are expected to bring your own WireGuard server and
peer configuration. Once the tunnel is up:
- `enabled = true` turns on the remote bridge.
- `tunnel_ip` is the home server's address inside the tunnel (default
`10.99.0.2`), shown in the UI for reference.
- `upload_bandwidth_mbps` caps the advertised upstream.
- `remote_hls_resolution`, `remote_hls_fps`, `remote_hls_bitrate_kbps`
define the transcode profile used when a client connects through
the tunnel instead of the LAN.
- `max_remote_viewers` bounds concurrent remote sessions; set to `0`
for unlimited.
Do not expose port `49735` directly on the WAN; require the tunnel.
## Known limitations
- **Recording integrity is not authenticated.** AES-256-CTR gives you
confidentiality, not tamper-evidence. If an attacker reaches the
recordings directory they can modify ciphertext unnoticed. See the
security section.
- **Camera supervision is asymmetric.** Most subsystems run under
`SubsystemProcess` in `vigilar/main.py`, which polls every two
seconds and applies an exponential backoff up to `max_restarts=10`.
Cameras do not: `CameraManager` in `vigilar/camera/manager.py`
owns its own per-camera child processes outside that supervisor.
A repeatedly crashing camera may thrash differently from, say, a
crashing UPS poller. Watch the journal for per-camera restart
messages independently from the top-level supervisor log.
- **Legacy storage keys.** `[storage] max_disk_usage_gb` and
`[storage] free_space_floor_gb` are editable but do nothing. Use
`[health]` for real disk policy.
- **No schema migrations.** There is no Alembic (or equivalent) in
the tree. Rollbacks rely on your backup discipline.
## Troubleshooting
**Supervisor crash loops.** `journalctl -u vigilar.service` will show
a subsystem crashing and the supervisor attempting to restart it. If
the same subsystem exceeds ten restarts, the supervisor gives up on
that subsystem and logs `exceeded max restarts, giving up`. Fix the
root cause (bad config, missing secret, missing model file for
detection) and restart the unit.
**Mosquitto will not start.** Confirm that
`/etc/mosquitto/conf.d/vigilar.conf` is present and that no other
listener is bound to `127.0.0.1:1883`. Run `sudo systemctl status
mosquitto.service` and `sudo journalctl -u mosquitto.service`. The
Vigilar unit `Requires=mosquitto.service`, so Vigilar will refuse to
start until mosquitto is healthy.
**Camera thrashing.** Because cameras are not under the main
supervisor's backoff, a camera whose RTSP URL is wrong or whose
remote end is rebooting can respawn quickly. Look for repeated
`camera <id>` messages in the journal. Disable the camera in the
config (`enabled = false`) while you fix the upstream, then
re-enable.
**Disk full.** Check `[health] disk_critical_pct` and confirm
`auto_prune` is on. If the partition is already past the target
percentage and nothing is being deleted, there are no unstarred
recordings left to prune — unstar something or lower retention. The
legacy `[storage]` keys will not help here; see the pruning section.
**HLS stalls.** The HLS directory lives at `[system] hls_dir`
(default `/var/vigilar/hls`) and is mounted `ReadWritePath` in the
systemd unit. Stalls usually mean FFmpeg has died on a camera;
check the journal for FFmpeg stderr and verify the RTSP URL is still
reachable from the server with `ffprobe`.
**Config validation fails.** Run `sudo -u vigilar
/opt/vigilar/venv/bin/vigilar config validate -c
/etc/vigilar/vigilar.toml`. Pydantic error messages include the
section, key, and reason. The two common traps are duplicate camera
or sensor IDs (root validator rejects them) and a TOML table that
should be an array of tables (`[cameras]` instead of `[[cameras]]`).
**Forgotten arm PIN.** Run `vigilar config set-pin` to mint a new
hash and paste it in; restart the service. If you also forgot the
recovery passphrase set up through the UI, the web
`/system/api/reset-pin` endpoint cannot help you — fall back to the
CLI.
**Forgotten web password.** Run `vigilar config set-password` and
paste the new hash into `[web] password_hash`, then restart. No
database state needs to change.