Add visitor recognition design spec (S3)
Local face recognition with visitor profiles, unknown clustering, household presence integration, and privacy-first opt-in model. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
parent
f530f26530
commit
2c72743bec
371
docs/superpowers/specs/2026-04-03-visitor-recognition-design.md
Normal file
371
docs/superpowers/specs/2026-04-03-visitor-recognition-design.md
Normal file
@ -0,0 +1,371 @@
|
|||||||
|
# Visitor Recognition — Design Spec
|
||||||
|
|
||||||
|
## Overview
|
||||||
|
|
||||||
|
Local face recognition with visitor logging, person profiles, and household presence integration. Opt-in only, all data stored locally, no cloud. Uses the `face_recognition` library (dlib-based) for 128-dimensional face embeddings with cosine similarity matching.
|
||||||
|
|
||||||
|
## Privacy Model
|
||||||
|
|
||||||
|
- Face recognition is **OFF by default** (`enabled = false`)
|
||||||
|
- Enabled per-camera in config — only cameras the user explicitly chooses
|
||||||
|
- No faces stored until enabled
|
||||||
|
- All embeddings and crops stored locally (SQLite + filesystem)
|
||||||
|
- Labeling a face requires explicit user action with a consent acknowledgment in the UI
|
||||||
|
- "Forget person" button permanently deletes all embeddings, crops, and visit history for a person
|
||||||
|
- No face detection on interior cameras unless explicitly enabled by the user
|
||||||
|
- No automatic enrollment — faces are detected and clustered, but naming requires human action
|
||||||
|
|
||||||
|
## Face Pipeline
|
||||||
|
|
||||||
|
### Detection Flow
|
||||||
|
|
||||||
|
```
|
||||||
|
Person detected (YOLO, existing) on opt-in camera
|
||||||
|
→ Crop person region from frame
|
||||||
|
→ face_recognition.face_locations(crop) → face bounding boxes
|
||||||
|
→ face_recognition.face_encodings(crop, locations) → 128-d embeddings
|
||||||
|
→ Compare against all labeled embeddings (cosine distance)
|
||||||
|
→ Distance < threshold (0.6): KNOWN_VISITOR event
|
||||||
|
→ Distance >= threshold: compare against unknowns
|
||||||
|
→ Match unknown cluster: add embedding, update visit
|
||||||
|
→ No match: create new unknown profile
|
||||||
|
→ No face found in crop: skip (back of head, obscured, etc.)
|
||||||
|
```
|
||||||
|
|
||||||
|
### New File: `vigilar/detection/face.py`
|
||||||
|
|
||||||
|
`FaceRecognizer` class:
|
||||||
|
|
||||||
|
```python
|
||||||
|
class FaceRecognizer:
|
||||||
|
def __init__(self, match_threshold: float = 0.6):
|
||||||
|
self._threshold = match_threshold
|
||||||
|
self._known_encodings: list[np.ndarray] = []
|
||||||
|
self._known_profile_ids: list[int] = []
|
||||||
|
self.is_loaded = False
|
||||||
|
|
||||||
|
def load_profiles(self, engine: Engine) -> None:
|
||||||
|
"""Load all face embeddings from DB into memory for fast comparison."""
|
||||||
|
|
||||||
|
def identify(self, frame: np.ndarray) -> list[FaceResult]:
|
||||||
|
"""Detect faces in frame, return identification results."""
|
||||||
|
|
||||||
|
def add_encoding(self, profile_id: int, encoding: np.ndarray) -> None:
|
||||||
|
"""Add a new encoding to the in-memory index."""
|
||||||
|
```
|
||||||
|
|
||||||
|
`FaceResult` dataclass:
|
||||||
|
```python
|
||||||
|
@dataclass
|
||||||
|
class FaceResult:
|
||||||
|
profile_id: int | None # None if new unknown
|
||||||
|
name: str | None # None if unknown
|
||||||
|
confidence: float # 1 - distance (higher = more confident)
|
||||||
|
face_crop: np.ndarray # cropped face image
|
||||||
|
bbox: tuple[int, int, int, int] # face location in frame
|
||||||
|
```
|
||||||
|
|
||||||
|
### Integration Points
|
||||||
|
|
||||||
|
**Camera worker:** After YOLO detects a person on an opt-in camera, pass the frame to `FaceRecognizer.identify()`. This runs after the existing person detection, not instead of it. Only runs on cameras listed in `visitors.cameras` config.
|
||||||
|
|
||||||
|
**Event processor:** Handle `KNOWN_VISITOR`, `UNKNOWN_VISITOR`, `VISITOR_DEPARTED` events. Insert/update visit records in DB.
|
||||||
|
|
||||||
|
**Throttling:** Face recognition runs at most once per 5 seconds per camera (more expensive than YOLO, and faces don't change that fast). Reuses the detection throttling pattern from the pet pipeline.
|
||||||
|
|
||||||
|
## Database
|
||||||
|
|
||||||
|
### New Table: `face_profiles`
|
||||||
|
|
||||||
|
| Column | Type | Notes |
|
||||||
|
|--------|------|-------|
|
||||||
|
| id | Integer PK | Autoincrement |
|
||||||
|
| name | String | NULL for unknowns, set when labeled |
|
||||||
|
| is_household | Integer NOT NULL | 1 = linked to presence member, default 0 |
|
||||||
|
| presence_member | String | Name from presence config (nullable) |
|
||||||
|
| primary_photo_path | String | Best face crop for display |
|
||||||
|
| visit_count | Integer NOT NULL | Total visits, default 0 |
|
||||||
|
| first_seen_at | Float NOT NULL | Timestamp |
|
||||||
|
| last_seen_at | Float NOT NULL | Timestamp |
|
||||||
|
| ignored | Integer NOT NULL | 1 = hidden from UI, default 0 |
|
||||||
|
| created_at | Float NOT NULL | Timestamp |
|
||||||
|
|
||||||
|
Index: `idx_face_profiles_name` on (name) where name IS NOT NULL
|
||||||
|
|
||||||
|
### New Table: `face_embeddings`
|
||||||
|
|
||||||
|
| Column | Type | Notes |
|
||||||
|
|--------|------|-------|
|
||||||
|
| id | Integer PK | Autoincrement |
|
||||||
|
| profile_id | Integer NOT NULL | FK to face_profiles |
|
||||||
|
| embedding | Blob NOT NULL | 128 float32 values (512 bytes) |
|
||||||
|
| crop_path | String | Face crop image path |
|
||||||
|
| camera_id | String NOT NULL | Where captured |
|
||||||
|
| captured_at | Float NOT NULL | Timestamp |
|
||||||
|
|
||||||
|
Index: `idx_face_embeddings_profile` on (profile_id)
|
||||||
|
|
||||||
|
### New Table: `visits`
|
||||||
|
|
||||||
|
| Column | Type | Notes |
|
||||||
|
|--------|------|-------|
|
||||||
|
| id | Integer PK | Autoincrement |
|
||||||
|
| profile_id | Integer NOT NULL | FK to face_profiles |
|
||||||
|
| camera_id | String NOT NULL | Which camera |
|
||||||
|
| arrived_at | Float NOT NULL | First detection timestamp |
|
||||||
|
| departed_at | Float | Last detection + timeout (NULL if still present) |
|
||||||
|
| duration_s | Float | Visit duration in seconds |
|
||||||
|
| event_id | Integer | Linked event |
|
||||||
|
|
||||||
|
Index: `idx_visits_profile_ts` on (profile_id, arrived_at DESC)
|
||||||
|
Index: `idx_visits_ts` on (arrived_at DESC)
|
||||||
|
|
||||||
|
## Unknown Face Clustering
|
||||||
|
|
||||||
|
When a face doesn't match any labeled profile:
|
||||||
|
|
||||||
|
1. Compare the embedding against all unknown (unlabeled) profiles' embeddings
|
||||||
|
2. If cosine distance < threshold (0.6) to an existing unknown cluster: add the embedding to that profile, increment visit count, update last_seen
|
||||||
|
3. If no match to any unknown: create a new unknown profile with this as the first embedding
|
||||||
|
|
||||||
|
This automatically groups repeat visitors. "Unknown #4 has visited 3 times" makes labeling easier — the user sees clusters, not individual face crops.
|
||||||
|
|
||||||
|
### Embedding Storage Limit
|
||||||
|
|
||||||
|
Store up to 10 embeddings per profile. Beyond 10, keep the 10 with highest quality (largest face crop area as proxy for quality). This bounds storage and keeps the comparison set manageable.
|
||||||
|
|
||||||
|
## Visit Tracking
|
||||||
|
|
||||||
|
### Visit State Machine
|
||||||
|
|
||||||
|
```
|
||||||
|
No face detected → person detection + face match → ARRIVED
|
||||||
|
→ ongoing detections within departure_timeout_s → PRESENT (extend visit)
|
||||||
|
→ no detection for departure_timeout_s → DEPARTED
|
||||||
|
```
|
||||||
|
|
||||||
|
Managed per-profile per-camera. A person can be "visiting" on multiple cameras simultaneously (they walk between cameras).
|
||||||
|
|
||||||
|
### Departure Detection
|
||||||
|
|
||||||
|
A background timer in the event processor checks every 60 seconds for active visits where `departed_at IS NULL` and `last_seen_at < now - departure_timeout_s`. Marks them as departed and logs `VISITOR_DEPARTED` event.
|
||||||
|
|
||||||
|
## Event Types
|
||||||
|
|
||||||
|
Add to `EventType` enum:
|
||||||
|
|
||||||
|
| Event Type | Severity | When |
|
||||||
|
|-----------|----------|------|
|
||||||
|
| `KNOWN_VISITOR` | INFO | Recognized labeled face detected |
|
||||||
|
| `UNKNOWN_VISITOR` | INFO (or WARNING if visits > threshold) | Unrecognized face detected |
|
||||||
|
| `VISITOR_DEPARTED` | INFO | Visitor no longer detected (logged only, no push) |
|
||||||
|
|
||||||
|
### Alert Integration
|
||||||
|
|
||||||
|
- `KNOWN_VISITOR`: default action `quiet_log` in alert profiles (friends arriving is not an alert). User can override per-profile.
|
||||||
|
- `UNKNOWN_VISITOR`: default action `record_only` for first visit, `push_and_record` when same unknown exceeds `unknown_alert_threshold` (default 3 visits). This surfaces repeat unknowns without spamming on every stranger.
|
||||||
|
- `VISITOR_DEPARTED`: always logged, never pushed.
|
||||||
|
|
||||||
|
New `detection_type` values for alert profile rules: `known_visitor`, `unknown_visitor`.
|
||||||
|
|
||||||
|
## Household Integration
|
||||||
|
|
||||||
|
### Linking Faces to Presence Members
|
||||||
|
|
||||||
|
In the visitor settings UI, users can link a face profile to a presence member:
|
||||||
|
|
||||||
|
```
|
||||||
|
Link profile to: [-- Select --]
|
||||||
|
[Aaron]
|
||||||
|
[Other family member]
|
||||||
|
```
|
||||||
|
|
||||||
|
When linked:
|
||||||
|
- `face_profiles.is_household = 1`
|
||||||
|
- `face_profiles.presence_member = "Aaron"`
|
||||||
|
- The visitor dashboard shows face + phone presence side by side
|
||||||
|
- Presence events gain supplementary face confirmation data
|
||||||
|
|
||||||
|
### Dual Confirmation Display
|
||||||
|
|
||||||
|
On the visitor dashboard, household members show:
|
||||||
|
|
||||||
|
```
|
||||||
|
Aaron
|
||||||
|
📱 Phone: HOME since 2:12 PM
|
||||||
|
📷 Face: Confirmed Front Entrance 2:14 PM
|
||||||
|
```
|
||||||
|
|
||||||
|
This is display-only — face confirmation does NOT change the presence logic. Phone ping remains authoritative for arm/disarm decisions. Face is supplementary confirmation that's useful for the audit log and daily digest.
|
||||||
|
|
||||||
|
### Face-Enhanced Presence Events
|
||||||
|
|
||||||
|
When a household member's face is detected:
|
||||||
|
- The `PET_DETECTED`-style event includes `face_confirmed: true` in the payload
|
||||||
|
- The daily digest can report: "Aaron arrived at 2:12 PM (face confirmed)"
|
||||||
|
- No behavioral change — just richer data in the event stream
|
||||||
|
|
||||||
|
## Configuration
|
||||||
|
|
||||||
|
```toml
|
||||||
|
[visitors]
|
||||||
|
enabled = false # master switch, off by default
|
||||||
|
match_threshold = 0.6 # cosine distance (lower = stricter)
|
||||||
|
cameras = [] # opt-in cameras (empty = none, even if enabled)
|
||||||
|
unknown_alert_threshold = 3 # alert after N visits from same unknown
|
||||||
|
departure_timeout_s = 300 # 5 min no detection = departed
|
||||||
|
max_embeddings_per_profile = 10 # cap storage per person
|
||||||
|
face_crop_dir = "/var/vigilar/faces" # where face crops are stored
|
||||||
|
```
|
||||||
|
|
||||||
|
Config model:
|
||||||
|
|
||||||
|
```python
|
||||||
|
class VisitorsConfig(BaseModel):
|
||||||
|
enabled: bool = False
|
||||||
|
match_threshold: float = 0.6
|
||||||
|
cameras: list[str] = Field(default_factory=list)
|
||||||
|
unknown_alert_threshold: int = 3
|
||||||
|
departure_timeout_s: int = 300
|
||||||
|
max_embeddings_per_profile: int = 10
|
||||||
|
face_crop_dir: str = "/var/vigilar/faces"
|
||||||
|
```
|
||||||
|
|
||||||
|
Add to `VigilarConfig` as `visitors: VisitorsConfig`.
|
||||||
|
|
||||||
|
## Web Blueprint
|
||||||
|
|
||||||
|
### New File: `vigilar/web/blueprints/visitors.py`
|
||||||
|
|
||||||
|
Routes:
|
||||||
|
|
||||||
|
| Route | Method | Purpose |
|
||||||
|
|-------|--------|---------|
|
||||||
|
| `/visitors/` | GET | Visitor dashboard page |
|
||||||
|
| `/visitors/api/profiles` | GET | All profiles with visit stats (filterable: known/unknown/household/ignored) |
|
||||||
|
| `/visitors/api/visits` | GET | Visit log (filterable by profile_id, camera_id, date range, limit with 500 cap) |
|
||||||
|
| `/visitors/<id>` | GET | Profile detail page (visit history, face crops, stats) |
|
||||||
|
| `/visitors/<id>/label` | POST | Label an unknown: `{name, consent: true}`. Consent required. |
|
||||||
|
| `/visitors/<id>/link` | POST | Link to household member: `{presence_member}` |
|
||||||
|
| `/visitors/<id>/unlink` | POST | Unlink from household member |
|
||||||
|
| `/visitors/<id>/forget` | DELETE | Permanently delete profile, all embeddings, all crops, all visits |
|
||||||
|
| `/visitors/<id>/ignore` | POST | Hide from UI (set ignored=1), keep data |
|
||||||
|
| `/visitors/<id>/unignore` | POST | Unhide (set ignored=0) |
|
||||||
|
|
||||||
|
### Label Endpoint Details
|
||||||
|
|
||||||
|
`POST /visitors/<id>/label`:
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"name": "Bob",
|
||||||
|
"consent": true
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
- `consent` must be `true` or request is rejected with 400
|
||||||
|
- Updates `face_profiles.name`
|
||||||
|
- Returns the updated profile
|
||||||
|
|
||||||
|
### Forget Endpoint Details
|
||||||
|
|
||||||
|
`DELETE /visitors/<id>/forget`:
|
||||||
|
|
||||||
|
1. Delete all rows from `face_embeddings` where `profile_id = id`
|
||||||
|
2. Delete all rows from `visits` where `profile_id = id`
|
||||||
|
3. Delete all face crop image files for this profile
|
||||||
|
4. Delete the `face_profiles` row
|
||||||
|
5. Remove from in-memory FaceRecognizer index
|
||||||
|
6. Return `{"status": "forgotten"}`
|
||||||
|
|
||||||
|
## Template
|
||||||
|
|
||||||
|
### New File: `vigilar/web/templates/visitors/dashboard.html`
|
||||||
|
|
||||||
|
Bootstrap 5 dark theme. Sections:
|
||||||
|
|
||||||
|
**1. Household Members**
|
||||||
|
- Cards for each linked profile
|
||||||
|
- Face thumbnail + name
|
||||||
|
- Phone presence status (HOME/AWAY with timestamp)
|
||||||
|
- Face confirmation status (last seen camera + time)
|
||||||
|
- Visit count + last visit
|
||||||
|
|
||||||
|
**2. Known Visitors**
|
||||||
|
- Grid of labeled profiles sorted by last visit
|
||||||
|
- Face thumbnail + name + visit count + last seen
|
||||||
|
- Click for profile detail page
|
||||||
|
|
||||||
|
**3. Unknown Visitors**
|
||||||
|
- Grid of unlabeled clusters sorted by visit count (descending)
|
||||||
|
- Face thumbnail + "Unknown #N" + visit count + cameras seen on
|
||||||
|
- "Label" button (opens inline form with name input + consent checkbox)
|
||||||
|
- "Ignore" button (dims the card, moves to bottom)
|
||||||
|
- "Forget" button (confirmation dialog, then permanent delete)
|
||||||
|
|
||||||
|
**4. Recent Visits Log**
|
||||||
|
- Chronological table: face thumbnail, name (or Unknown #N), camera, arrived, departed, duration
|
||||||
|
- Link to recording for each visit
|
||||||
|
- Paginated, loaded via fetch()
|
||||||
|
|
||||||
|
### Profile Detail Page: `vigilar/web/templates/visitors/profile.html`
|
||||||
|
|
||||||
|
- Large face photo + name + edit button
|
||||||
|
- Stats: total visits, first seen, last seen, most common camera, usual time of day
|
||||||
|
- Visit history table
|
||||||
|
- Face crops gallery (all stored embeddings' crops)
|
||||||
|
- Household link controls (if applicable)
|
||||||
|
- "Forget this person" danger button
|
||||||
|
|
||||||
|
## File Storage
|
||||||
|
|
||||||
|
```
|
||||||
|
/var/vigilar/faces/
|
||||||
|
{profile_id}/
|
||||||
|
primary.jpg — best face crop (display photo)
|
||||||
|
embed_001.jpg — face crop for embedding 1
|
||||||
|
embed_002.jpg — face crop for embedding 2
|
||||||
|
...
|
||||||
|
```
|
||||||
|
|
||||||
|
## Dependencies
|
||||||
|
|
||||||
|
### New Packages
|
||||||
|
|
||||||
|
- `face_recognition >= 1.3.0` — face detection + 128-d embedding computation (depends on dlib)
|
||||||
|
- `dlib >= 19.24` — required by face_recognition (may need cmake for build)
|
||||||
|
|
||||||
|
Note: dlib compilation requires cmake and a C++ compiler. Document in installation instructions. Pre-built wheels are available for most platforms.
|
||||||
|
|
||||||
|
### New Files
|
||||||
|
|
||||||
|
| File | Purpose |
|
||||||
|
|------|---------|
|
||||||
|
| `vigilar/detection/face.py` | FaceRecognizer, FaceResult |
|
||||||
|
| `vigilar/web/blueprints/visitors.py` | Visitors web blueprint |
|
||||||
|
| `vigilar/web/templates/visitors/dashboard.html` | Visitor dashboard |
|
||||||
|
| `vigilar/web/templates/visitors/profile.html` | Profile detail page |
|
||||||
|
|
||||||
|
### Modified Files
|
||||||
|
|
||||||
|
| File | Changes |
|
||||||
|
|------|---------|
|
||||||
|
| `vigilar/constants.py` | Add KNOWN_VISITOR, UNKNOWN_VISITOR, VISITOR_DEPARTED to EventType |
|
||||||
|
| `vigilar/config.py` | Add VisitorsConfig model |
|
||||||
|
| `vigilar/storage/schema.py` | Add face_profiles, face_embeddings, visits tables |
|
||||||
|
| `vigilar/storage/queries.py` | Add face/visit CRUD + query functions |
|
||||||
|
| `vigilar/camera/worker.py` | Add face recognition to person detection path (opt-in cameras) |
|
||||||
|
| `vigilar/events/processor.py` | Handle visitor events, departure timer |
|
||||||
|
| `vigilar/web/app.py` | Register visitors blueprint |
|
||||||
|
| `pyproject.toml` | Add face_recognition dependency |
|
||||||
|
|
||||||
|
## Out of Scope
|
||||||
|
|
||||||
|
- Face recognition on interior cameras (can be enabled by user but not default)
|
||||||
|
- Automatic face enrollment without user action
|
||||||
|
- Age/gender/emotion detection
|
||||||
|
- Face recognition on recorded video (live detection only)
|
||||||
|
- Multi-face tracking within a single frame (process faces independently)
|
||||||
|
- Face anti-spoofing (photo/screen detection) — YAGNI for home use
|
||||||
|
- Integration with external face databases
|
||||||
|
- GDPR compliance features (data export, retention policies) — this is a personal home system
|
||||||
Loading…
Reference in New Issue
Block a user