fix: unify PIN hashing across CLI, FSM, and web (closes #2) #7

Merged
alee merged 16 commits from fix/issue-2-pin-unification into main 2026-04-05 16:59:51 +00:00
Owner

Closes #2.

Stacking: This PR is stacked on top of #5 (test isolation fixture, fix/issue-3-test-isolation). Merge #5 first; this will auto-rebase onto main.

Problem (recap from the issue)

Three incompatible PIN hashing schemes across two config keys:

Component Hash scheme Config key
Web arm/disarm/reset (via alerts/pin.py) PBKDF2-SHA256, 600k iterations, salted [security] pin_hash
ArmStateFSM (events/state.py) Unsalted SHA-256(pin) [system] arm_pin_hash
CLI set-pin (cli/cmd_config.py) HMAC-SHA256(random, pin) [system] arm_pin_hash

Consequences: vigilar config set-pin output was unusable by any verifier; the web arm endpoint checked the PIN but never transitioned state (stubbed {"ok": true} without calling the FSM); the FSM used trivially brute-forceable unsalted SHA-256.

Resolution

  • One canonical scheme: PBKDF2-SHA256 via alerts/pin (hash_pin / verify_pin).
  • One canonical key: [security] pin_hash.
  • Legacy field deprecated: [system] arm_pin_hash is still parsed (old configs don't fail validation) but ignored at runtime. A WARNING logs at startup if it's set and the new key isn't.
  • Web arm/disarm actually transitions now: POST /system/api/arm publishes a SYSTEM_ARM_REQUEST message on localhost MQTT with {mode, pin, triggered_by: "web"}. The event processor subscribes, ArmStateFSM verifies the PIN via alerts.pin.verify_pin and performs the transition. HTTP response is 202 Accepted with {"ok": true, "mode": ...} (matches the codebase convention used by cameras.py:108 and the apiPost JS helper). Clients poll /system/status for the new state.
  • /system/status no longer a stub: reads the real arm state from arm_state_log via get_current_arm_state. Without this, polling after the new async flow would have been a UX dead-end.
  • Adjacent fix: vigilar config show now redacts security.recovery_passphrase_hash (found during Task 3 code review — was leaking).
  • Reliability: _publish_arm_request guards against silent MQTT connect-timeout failures — paho's connect() logs and returns without raising on handshake timeout, which would have produced 202 responses with no MQTT delivery. We check bus.connected after connect and raise explicitly so the web returns 503.

Security model notes

  • The plaintext PIN travels over the loopback MQTT bus (127.0.0.1, anonymous, no persistence). Same trust boundary as every other internal bus message in Vigilar.
  • The HTTP layer no longer returns 401 on wrong PINs — every arm/disarm attempt becomes 202 and the FSM handles verification/audit. This is standard enumeration-resistance posture.
  • All arm/disarm attempts now get audited by the FSM regardless of PIN validity (wrong PINs were previously rejected at the HTTP layer and never reached the audit log).

Commits (16)

  1. cdd8cf4 feat(constants): add Topics.SYSTEM_ARM_REQUEST
  2. 7fda351 fix(events): ArmStateFSM uses PBKDF2 via alerts.pin
  3. af2ac56 refactor(events): drop verify_pin alias and clarify audit-log comment
  4. 8ffe1a3 fix(cli): set-pin emits PBKDF2 under [security] pin_hash
  5. 94cc206 fix(cli): redact security.recovery_passphrase_hash in show_cmd
  6. c9dd348 feat(config): deprecation warning for [system] arm_pin_hash
  7. 7006078 style(config): move log handle below import block
  8. f4d66dd feat(events): processor handles SYSTEM_ARM_REQUEST over MQTT
  9. c275404 refactor(events): drop forward-ref quote and test triggered_by default
  10. efd5c4a fix(web): arm/disarm actually transition the FSM via MQTT
  11. 9f203d8 fix(web): align arm/disarm 202 response shape with {"ok": true}
  12. 08c689e fix(web): raise on MQTT connect timeout in _publish_arm_request
  13. ac52198 test: end-to-end PIN unification regression guard
  14. d72c04e docs(operator-guide): PIN hashing is unified
  15. 038ab0a fix: address final-review items (status endpoint, docs, tests)

Plus 45afaf8 (plan doc) at the base of the branch.

Test plan

  • ArmStateFSM PBKDF2 verify pathtests/unit/test_events.py::TestArmStateFSM
  • CLI set-pin emits PBKDF2 under [security]tests/unit/test_cli_set_pin.py
  • Deprecation warning fires for legacy keytests/unit/test_config.py::test_deprecation_warning_for_arm_pin_hash
  • No warning on migrated configtests/unit/test_config.py::test_no_deprecation_warning_when_security_pin_hash_set
  • Processor dispatches SYSTEM_ARM_REQUESTtests/unit/test_events.py::TestArmRequestDispatch (happy path, bad mode, default triggered_by)
  • Web arm/disarm publish and return 202tests/unit/test_system_pin.py
  • /system/status reflects real FSM statetests/unit/test_web.py::test_system_status_reflects_fsm_arm_state
  • verify_pin fails closed on malformed hashtests/unit/test_pin.py::test_verify_pin_rejects_malformed_hash
  • End-to-end: CLI output → FSM verifytests/unit/test_pin_unification.py
  • Full suite: 372 passed (was 360 at branch start, +12 new tests)
  • config/vigilar.toml stays clean after pytest (thanks to the session fixture from #5)

Migration note for operators

If you previously ran vigilar config set-pin, you will see a deprecation warning on the next start. Re-run the command and paste the new pin_hash = "pbkdf2_sha256$..." line into [security] (not [system]). The old [system] arm_pin_hash can then be deleted.

Closes #2. **Stacking:** This PR is stacked on top of #5 (test isolation fixture, `fix/issue-3-test-isolation`). Merge #5 first; this will auto-rebase onto main. ## Problem (recap from the issue) Three incompatible PIN hashing schemes across two config keys: | Component | Hash scheme | Config key | |---|---|---| | Web arm/disarm/reset (via alerts/pin.py) | PBKDF2-SHA256, 600k iterations, salted | [security] pin_hash | | ArmStateFSM (events/state.py) | Unsalted SHA-256(pin) | [system] arm_pin_hash | | CLI set-pin (cli/cmd_config.py) | HMAC-SHA256(random, pin) | [system] arm_pin_hash | Consequences: `vigilar config set-pin` output was unusable by any verifier; the web arm endpoint checked the PIN but never transitioned state (stubbed `{"ok": true}` without calling the FSM); the FSM used trivially brute-forceable unsalted SHA-256. ## Resolution - **One canonical scheme:** PBKDF2-SHA256 via `alerts/pin` (`hash_pin` / `verify_pin`). - **One canonical key:** `[security] pin_hash`. - **Legacy field deprecated:** `[system] arm_pin_hash` is still parsed (old configs don't fail validation) but ignored at runtime. A WARNING logs at startup if it's set and the new key isn't. - **Web arm/disarm actually transitions now:** `POST /system/api/arm` publishes a `SYSTEM_ARM_REQUEST` message on localhost MQTT with `{mode, pin, triggered_by: "web"}`. The event processor subscribes, `ArmStateFSM` verifies the PIN via `alerts.pin.verify_pin` and performs the transition. HTTP response is `202 Accepted` with `{"ok": true, "mode": ...}` (matches the codebase convention used by `cameras.py:108` and the `apiPost` JS helper). Clients poll `/system/status` for the new state. - **`/system/status` no longer a stub:** reads the real arm state from `arm_state_log` via `get_current_arm_state`. Without this, polling after the new async flow would have been a UX dead-end. - **Adjacent fix:** `vigilar config show` now redacts `security.recovery_passphrase_hash` (found during Task 3 code review — was leaking). - **Reliability:** `_publish_arm_request` guards against silent MQTT connect-timeout failures — paho's `connect()` logs and returns without raising on handshake timeout, which would have produced 202 responses with no MQTT delivery. We check `bus.connected` after connect and raise explicitly so the web returns 503. ## Security model notes - The plaintext PIN travels over the loopback MQTT bus (127.0.0.1, anonymous, no persistence). Same trust boundary as every other internal bus message in Vigilar. - The HTTP layer no longer returns 401 on wrong PINs — every arm/disarm attempt becomes 202 and the FSM handles verification/audit. This is standard enumeration-resistance posture. - All arm/disarm attempts now get audited by the FSM regardless of PIN validity (wrong PINs were previously rejected at the HTTP layer and never reached the audit log). ## Commits (16) 1. `cdd8cf4` feat(constants): add Topics.SYSTEM_ARM_REQUEST 2. `7fda351` fix(events): ArmStateFSM uses PBKDF2 via alerts.pin 3. `af2ac56` refactor(events): drop verify_pin alias and clarify audit-log comment 4. `8ffe1a3` fix(cli): set-pin emits PBKDF2 under [security] pin_hash 5. `94cc206` fix(cli): redact security.recovery_passphrase_hash in show_cmd 6. `c9dd348` feat(config): deprecation warning for [system] arm_pin_hash 7. `7006078` style(config): move log handle below import block 8. `f4d66dd` feat(events): processor handles SYSTEM_ARM_REQUEST over MQTT 9. `c275404` refactor(events): drop forward-ref quote and test triggered_by default 10. `efd5c4a` fix(web): arm/disarm actually transition the FSM via MQTT 11. `9f203d8` fix(web): align arm/disarm 202 response shape with {"ok": true} 12. `08c689e` fix(web): raise on MQTT connect timeout in _publish_arm_request 13. `ac52198` test: end-to-end PIN unification regression guard 14. `d72c04e` docs(operator-guide): PIN hashing is unified 15. `038ab0a` fix: address final-review items (status endpoint, docs, tests) Plus `45afaf8` (plan doc) at the base of the branch. ## Test plan - **ArmStateFSM PBKDF2 verify path** — `tests/unit/test_events.py::TestArmStateFSM` - **CLI set-pin emits PBKDF2 under [security]** — `tests/unit/test_cli_set_pin.py` - **Deprecation warning fires for legacy key** — `tests/unit/test_config.py::test_deprecation_warning_for_arm_pin_hash` - **No warning on migrated config** — `tests/unit/test_config.py::test_no_deprecation_warning_when_security_pin_hash_set` - **Processor dispatches SYSTEM_ARM_REQUEST** — `tests/unit/test_events.py::TestArmRequestDispatch` (happy path, bad mode, default triggered_by) - **Web arm/disarm publish and return 202** — `tests/unit/test_system_pin.py` - **`/system/status` reflects real FSM state** — `tests/unit/test_web.py::test_system_status_reflects_fsm_arm_state` - **verify_pin fails closed on malformed hash** — `tests/unit/test_pin.py::test_verify_pin_rejects_malformed_hash` - **End-to-end: CLI output → FSM verify** — `tests/unit/test_pin_unification.py` - **Full suite: 372 passed** (was 360 at branch start, +12 new tests) - **`config/vigilar.toml` stays clean after pytest** (thanks to the session fixture from #5) ## Migration note for operators If you previously ran `vigilar config set-pin`, you will see a deprecation warning on the next start. Re-run the command and paste the new `pin_hash = "pbkdf2_sha256$..."` line into `[security]` (not `[system]`). The old `[system] arm_pin_hash` can then be deleted.
alee added 17 commits 2026-04-05 16:28:08 +00:00
Some web endpoint handlers call _save_and_reload(), which resolves the
target path via VIGILAR_CONFIG env var with a fallback to the relative
"config/vigilar.toml". Any test exercising such an endpoint without
setting the env var rewrites the repo's committed config file via a
Pydantic model_dump round-trip, stripping comments and non-default
fields. The culprit discovered was test_reset_pin_correct_passphrase
in tests/unit/test_system_pin.py.

Add an autouse session-scoped fixture in tests/conftest.py that points
VIGILAR_CONFIG at a path inside pytest's session tmp dir so no test
can touch the real file. Restore the previous env var value on teardown.

Fixes #3.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Plan document for issue #2 — the three-way PIN hash mismatch across
CLI, events FSM, and web arm/disarm. Proposes canonicalizing on
PBKDF2-SHA256 via alerts/pin and [security] pin_hash, deprecating
[system] arm_pin_hash, and wiring web arm/disarm through MQTT to the
FSM so the web buttons actually transition state.

Nine tasks, TDD throughout. No code changes in this commit.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Topic for web-originated arm/disarm requests that the event processor
will subscribe to and dispatch to ArmStateFSM.transition. Part of the
PIN unification work (issue #2).
Was: unsalted SHA-256 read from [system] arm_pin_hash.
Now: PBKDF2-SHA256 600k iterations read from [security] pin_hash,
matching the web arm/disarm path and the alerts/pin module.

Also drops the redundant pin re-hash on the arm_state_log audit row
(a fresh PBKDF2 salt made the column valueless for traceability).

Part of issue #2 PIN hashing unification.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Code review feedback on the Task 2 commit (7fda351):
- The 'verify_pin as _verify_pin_hash' alias was unnecessary — the
  method self.verify_pin and the module-level verify_pin do not
  collide (one is accessed via self, the other via the bare name).
  Removing the alias matches how web/blueprints/system.py already
  imports verify_pin and makes the call site read cleanly.
- The comment on the insert_arm_state None argument now explains
  WHY (PBKDF2 salt is fresh per call, so re-hashing is worthless for
  audit correlation) instead of only referencing the issue.

No behavior change. Part of issue #2.
Was: HMAC-SHA256(random, pin) written to [system] arm_pin_hash —
no verifier in the codebase accepted this output.
Now: PBKDF2-SHA256 via alerts.pin.hash_pin written to [security]
pin_hash, matching what the web and FSM paths verify against.

Also fixes show_cmd to redact the new location.
Adjacent secret leak in show_cmd noticed during Task 3 code review.
SecurityConfig has two sensitive fields and the redaction block only
covered pin_hash. vigilar config show would print the recovery
passphrase hash verbatim whenever one was configured.

One-line fix; same redaction pattern as the surrounding secrets.
Part of issue #2.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
If a config still has the legacy [system] arm_pin_hash set but no
[security] pin_hash, load_config logs a WARNING telling the operator
to re-run 'vigilar config set-pin'. The legacy field is still parsed
(so old configs don't fail validation) but ignored at runtime.

Part of issue #2 PIN hashing unification.
Code review follow-up on c9dd348 — the log = logging.getLogger(__name__)
assignment was interleaved between 'import X' and 'from X import Y'
statements. Move it below all imports per standard ordering.

No behavior change.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds _handle_arm_request and a dedicated bus.subscribe on
Topics.SYSTEM_ARM_REQUEST. Payload {mode, pin, triggered_by} is
dispatched to ArmStateFSM.transition, which verifies the PIN via
alerts.pin.verify_pin and performs the state change.

This is the missing link for web /system/api/arm to actually move
the system into an armed state. Part of issue #2.
Code review follow-up on f4d66dd:
- _handle_arm_request signature used "ArmStateFSM" as a string forward
  reference even though the type is imported at module top.
  _handle_event uses the bare form; match it for consistency.
- Add a test asserting that omitting triggered_by in an arm-request
  payload defaults to "unknown". That value feeds the audit log, so
  it deserves explicit regression coverage.

No behavior change.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Was: /system/api/arm verified the PIN against [security] pin_hash and
returned {ok: true} without ever calling the FSM. State never changed.
Now: the endpoint publishes a SYSTEM_ARM_REQUEST message to the local
MQTT broker. The event processor (see previous commit) picks it up,
ArmStateFSM verifies the PIN via alerts.pin.verify_pin and performs
the transition. Response is 202 Accepted; clients poll /system/status
for the new state.

Design: PIN travels over localhost-only MQTT, which matches the
existing trust boundary for the internal bus.
Follow-up to efd5c4a. The plan invented {"accepted": True, ...} for
the new 202 responses, but every other 2xx endpoint in the Flask app
returns {"ok": True, ...} — including cameras.py:108 which is direct
prior art for a 202 with the same convention. The shared JS helper
at static/js/settings.js:54 does 'if (resp.ok && result.ok)' and was
falling into the error branch on our success responses, showing a
bogus "Save failed" toast after every arm/disarm click.

Keep the 202 status. Swap the body key from 'accepted' to 'ok'.
No JS change needed.
Code review on 9f203d8 caught a silent-failure mode: MessageBus.connect
logs and returns without raising when the MQTT handshake times out, so
an overloaded broker would let bus.publish() enqueue into paho's outbox
only to be discarded by the immediate disconnect(). The web endpoint
would return 202 even though the FSM never received the request.

Guard with 'if not bus.connected: raise RuntimeError'. The existing
try/except in arm_system/disarm_system catches the exception and turns
it into a 503 with the same log message as other bus failures.
Describes the canonical [security] pin_hash key, the PBKDF2 format
emitted by 'vigilar config set-pin', and the deprecation warning for
the legacy [system] arm_pin_hash. Drops the three-way mismatch
known-limitation.
Follow-up to the holistic review of the PIN-unification branch:

- /system/status now reads the real arm state from the arm_state_log
  table via get_current_arm_state, instead of returning a hardcoded
  'DISARMED' stub. Without this, polling after the new async 202
  arm/disarm flow was a UX dead-end — clients never saw the state
  change they just requested. DB read failures degrade gracefully.

- Operator guide: correct the claim that 'vigilar config set-pin'
  populates recovery_passphrase_hash. It doesn't. recovery_passphrase
  _hash has no CLI helper today; it must be set manually.

- Tests: add a fail-closed regression for verify_pin on malformed
  stored hashes, and a companion test confirming the deprecation
  warning stays silent on a fully migrated config.

All address specific review comments on the branch; no scope creep.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
alee force-pushed fix/issue-2-pin-unification from 038ab0af12 to 5745388880 2026-04-05 16:58:44 +00:00 Compare
Author
Owner

Ship it.

Ship it.
alee merged commit 5745388880 into main 2026-04-05 16:59:51 +00:00
Sign in to join this conversation.
No Reviewers
No Label
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: alee/vigilar#7