# Security & Docs Polish — Design **Date:** 2026-05-04 **Status:** Proposed **Source:** `docs/superpowers/reviews/2026-05-04-architecture-review.md` (P1.1, P1.7, P1.8) **Effort estimate:** S (whole-PR; phase totals below) ## Summary This is the one-day security and documentation PR that ships first in the v0.5.x architecture-followup train, independent of Plan B (CLI restructure) and Plan C (extension restructure). It closes the single defense-in-depth crypto gap the review flagged (`SessionHandle` has no `impl Drop`, so wasm-bindgen's `.free()` is a cleanup no-op while the master key sits in WASM linear memory), removes the JS-side error swallow that was masking that fact, brings the one undocumented security-critical core file (`recovery_qr.rs`) up to the documentation density of `crypto.rs` / `imgsecret.rs` / `backup.rs` / `tar_safe.rs`, and confirms the relay test/launcher polish items. Together these are mechanical, low-risk changes that fix the only finding in the review where "what the code looks like it does" diverges dangerously from "what it actually does" — which is exactly the wall a tinkerer learning Rust would hit hardest. ## Findings addressed - **P1.1 — `SessionHandle` has no `impl Drop`; `.free()` is a cleanup no-op** — DEV-B (Rust headline) + DEV-C (JS partner finding). Files: `crates/relicario-wasm/src/lib.rs:15-23`, `crates/relicario-wasm/src/session.rs:1-58`, `extension/src/service-worker/session.ts:26`. - **Partner JS swallow** — DEV-C P2 in service-worker. File: `extension/src/service-worker/session.ts:26` (the `try { current.free(); } catch { /* already freed */ }` block). - **P1.7 — `recovery_qr.rs` is undocumented relative to the rest of core** — DEV-A. File: `crates/relicario-core/src/recovery_qr.rs:1-130`. - **P1.8 — `tools/relay/queue.test.ts` assertion update** — DEV-C. File: `tools/relay/queue.test.ts:54`. Confirmed during planning: **already fixed in `061facd`** — line 54 now reads `assert.ok(isRole("dev-c"));` and the negative case `assert.ok(!isRole("dev-d"));` is on line 55. No code work remains; the plan tracks this as a verification step. - **Launcher follow-up — `tools/relay/start.sh` still references three roles** — DEV-C P2. File: `tools/relay/start.sh:30-86`. Confirmed during planning: **not yet fixed.** Both `launch_tmux` and `launch_kitty` open only PM, Dev-A, Dev-B (no Dev-C); the manual-mode banner still says "Open 3 new terminals"; the post-launch summary lines (`:61, :81`) print "3 windows (PM, Dev-A, Dev-B)"; no `*-dev-c-prompt.md` is discovered or used. This phase is real. ## Approach The PR is mechanical and uniform across four phases: - **Adds** one `impl Drop for SessionHandle` block in `crates/relicario-wasm/src/lib.rs`, one Rust test that exercises construct → drop → registry-empty, one module-level `//!` doc-block in `recovery_qr.rs`, one ASCII layout diagram next to the magic constants there, four per-function doc-comments on the public API there, and one `DEV_C_PROMPT` discovery line + a fourth window in each launcher mode of `start.sh`. - **Removes** the `try { current.free(); } catch { /* already freed */ }` swallow at `extension/src/service-worker/session.ts:26`. - **Verifies** every `.free()` callsite under `extension/src/` (the audit deliverable for P1.1's JS side; today the audit itself is small — only one match — but the deliverable is the recorded grep + a check that every code path that holds a `SessionHandle` calls `wasm.lock(handle)` before `free()`). - **Confirms in passing** that `tools/relay/queue.test.ts:54` already matches the new role union (it does — committed in `061facd`). No public API changes on the Rust side beyond the `Drop` impl (which is observably equivalent to "calling `lock(handle)` was previously the only way to clean up"). No JS public API changes. No file moves. No new dependencies (`wasm-bindgen-test` is already in `[dev-dependencies]` for the wasm crate). ## Implementation phases ### Phase 1 — Rust-side `impl Drop for SessionHandle` + test - **Goal:** make wasm-bindgen's auto-generated `.free()` actually clear the master key from WASM linear memory, regardless of whether JS also called `lock(handle)`. - **Changes:** - `crates/relicario-wasm/src/lib.rs` — add `impl Drop for SessionHandle { fn drop(&mut self) { let _ = session::remove(self.0); } }` immediately after the existing `#[wasm_bindgen] impl SessionHandle { ... value() ... }` block (around line 23). Update the doc-comment on `SessionHandle` (line 15) to document the contract: "Dropping (or `.free()` from JS) removes the entry from the session registry and zeroizes the wrapped key/image_secret. `lock(handle)` remains available as the explicit early-cleanup path; `Drop` is the safety net." - `crates/relicario-wasm/src/session.rs` — no functional change required, but add a one-line module-level note above `pub fn remove(...)` explicitly stating that `Drop for SessionHandle` is the second caller of this function alongside `lock()`, so that a future reader removing `lock()` doesn't accidentally orphan the cleanup path. - **Tests:** - Add a `#[wasm_bindgen_test]` (the `wasm-bindgen-test` crate is already in `[dev-dependencies]`, line 25 of `Cargo.toml`) under `crates/relicario-wasm/tests/session_drop.rs`. The test constructs a `SessionHandle` (via the public `unlock` path with synthetic JPEG + tiny `KdfParams` for fast key derivation, mirroring the existing native `session_tests::manifest_round_trip_via_handle` shape), records the underlying handle id via `handle.value()`, drops the handle, then asserts `session::with(id, |_| ()).is_none()`. The `session::with` symbol must be re-exported from `tests/` scope; if it isn't, expose `pub fn with_for_tests(handle: u32) -> bool` on the `session` module guarded by `#[cfg(test)]` so the test does not need to reach into thread-locals. - As a fallback / belt-and-suspenders, add a parallel native `#[test]` under the existing `mod session_tests` in `lib.rs` that constructs a `SessionHandle(session::insert(...))`, drops it explicitly with `drop(handle)`, then asserts `session::with(h, |_| ()).is_none()`. Native tests run on every `cargo test`; the wasm-bindgen-test runs only when someone invokes `wasm-pack test`, so the native variant is the one that catches regressions in CI. - **Effort:** S - **Depends on:** none ### Phase 2 — JS-side `.free()` audit + remove the swallow - **Goal:** make the JS side stop hiding crypto-state-transition errors, and prove (via a recorded audit) that there are no other `.free()` callsites on `SessionHandle` (or on `EncryptedAttachment` / `TotpCode`) that bypass `wasm.lock(handle)` first. - **Changes:** - `extension/src/service-worker/session.ts:24-28` — replace the body of `clearCurrent()` so that `current.free()` is called unconditionally (no `try`/`catch`) and `current` is set to `null` afterwards. The doc-block at the top of the file (lines 1-9) gets one extra paragraph noting that with `impl Drop` on the Rust side (Phase 1), `free()` now removes the registry entry — `lock(handle)` becomes a redundant early-cleanup option, but the SW currently does not call it. Document whether to keep `clearCurrent()` minimal (just `free()`) or add an explicit `wasm.lock(current.value)` immediately before `free()` for symmetry with the CLI's session lifecycle. Recommended: just `free()` post-Phase-1 — calling `lock` first is now belt-and-suspenders that adds a redundant HashMap remove. Document the rationale in the comment. - **Audit deliverable:** run `grep -rn "\.free\b" extension/src/` and record the result in the PR description. Today this returns exactly one match (`extension/src/service-worker/session.ts:26`), confirming the SW is the only surface that calls `.free()` on a wasm-bindgen handle. If a future surface (e.g. an offscreen document, a future setup-flow refactor) adds a second callsite, this PR's grep becomes the baseline that flags the new one. - **Lifecycle audit (no code change required, but record findings in PR description):** confirm every code path that constructs a `SessionHandle` via `unlock(...)` ultimately reaches `clearCurrent()`. The two paths are (a) explicit user lock via the popup `lock` action → `service-worker/router/popup-only.ts` `lock` handler → `clearCurrent()`, and (b) inactivity timer expiry → `service-worker/index.ts:51-58` session-expired callback → `clearCurrent()`. Both are already wired; this audit just records that the wiring is complete. - **Tests:** - Vitest in `extension/`: add a small unit test in `extension/src/service-worker/__tests__/session.test.ts` (create if absent) that asserts `clearCurrent()` is a no-op when `current` is `null`, calls `setCurrent({ free: spy, value: 1 } as unknown as SessionHandle)` then `clearCurrent()`, and asserts `spy` was called exactly once and `getCurrent()` returns `null`. Use the existing `__stubs__/relicario_wasm.stub.ts` pattern for the handle shape if convenient. - Optionally extend the lifecycle test: a second case where `free()` throws — assert the exception now propagates out of `clearCurrent()` (regression coverage for the swallow removal). - **Effort:** S - **Depends on:** Phase 1 (the swallow removal is only safe once the Rust side actually does the cleanup work that `.free()` is now supposed to do) ### Phase 3 — `recovery_qr.rs` documentation pass - **Goal:** raise `crates/relicario-core/src/recovery_qr.rs` to the documentation density of `crypto.rs` / `imgsecret.rs` / `backup.rs` / `tar_safe.rs` so a reader landing in the file does not assume it is out-of-the-documented-zone scratch code. - **Changes:** all in `crates/relicario-core/src/recovery_qr.rs`: - **Module-level `//!` header** at the top of the file (above the `use` block on line 1). Three short paragraphs: (1) *what* the module produces (a 109-byte payload encoded as a QR v40 EcLevel::M SVG that, given the same passphrase, recovers the 32-byte image_secret); (2) *why the format is structured this way* — XChaCha20-Poly1305 wrapping over an Argon2id-derived "wrap key" derived from the recovery passphrase (`b"relicario-recovery-v1\0"` domain-separation prefix prevents the wrap key from ever colliding with a vault master key, even if the user reuses their vault passphrase here); (3) *parameter-pinning rationale* — `production_params()` deliberately re-states `KdfParams::default()`'s values rather than referencing them, because changing the default at the type level must not silently invalidate every recovery QR a user has printed. - **ASCII layout diagram** above the magic-constant block (lines 7-9): ``` // byte field length // ------ -------------- ------ // 0..4 MAGIC = "RREC" 4 // 4..5 VERSION = 0x01 1 // 5..37 kdf_salt 32 (random per QR) // 37..61 wrap_nonce 24 (random per QR) // 61..109 ciphertext 48 (32 image_secret + 16 AEAD tag) // ------------------------------ // total 109 ``` Pair the diagram with a `const _: () = assert!(PAYLOAD_LEN == 4 + 1 + 32 + 24 + 48);` static assertion so a future edit that changes the layout cannot drift from the documented total without a compile error. - **Doc-comments on the four public items:** - `RecoveryQrPayload` (line 11): one-paragraph summary; note that `as_bytes()` is the only accessor and that the bytes are an opaque sealed package — they should travel together (printing them as a QR is the supported transport, but a hex string also works for paranoid sneakernet). - `generate_recovery_qr` (line 45): documents inputs (passphrase, image_secret), output (the 109-byte payload), and the security properties (random kdf_salt + random wrap_nonce per call ⇒ two QRs from the same passphrase + image_secret are different bytes). Cross-reference `recovery_qr_to_svg` for rendering. - `unwrap_recovery_qr` (line 81): documents inputs/output, return-type rationale (`Zeroizing<[u8; 32]>` so the recovered image_secret zeroes when dropped), and the error vocabulary (`RecoveryQr` for format errors, `Decrypt` for AEAD failure / wrong passphrase — the deliberate non-distinguishing rejection from audit M4). - `recovery_qr_to_svg` (line 122): documents the rendering choice (QR v40, EcLevel::M, 140×140 minimum dimension) and the rationale (109 bytes fits comfortably in v40 at M, leaving recovery-error-correction headroom for a smudged print). - **`production_params` (line 32):** add a doc-comment explaining the deliberate divergence-tolerance from `KdfParams::default()` — wording: "Pinned at QR-format authoring time. `KdfParams::default()` may evolve as we re-tune Argon2 cost; this function MUST NOT change values that have ever shipped, because doing so silently invalidates every recovery QR printed under a previous parameter set." Either keep the function-with-comment shape, or per the synthesis suggestion, replace it with a `const RECOVERY_PRODUCTION_PARAMS: KdfParams = KdfParams { argon2_m: 65536, argon2_t: 3, argon2_p: 4 };` if `KdfParams` is `const`-constructible (it is — checked via `crypto.rs`). Either is acceptable; the `const` form is preferred because it makes the "this is not derived from defaults" property visible at the use site. - **Optional micro-cleanups while in the file** (cite explicitly so they aren't surprising in code review): - Replace the hand-counted `payload_bytes[5..37]`, `[37..61]`, `[61..109]` slice indices with named constants (`KDF_SALT_RANGE`, `WRAP_NONCE_RANGE`, `CIPHERTEXT_RANGE`) derived from the layout offsets. Improves diff-readability if the layout ever extends. - Add a one-line comment to `recovery_kdf_input` documenting that the length-prefix on `nfc_bytes` mirrors the same convention as `crypto::derive_master_key` (audit H1 in the design spec) — this is the connective tissue that explains "why is there a `(len as u64).to_be_bytes()` here too?" - **Tests:** - No new test logic needed — the existing `tests/recovery_qr.rs` and the `RecoveryQr`/`Decrypt` round-trips in `tests/integration.rs` already pin behaviour. - The `const _: () = assert!(...)` static assertion is itself the contract test for the layout diagram. - **Effort:** S - **Depends on:** none ### Phase 4 — Relay launcher (queue.test.ts already done in `061facd`) - **Goal:** confirm the test fix that already shipped, and bring the launcher script up to the four-role world. - **Changes:** - `tools/relay/queue.test.ts` — **no change.** Confirmed during planning: line 54 is already `assert.ok(isRole("dev-c"));` and line 55 is already `assert.ok(!isRole("dev-d"));` (committed in `061facd`). This phase records the verification step in the PR description so a reviewer can see the file was checked. - `tools/relay/start.sh` — real changes: - Add a `DEV_C_PROMPT` discovery line at lines 30-33 alongside the existing `PM_PROMPT` / `DEV_A_PROMPT` / `DEV_B_PROMPT` definitions: `DEV_C_PROMPT="$(ls -t "$COORD_DIR"/*-dev-c-prompt.md 2>/dev/null | head -1 || echo "(none found)")"`. - Manual mode (`print_manual_instructions`, lines 35-51) — change "Open 3 new terminals" to "Open 4 new terminals" and add the `Terminal 4 (Dev C): cat '$DEV_C_PROMPT'` line. - tmux mode (`launch_tmux`, lines 53-69) — add a fourth `tmux new-window -t "$SESSION:" -n "dev-c" "cd '$REPO_ROOT' && claude"` line; update the summary print on line 61 from "4 windows: relay, pm, dev-a, dev-b" to "5 windows: relay, pm, dev-a, dev-b, dev-c"; add `Dev C: $DEV_C_PROMPT` to the prompts list (currently lines 64-66). - kitty mode (`launch_kitty`, lines 71-86) — add a fourth `kitty @ launch --type=tab --tab-title "Dev-C" --hold -- bash -l -i -c "cd '$REPO_ROOT' && claude"` line; update the summary print on line 81 from "3 windows (PM, Dev-A, Dev-B)" to "4 windows (PM, Dev-A, Dev-B, Dev-C)"; add `Dev C: $DEV_C_PROMPT` to the prompts list. - **Tests:** none directly. The relay launcher has no automated tests; verification is "run `bash tools/relay/start.sh --manual` and confirm the banner mentions four terminals + the Dev-C prompt path; run `bash tools/relay/start.sh --kitty` on a kitty terminal and confirm a fourth tab opens." Both checks belong in the PR's manual-test plan. - **Effort:** S - **Depends on:** none ## Risks and mitigations - **Adding `impl Drop` could change observable behaviour for any JS code that today holds two references to the same `SessionHandle` and frees one of them.** wasm-bindgen does not in fact let JS hold two refs to the same Rust struct without explicit cloning — the `SessionHandle` is moved into JS at construction and `.free()` consumes the handle pointer. The audit in Phase 2 confirmed only one `.free()` callsite under `extension/src/`. The CLI does not touch the WASM crate at all (Rust-side; the CLI uses `relicario-core` directly). So the realistic blast radius is exactly the one SW callsite this plan also touches. Mitigation: Phase 1's native test explicitly covers "drop the handle, registry is empty," and Phase 2's vitest covers "free is called exactly once." - **Removing the `try { current.free() } catch { /* already freed */ }` swallow could expose latent bugs that were silently tolerated.** Specifically, if any code path today double-frees the handle (calls `clearCurrent()` twice without a `setCurrent()` in between), the catch is currently hiding it. Reading the SW lifecycle: `clearCurrent()` is called from (a) the `lock` message handler, (b) the inactivity-timer session-expired callback, and (c) potentially the unlock-failure cleanup path. The current `if (!current) return;` early-out at line 25 already guards the simplest double-free case (calling `clearCurrent()` twice in a row is safe because the second call no-ops). The only way the catch fires today is if something in WASM raises during `free()` — which post-Phase-1 should be impossible (Drop is infallible; `session::remove` returns a bool, not a Result). Mitigation: the swallow removal happens *after* Phase 1's Drop lands, so the catch should now have nothing to catch. If a regression test surfaces a real bug, the right fix is to find and fix the bug, not re-add the swallow. - **The `recovery_qr.rs` static assertion (`const _: () = assert!(PAYLOAD_LEN == ...);`) requires Rust 1.79+** for `const` panic in test position. The workspace's `rust-toolchain.toml` (if present) or the `Cargo.toml` `rust-version` field needs to be checked; if the repo is on an older toolchain, fall back to a runtime `#[test] fn payload_len_matches_layout() { assert_eq!(PAYLOAD_LEN, 4 + 1 + 32 + 24 + 48); }` — same effect, slightly less elegant. Verify during implementation; not a blocker. - **Phase 4's launcher change is untested by automation.** A typo in the kitty/tmux command lines won't surface until someone runs `start.sh --kitty` or `start.sh --tmux`. Mitigation: the PR description includes an explicit manual-test plan ("run all three modes and confirm 4 windows + Dev-C prompt resolution") and the change pattern is symmetric to the existing Dev-A/Dev-B lines, minimizing typo surface. - **The `wasm-bindgen-test` variant in Phase 1 only runs under `wasm-pack test --node`, which is not part of the project's default `cargo test` cycle.** The native `#[test]` in `mod session_tests` is the one that protects against regressions in CI. The wasm-bindgen-test is included for symmetry and to exercise the actual WASM target's Drop behaviour — it should be invoked at least once during PR review and added to a follow-up CI workflow if the project's test infrastructure ever expands to wasm-pack. ## Out of scope This plan deliberately does not address, and explicitly leaves to other plans or the P2/P3 tail: - Plan B's CLI restructure work in its entirety (P1.2 main.rs split, P1.3 git-shell error UX, P1.10 parser migration to core, plus the CLI P2/P3 entries). These touch `crates/relicario-cli/` and are independent of this PR. - Plan C's extension restructure work in its entirety (P1.4 setup.ts WASM extraction, P1.5 vault.ts split, P1.6 shared/state.ts typing, P1.9 SW router helper extraction, plus the extension P2/P3 entries). These touch `extension/src/` and are independent of this PR — except for the single SW file that Phase 2 of this plan touches, which is small and disjoint from Plan C's surfaces. - The P2 WASM cleanups DEV-B flagged: redundant `need_key + with(...).unwrap()` double-lookup, `Vec` getter clones, the `wasm_*_recovery_qr` naming inconsistency, and the `device.rs` vs `session.rs` concurrency-primitive split. Each is independently correct and worth doing, but each is its own one-paragraph change and the PR is already covering its theme. - DEV-C's other relay P2s: queue TTL/persistence/cap, the `tools/relay/call.py` and `call.ts` tracking decision, `server.ts` per-connection factory comment. None block this PR. - The eight "Open architectural decisions" at the bottom of the synthesis (deliberate-Drop-omission, parser migration timing, bootstrap rule, `Lock` subcommand visibility, Task 12 status, call.py tracking, snake_case-vs-camelCase WASM naming, `.gitea_env_vars` gitignore). Most are user-judgement calls; this PR confirms #1 (the omission was not deliberate; the fix lands here) but takes no position on #2-#8. - Anything touching the in-flight uncommitted v0.5.x work — manifests, glyphs, vault.css, relay tooling beyond `start.sh`. Those changes are on parallel branches and are explicitly hands-off for this PR. ## Done criteria A reviewer can confirm this plan shipped by checking: - [ ] `crates/relicario-wasm/src/lib.rs` contains `impl Drop for SessionHandle` whose body removes the registry entry. - [ ] `crates/relicario-wasm/tests/session_drop.rs` (or equivalent native `#[test]` inside `mod session_tests`) asserts that dropping a `SessionHandle` removes its handle id from the session registry. `cargo test -p relicario-wasm` passes the new test. - [ ] `extension/src/service-worker/session.ts` no longer contains `try { current.free(); } catch`. The body of `clearCurrent()` calls `current.free()` unconditionally, then sets `current = null`. - [ ] `grep -rn "\.free\b" extension/src/` returns exactly one match (the SW callsite). PR description records this audit. - [ ] `extension/src/service-worker/__tests__/session.test.ts` covers `clearCurrent()` with both the no-op (no current handle) and the propagating-throw cases. `npm test` from `extension/` passes. - [ ] `crates/relicario-core/src/recovery_qr.rs` has a module-level `//!` header explaining format + domain-separation + parameter-pinning, an ASCII layout diagram next to the magic constants, doc-comments on `RecoveryQrPayload`, `generate_recovery_qr`, `unwrap_recovery_qr`, and `recovery_qr_to_svg`, plus either a `const RECOVERY_PRODUCTION_PARAMS` or a doc-commented `production_params()` explaining why it deliberately diverges from `KdfParams::default()`. `cargo test -p relicario-core` still passes. - [ ] `tools/relay/queue.test.ts:54-55` verified to already match the new role union (no change required, recorded in PR description). - [ ] `tools/relay/start.sh` discovers a `*-dev-c-prompt.md`, opens a fourth window/terminal in tmux and kitty modes, and the manual-mode banner mentions all four roles. `bash tools/relay/start.sh --manual` shows "Open 4 new terminals" and lists the Dev-C prompt path.