fieldwitness/docs/architecture/chain-format.md
Aaron D. Lee 6325e86873
Some checks failed
CI / lint (push) Failing after 1m1s
CI / typecheck (push) Failing after 31s
Comprehensive documentation for v0.2.0 release
README.md (700 lines):
- Three-tier deployment model with ASCII diagram
- Federation blueprint in web UI routes
- deploy/ directory in architecture tree
- Documentation index linking all guides

CLAUDE.md (256 lines):
- Updated architecture tree with all new docs and deploy files

New guides:
- docs/federation.md (317 lines) — gossip protocol mechanics, peer
  setup, trust filtering, offline bundles, relay deployment, jurisdiction
- docs/evidence-guide.md (283 lines) — evidence packages, cold archives,
  selective disclosure, chain anchoring, legal discovery workflow
- docs/source-dropbox.md (220 lines) — token management, client-side
  hashing, extract-then-strip pipeline, receipt mechanics, opsec
- docs/index.md — documentation hub linking all guides

Training materials:
- docs/training/reporter-quickstart.md (105 lines) — printable one-page
  card: boot USB, attest photo, encode message, check-in, emergency
- docs/training/emergency-card.md (79 lines) — wallet-sized laminated
  card: three destruction methods, 10-step order, key contacts
- docs/training/admin-reference.md (219 lines) — deployment tiers,
  CLI tables, backup checklist, hardening checklist, troubleshooting

Also includes existing architecture docs from the original repos.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-01 23:31:47 -04:00

9.7 KiB

Chain Format Specification

Status: Design
Version: 1 (record format version)
Last updated: 2026-04-01

1. Overview

The attestation chain is an append-only sequence of signed records stored locally on the offline device. Each record includes a hash of the previous record, forming a tamper-evident chain analogous to git commits or blockchain blocks.

The chain wraps existing Verisoo attestation records. A Verisoo record's serialized bytes become the input to content_hash, preserving the original attestation while adding ordering, entropy witnesses, and chain integrity guarantees.

2. AttestationChainRecord

Field Definitions

Field CBOR Key Type Size Description
version 0 unsigned int 1 byte Record format version. Currently 1.
record_id 1 byte string 16 bytes UUID v7 (RFC 9562). Time-ordered unique identifier.
chain_index 2 unsigned int 8 bytes max Monotonically increasing, 0-based. Genesis record is index 0.
prev_hash 3 byte string 32 bytes SHA-256 of canonical_bytes(previous_record). Genesis: 0x00 * 32.
content_hash 4 byte string 32 bytes SHA-256 of the wrapped content (e.g., Verisoo record bytes).
content_type 5 text string variable MIME-like type identifier. "verisoo/attestation-v1" for Verisoo records.
metadata 6 CBOR map variable Extensible key-value map. See §2.1.
claimed_ts 7 integer 8 bytes max Unix timestamp in microseconds (µs). Signed integer to handle pre-epoch dates.
entropy_witnesses 8 CBOR map variable System entropy snapshot. See §3.
signer_pubkey 9 byte string 32 bytes Ed25519 raw public key bytes.
signature 10 byte string 64 bytes Ed25519 signature over canonical_bytes(record) excluding the signature field.

2.1 Metadata Map

The metadata field is an open CBOR map with text string keys. Defined keys:

Key Type Description
"backfilled" bool true if this record was created by the backfill migration
"caption" text Human-readable description of the attested content
"location" text Location name associated with the attestation
"original_ts" integer Original Verisoo timestamp (µs) if different from claimed_ts
"tags" array of text User-defined classification tags

Applications may add custom keys. Unknown keys must be preserved during serialization.

3. Entropy Witnesses

Entropy witnesses are system-state snapshots collected at record creation time. They serve as soft evidence that the claimed timestamp is plausible. Fabricating convincing witnesses for a backdated record requires simulating the full system state at the claimed time.

Field CBOR Key Type Source Fallback (non-Linux)
sys_uptime 0 float64 time.monotonic() Same (cross-platform)
fs_snapshot 1 byte string (16 bytes) SHA-256 of os.stat() on chain DB, truncated to 16 bytes SHA-256 of chain dir stat
proc_entropy 2 unsigned int /proc/sys/kernel/random/entropy_avail len(os.urandom(32)) (always 32, marker for non-Linux)
boot_id 3 text string /proc/sys/kernel/random/boot_id uuid.uuid4() cached per process lifetime

Witness Properties

  • sys_uptime: Monotonically increasing within a boot. Cannot decrease. A record with sys_uptime < previous_record.sys_uptime and claimed_ts > previous_record.claimed_ts is suspicious (reboot or clock manipulation).
  • fs_snapshot: Changes with every write to the chain DB. Hash includes mtime, ctime, size, and inode number.
  • proc_entropy: Varies naturally. On Linux, reflects kernel entropy pool state.
  • boot_id: Changes on every reboot. Identical boot_id across records implies same boot session — combined with sys_uptime, this constrains the timeline.

4. Serialization

4.1 Canonical Bytes

canonical_bytes(record) produces the deterministic byte representation used for hashing and signing. It is a CBOR map containing all fields except signature, encoded using CBOR canonical encoding (RFC 8949 §4.2):

  • Map keys sorted by integer value (0, 1, 2, ..., 9)
  • Integers use minimal-length encoding
  • No indefinite-length items
  • No duplicate keys
canonical_bytes(record) = cbor2.dumps({
    0: record.version,
    1: record.record_id,
    2: record.chain_index,
    3: record.prev_hash,
    4: record.content_hash,
    5: record.content_type,
    6: record.metadata,
    7: record.claimed_ts,
    8: {
        0: record.entropy_witnesses.sys_uptime,
        1: record.entropy_witnesses.fs_snapshot,
        2: record.entropy_witnesses.proc_entropy,
        3: record.entropy_witnesses.boot_id,
    },
    9: record.signer_pubkey,
}, canonical=True)

4.2 Record Hash

compute_record_hash(record) = SHA-256(canonical_bytes(record))

This hash is used as prev_hash in the next record and as Merkle tree leaves in export bundles.

4.3 Signature

record.signature = Ed25519_Sign(private_key, canonical_bytes(record))

Verification:

Ed25519_Verify(record.signer_pubkey, record.signature, canonical_bytes(record))

4.4 Full Serialization

serialize_record(record) produces the full CBOR encoding including the signature field (CBOR key 10). This is used for storage and transmission.

serialize_record(record) = cbor2.dumps({
    0: record.version,
    1: record.record_id,
    ...
    9: record.signer_pubkey,
    10: record.signature,
}, canonical=True)

5. Chain Rules

5.1 Genesis Record

The first record in a chain (index 0) has:

  • chain_index = 0
  • prev_hash = b'\x00' * 32 (32 zero bytes)

The chain ID is defined as SHA-256(canonical_bytes(genesis_record)). This permanently identifies the chain.

5.2 Append Rule

For record N (where N > 0):

record_N.chain_index == record_{N-1}.chain_index + 1
record_N.prev_hash == compute_record_hash(record_{N-1})
record_N.claimed_ts >= record_{N-1}.claimed_ts  (SHOULD, not MUST — clock skew possible)

5.3 Verification

Full chain verification checks, for each record from index 0 to head:

  1. Ed25519_Verify(record.signer_pubkey, record.signature, canonical_bytes(record)) — signature valid
  2. record.chain_index == expected_index — no gaps or duplicates
  3. record.prev_hash == compute_record_hash(previous_record) — chain link intact
  4. All signer_pubkey values are identical within a chain (single-signer chain)

Violation of rule 4 indicates a chain was signed by multiple identities, which may be legitimate (key rotation) or malicious (chain hijacking). Key rotation is out of scope for v1; implementations should flag this as a warning.

6. Storage Format

6.1 chain.bin (Append-Only Log)

Records are stored sequentially as length-prefixed CBOR:

┌─────────────────────────────┐
│ uint32 BE: record_0 length  │
│ bytes: serialize(record_0)  │
├─────────────────────────────┤
│ uint32 BE: record_1 length  │
│ bytes: serialize(record_1)  │
├─────────────────────────────┤
│ ...                         │
└─────────────────────────────┘
  • Length prefix is 4 bytes, big-endian unsigned 32-bit integer
  • Maximum record size: 4 GiB (practical limit much smaller)
  • File is append-only; records are never modified or deleted
  • File locking via fcntl.flock(LOCK_EX) for single-writer safety

6.2 state.cbor (Chain State Checkpoint)

A single CBOR map, atomically rewritten after each append:

{
    "chain_id": bytes[32],       # SHA-256(canonical_bytes(genesis))
    "head_index": uint,          # Index of the most recent record
    "head_hash": bytes[32],      # Hash of the most recent record
    "record_count": uint,        # Total records in chain
    "created_at": int,           # Unix µs when chain was created
    "last_append_at": int        # Unix µs of last append
}

This file is a performance optimization — the canonical state is always derivable from chain.bin. On corruption, state.cbor is rebuilt by scanning the log.

6.3 File Locations

~/.soosef/chain/
  chain.bin         Append-only record log
  state.cbor        Chain state checkpoint

Paths are defined in src/soosef/paths.py.

7. Migration from Verisoo-Only Attestations

Existing Verisoo attestations in ~/.soosef/attestations/ are not modified. The chain is a parallel structure. Migration is performed by the soosef chain backfill command:

  1. Iterate all records in Verisoo's LocalStorage (ordered by timestamp)
  2. For each record, compute content_hash = SHA-256(record.to_bytes())
  3. Create a chain record with:
    • content_type = "verisoo/attestation-v1"
    • claimed_ts set to the original Verisoo timestamp
    • metadata = {"backfilled": true, "original_ts": <verisoo_timestamp>}
    • Entropy witnesses collected at migration time (not original time)
  4. Append to chain

Backfilled records are distinguishable via the backfilled metadata flag. Their entropy witnesses reflect migration time, not original attestation time — this is honest and intentional.

8. Content Types

The content_type field identifies what was hashed into content_hash. Defined types:

Content Type Description
verisoo/attestation-v1 Verisoo AttestationRecord serialized bytes
soosef/raw-file-v1 Raw file bytes (for non-image attestations, future)
soosef/metadata-only-v1 No file content; metadata-only attestation (future)

New content types may be added without changing the record format version.