# Chain Format Specification **Status**: Design **Version**: 1 (record format version) **Last updated**: 2026-04-01 ## 1. Overview The attestation chain is an append-only sequence of signed records stored locally on the offline device. Each record includes a hash of the previous record, forming a tamper-evident chain analogous to git commits or blockchain blocks. The chain wraps existing Verisoo attestation records. A Verisoo record's serialized bytes become the input to `content_hash`, preserving the original attestation while adding ordering, entropy witnesses, and chain integrity guarantees. ## 2. AttestationChainRecord ### Field Definitions | Field | CBOR Key | Type | Size | Description | |---|---|---|---|---| | `version` | 0 | unsigned int | 1 byte | Record format version. Currently `1`. | | `record_id` | 1 | byte string | 16 bytes | UUID v7 (RFC 9562). Time-ordered unique identifier. | | `chain_index` | 2 | unsigned int | 8 bytes max | Monotonically increasing, 0-based. Genesis record is index 0. | | `prev_hash` | 3 | byte string | 32 bytes | SHA-256 of `canonical_bytes(previous_record)`. Genesis: `0x00 * 32`. | | `content_hash` | 4 | byte string | 32 bytes | SHA-256 of the wrapped content (e.g., Verisoo record bytes). | | `content_type` | 5 | text string | variable | MIME-like type identifier. `"verisoo/attestation-v1"` for Verisoo records. | | `metadata` | 6 | CBOR map | variable | Extensible key-value map. See §2.1. | | `claimed_ts` | 7 | integer | 8 bytes max | Unix timestamp in microseconds (µs). Signed integer to handle pre-epoch dates. | | `entropy_witnesses` | 8 | CBOR map | variable | System entropy snapshot. See §3. | | `signer_pubkey` | 9 | byte string | 32 bytes | Ed25519 raw public key bytes. | | `signature` | 10 | byte string | 64 bytes | Ed25519 signature over `canonical_bytes(record)` excluding the signature field. | ### 2.1 Metadata Map The `metadata` field is an open CBOR map with text string keys. Defined keys: | Key | Type | Description | |---|---|---| | `"backfilled"` | bool | `true` if this record was created by the backfill migration | | `"caption"` | text | Human-readable description of the attested content | | `"location"` | text | Location name associated with the attestation | | `"original_ts"` | integer | Original Verisoo timestamp (µs) if different from `claimed_ts` | | `"tags"` | array of text | User-defined classification tags | Applications may add custom keys. Unknown keys must be preserved during serialization. ## 3. Entropy Witnesses Entropy witnesses are system-state snapshots collected at record creation time. They serve as soft evidence that the claimed timestamp is plausible. Fabricating convincing witnesses for a backdated record requires simulating the full system state at the claimed time. | Field | CBOR Key | Type | Source | Fallback (non-Linux) | |---|---|---|---|---| | `sys_uptime` | 0 | float64 | `time.monotonic()` | Same (cross-platform) | | `fs_snapshot` | 1 | byte string (16 bytes) | SHA-256 of `os.stat()` on chain DB, truncated to 16 bytes | SHA-256 of chain dir stat | | `proc_entropy` | 2 | unsigned int | `/proc/sys/kernel/random/entropy_avail` | `len(os.urandom(32))` (always 32, marker for non-Linux) | | `boot_id` | 3 | text string | `/proc/sys/kernel/random/boot_id` | `uuid.uuid4()` cached per process lifetime | ### Witness Properties - **sys_uptime**: Monotonically increasing within a boot. Cannot decrease. A record with `sys_uptime < previous_record.sys_uptime` and `claimed_ts > previous_record.claimed_ts` is suspicious (reboot or clock manipulation). - **fs_snapshot**: Changes with every write to the chain DB. Hash includes mtime, ctime, size, and inode number. - **proc_entropy**: Varies naturally. On Linux, reflects kernel entropy pool state. - **boot_id**: Changes on every reboot. Identical `boot_id` across records implies same boot session — combined with `sys_uptime`, this constrains the timeline. ## 4. Serialization ### 4.1 Canonical Bytes `canonical_bytes(record)` produces the deterministic byte representation used for hashing and signing. It is a CBOR map containing all fields **except** `signature`, encoded using CBOR canonical encoding (RFC 8949 §4.2): - Map keys sorted by integer value (0, 1, 2, ..., 9) - Integers use minimal-length encoding - No indefinite-length items - No duplicate keys ``` canonical_bytes(record) = cbor2.dumps({ 0: record.version, 1: record.record_id, 2: record.chain_index, 3: record.prev_hash, 4: record.content_hash, 5: record.content_type, 6: record.metadata, 7: record.claimed_ts, 8: { 0: record.entropy_witnesses.sys_uptime, 1: record.entropy_witnesses.fs_snapshot, 2: record.entropy_witnesses.proc_entropy, 3: record.entropy_witnesses.boot_id, }, 9: record.signer_pubkey, }, canonical=True) ``` ### 4.2 Record Hash ``` compute_record_hash(record) = SHA-256(canonical_bytes(record)) ``` This hash is used as `prev_hash` in the next record and as Merkle tree leaves in export bundles. ### 4.3 Signature ``` record.signature = Ed25519_Sign(private_key, canonical_bytes(record)) ``` Verification: ``` Ed25519_Verify(record.signer_pubkey, record.signature, canonical_bytes(record)) ``` ### 4.4 Full Serialization `serialize_record(record)` produces the full CBOR encoding including the signature field (CBOR key 10). This is used for storage and transmission. ``` serialize_record(record) = cbor2.dumps({ 0: record.version, 1: record.record_id, ... 9: record.signer_pubkey, 10: record.signature, }, canonical=True) ``` ## 5. Chain Rules ### 5.1 Genesis Record The first record in a chain (index 0) has: - `chain_index = 0` - `prev_hash = b'\x00' * 32` (32 zero bytes) The **chain ID** is defined as `SHA-256(canonical_bytes(genesis_record))`. This permanently identifies the chain. ### 5.2 Append Rule For record N (where N > 0): ``` record_N.chain_index == record_{N-1}.chain_index + 1 record_N.prev_hash == compute_record_hash(record_{N-1}) record_N.claimed_ts >= record_{N-1}.claimed_ts (SHOULD, not MUST — clock skew possible) ``` ### 5.3 Verification Full chain verification checks, for each record from index 0 to head: 1. `Ed25519_Verify(record.signer_pubkey, record.signature, canonical_bytes(record))` — signature valid 2. `record.chain_index == expected_index` — no gaps or duplicates 3. `record.prev_hash == compute_record_hash(previous_record)` — chain link intact 4. All `signer_pubkey` values are identical within a chain (single-signer chain) Violation of rule 4 indicates a chain was signed by multiple identities, which may be legitimate (key rotation) or malicious (chain hijacking). Key rotation is out of scope for v1; implementations should flag this as a warning. ## 6. Storage Format ### 6.1 chain.bin (Append-Only Log) Records are stored sequentially as length-prefixed CBOR: ``` ┌─────────────────────────────┐ │ uint32 BE: record_0 length │ │ bytes: serialize(record_0) │ ├─────────────────────────────┤ │ uint32 BE: record_1 length │ │ bytes: serialize(record_1) │ ├─────────────────────────────┤ │ ... │ └─────────────────────────────┘ ``` - Length prefix is 4 bytes, big-endian unsigned 32-bit integer - Maximum record size: 4 GiB (practical limit much smaller) - File is append-only; records are never modified or deleted - File locking via `fcntl.flock(LOCK_EX)` for single-writer safety ### 6.2 state.cbor (Chain State Checkpoint) A single CBOR map, atomically rewritten after each append: ```cbor { "chain_id": bytes[32], # SHA-256(canonical_bytes(genesis)) "head_index": uint, # Index of the most recent record "head_hash": bytes[32], # Hash of the most recent record "record_count": uint, # Total records in chain "created_at": int, # Unix µs when chain was created "last_append_at": int # Unix µs of last append } ``` This file is a performance optimization — the canonical state is always derivable from `chain.bin`. On corruption, `state.cbor` is rebuilt by scanning the log. ### 6.3 File Locations ``` ~/.soosef/chain/ chain.bin Append-only record log state.cbor Chain state checkpoint ``` Paths are defined in `src/soosef/paths.py`. ## 7. Migration from Verisoo-Only Attestations Existing Verisoo attestations in `~/.soosef/attestations/` are not modified. The chain is a parallel structure. Migration is performed by the `soosef chain backfill` command: 1. Iterate all records in Verisoo's `LocalStorage` (ordered by timestamp) 2. For each record, compute `content_hash = SHA-256(record.to_bytes())` 3. Create a chain record with: - `content_type = "verisoo/attestation-v1"` - `claimed_ts` set to the original Verisoo timestamp - `metadata = {"backfilled": true, "original_ts": }` - Entropy witnesses collected at migration time (not original time) 4. Append to chain Backfilled records are distinguishable via the `backfilled` metadata flag. Their entropy witnesses reflect migration time, not original attestation time — this is honest and intentional. ## 8. Content Types The `content_type` field identifies what was hashed into `content_hash`. Defined types: | Content Type | Description | |---|---| | `verisoo/attestation-v1` | Verisoo `AttestationRecord` serialized bytes | | `soosef/raw-file-v1` | Raw file bytes (for non-image attestations, future) | | `soosef/metadata-only-v1` | No file content; metadata-only attestation (future) | New content types may be added without changing the record format version.