fieldwitness/docs/architecture/chain-format.md
Aaron D. Lee 490f9d4a1d Rebrand SooSeF to FieldWitness
Complete project rebrand for better positioning in the press freedom
and digital security space. FieldWitness communicates both field
deployment and evidence testimony — appropriate for the target audience
of journalists, NGOs, and human rights organizations.

Rename mapping:
- soosef → fieldwitness (package, CLI, all imports)
- soosef.stegasoo → fieldwitness.stego
- soosef.verisoo → fieldwitness.attest
- ~/.soosef/ → ~/.fwmetadata/ (innocuous data dir name)
- SOOSEF_DATA_DIR → FIELDWITNESS_DATA_DIR
- SoosefConfig → FieldWitnessConfig
- SoosefError → FieldWitnessError

Also includes:
- License switch from MIT to GPL-3.0
- C2PA bridge module (Phase 0-2 MVP): cert.py, export.py, vendor_assertions.py
- README repositioned to lead with provenance/federation, stego backgrounded
- Threat model skeleton at docs/security/threat-model.md
- Planning docs: docs/planning/c2pa-integration.md, docs/planning/gtm-feasibility.md

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-02 15:05:13 -04:00

9.7 KiB

Chain Format Specification

Status: Design
Version: 1 (record format version)
Last updated: 2026-04-01

1. Overview

The attestation chain is an append-only sequence of signed records stored locally on the offline device. Each record includes a hash of the previous record, forming a tamper-evident chain analogous to git commits or blockchain blocks.

The chain wraps existing Attest attestation records. A Attest record's serialized bytes become the input to content_hash, preserving the original attestation while adding ordering, entropy witnesses, and chain integrity guarantees.

2. AttestationChainRecord

Field Definitions

Field CBOR Key Type Size Description
version 0 unsigned int 1 byte Record format version. Currently 1.
record_id 1 byte string 16 bytes UUID v7 (RFC 9562). Time-ordered unique identifier.
chain_index 2 unsigned int 8 bytes max Monotonically increasing, 0-based. Genesis record is index 0.
prev_hash 3 byte string 32 bytes SHA-256 of canonical_bytes(previous_record). Genesis: 0x00 * 32.
content_hash 4 byte string 32 bytes SHA-256 of the wrapped content (e.g., Attest record bytes).
content_type 5 text string variable MIME-like type identifier. "attest/attestation-v1" for Attest records.
metadata 6 CBOR map variable Extensible key-value map. See §2.1.
claimed_ts 7 integer 8 bytes max Unix timestamp in microseconds (µs). Signed integer to handle pre-epoch dates.
entropy_witnesses 8 CBOR map variable System entropy snapshot. See §3.
signer_pubkey 9 byte string 32 bytes Ed25519 raw public key bytes.
signature 10 byte string 64 bytes Ed25519 signature over canonical_bytes(record) excluding the signature field.

2.1 Metadata Map

The metadata field is an open CBOR map with text string keys. Defined keys:

Key Type Description
"backfilled" bool true if this record was created by the backfill migration
"caption" text Human-readable description of the attested content
"location" text Location name associated with the attestation
"original_ts" integer Original Attest timestamp (µs) if different from claimed_ts
"tags" array of text User-defined classification tags

Applications may add custom keys. Unknown keys must be preserved during serialization.

3. Entropy Witnesses

Entropy witnesses are system-state snapshots collected at record creation time. They serve as soft evidence that the claimed timestamp is plausible. Fabricating convincing witnesses for a backdated record requires simulating the full system state at the claimed time.

Field CBOR Key Type Source Fallback (non-Linux)
sys_uptime 0 float64 time.monotonic() Same (cross-platform)
fs_snapshot 1 byte string (16 bytes) SHA-256 of os.stat() on chain DB, truncated to 16 bytes SHA-256 of chain dir stat
proc_entropy 2 unsigned int /proc/sys/kernel/random/entropy_avail len(os.urandom(32)) (always 32, marker for non-Linux)
boot_id 3 text string /proc/sys/kernel/random/boot_id uuid.uuid4() cached per process lifetime

Witness Properties

  • sys_uptime: Monotonically increasing within a boot. Cannot decrease. A record with sys_uptime < previous_record.sys_uptime and claimed_ts > previous_record.claimed_ts is suspicious (reboot or clock manipulation).
  • fs_snapshot: Changes with every write to the chain DB. Hash includes mtime, ctime, size, and inode number.
  • proc_entropy: Varies naturally. On Linux, reflects kernel entropy pool state.
  • boot_id: Changes on every reboot. Identical boot_id across records implies same boot session — combined with sys_uptime, this constrains the timeline.

4. Serialization

4.1 Canonical Bytes

canonical_bytes(record) produces the deterministic byte representation used for hashing and signing. It is a CBOR map containing all fields except signature, encoded using CBOR canonical encoding (RFC 8949 §4.2):

  • Map keys sorted by integer value (0, 1, 2, ..., 9)
  • Integers use minimal-length encoding
  • No indefinite-length items
  • No duplicate keys
canonical_bytes(record) = cbor2.dumps({
    0: record.version,
    1: record.record_id,
    2: record.chain_index,
    3: record.prev_hash,
    4: record.content_hash,
    5: record.content_type,
    6: record.metadata,
    7: record.claimed_ts,
    8: {
        0: record.entropy_witnesses.sys_uptime,
        1: record.entropy_witnesses.fs_snapshot,
        2: record.entropy_witnesses.proc_entropy,
        3: record.entropy_witnesses.boot_id,
    },
    9: record.signer_pubkey,
}, canonical=True)

4.2 Record Hash

compute_record_hash(record) = SHA-256(canonical_bytes(record))

This hash is used as prev_hash in the next record and as Merkle tree leaves in export bundles.

4.3 Signature

record.signature = Ed25519_Sign(private_key, canonical_bytes(record))

Verification:

Ed25519_Verify(record.signer_pubkey, record.signature, canonical_bytes(record))

4.4 Full Serialization

serialize_record(record) produces the full CBOR encoding including the signature field (CBOR key 10). This is used for storage and transmission.

serialize_record(record) = cbor2.dumps({
    0: record.version,
    1: record.record_id,
    ...
    9: record.signer_pubkey,
    10: record.signature,
}, canonical=True)

5. Chain Rules

5.1 Genesis Record

The first record in a chain (index 0) has:

  • chain_index = 0
  • prev_hash = b'\x00' * 32 (32 zero bytes)

The chain ID is defined as SHA-256(canonical_bytes(genesis_record)). This permanently identifies the chain.

5.2 Append Rule

For record N (where N > 0):

record_N.chain_index == record_{N-1}.chain_index + 1
record_N.prev_hash == compute_record_hash(record_{N-1})
record_N.claimed_ts >= record_{N-1}.claimed_ts  (SHOULD, not MUST — clock skew possible)

5.3 Verification

Full chain verification checks, for each record from index 0 to head:

  1. Ed25519_Verify(record.signer_pubkey, record.signature, canonical_bytes(record)) — signature valid
  2. record.chain_index == expected_index — no gaps or duplicates
  3. record.prev_hash == compute_record_hash(previous_record) — chain link intact
  4. All signer_pubkey values are identical within a chain (single-signer chain)

Violation of rule 4 indicates a chain was signed by multiple identities, which may be legitimate (key rotation) or malicious (chain hijacking). Key rotation is out of scope for v1; implementations should flag this as a warning.

6. Storage Format

6.1 chain.bin (Append-Only Log)

Records are stored sequentially as length-prefixed CBOR:

┌─────────────────────────────┐
│ uint32 BE: record_0 length  │
│ bytes: serialize(record_0)  │
├─────────────────────────────┤
│ uint32 BE: record_1 length  │
│ bytes: serialize(record_1)  │
├─────────────────────────────┤
│ ...                         │
└─────────────────────────────┘
  • Length prefix is 4 bytes, big-endian unsigned 32-bit integer
  • Maximum record size: 4 GiB (practical limit much smaller)
  • File is append-only; records are never modified or deleted
  • File locking via fcntl.flock(LOCK_EX) for single-writer safety

6.2 state.cbor (Chain State Checkpoint)

A single CBOR map, atomically rewritten after each append:

{
    "chain_id": bytes[32],       # SHA-256(canonical_bytes(genesis))
    "head_index": uint,          # Index of the most recent record
    "head_hash": bytes[32],      # Hash of the most recent record
    "record_count": uint,        # Total records in chain
    "created_at": int,           # Unix µs when chain was created
    "last_append_at": int        # Unix µs of last append
}

This file is a performance optimization — the canonical state is always derivable from chain.bin. On corruption, state.cbor is rebuilt by scanning the log.

6.3 File Locations

~/.fwmetadata/chain/
  chain.bin         Append-only record log
  state.cbor        Chain state checkpoint

Paths are defined in src/fieldwitness/paths.py.

7. Migration from Attest-Only Attestations

Existing Attest attestations in ~/.fwmetadata/attestations/ are not modified. The chain is a parallel structure. Migration is performed by the fieldwitness chain backfill command:

  1. Iterate all records in Attest's LocalStorage (ordered by timestamp)
  2. For each record, compute content_hash = SHA-256(record.to_bytes())
  3. Create a chain record with:
    • content_type = "attest/attestation-v1"
    • claimed_ts set to the original Attest timestamp
    • metadata = {"backfilled": true, "original_ts": <attest_timestamp>}
    • Entropy witnesses collected at migration time (not original time)
  4. Append to chain

Backfilled records are distinguishable via the backfilled metadata flag. Their entropy witnesses reflect migration time, not original attestation time — this is honest and intentional.

8. Content Types

The content_type field identifies what was hashed into content_hash. Defined types:

Content Type Description
attest/attestation-v1 Attest AttestationRecord serialized bytes
fieldwitness/raw-file-v1 Raw file bytes (for non-image attestations, future)
fieldwitness/metadata-only-v1 No file content; metadata-only attestation (future)

New content types may be added without changing the record format version.