fieldwitness/docs/architecture/federation-protocol.md
Aaron D. Lee 6325e86873
Some checks failed
CI / lint (push) Failing after 1m1s
CI / typecheck (push) Failing after 31s
Comprehensive documentation for v0.2.0 release
README.md (700 lines):
- Three-tier deployment model with ASCII diagram
- Federation blueprint in web UI routes
- deploy/ directory in architecture tree
- Documentation index linking all guides

CLAUDE.md (256 lines):
- Updated architecture tree with all new docs and deploy files

New guides:
- docs/federation.md (317 lines) — gossip protocol mechanics, peer
  setup, trust filtering, offline bundles, relay deployment, jurisdiction
- docs/evidence-guide.md (283 lines) — evidence packages, cold archives,
  selective disclosure, chain anchoring, legal discovery workflow
- docs/source-dropbox.md (220 lines) — token management, client-side
  hashing, extract-then-strip pipeline, receipt mechanics, opsec
- docs/index.md — documentation hub linking all guides

Training materials:
- docs/training/reporter-quickstart.md (105 lines) — printable one-page
  card: boot USB, attest photo, encode message, check-in, emergency
- docs/training/emergency-card.md (79 lines) — wallet-sized laminated
  card: three destruction methods, 10-step order, key contacts
- docs/training/admin-reference.md (219 lines) — deployment tiers,
  CLI tables, backup checklist, hardening checklist, troubleshooting

Also includes existing architecture docs from the original repos.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-01 23:31:47 -04:00

18 KiB

Federation Protocol Specification

Status: Design
Version: 1 (protocol version)
Last updated: 2026-04-01

1. Overview

The federation is a network of append-only log servers inspired by Certificate Transparency (RFC 6962). Each server acts as a "blind notary" — it stores encrypted attestation bundles, maintains a Merkle tree over them, and issues signed receipts proving when bundles were received. Servers gossip with peers to ensure consistency and replicate data.

Federation servers never decrypt attestation content. They operate at the "federation member" permission tier: they can verify chain summaries and signatures, but not read the underlying attestation data.

2. Terminology

Term Definition
Bundle An encrypted export bundle (SOOSEFX1 format) containing chain records
STH Signed Tree Head — a server's signed commitment to its current Merkle tree state
Receipt A server-signed proof that a bundle was included in its log at a specific time
Inclusion proof Merkle path from a leaf (bundle hash) to the tree root
Consistency proof Proof that an older tree is a prefix of a newer tree (no entries removed)
Gossip Peer-to-peer exchange of STHs and entries to maintain consistency

3. Server Merkle Tree

3.1 Structure

The server maintains a single append-only Merkle tree. Each leaf is the SHA-256 hash of a received bundle's raw bytes:

leaf[i] = SHA-256(bundle_bytes[i])

Internal nodes follow standard Merkle tree construction:

node = SHA-256(0x01 || left || right)    # internal node
leaf = SHA-256(0x00 || data)             # leaf node (domain separation)

Domain separation prefixes (0x00 for leaves, 0x01 for internal nodes) prevent second-preimage attacks, following CT convention (RFC 6962 §2.1).

3.2 Signed Tree Head (STH)

After each append (or periodically in batch mode), the server computes and signs a new tree head:

{
    0: tree_size,        # uint — number of leaves
    1: root_hash,        # bytes[32] — Merkle tree root
    2: timestamp,        # int — Unix µs, server's clock
    3: server_id,        # text — server identifier (domain or pubkey fingerprint)
    4: server_pubkey,    # bytes[32] — Ed25519 public key
    5: signature,        # bytes[64] — Ed25519(cbor(fields 0-4))
}

The STH is the server's signed commitment: "My tree has N entries with this root at this time." Clients and peers can verify the signature and use consistency proofs to ensure the tree only grows (never shrinks or forks).

3.3 Inclusion Proof

Proves a specific bundle is at index i in a tree of size n:

proof = [(sibling_hash, direction), ...]

Verification:

current = SHA-256(0x00 || bundle_bytes)
for (sibling, direction) in proof:
    if direction == "L":
        current = SHA-256(0x01 || sibling || current)
    else:
        current = SHA-256(0x01 || current || sibling)
assert current == sth.root_hash

3.4 Consistency Proof

Proves that tree of size m is a prefix of tree of size n (where m < n). This guarantees the server hasn't removed or reordered entries.

The proof is a list of intermediate hashes that, combined with the old root, reconstruct the new root. Verification follows RFC 6962 §2.1.2.

4. API Endpoints

All endpoints use CBOR for request/response bodies. Content-Type: application/cbor.

4.1 Submit Bundle

POST /v1/submit

Request body: Raw bundle bytes (application/octet-stream)

Processing:

  1. Verify magic bytes b"SOOSEFX1" and version
  2. Parse chain summary
  3. Verify bundle_sig against signer_pubkey
  4. Compute bundle_hash = SHA-256(0x00 || bundle_bytes)
  5. Check for duplicate (bundle_hash already in tree) — if duplicate, return existing receipt
  6. Append bundle_hash to Merkle tree
  7. Store bundle bytes (encrypted blob, as-is)
  8. Generate and sign receipt

Response (CBOR):

{
    0: bundle_id,          # bytes[16] — from chain summary
    1: bundle_hash,        # bytes[32] — leaf hash
    2: tree_size,          # uint — tree size after inclusion
    3: tree_index,         # uint — leaf index in tree
    4: timestamp,          # int — Unix µs, server's reception time
    5: inclusion_proof,    # array of bytes[32] — Merkle path
    6: sth,                # map — current STH (see §3.2)
    7: server_id,          # text — server identifier
    8: server_pubkey,      # bytes[32] — Ed25519 public key
    9: receipt_sig,        # bytes[64] — Ed25519(cbor(fields 0-8))
}

Auth: Federation member token required.

Errors:

  • 400 — Invalid bundle format, bad signature
  • 401 — Missing or invalid auth token
  • 507 — Server storage full

4.2 Get Signed Tree Head

GET /v1/sth

Response (CBOR): STH map (see §3.2)

Auth: Public (no auth required).

4.3 Get Consistency Proof

GET /v1/consistency-proof?old={m}&new={n}

Parameters:

  • old — previous tree size (must be > 0)
  • new — current tree size (must be >= old)

Response (CBOR):

{
    0: old_size,        # uint
    1: new_size,        # uint
    2: proof,           # array of bytes[32]
}

Auth: Public.

4.4 Get Inclusion Proof

GET /v1/inclusion-proof?hash={hex}&tree_size={n}

Parameters:

  • hash — hex-encoded bundle hash (leaf hash)
  • tree_size — tree size for the proof (use current STH tree_size)

Response (CBOR):

{
    0: tree_index,      # uint — leaf index
    1: tree_size,       # uint
    2: proof,           # array of bytes[32]
}

Auth: Public.

4.5 Get Entries

GET /v1/entries?start={s}&end={e}

Parameters:

  • start — first tree index (inclusive)
  • end — last tree index (inclusive)
  • Maximum range: 1000 entries per request

Response (CBOR):

{
    0: entries,    # array of entry maps (see §4.5.1)
}

4.5.1 Entry Map

{
    0: tree_index,          # uint
    1: bundle_hash,         # bytes[32]
    2: chain_summary,       # CBOR map (from bundle, unencrypted)
    3: encrypted_blob,      # bytes — full SOOSEFX1 bundle
    4: receipt_ts,          # int — Unix µs when received
}

Auth: Federation member token required.

4.6 Audit Summary

GET /v1/audit/summary?bundle_id={hex}

Returns the chain summary for a specific bundle without the encrypted payload.

Response (CBOR):

{
    0: bundle_id,        # bytes[16]
    1: chain_summary,    # CBOR map (from bundle)
    2: tree_index,       # uint
    3: receipt_ts,       # int
    4: inclusion_proof,  # array of bytes[32] (against current STH)
}

Auth: Public.

5. Permission Tiers

5.1 Public Auditor

Access: Unauthenticated.

Endpoints: /v1/sth, /v1/consistency-proof, /v1/inclusion-proof, /v1/audit/summary

Can verify:

  • The log exists and has a specific size at a specific time
  • A specific bundle is included in the log at a specific position
  • The log has not been forked (consistency proofs between STHs)
  • Chain summary metadata (record count, hash range) for any bundle

Cannot see: Encrypted content, chain IDs, signer identities, raw bundles.

5.2 Federation Member

Access: Bearer token issued by server operator. Tokens are Ed25519-signed credentials binding a public key to a set of permissions.

{
    0: token_id,         # bytes[16] — UUID v7
    1: member_pubkey,    # bytes[32] — member's Ed25519 public key
    2: permissions,      # array of text — ["submit", "entries", "gossip"]
    3: issued_at,        # int — Unix µs
    4: expires_at,       # int — Unix µs (0 = no expiry)
    5: issuer_pubkey,    # bytes[32] — server's Ed25519 public key
    6: signature,        # bytes[64] — Ed25519(cbor(fields 0-5))
}

Endpoints: All public endpoints + /v1/submit, /v1/entries, gossip endpoints.

Can see: Everything a public auditor sees + chain IDs, signer public keys, full encrypted bundles (but not decrypted content).

5.3 Authorized Recipient

Not enforced server-side. Recipients hold Ed25519 private keys whose corresponding public keys appear in the bundle's recipients array. They can decrypt bundle content locally after retrieving the encrypted blob via the entries endpoint.

The server has no knowledge of who can or cannot decrypt a given bundle.

6. Gossip Protocol

6.1 Overview

Federation servers maintain a list of known peers. Periodically (default: every 5 minutes), each server initiates gossip with its peers to:

  1. Exchange STHs — detect if any peer has entries the local server doesn't
  2. Verify consistency — ensure no peer is presenting a forked log
  3. Sync entries — pull missing entries from peers that have them

6.2 Gossip Flow

Server A                                Server B
   │                                       │
   │── POST /v1/gossip/sth ──────────────>│   (A sends its STH)
   │                                       │
   │<── response: B's STH ───────────────│    (B responds with its STH)
   │                                       │
   │  (A compares tree sizes)              │
   │  if B.tree_size > A.tree_size:        │
   │                                       │
   │── GET /v1/consistency-proof ────────>│   (verify B's tree extends A's)
   │<── proof ────────────────────────────│
   │                                       │
   │  (verify consistency proof)           │
   │                                       │
   │── GET /v1/entries?start=...&end=... >│   (pull missing entries)
   │<── entries ──────────────────────────│
   │                                       │
   │  (append entries to local tree)       │
   │  (recompute STH)                      │
   │                                       │

6.3 Gossip Endpoints

POST /v1/gossip/sth

Request body (CBOR): Sender's current STH.

Response (CBOR): Receiver's current STH.

Auth: Federation member token with "gossip" permission.

6.4 Fork Detection

If server A receives an STH from server B where:

  • B.tree_size <= A.tree_size but B.root_hash != A.root_hash at the same size

Then B is presenting a different history. This is a fork — a critical security event. The server should:

  1. Log the fork with both STHs as evidence
  2. Alert the operator
  3. Continue serving its own tree (do not merge the forked tree)
  4. Refuse to gossip further with the forked peer until operator resolution

6.5 Convergence

Under normal operation (no forks), servers converge to identical trees. The convergence time depends on gossip interval and network topology. With a 5-minute interval and full mesh topology among N servers, convergence after a new entry takes at most 5 minutes.

7. Receipts

7.1 Purpose

A receipt is the federation's proof that a bundle was received and included in the log at a specific time. It is the critical artifact that closes the timestamp gap: the offline device's claimed timestamp + the federation receipt = practical proof of timing.

7.2 Receipt Format

{
    0: bundle_id,          # bytes[16] — from chain summary
    1: bundle_hash,        # bytes[32] — leaf hash in server's tree
    2: tree_size,          # uint — tree size at inclusion
    3: tree_index,         # uint — leaf position
    4: timestamp,          # int — Unix µs, server's clock
    5: inclusion_proof,    # array of bytes[32] — Merkle path
    6: sth,                # map — STH at time of inclusion
    7: server_id,          # text — server identifier
    8: server_pubkey,      # bytes[32] — Ed25519 public key
    9: receipt_sig,        # bytes[64] — Ed25519(cbor(fields 0-8))
}

7.3 Receipt Verification

To verify a receipt:

  1. Ed25519_Verify(server_pubkey, receipt_sig, cbor(fields 0-8)) — receipt is authentic
  2. Verify inclusion_proof against sth.root_hash with bundle_hash at tree_index
  3. Verify sth.signature — the STH itself is authentic
  4. sth.tree_size >= tree_size — STH covers the inclusion
  5. sth.timestamp >= timestamp — STH is at or after receipt time

7.4 Receipt Lifecycle

1. Loader submits bundle to federation server
2. Server issues receipt in submit response
3. Loader stores receipt locally (receipts/ directory)
4. Loader exports receipts to USB (CBOR file)
5. Offline device imports receipts
6. Receipt is stored alongside chain records as proof of federation timestamp

7.5 Multi-Server Receipts

A bundle submitted to N servers produces N independent receipts. Each receipt is from a different server with a different timestamp and Merkle tree position. Multiple receipts strengthen the timestamp claim — an adversary would need to compromise all N servers to suppress evidence.

8. Storage Tiers

Federation servers manage bundle storage across three tiers based on age:

8.1 Hot Tier (0-30 days)

  • Format: Individual files, one per bundle
  • Location: data/hot/{tree_index}.bundle
  • Access: Direct file read, O(1)
  • Purpose: Fast access for recent entries, active gossip sync

8.2 Warm Tier (30-365 days)

  • Format: Zstd-compressed segments, 1000 bundles per segment
  • Location: data/warm/segment-{start}-{end}.zst
  • Access: Decompress segment, extract entry
  • Compression: Zstd level 3 (fast compression, moderate ratio)
  • Purpose: Reduced storage for medium-term retention

8.3 Cold Tier (>1 year)

  • Format: Zstd-compressed segments, maximum compression
  • Location: data/cold/segment-{start}-{end}.zst
  • Access: Decompress segment, extract entry
  • Compression: Zstd level 19 (slow compression, best ratio)
  • Purpose: Archival storage, rarely accessed

8.4 Tier Promotion

A background compaction process runs periodically (default: every 24 hours):

  1. Identify hot entries older than 30 days
  2. Group into segments of 1000
  3. Compress and write to warm tier
  4. Delete hot files
  5. Repeat for warm → cold at 365 days

8.5 Merkle Tree Preservation

The Merkle tree is independent of storage tiers. Leaf hashes and the tree structure are maintained in a separate data structure (compact tree format, stored in SQLite or flat file). Moving bundles between storage tiers does not affect the tree.

Inclusion proofs and consistency proofs remain valid across tier promotions — they reference the tree, not the storage location.

8.6 Metadata Database

SQLite database tracking all bundles:

CREATE TABLE bundles (
    tree_index     INTEGER PRIMARY KEY,
    bundle_id      BLOB NOT NULL,          -- UUID v7
    bundle_hash    BLOB NOT NULL,          -- leaf hash
    chain_id       BLOB NOT NULL,          -- source chain ID
    signer_pubkey  BLOB NOT NULL,          -- Ed25519 public key
    record_count   INTEGER NOT NULL,       -- records in bundle
    range_start    INTEGER NOT NULL,       -- first chain index
    range_end      INTEGER NOT NULL,       -- last chain index
    receipt_ts     INTEGER NOT NULL,       -- Unix µs reception time
    storage_tier   TEXT NOT NULL DEFAULT 'hot',  -- 'hot', 'warm', 'cold'
    storage_key    TEXT NOT NULL,          -- file path or segment reference
    created_at     TEXT NOT NULL DEFAULT (strftime('%Y-%m-%dT%H:%M:%fZ', 'now'))
);

CREATE INDEX idx_bundles_bundle_id ON bundles(bundle_id);
CREATE INDEX idx_bundles_chain_id ON bundles(chain_id);
CREATE INDEX idx_bundles_bundle_hash ON bundles(bundle_hash);
CREATE INDEX idx_bundles_receipt_ts ON bundles(receipt_ts);

9. Server Configuration

{
    "server_id": "my-server.example.org",
    "host": "0.0.0.0",
    "port": 8443,
    "data_dir": "/var/lib/soosef-federation",
    "identity_key_path": "/etc/soosef-federation/identity/private.pem",
    "peers": [
        {
            "url": "https://peer1.example.org:8443",
            "pubkey_hex": "abc123...",
            "name": "Peer One"
        }
    ],
    "gossip_interval_seconds": 300,
    "hot_retention_days": 30,
    "warm_retention_days": 365,
    "compaction_interval_hours": 24,
    "max_bundle_size_bytes": 10485760,
    "max_entries_per_request": 1000,
    "member_tokens": [
        {
            "name": "loader-1",
            "pubkey_hex": "def456...",
            "permissions": ["submit", "entries"]
        }
    ]
}

10. Error Codes

HTTP Status CBOR Error Code Description
400 "invalid_bundle" Bundle format invalid or signature verification failed
400 "invalid_range" Requested entry range is invalid
401 "unauthorized" Missing or invalid auth token
403 "forbidden" Token lacks required permission
404 "not_found" Bundle or entry not found
409 "duplicate" Bundle already in log (returns existing receipt)
413 "bundle_too_large" Bundle exceeds max_bundle_size_bytes
507 "storage_full" Server cannot accept new entries

Error response format:

{
    0: error_code,     # text
    1: message,        # text — human-readable description
    2: details,        # map — optional additional context
}

11. Security Considerations

11.1 Server Compromise

A compromised server can:

  • Read bundle metadata (chain IDs, signer pubkeys, timestamps) — expected at member tier
  • Withhold entries from gossip — detectable: other servers will see inconsistent tree sizes
  • Present a forked tree — detectable: consistency proofs will fail
  • Issue false receipts — detectable: receipt's inclusion proof won't verify against other servers' STHs

A compromised server cannot:

  • Read attestation content (encrypted with recipient keys)
  • Forge attestation signatures (requires Ed25519 private key)
  • Modify bundle contents (GCM authentication would fail)
  • Reorder or remove entries from other servers' trees

11.2 Transport Security

All server-to-server and client-to-server communication should use TLS 1.3. The federation protocol provides its own authentication (Ed25519 signatures on STHs and receipts), but TLS prevents network-level attacks.

11.3 Clock Reliability

Federation server clocks should be synchronized via NTP. Receipt timestamps are only as reliable as the server's clock. Deploying servers across multiple time zones and operators provides cross-checks — wildly divergent receipt timestamps for the same bundle indicate clock problems or compromise.