Complete project rebrand for better positioning in the press freedom and digital security space. FieldWitness communicates both field deployment and evidence testimony — appropriate for the target audience of journalists, NGOs, and human rights organizations. Rename mapping: - soosef → fieldwitness (package, CLI, all imports) - soosef.stegasoo → fieldwitness.stego - soosef.verisoo → fieldwitness.attest - ~/.soosef/ → ~/.fwmetadata/ (innocuous data dir name) - SOOSEF_DATA_DIR → FIELDWITNESS_DATA_DIR - SoosefConfig → FieldWitnessConfig - SoosefError → FieldWitnessError Also includes: - License switch from MIT to GPL-3.0 - C2PA bridge module (Phase 0-2 MVP): cert.py, export.py, vendor_assertions.py - README repositioned to lead with provenance/federation, stego backgrounded - Threat model skeleton at docs/security/threat-model.md - Planning docs: docs/planning/c2pa-integration.md, docs/planning/gtm-feasibility.md Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
566 lines
18 KiB
Markdown
566 lines
18 KiB
Markdown
# Federation Protocol Specification
|
|
|
|
**Status**: Design
|
|
**Version**: 1 (protocol version)
|
|
**Last updated**: 2026-04-01
|
|
|
|
## 1. Overview
|
|
|
|
The federation is a network of append-only log servers inspired by Certificate Transparency
|
|
(RFC 6962). Each server acts as a "blind notary" — it stores encrypted attestation bundles,
|
|
maintains a Merkle tree over them, and issues signed receipts proving when bundles were
|
|
received. Servers gossip with peers to ensure consistency and replicate data.
|
|
|
|
Federation servers never decrypt attestation content. They operate at the "federation
|
|
member" permission tier: they can verify chain summaries and signatures, but not read
|
|
the underlying attestation data.
|
|
|
|
## 2. Terminology
|
|
|
|
| Term | Definition |
|
|
|---|---|
|
|
| **Bundle** | An encrypted export bundle (FIELDWITNESSX1 format) containing chain records |
|
|
| **STH** | Signed Tree Head — a server's signed commitment to its current Merkle tree state |
|
|
| **Receipt** | A server-signed proof that a bundle was included in its log at a specific time |
|
|
| **Inclusion proof** | Merkle path from a leaf (bundle hash) to the tree root |
|
|
| **Consistency proof** | Proof that an older tree is a prefix of a newer tree (no entries removed) |
|
|
| **Gossip** | Peer-to-peer exchange of STHs and entries to maintain consistency |
|
|
|
|
## 3. Server Merkle Tree
|
|
|
|
### 3.1 Structure
|
|
|
|
The server maintains a single append-only Merkle tree. Each leaf is the SHA-256 hash
|
|
of a received bundle's raw bytes:
|
|
|
|
```
|
|
leaf[i] = SHA-256(bundle_bytes[i])
|
|
```
|
|
|
|
Internal nodes follow standard Merkle tree construction:
|
|
```
|
|
node = SHA-256(0x01 || left || right) # internal node
|
|
leaf = SHA-256(0x00 || data) # leaf node (domain separation)
|
|
```
|
|
|
|
Domain separation prefixes (`0x00` for leaves, `0x01` for internal nodes) prevent
|
|
second-preimage attacks, following CT convention (RFC 6962 §2.1).
|
|
|
|
### 3.2 Signed Tree Head (STH)
|
|
|
|
After each append (or periodically in batch mode), the server computes and signs a new
|
|
tree head:
|
|
|
|
```cbor
|
|
{
|
|
0: tree_size, # uint — number of leaves
|
|
1: root_hash, # bytes[32] — Merkle tree root
|
|
2: timestamp, # int — Unix µs, server's clock
|
|
3: server_id, # text — server identifier (domain or pubkey fingerprint)
|
|
4: server_pubkey, # bytes[32] — Ed25519 public key
|
|
5: signature, # bytes[64] — Ed25519(cbor(fields 0-4))
|
|
}
|
|
```
|
|
|
|
The STH is the server's signed commitment: "My tree has N entries with this root at this
|
|
time." Clients and peers can verify the signature and use consistency proofs to ensure
|
|
the tree only grows (never shrinks or forks).
|
|
|
|
### 3.3 Inclusion Proof
|
|
|
|
Proves a specific bundle is at index `i` in a tree of size `n`:
|
|
|
|
```
|
|
proof = [(sibling_hash, direction), ...]
|
|
```
|
|
|
|
Verification:
|
|
```
|
|
current = SHA-256(0x00 || bundle_bytes)
|
|
for (sibling, direction) in proof:
|
|
if direction == "L":
|
|
current = SHA-256(0x01 || sibling || current)
|
|
else:
|
|
current = SHA-256(0x01 || current || sibling)
|
|
assert current == sth.root_hash
|
|
```
|
|
|
|
### 3.4 Consistency Proof
|
|
|
|
Proves that tree of size `m` is a prefix of tree of size `n` (where `m < n`). This
|
|
guarantees the server hasn't removed or reordered entries.
|
|
|
|
The proof is a list of intermediate hashes that, combined with the old root, reconstruct
|
|
the new root. Verification follows RFC 6962 §2.1.2.
|
|
|
|
## 4. API Endpoints
|
|
|
|
All endpoints use CBOR for request/response bodies. Content-Type: `application/cbor`.
|
|
|
|
### 4.1 Submit Bundle
|
|
|
|
```
|
|
POST /v1/submit
|
|
```
|
|
|
|
**Request body**: Raw bundle bytes (application/octet-stream)
|
|
|
|
**Processing**:
|
|
1. Verify magic bytes `b"FIELDWITNESSX1"` and version
|
|
2. Parse chain summary
|
|
3. Verify `bundle_sig` against `signer_pubkey`
|
|
4. Compute `bundle_hash = SHA-256(0x00 || bundle_bytes)`
|
|
5. Check for duplicate (`bundle_hash` already in tree) — if duplicate, return existing receipt
|
|
6. Append `bundle_hash` to Merkle tree
|
|
7. Store bundle bytes (encrypted blob, as-is)
|
|
8. Generate and sign receipt
|
|
|
|
**Response** (CBOR):
|
|
```cbor
|
|
{
|
|
0: bundle_id, # bytes[16] — from chain summary
|
|
1: bundle_hash, # bytes[32] — leaf hash
|
|
2: tree_size, # uint — tree size after inclusion
|
|
3: tree_index, # uint — leaf index in tree
|
|
4: timestamp, # int — Unix µs, server's reception time
|
|
5: inclusion_proof, # array of bytes[32] — Merkle path
|
|
6: sth, # map — current STH (see §3.2)
|
|
7: server_id, # text — server identifier
|
|
8: server_pubkey, # bytes[32] — Ed25519 public key
|
|
9: receipt_sig, # bytes[64] — Ed25519(cbor(fields 0-8))
|
|
}
|
|
```
|
|
|
|
**Auth**: Federation member token required.
|
|
|
|
**Errors**:
|
|
- `400` — Invalid bundle format, bad signature
|
|
- `401` — Missing or invalid auth token
|
|
- `507` — Server storage full
|
|
|
|
### 4.2 Get Signed Tree Head
|
|
|
|
```
|
|
GET /v1/sth
|
|
```
|
|
|
|
**Response** (CBOR): STH map (see §3.2)
|
|
|
|
**Auth**: Public (no auth required).
|
|
|
|
### 4.3 Get Consistency Proof
|
|
|
|
```
|
|
GET /v1/consistency-proof?old={m}&new={n}
|
|
```
|
|
|
|
**Parameters**:
|
|
- `old` — previous tree size (must be > 0)
|
|
- `new` — current tree size (must be >= old)
|
|
|
|
**Response** (CBOR):
|
|
```cbor
|
|
{
|
|
0: old_size, # uint
|
|
1: new_size, # uint
|
|
2: proof, # array of bytes[32]
|
|
}
|
|
```
|
|
|
|
**Auth**: Public.
|
|
|
|
### 4.4 Get Inclusion Proof
|
|
|
|
```
|
|
GET /v1/inclusion-proof?hash={hex}&tree_size={n}
|
|
```
|
|
|
|
**Parameters**:
|
|
- `hash` — hex-encoded bundle hash (leaf hash)
|
|
- `tree_size` — tree size for the proof (use current STH tree_size)
|
|
|
|
**Response** (CBOR):
|
|
```cbor
|
|
{
|
|
0: tree_index, # uint — leaf index
|
|
1: tree_size, # uint
|
|
2: proof, # array of bytes[32]
|
|
}
|
|
```
|
|
|
|
**Auth**: Public.
|
|
|
|
### 4.5 Get Entries
|
|
|
|
```
|
|
GET /v1/entries?start={s}&end={e}
|
|
```
|
|
|
|
**Parameters**:
|
|
- `start` — first tree index (inclusive)
|
|
- `end` — last tree index (inclusive)
|
|
- Maximum range: 1000 entries per request
|
|
|
|
**Response** (CBOR):
|
|
```cbor
|
|
{
|
|
0: entries, # array of entry maps (see §4.5.1)
|
|
}
|
|
```
|
|
|
|
#### 4.5.1 Entry Map
|
|
|
|
```cbor
|
|
{
|
|
0: tree_index, # uint
|
|
1: bundle_hash, # bytes[32]
|
|
2: chain_summary, # CBOR map (from bundle, unencrypted)
|
|
3: encrypted_blob, # bytes — full FIELDWITNESSX1 bundle
|
|
4: receipt_ts, # int — Unix µs when received
|
|
}
|
|
```
|
|
|
|
**Auth**: Federation member token required.
|
|
|
|
### 4.6 Audit Summary
|
|
|
|
```
|
|
GET /v1/audit/summary?bundle_id={hex}
|
|
```
|
|
|
|
Returns the chain summary for a specific bundle without the encrypted payload.
|
|
|
|
**Response** (CBOR):
|
|
```cbor
|
|
{
|
|
0: bundle_id, # bytes[16]
|
|
1: chain_summary, # CBOR map (from bundle)
|
|
2: tree_index, # uint
|
|
3: receipt_ts, # int
|
|
4: inclusion_proof, # array of bytes[32] (against current STH)
|
|
}
|
|
```
|
|
|
|
**Auth**: Public.
|
|
|
|
## 5. Permission Tiers
|
|
|
|
### 5.1 Public Auditor
|
|
|
|
**Access**: Unauthenticated.
|
|
|
|
**Endpoints**: `/v1/sth`, `/v1/consistency-proof`, `/v1/inclusion-proof`, `/v1/audit/summary`
|
|
|
|
**Can verify**:
|
|
- The log exists and has a specific size at a specific time
|
|
- A specific bundle is included in the log at a specific position
|
|
- The log has not been forked (consistency proofs between STHs)
|
|
- Chain summary metadata (record count, hash range) for any bundle
|
|
|
|
**Cannot see**: Encrypted content, chain IDs, signer identities, raw bundles.
|
|
|
|
### 5.2 Federation Member
|
|
|
|
**Access**: Bearer token issued by server operator. Tokens are Ed25519-signed
|
|
credentials binding a public key to a set of permissions.
|
|
|
|
```cbor
|
|
{
|
|
0: token_id, # bytes[16] — UUID v7
|
|
1: member_pubkey, # bytes[32] — member's Ed25519 public key
|
|
2: permissions, # array of text — ["submit", "entries", "gossip"]
|
|
3: issued_at, # int — Unix µs
|
|
4: expires_at, # int — Unix µs (0 = no expiry)
|
|
5: issuer_pubkey, # bytes[32] — server's Ed25519 public key
|
|
6: signature, # bytes[64] — Ed25519(cbor(fields 0-5))
|
|
}
|
|
```
|
|
|
|
**Endpoints**: All public endpoints + `/v1/submit`, `/v1/entries`, gossip endpoints.
|
|
|
|
**Can see**: Everything a public auditor sees + chain IDs, signer public keys, full
|
|
encrypted bundles (but not decrypted content).
|
|
|
|
### 5.3 Authorized Recipient
|
|
|
|
Not enforced server-side. Recipients hold Ed25519 private keys whose corresponding
|
|
public keys appear in the bundle's recipients array. They can decrypt bundle content
|
|
locally after retrieving the encrypted blob via the entries endpoint.
|
|
|
|
The server has no knowledge of who can or cannot decrypt a given bundle.
|
|
|
|
## 6. Gossip Protocol
|
|
|
|
### 6.1 Overview
|
|
|
|
Federation servers maintain a list of known peers. Periodically (default: every 5 minutes),
|
|
each server initiates gossip with its peers to:
|
|
|
|
1. Exchange STHs — detect if any peer has entries the local server doesn't
|
|
2. Verify consistency — ensure no peer is presenting a forked log
|
|
3. Sync entries — pull missing entries from peers that have them
|
|
|
|
### 6.2 Gossip Flow
|
|
|
|
```
|
|
Server A Server B
|
|
│ │
|
|
│── POST /v1/gossip/sth ──────────────>│ (A sends its STH)
|
|
│ │
|
|
│<── response: B's STH ───────────────│ (B responds with its STH)
|
|
│ │
|
|
│ (A compares tree sizes) │
|
|
│ if B.tree_size > A.tree_size: │
|
|
│ │
|
|
│── GET /v1/consistency-proof ────────>│ (verify B's tree extends A's)
|
|
│<── proof ────────────────────────────│
|
|
│ │
|
|
│ (verify consistency proof) │
|
|
│ │
|
|
│── GET /v1/entries?start=...&end=... >│ (pull missing entries)
|
|
│<── entries ──────────────────────────│
|
|
│ │
|
|
│ (append entries to local tree) │
|
|
│ (recompute STH) │
|
|
│ │
|
|
```
|
|
|
|
### 6.3 Gossip Endpoints
|
|
|
|
```
|
|
POST /v1/gossip/sth
|
|
```
|
|
|
|
**Request body** (CBOR): Sender's current STH.
|
|
|
|
**Response** (CBOR): Receiver's current STH.
|
|
|
|
**Auth**: Federation member token with `"gossip"` permission.
|
|
|
|
### 6.4 Fork Detection
|
|
|
|
If server A receives an STH from server B where:
|
|
- `B.tree_size <= A.tree_size` but `B.root_hash != A.root_hash` at the same size
|
|
|
|
Then B is presenting a different history. This is a **fork** — a critical security event.
|
|
The server should:
|
|
|
|
1. Log the fork with both STHs as evidence
|
|
2. Alert the operator
|
|
3. Continue serving its own tree (do not merge the forked tree)
|
|
4. Refuse to gossip further with the forked peer until operator resolution
|
|
|
|
### 6.5 Convergence
|
|
|
|
Under normal operation (no forks), servers converge to identical trees. The convergence
|
|
time depends on gossip interval and network topology. With a 5-minute interval and full
|
|
mesh topology among N servers, convergence after a new entry takes at most 5 minutes.
|
|
|
|
## 7. Receipts
|
|
|
|
### 7.1 Purpose
|
|
|
|
A receipt is the federation's proof that a bundle was received and included in the log
|
|
at a specific time. It is the critical artifact that closes the timestamp gap: the
|
|
offline device's claimed timestamp + the federation receipt = practical proof of timing.
|
|
|
|
### 7.2 Receipt Format
|
|
|
|
```cbor
|
|
{
|
|
0: bundle_id, # bytes[16] — from chain summary
|
|
1: bundle_hash, # bytes[32] — leaf hash in server's tree
|
|
2: tree_size, # uint — tree size at inclusion
|
|
3: tree_index, # uint — leaf position
|
|
4: timestamp, # int — Unix µs, server's clock
|
|
5: inclusion_proof, # array of bytes[32] — Merkle path
|
|
6: sth, # map — STH at time of inclusion
|
|
7: server_id, # text — server identifier
|
|
8: server_pubkey, # bytes[32] — Ed25519 public key
|
|
9: receipt_sig, # bytes[64] — Ed25519(cbor(fields 0-8))
|
|
}
|
|
```
|
|
|
|
### 7.3 Receipt Verification
|
|
|
|
To verify a receipt:
|
|
|
|
1. `Ed25519_Verify(server_pubkey, receipt_sig, cbor(fields 0-8))` — receipt is authentic
|
|
2. Verify `inclusion_proof` against `sth.root_hash` with `bundle_hash` at `tree_index`
|
|
3. Verify `sth.signature` — the STH itself is authentic
|
|
4. `sth.tree_size >= tree_size` — STH covers the inclusion
|
|
5. `sth.timestamp >= timestamp` — STH is at or after receipt time
|
|
|
|
### 7.4 Receipt Lifecycle
|
|
|
|
```
|
|
1. Loader submits bundle to federation server
|
|
2. Server issues receipt in submit response
|
|
3. Loader stores receipt locally (receipts/ directory)
|
|
4. Loader exports receipts to USB (CBOR file)
|
|
5. Offline device imports receipts
|
|
6. Receipt is stored alongside chain records as proof of federation timestamp
|
|
```
|
|
|
|
### 7.5 Multi-Server Receipts
|
|
|
|
A bundle submitted to N servers produces N independent receipts. Each receipt is from a
|
|
different server with a different timestamp and Merkle tree position. Multiple receipts
|
|
strengthen the timestamp claim — an adversary would need to compromise all N servers to
|
|
suppress evidence.
|
|
|
|
## 8. Storage Tiers
|
|
|
|
Federation servers manage bundle storage across three tiers based on age:
|
|
|
|
### 8.1 Hot Tier (0-30 days)
|
|
|
|
- **Format**: Individual files, one per bundle
|
|
- **Location**: `data/hot/{tree_index}.bundle`
|
|
- **Access**: Direct file read, O(1)
|
|
- **Purpose**: Fast access for recent entries, active gossip sync
|
|
|
|
### 8.2 Warm Tier (30-365 days)
|
|
|
|
- **Format**: Zstd-compressed segments, 1000 bundles per segment
|
|
- **Location**: `data/warm/segment-{start}-{end}.zst`
|
|
- **Access**: Decompress segment, extract entry
|
|
- **Compression**: Zstd level 3 (fast compression, moderate ratio)
|
|
- **Purpose**: Reduced storage for medium-term retention
|
|
|
|
### 8.3 Cold Tier (>1 year)
|
|
|
|
- **Format**: Zstd-compressed segments, maximum compression
|
|
- **Location**: `data/cold/segment-{start}-{end}.zst`
|
|
- **Access**: Decompress segment, extract entry
|
|
- **Compression**: Zstd level 19 (slow compression, best ratio)
|
|
- **Purpose**: Archival storage, rarely accessed
|
|
|
|
### 8.4 Tier Promotion
|
|
|
|
A background compaction process runs periodically (default: every 24 hours):
|
|
|
|
1. Identify hot entries older than 30 days
|
|
2. Group into segments of 1000
|
|
3. Compress and write to warm tier
|
|
4. Delete hot files
|
|
5. Repeat for warm → cold at 365 days
|
|
|
|
### 8.5 Merkle Tree Preservation
|
|
|
|
The Merkle tree is independent of storage tiers. Leaf hashes and the tree structure
|
|
are maintained in a separate data structure (compact tree format, stored in SQLite or
|
|
flat file). Moving bundles between storage tiers does not affect the tree.
|
|
|
|
Inclusion proofs and consistency proofs remain valid across tier promotions — they
|
|
reference the tree, not the storage location.
|
|
|
|
### 8.6 Metadata Database
|
|
|
|
SQLite database tracking all bundles:
|
|
|
|
```sql
|
|
CREATE TABLE bundles (
|
|
tree_index INTEGER PRIMARY KEY,
|
|
bundle_id BLOB NOT NULL, -- UUID v7
|
|
bundle_hash BLOB NOT NULL, -- leaf hash
|
|
chain_id BLOB NOT NULL, -- source chain ID
|
|
signer_pubkey BLOB NOT NULL, -- Ed25519 public key
|
|
record_count INTEGER NOT NULL, -- records in bundle
|
|
range_start INTEGER NOT NULL, -- first chain index
|
|
range_end INTEGER NOT NULL, -- last chain index
|
|
receipt_ts INTEGER NOT NULL, -- Unix µs reception time
|
|
storage_tier TEXT NOT NULL DEFAULT 'hot', -- 'hot', 'warm', 'cold'
|
|
storage_key TEXT NOT NULL, -- file path or segment reference
|
|
created_at TEXT NOT NULL DEFAULT (strftime('%Y-%m-%dT%H:%M:%fZ', 'now'))
|
|
);
|
|
|
|
CREATE INDEX idx_bundles_bundle_id ON bundles(bundle_id);
|
|
CREATE INDEX idx_bundles_chain_id ON bundles(chain_id);
|
|
CREATE INDEX idx_bundles_bundle_hash ON bundles(bundle_hash);
|
|
CREATE INDEX idx_bundles_receipt_ts ON bundles(receipt_ts);
|
|
```
|
|
|
|
## 9. Server Configuration
|
|
|
|
```json
|
|
{
|
|
"server_id": "my-server.example.org",
|
|
"host": "0.0.0.0",
|
|
"port": 8443,
|
|
"data_dir": "/var/lib/fieldwitness-federation",
|
|
"identity_key_path": "/etc/fieldwitness-federation/identity/private.pem",
|
|
"peers": [
|
|
{
|
|
"url": "https://peer1.example.org:8443",
|
|
"pubkey_hex": "abc123...",
|
|
"name": "Peer One"
|
|
}
|
|
],
|
|
"gossip_interval_seconds": 300,
|
|
"hot_retention_days": 30,
|
|
"warm_retention_days": 365,
|
|
"compaction_interval_hours": 24,
|
|
"max_bundle_size_bytes": 10485760,
|
|
"max_entries_per_request": 1000,
|
|
"member_tokens": [
|
|
{
|
|
"name": "loader-1",
|
|
"pubkey_hex": "def456...",
|
|
"permissions": ["submit", "entries"]
|
|
}
|
|
]
|
|
}
|
|
```
|
|
|
|
## 10. Error Codes
|
|
|
|
| HTTP Status | CBOR Error Code | Description |
|
|
|---|---|---|
|
|
| 400 | `"invalid_bundle"` | Bundle format invalid or signature verification failed |
|
|
| 400 | `"invalid_range"` | Requested entry range is invalid |
|
|
| 401 | `"unauthorized"` | Missing or invalid auth token |
|
|
| 403 | `"forbidden"` | Token lacks required permission |
|
|
| 404 | `"not_found"` | Bundle or entry not found |
|
|
| 409 | `"duplicate"` | Bundle already in log (returns existing receipt) |
|
|
| 413 | `"bundle_too_large"` | Bundle exceeds `max_bundle_size_bytes` |
|
|
| 507 | `"storage_full"` | Server cannot accept new entries |
|
|
|
|
Error response format:
|
|
```cbor
|
|
{
|
|
0: error_code, # text
|
|
1: message, # text — human-readable description
|
|
2: details, # map — optional additional context
|
|
}
|
|
```
|
|
|
|
## 11. Security Considerations
|
|
|
|
### 11.1 Server Compromise
|
|
|
|
A compromised server can:
|
|
- Read bundle metadata (chain IDs, signer pubkeys, timestamps) — **expected at member tier**
|
|
- Withhold entries from gossip — **detectable**: other servers will see inconsistent tree sizes
|
|
- Present a forked tree — **detectable**: consistency proofs will fail
|
|
- Issue false receipts — **detectable**: receipt's inclusion proof won't verify against other servers' STHs
|
|
|
|
A compromised server **cannot**:
|
|
- Read attestation content (encrypted with recipient keys)
|
|
- Forge attestation signatures (requires Ed25519 private key)
|
|
- Modify bundle contents (GCM authentication would fail)
|
|
- Reorder or remove entries from other servers' trees
|
|
|
|
### 11.2 Transport Security
|
|
|
|
All server-to-server and client-to-server communication should use TLS 1.3. The
|
|
federation protocol provides its own authentication (Ed25519 signatures on STHs and
|
|
receipts), but TLS prevents network-level attacks.
|
|
|
|
### 11.3 Clock Reliability
|
|
|
|
Federation server clocks should be synchronized via NTP. Receipt timestamps are only as
|
|
reliable as the server's clock. Deploying servers across multiple time zones and operators
|
|
provides cross-checks — wildly divergent receipt timestamps for the same bundle indicate
|
|
clock problems or compromise.
|