fieldwitness/docs/architecture/export-bundle.md
Aaron D. Lee 490f9d4a1d Rebrand SooSeF to FieldWitness
Complete project rebrand for better positioning in the press freedom
and digital security space. FieldWitness communicates both field
deployment and evidence testimony — appropriate for the target audience
of journalists, NGOs, and human rights organizations.

Rename mapping:
- soosef → fieldwitness (package, CLI, all imports)
- soosef.stegasoo → fieldwitness.stego
- soosef.verisoo → fieldwitness.attest
- ~/.soosef/ → ~/.fwmetadata/ (innocuous data dir name)
- SOOSEF_DATA_DIR → FIELDWITNESS_DATA_DIR
- SoosefConfig → FieldWitnessConfig
- SoosefError → FieldWitnessError

Also includes:
- License switch from MIT to GPL-3.0
- C2PA bridge module (Phase 0-2 MVP): cert.py, export.py, vendor_assertions.py
- README repositioned to lead with provenance/federation, stego backgrounded
- Threat model skeleton at docs/security/threat-model.md
- Planning docs: docs/planning/c2pa-integration.md, docs/planning/gtm-feasibility.md

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-02 15:05:13 -04:00

11 KiB

Export Bundle Specification

Status: Design
Version: 1 (bundle format version)
Last updated: 2026-04-01

1. Overview

An export bundle packages a contiguous range of chain records into a portable, encrypted file suitable for transfer across an air gap. The bundle format is designed so that:

  • Auditors can verify chain integrity without decrypting content
  • Recipients with the correct key can decrypt and read attestation records
  • Anyone can detect tampering via Merkle root and signature verification
  • Steganographic embedding is optional — bundles can be hidden in JPEG images via DCT

The format follows the pattern established by keystore/export.py (SOOBNDL): magic bytes, version, structured binary payload.

2. Binary Layout

Offset  Size       Field
──────  ─────────  ──────────────────────────────────────
0       8          magic: b"FIELDWITNESSX1"
8       1          version: uint8 (1)
9       4          summary_len: uint32 BE
13      var        chain_summary: CBOR (see §3)
var     4          recipients_len: uint32 BE
var     var        recipients: CBOR array (see §4)
var     12         nonce: AES-256-GCM nonce
var     var        ciphertext: AES-256-GCM(zstd(CBOR(records)))
last 16 16         tag: AES-256-GCM authentication tag

All multi-byte integers are big-endian. The total bundle size is: 9 + 4 + summary_len + 4 + recipients_len + 12 + ciphertext_len + 16

Parsing Without Decryption

To audit a bundle without decryption, read:

  1. Magic (8 bytes) — verify b"FIELDWITNESSX1"
  2. Version (1 byte) — verify 1
  3. Summary length (4 bytes BE) — read the next N bytes as CBOR
  4. Chain summary — verify signature, inspect metadata

The encrypted payload and recipient list can be skipped for audit purposes.

3. Chain Summary

The chain summary sits outside the encryption envelope. It provides verifiable metadata about the bundle contents without revealing the actual attestation data.

CBOR map with integer keys:

CBOR Key Field Type Description
0 bundle_id byte string (16) UUID v7, unique bundle identifier
1 chain_id byte string (32) SHA-256(genesis record) — identifies source chain
2 range_start unsigned int First record index (inclusive)
3 range_end unsigned int Last record index (inclusive)
4 record_count unsigned int Number of records in bundle
5 first_hash byte string (32) compute_record_hash(first_record)
6 last_hash byte string (32) compute_record_hash(last_record)
7 merkle_root byte string (32) Root of Merkle tree over record hashes (see §5)
8 created_ts integer Bundle creation timestamp (Unix µs)
9 signer_pubkey byte string (32) Ed25519 public key of bundle creator
10 bundle_sig byte string (64) Ed25519 signature (see §3.1)

3.1 Signature Computation

The signature covers all summary fields except bundle_sig itself:

summary_bytes = cbor2.dumps({
    0: bundle_id,
    1: chain_id,
    2: range_start,
    3: range_end,
    4: record_count,
    5: first_hash,
    6: last_hash,
    7: merkle_root,
    8: created_ts,
    9: signer_pubkey,
}, canonical=True)

bundle_sig = Ed25519_Sign(private_key, summary_bytes)

3.2 Verification Without Decryption

An auditor verifies a bundle by:

  1. Parse chain summary
  2. Ed25519_Verify(signer_pubkey, bundle_sig, summary_bytes) — authentic summary
  3. record_count == range_end - range_start + 1 — count matches range
  4. If previous bundles from the same chain_id exist, verify first_hash matches the expected continuation

The auditor now knows: "A chain with ID X contains records [start, end], the creator signed this claim, and the Merkle root commits to specific record contents." All without decrypting.

4. Envelope Encryption

4.1 Key Derivation

Ed25519 signing keys are converted to X25519 Diffie-Hellman keys for encryption:

x25519_private = Ed25519_to_X25519_Private(ed25519_private_key)
x25519_public  = Ed25519_to_X25519_Public(ed25519_public_key_bytes)

This uses the birational map between Ed25519 and X25519 curves, supported natively by the cryptography library.

4.2 DEK Generation

A random 32-byte data encryption key (DEK) is generated per bundle:

dek = os.urandom(32)  # AES-256 key

4.3 DEK Wrapping (Per Recipient)

For each recipient, the DEK is wrapped using X25519 ECDH + HKDF + AES-256-GCM:

1. shared_secret = X25519_ECDH(sender_x25519_private, recipient_x25519_public)
2. derived_key = HKDF-SHA256(
       ikm=shared_secret,
       salt=bundle_id,          # binds to this specific bundle
       info=b"fieldwitness-dek-wrap-v1",
       length=32
   )
3. wrapped_dek = AES-256-GCM_Encrypt(
       key=derived_key,
       nonce=os.urandom(12),
       plaintext=dek,
       aad=bundle_id             # additional authenticated data
   )

4.4 Recipients Array

CBOR array of recipient entries:

[
    {
        0: recipient_pubkey,     # byte string (32) — Ed25519 public key
        1: wrap_nonce,           # byte string (12) — AES-GCM nonce for DEK wrap
        2: wrapped_dek,          # byte string (48) — encrypted DEK (32) + GCM tag (16)
    },
    ...
]

4.5 Payload Encryption

1. records_cbor = cbor2.dumps([serialize_record(r) for r in records], canonical=True)
2. compressed = zstd.compress(records_cbor, level=3)
3. nonce = os.urandom(12)
4. ciphertext, tag = AES-256-GCM_Encrypt(
       key=dek,
       nonce=nonce,
       plaintext=compressed,
       aad=summary_bytes         # binds ciphertext to this summary
   )

The summary_bytes (same bytes that are signed) are used as additional authenticated data (AAD). This cryptographically binds the encrypted payload to the chain summary — modifying the summary invalidates the decryption.

4.6 Decryption

A recipient decrypts a bundle:

1. Parse chain summary, verify bundle_sig
2. Find own pubkey in recipients array
3. shared_secret = X25519_ECDH(recipient_x25519_private, sender_x25519_public)
   (sender_x25519_public derived from summary.signer_pubkey)
4. derived_key = HKDF-SHA256(shared_secret, salt=bundle_id, info=b"fieldwitness-dek-wrap-v1")
5. dek = AES-256-GCM_Decrypt(derived_key, wrap_nonce, wrapped_dek, aad=bundle_id)
6. compressed = AES-256-GCM_Decrypt(dek, nonce, ciphertext, aad=summary_bytes)
7. records_cbor = zstd.decompress(compressed)
8. records = [deserialize_record(r) for r in cbor2.loads(records_cbor)]
9. Verify each record's signature and chain linkage

5. Merkle Tree

The Merkle tree provides compact proofs that specific records are included in a bundle.

5.1 Construction

Leaves are the record hashes in chain order:

leaf[i] = compute_record_hash(records[i])

Internal nodes:

node = SHA-256(left_child || right_child)

If the number of leaves is not a power of 2, the last leaf is promoted to the next level (standard binary Merkle tree padding).

5.2 Inclusion Proof

An inclusion proof for record at index i is a list of (sibling_hash, direction) pairs from the leaf to the root. Verification:

current = leaf[i]
for (sibling, direction) in proof:
    if direction == "L":
        current = SHA-256(sibling || current)
    else:
        current = SHA-256(current || sibling)
assert current == merkle_root

5.3 Usage

  • Export bundles: merkle_root in chain summary commits to exact record contents
  • Federation servers: Build a separate Merkle tree over bundle hashes (see federation-protocol.md)

These are two different trees:

  1. Record tree (this section) — leaves are record hashes within a bundle
  2. Bundle tree (federation) — leaves are bundle hashes across the federation log

6. Steganographic Embedding

Bundles can optionally be embedded in JPEG images using stego's DCT steganography:

1. bundle_bytes = create_export_bundle(chain, start, end, private_key, recipients)
2. stego_image = stego.encode(
       carrier=carrier_image,
       reference=reference_image,
       file_data=bundle_bytes,
       passphrase=passphrase,
       embed_mode="dct",
       channel_key=channel_key  # optional
   )

Extraction:

1. result = stego.decode(
       carrier=stego_image,
       reference=reference_image,
       passphrase=passphrase,
       channel_key=channel_key
   )
2. bundle_bytes = result.file_data
3. assert bundle_bytes[:8] == b"FIELDWITNESSX1"

6.1 Capacity Considerations

DCT steganography has limited capacity relative to the carrier image size. Approximate capacities:

Carrier Size Approximate DCT Capacity Records (est.)
1 MP (1024x1024) ~10 KB ~20-40 records
4 MP (2048x2048) ~40 KB ~80-160 records
12 MP (4000x3000) ~100 KB ~200-400 records

Record size varies (~200-500 bytes each after CBOR serialization, before compression). Zstd compression typically achieves 2-4x ratio on CBOR attestation data. Use check_capacity() before embedding.

6.2 Multiple Images

For large export ranges, split across multiple bundles embedded in multiple carrier images. Each bundle is self-contained with its own chain summary. The receiving side imports them in any order — the chain indices and hashes enable reassembly.

7. Recipient Management

7.1 Adding Recipients

Recipients are identified by their Ed25519 public keys. To encrypt a bundle for a recipient, the creator needs only their public key (no shared secret setup required).

7.2 Recipient Discovery

Recipients' Ed25519 public keys can be obtained via:

  • Direct exchange (QR code, USB transfer, verbal fingerprint verification)
  • Federation server identity registry (when available)
  • Attest's existing peers.json file

7.3 Self-Encryption

The bundle creator should always include their own public key in the recipients list. This allows them to decrypt their own exports (e.g., when restoring from backup).

8. Error Handling

Error Cause Response
Bad magic Not a FIELDWITNESSX1 bundle Reject with ExportError("not a FieldWitness export bundle")
Bad version Unsupported format version Reject with ExportError("unsupported bundle version")
Signature invalid Tampered summary or wrong signer Reject with ExportError("bundle signature verification failed")
No matching recipient Decryptor's key not in recipients list Reject with ExportError("not an authorized recipient")
GCM auth failure Tampered ciphertext or wrong key Reject with ExportError("decryption failed — bundle may be corrupted")
Decompression failure Corrupted compressed data Reject with ExportError("decompression failed")
Chain integrity failure Records don't link correctly Reject with ChainIntegrityError(...) after decryption