Complete project rebrand for better positioning in the press freedom and digital security space. FieldWitness communicates both field deployment and evidence testimony — appropriate for the target audience of journalists, NGOs, and human rights organizations. Rename mapping: - soosef → fieldwitness (package, CLI, all imports) - soosef.stegasoo → fieldwitness.stego - soosef.verisoo → fieldwitness.attest - ~/.soosef/ → ~/.fwmetadata/ (innocuous data dir name) - SOOSEF_DATA_DIR → FIELDWITNESS_DATA_DIR - SoosefConfig → FieldWitnessConfig - SoosefError → FieldWitnessError Also includes: - License switch from MIT to GPL-3.0 - C2PA bridge module (Phase 0-2 MVP): cert.py, export.py, vendor_assertions.py - README repositioned to lead with provenance/federation, stego backgrounded - Threat model skeleton at docs/security/threat-model.md - Planning docs: docs/planning/c2pa-integration.md, docs/planning/gtm-feasibility.md Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
320 lines
11 KiB
Markdown
320 lines
11 KiB
Markdown
# Export Bundle Specification
|
|
|
|
**Status**: Design
|
|
**Version**: 1 (bundle format version)
|
|
**Last updated**: 2026-04-01
|
|
|
|
## 1. Overview
|
|
|
|
An export bundle packages a contiguous range of chain records into a portable, encrypted
|
|
file suitable for transfer across an air gap. The bundle format is designed so that:
|
|
|
|
- **Auditors** can verify chain integrity without decrypting content
|
|
- **Recipients** with the correct key can decrypt and read attestation records
|
|
- **Anyone** can detect tampering via Merkle root and signature verification
|
|
- **Steganographic embedding** is optional — bundles can be hidden in JPEG images via DCT
|
|
|
|
The format follows the pattern established by `keystore/export.py` (SOOBNDL): magic bytes,
|
|
version, structured binary payload.
|
|
|
|
## 2. Binary Layout
|
|
|
|
```
|
|
Offset Size Field
|
|
────── ───────── ──────────────────────────────────────
|
|
0 8 magic: b"FIELDWITNESSX1"
|
|
8 1 version: uint8 (1)
|
|
9 4 summary_len: uint32 BE
|
|
13 var chain_summary: CBOR (see §3)
|
|
var 4 recipients_len: uint32 BE
|
|
var var recipients: CBOR array (see §4)
|
|
var 12 nonce: AES-256-GCM nonce
|
|
var var ciphertext: AES-256-GCM(zstd(CBOR(records)))
|
|
last 16 16 tag: AES-256-GCM authentication tag
|
|
```
|
|
|
|
All multi-byte integers are big-endian. The total bundle size is:
|
|
`9 + 4 + summary_len + 4 + recipients_len + 12 + ciphertext_len + 16`
|
|
|
|
### Parsing Without Decryption
|
|
|
|
To audit a bundle without decryption, read:
|
|
1. Magic (8 bytes) — verify `b"FIELDWITNESSX1"`
|
|
2. Version (1 byte) — verify `1`
|
|
3. Summary length (4 bytes BE) — read the next N bytes as CBOR
|
|
4. Chain summary — verify signature, inspect metadata
|
|
|
|
The encrypted payload and recipient list can be skipped for audit purposes.
|
|
|
|
## 3. Chain Summary
|
|
|
|
The chain summary sits **outside** the encryption envelope. It provides verifiable metadata
|
|
about the bundle contents without revealing the actual attestation data.
|
|
|
|
CBOR map with integer keys:
|
|
|
|
| CBOR Key | Field | Type | Description |
|
|
|---|---|---|---|
|
|
| 0 | `bundle_id` | byte string (16) | UUID v7, unique bundle identifier |
|
|
| 1 | `chain_id` | byte string (32) | SHA-256(genesis record) — identifies source chain |
|
|
| 2 | `range_start` | unsigned int | First record index (inclusive) |
|
|
| 3 | `range_end` | unsigned int | Last record index (inclusive) |
|
|
| 4 | `record_count` | unsigned int | Number of records in bundle |
|
|
| 5 | `first_hash` | byte string (32) | `compute_record_hash(first_record)` |
|
|
| 6 | `last_hash` | byte string (32) | `compute_record_hash(last_record)` |
|
|
| 7 | `merkle_root` | byte string (32) | Root of Merkle tree over record hashes (see §5) |
|
|
| 8 | `created_ts` | integer | Bundle creation timestamp (Unix µs) |
|
|
| 9 | `signer_pubkey` | byte string (32) | Ed25519 public key of bundle creator |
|
|
| 10 | `bundle_sig` | byte string (64) | Ed25519 signature (see §3.1) |
|
|
|
|
### 3.1 Signature Computation
|
|
|
|
The signature covers all summary fields except `bundle_sig` itself:
|
|
|
|
```
|
|
summary_bytes = cbor2.dumps({
|
|
0: bundle_id,
|
|
1: chain_id,
|
|
2: range_start,
|
|
3: range_end,
|
|
4: record_count,
|
|
5: first_hash,
|
|
6: last_hash,
|
|
7: merkle_root,
|
|
8: created_ts,
|
|
9: signer_pubkey,
|
|
}, canonical=True)
|
|
|
|
bundle_sig = Ed25519_Sign(private_key, summary_bytes)
|
|
```
|
|
|
|
### 3.2 Verification Without Decryption
|
|
|
|
An auditor verifies a bundle by:
|
|
1. Parse chain summary
|
|
2. `Ed25519_Verify(signer_pubkey, bundle_sig, summary_bytes)` — authentic summary
|
|
3. `record_count == range_end - range_start + 1` — count matches range
|
|
4. If previous bundles from the same `chain_id` exist, verify `first_hash` matches
|
|
the expected continuation
|
|
|
|
The auditor now knows: "A chain with ID X contains records [start, end], the creator
|
|
signed this claim, and the Merkle root commits to specific record contents." All without
|
|
decrypting.
|
|
|
|
## 4. Envelope Encryption
|
|
|
|
### 4.1 Key Derivation
|
|
|
|
Ed25519 signing keys are converted to X25519 Diffie-Hellman keys for encryption:
|
|
|
|
```
|
|
x25519_private = Ed25519_to_X25519_Private(ed25519_private_key)
|
|
x25519_public = Ed25519_to_X25519_Public(ed25519_public_key_bytes)
|
|
```
|
|
|
|
This uses the birational map between Ed25519 and X25519 curves, supported natively by
|
|
the `cryptography` library.
|
|
|
|
### 4.2 DEK Generation
|
|
|
|
A random 32-byte data encryption key (DEK) is generated per bundle:
|
|
|
|
```
|
|
dek = os.urandom(32) # AES-256 key
|
|
```
|
|
|
|
### 4.3 DEK Wrapping (Per Recipient)
|
|
|
|
For each recipient, the DEK is wrapped using X25519 ECDH + HKDF + AES-256-GCM:
|
|
|
|
```
|
|
1. shared_secret = X25519_ECDH(sender_x25519_private, recipient_x25519_public)
|
|
2. derived_key = HKDF-SHA256(
|
|
ikm=shared_secret,
|
|
salt=bundle_id, # binds to this specific bundle
|
|
info=b"fieldwitness-dek-wrap-v1",
|
|
length=32
|
|
)
|
|
3. wrapped_dek = AES-256-GCM_Encrypt(
|
|
key=derived_key,
|
|
nonce=os.urandom(12),
|
|
plaintext=dek,
|
|
aad=bundle_id # additional authenticated data
|
|
)
|
|
```
|
|
|
|
### 4.4 Recipients Array
|
|
|
|
CBOR array of recipient entries:
|
|
|
|
```cbor
|
|
[
|
|
{
|
|
0: recipient_pubkey, # byte string (32) — Ed25519 public key
|
|
1: wrap_nonce, # byte string (12) — AES-GCM nonce for DEK wrap
|
|
2: wrapped_dek, # byte string (48) — encrypted DEK (32) + GCM tag (16)
|
|
},
|
|
...
|
|
]
|
|
```
|
|
|
|
### 4.5 Payload Encryption
|
|
|
|
```
|
|
1. records_cbor = cbor2.dumps([serialize_record(r) for r in records], canonical=True)
|
|
2. compressed = zstd.compress(records_cbor, level=3)
|
|
3. nonce = os.urandom(12)
|
|
4. ciphertext, tag = AES-256-GCM_Encrypt(
|
|
key=dek,
|
|
nonce=nonce,
|
|
plaintext=compressed,
|
|
aad=summary_bytes # binds ciphertext to this summary
|
|
)
|
|
```
|
|
|
|
The `summary_bytes` (same bytes that are signed) are used as additional authenticated
|
|
data (AAD). This cryptographically binds the encrypted payload to the chain summary —
|
|
modifying the summary invalidates the decryption.
|
|
|
|
### 4.6 Decryption
|
|
|
|
A recipient decrypts a bundle:
|
|
|
|
```
|
|
1. Parse chain summary, verify bundle_sig
|
|
2. Find own pubkey in recipients array
|
|
3. shared_secret = X25519_ECDH(recipient_x25519_private, sender_x25519_public)
|
|
(sender_x25519_public derived from summary.signer_pubkey)
|
|
4. derived_key = HKDF-SHA256(shared_secret, salt=bundle_id, info=b"fieldwitness-dek-wrap-v1")
|
|
5. dek = AES-256-GCM_Decrypt(derived_key, wrap_nonce, wrapped_dek, aad=bundle_id)
|
|
6. compressed = AES-256-GCM_Decrypt(dek, nonce, ciphertext, aad=summary_bytes)
|
|
7. records_cbor = zstd.decompress(compressed)
|
|
8. records = [deserialize_record(r) for r in cbor2.loads(records_cbor)]
|
|
9. Verify each record's signature and chain linkage
|
|
```
|
|
|
|
## 5. Merkle Tree
|
|
|
|
The Merkle tree provides compact proofs that specific records are included in a bundle.
|
|
|
|
### 5.1 Construction
|
|
|
|
Leaves are the record hashes in chain order:
|
|
|
|
```
|
|
leaf[i] = compute_record_hash(records[i])
|
|
```
|
|
|
|
Internal nodes:
|
|
|
|
```
|
|
node = SHA-256(left_child || right_child)
|
|
```
|
|
|
|
If the number of leaves is not a power of 2, the last leaf is promoted to the next level
|
|
(standard binary Merkle tree padding).
|
|
|
|
### 5.2 Inclusion Proof
|
|
|
|
An inclusion proof for record at index `i` is a list of `(sibling_hash, direction)` pairs
|
|
from the leaf to the root. Verification:
|
|
|
|
```
|
|
current = leaf[i]
|
|
for (sibling, direction) in proof:
|
|
if direction == "L":
|
|
current = SHA-256(sibling || current)
|
|
else:
|
|
current = SHA-256(current || sibling)
|
|
assert current == merkle_root
|
|
```
|
|
|
|
### 5.3 Usage
|
|
|
|
- **Export bundles**: `merkle_root` in chain summary commits to exact record contents
|
|
- **Federation servers**: Build a separate Merkle tree over bundle hashes (see federation-protocol.md)
|
|
|
|
These are two different trees:
|
|
1. **Record tree** (this section) — leaves are record hashes within a bundle
|
|
2. **Bundle tree** (federation) — leaves are bundle hashes across the federation log
|
|
|
|
## 6. Steganographic Embedding
|
|
|
|
Bundles can optionally be embedded in JPEG images using stego's DCT steganography:
|
|
|
|
```
|
|
1. bundle_bytes = create_export_bundle(chain, start, end, private_key, recipients)
|
|
2. stego_image = stego.encode(
|
|
carrier=carrier_image,
|
|
reference=reference_image,
|
|
file_data=bundle_bytes,
|
|
passphrase=passphrase,
|
|
embed_mode="dct",
|
|
channel_key=channel_key # optional
|
|
)
|
|
```
|
|
|
|
Extraction:
|
|
```
|
|
1. result = stego.decode(
|
|
carrier=stego_image,
|
|
reference=reference_image,
|
|
passphrase=passphrase,
|
|
channel_key=channel_key
|
|
)
|
|
2. bundle_bytes = result.file_data
|
|
3. assert bundle_bytes[:8] == b"FIELDWITNESSX1"
|
|
```
|
|
|
|
### 6.1 Capacity Considerations
|
|
|
|
DCT steganography has limited capacity relative to the carrier image size. Approximate
|
|
capacities:
|
|
|
|
| Carrier Size | Approximate DCT Capacity | Records (est.) |
|
|
|---|---|---|
|
|
| 1 MP (1024x1024) | ~10 KB | ~20-40 records |
|
|
| 4 MP (2048x2048) | ~40 KB | ~80-160 records |
|
|
| 12 MP (4000x3000) | ~100 KB | ~200-400 records |
|
|
|
|
Record size varies (~200-500 bytes each after CBOR serialization, before compression).
|
|
Zstd compression typically achieves 2-4x ratio on CBOR attestation data. Use
|
|
`check_capacity()` before embedding.
|
|
|
|
### 6.2 Multiple Images
|
|
|
|
For large export ranges, split across multiple bundles embedded in multiple carrier images.
|
|
Each bundle is self-contained with its own chain summary. The receiving side imports them
|
|
in any order — the chain indices and hashes enable reassembly.
|
|
|
|
## 7. Recipient Management
|
|
|
|
### 7.1 Adding Recipients
|
|
|
|
Recipients are identified by their Ed25519 public keys. To encrypt a bundle for a
|
|
recipient, the creator needs only their public key (no shared secret setup required).
|
|
|
|
### 7.2 Recipient Discovery
|
|
|
|
Recipients' Ed25519 public keys can be obtained via:
|
|
- Direct exchange (QR code, USB transfer, verbal fingerprint verification)
|
|
- Federation server identity registry (when available)
|
|
- Attest's existing `peers.json` file
|
|
|
|
### 7.3 Self-Encryption
|
|
|
|
The bundle creator should always include their own public key in the recipients list.
|
|
This allows them to decrypt their own exports (e.g., when restoring from backup).
|
|
|
|
## 8. Error Handling
|
|
|
|
| Error | Cause | Response |
|
|
|---|---|---|
|
|
| Bad magic | Not a FIELDWITNESSX1 bundle | Reject with `ExportError("not a FieldWitness export bundle")` |
|
|
| Bad version | Unsupported format version | Reject with `ExportError("unsupported bundle version")` |
|
|
| Signature invalid | Tampered summary or wrong signer | Reject with `ExportError("bundle signature verification failed")` |
|
|
| No matching recipient | Decryptor's key not in recipients list | Reject with `ExportError("not an authorized recipient")` |
|
|
| GCM auth failure | Tampered ciphertext or wrong key | Reject with `ExportError("decryption failed — bundle may be corrupted")` |
|
|
| Decompression failure | Corrupted compressed data | Reject with `ExportError("decompression failed")` |
|
|
| Chain integrity failure | Records don't link correctly | Reject with `ChainIntegrityError(...)` after decryption |
|