fieldwitness/docs/architecture/export-bundle.md
Aaron D. Lee 6325e86873
Some checks failed
CI / lint (push) Failing after 1m1s
CI / typecheck (push) Failing after 31s
Comprehensive documentation for v0.2.0 release
README.md (700 lines):
- Three-tier deployment model with ASCII diagram
- Federation blueprint in web UI routes
- deploy/ directory in architecture tree
- Documentation index linking all guides

CLAUDE.md (256 lines):
- Updated architecture tree with all new docs and deploy files

New guides:
- docs/federation.md (317 lines) — gossip protocol mechanics, peer
  setup, trust filtering, offline bundles, relay deployment, jurisdiction
- docs/evidence-guide.md (283 lines) — evidence packages, cold archives,
  selective disclosure, chain anchoring, legal discovery workflow
- docs/source-dropbox.md (220 lines) — token management, client-side
  hashing, extract-then-strip pipeline, receipt mechanics, opsec
- docs/index.md — documentation hub linking all guides

Training materials:
- docs/training/reporter-quickstart.md (105 lines) — printable one-page
  card: boot USB, attest photo, encode message, check-in, emergency
- docs/training/emergency-card.md (79 lines) — wallet-sized laminated
  card: three destruction methods, 10-step order, key contacts
- docs/training/admin-reference.md (219 lines) — deployment tiers,
  CLI tables, backup checklist, hardening checklist, troubleshooting

Also includes existing architecture docs from the original repos.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-01 23:31:47 -04:00

320 lines
11 KiB
Markdown

# Export Bundle Specification
**Status**: Design
**Version**: 1 (bundle format version)
**Last updated**: 2026-04-01
## 1. Overview
An export bundle packages a contiguous range of chain records into a portable, encrypted
file suitable for transfer across an air gap. The bundle format is designed so that:
- **Auditors** can verify chain integrity without decrypting content
- **Recipients** with the correct key can decrypt and read attestation records
- **Anyone** can detect tampering via Merkle root and signature verification
- **Steganographic embedding** is optional — bundles can be hidden in JPEG images via DCT
The format follows the pattern established by `keystore/export.py` (SOOBNDL): magic bytes,
version, structured binary payload.
## 2. Binary Layout
```
Offset Size Field
────── ───────── ──────────────────────────────────────
0 8 magic: b"SOOSEFX1"
8 1 version: uint8 (1)
9 4 summary_len: uint32 BE
13 var chain_summary: CBOR (see §3)
var 4 recipients_len: uint32 BE
var var recipients: CBOR array (see §4)
var 12 nonce: AES-256-GCM nonce
var var ciphertext: AES-256-GCM(zstd(CBOR(records)))
last 16 16 tag: AES-256-GCM authentication tag
```
All multi-byte integers are big-endian. The total bundle size is:
`9 + 4 + summary_len + 4 + recipients_len + 12 + ciphertext_len + 16`
### Parsing Without Decryption
To audit a bundle without decryption, read:
1. Magic (8 bytes) — verify `b"SOOSEFX1"`
2. Version (1 byte) — verify `1`
3. Summary length (4 bytes BE) — read the next N bytes as CBOR
4. Chain summary — verify signature, inspect metadata
The encrypted payload and recipient list can be skipped for audit purposes.
## 3. Chain Summary
The chain summary sits **outside** the encryption envelope. It provides verifiable metadata
about the bundle contents without revealing the actual attestation data.
CBOR map with integer keys:
| CBOR Key | Field | Type | Description |
|---|---|---|---|
| 0 | `bundle_id` | byte string (16) | UUID v7, unique bundle identifier |
| 1 | `chain_id` | byte string (32) | SHA-256(genesis record) — identifies source chain |
| 2 | `range_start` | unsigned int | First record index (inclusive) |
| 3 | `range_end` | unsigned int | Last record index (inclusive) |
| 4 | `record_count` | unsigned int | Number of records in bundle |
| 5 | `first_hash` | byte string (32) | `compute_record_hash(first_record)` |
| 6 | `last_hash` | byte string (32) | `compute_record_hash(last_record)` |
| 7 | `merkle_root` | byte string (32) | Root of Merkle tree over record hashes (see §5) |
| 8 | `created_ts` | integer | Bundle creation timestamp (Unix µs) |
| 9 | `signer_pubkey` | byte string (32) | Ed25519 public key of bundle creator |
| 10 | `bundle_sig` | byte string (64) | Ed25519 signature (see §3.1) |
### 3.1 Signature Computation
The signature covers all summary fields except `bundle_sig` itself:
```
summary_bytes = cbor2.dumps({
0: bundle_id,
1: chain_id,
2: range_start,
3: range_end,
4: record_count,
5: first_hash,
6: last_hash,
7: merkle_root,
8: created_ts,
9: signer_pubkey,
}, canonical=True)
bundle_sig = Ed25519_Sign(private_key, summary_bytes)
```
### 3.2 Verification Without Decryption
An auditor verifies a bundle by:
1. Parse chain summary
2. `Ed25519_Verify(signer_pubkey, bundle_sig, summary_bytes)` — authentic summary
3. `record_count == range_end - range_start + 1` — count matches range
4. If previous bundles from the same `chain_id` exist, verify `first_hash` matches
the expected continuation
The auditor now knows: "A chain with ID X contains records [start, end], the creator
signed this claim, and the Merkle root commits to specific record contents." All without
decrypting.
## 4. Envelope Encryption
### 4.1 Key Derivation
Ed25519 signing keys are converted to X25519 Diffie-Hellman keys for encryption:
```
x25519_private = Ed25519_to_X25519_Private(ed25519_private_key)
x25519_public = Ed25519_to_X25519_Public(ed25519_public_key_bytes)
```
This uses the birational map between Ed25519 and X25519 curves, supported natively by
the `cryptography` library.
### 4.2 DEK Generation
A random 32-byte data encryption key (DEK) is generated per bundle:
```
dek = os.urandom(32) # AES-256 key
```
### 4.3 DEK Wrapping (Per Recipient)
For each recipient, the DEK is wrapped using X25519 ECDH + HKDF + AES-256-GCM:
```
1. shared_secret = X25519_ECDH(sender_x25519_private, recipient_x25519_public)
2. derived_key = HKDF-SHA256(
ikm=shared_secret,
salt=bundle_id, # binds to this specific bundle
info=b"soosef-dek-wrap-v1",
length=32
)
3. wrapped_dek = AES-256-GCM_Encrypt(
key=derived_key,
nonce=os.urandom(12),
plaintext=dek,
aad=bundle_id # additional authenticated data
)
```
### 4.4 Recipients Array
CBOR array of recipient entries:
```cbor
[
{
0: recipient_pubkey, # byte string (32) — Ed25519 public key
1: wrap_nonce, # byte string (12) — AES-GCM nonce for DEK wrap
2: wrapped_dek, # byte string (48) — encrypted DEK (32) + GCM tag (16)
},
...
]
```
### 4.5 Payload Encryption
```
1. records_cbor = cbor2.dumps([serialize_record(r) for r in records], canonical=True)
2. compressed = zstd.compress(records_cbor, level=3)
3. nonce = os.urandom(12)
4. ciphertext, tag = AES-256-GCM_Encrypt(
key=dek,
nonce=nonce,
plaintext=compressed,
aad=summary_bytes # binds ciphertext to this summary
)
```
The `summary_bytes` (same bytes that are signed) are used as additional authenticated
data (AAD). This cryptographically binds the encrypted payload to the chain summary —
modifying the summary invalidates the decryption.
### 4.6 Decryption
A recipient decrypts a bundle:
```
1. Parse chain summary, verify bundle_sig
2. Find own pubkey in recipients array
3. shared_secret = X25519_ECDH(recipient_x25519_private, sender_x25519_public)
(sender_x25519_public derived from summary.signer_pubkey)
4. derived_key = HKDF-SHA256(shared_secret, salt=bundle_id, info=b"soosef-dek-wrap-v1")
5. dek = AES-256-GCM_Decrypt(derived_key, wrap_nonce, wrapped_dek, aad=bundle_id)
6. compressed = AES-256-GCM_Decrypt(dek, nonce, ciphertext, aad=summary_bytes)
7. records_cbor = zstd.decompress(compressed)
8. records = [deserialize_record(r) for r in cbor2.loads(records_cbor)]
9. Verify each record's signature and chain linkage
```
## 5. Merkle Tree
The Merkle tree provides compact proofs that specific records are included in a bundle.
### 5.1 Construction
Leaves are the record hashes in chain order:
```
leaf[i] = compute_record_hash(records[i])
```
Internal nodes:
```
node = SHA-256(left_child || right_child)
```
If the number of leaves is not a power of 2, the last leaf is promoted to the next level
(standard binary Merkle tree padding).
### 5.2 Inclusion Proof
An inclusion proof for record at index `i` is a list of `(sibling_hash, direction)` pairs
from the leaf to the root. Verification:
```
current = leaf[i]
for (sibling, direction) in proof:
if direction == "L":
current = SHA-256(sibling || current)
else:
current = SHA-256(current || sibling)
assert current == merkle_root
```
### 5.3 Usage
- **Export bundles**: `merkle_root` in chain summary commits to exact record contents
- **Federation servers**: Build a separate Merkle tree over bundle hashes (see federation-protocol.md)
These are two different trees:
1. **Record tree** (this section) — leaves are record hashes within a bundle
2. **Bundle tree** (federation) — leaves are bundle hashes across the federation log
## 6. Steganographic Embedding
Bundles can optionally be embedded in JPEG images using stegasoo's DCT steganography:
```
1. bundle_bytes = create_export_bundle(chain, start, end, private_key, recipients)
2. stego_image = stegasoo.encode(
carrier=carrier_image,
reference=reference_image,
file_data=bundle_bytes,
passphrase=passphrase,
embed_mode="dct",
channel_key=channel_key # optional
)
```
Extraction:
```
1. result = stegasoo.decode(
carrier=stego_image,
reference=reference_image,
passphrase=passphrase,
channel_key=channel_key
)
2. bundle_bytes = result.file_data
3. assert bundle_bytes[:8] == b"SOOSEFX1"
```
### 6.1 Capacity Considerations
DCT steganography has limited capacity relative to the carrier image size. Approximate
capacities:
| Carrier Size | Approximate DCT Capacity | Records (est.) |
|---|---|---|
| 1 MP (1024x1024) | ~10 KB | ~20-40 records |
| 4 MP (2048x2048) | ~40 KB | ~80-160 records |
| 12 MP (4000x3000) | ~100 KB | ~200-400 records |
Record size varies (~200-500 bytes each after CBOR serialization, before compression).
Zstd compression typically achieves 2-4x ratio on CBOR attestation data. Use
`check_capacity()` before embedding.
### 6.2 Multiple Images
For large export ranges, split across multiple bundles embedded in multiple carrier images.
Each bundle is self-contained with its own chain summary. The receiving side imports them
in any order — the chain indices and hashes enable reassembly.
## 7. Recipient Management
### 7.1 Adding Recipients
Recipients are identified by their Ed25519 public keys. To encrypt a bundle for a
recipient, the creator needs only their public key (no shared secret setup required).
### 7.2 Recipient Discovery
Recipients' Ed25519 public keys can be obtained via:
- Direct exchange (QR code, USB transfer, verbal fingerprint verification)
- Federation server identity registry (when available)
- Verisoo's existing `peers.json` file
### 7.3 Self-Encryption
The bundle creator should always include their own public key in the recipients list.
This allows them to decrypt their own exports (e.g., when restoring from backup).
## 8. Error Handling
| Error | Cause | Response |
|---|---|---|
| Bad magic | Not a SOOSEFX1 bundle | Reject with `ExportError("not a SooSeF export bundle")` |
| Bad version | Unsupported format version | Reject with `ExportError("unsupported bundle version")` |
| Signature invalid | Tampered summary or wrong signer | Reject with `ExportError("bundle signature verification failed")` |
| No matching recipient | Decryptor's key not in recipients list | Reject with `ExportError("not an authorized recipient")` |
| GCM auth failure | Tampered ciphertext or wrong key | Reject with `ExportError("decryption failed — bundle may be corrupted")` |
| Decompression failure | Corrupted compressed data | Reject with `ExportError("decompression failed")` |
| Chain integrity failure | Records don't link correctly | Reject with `ChainIntegrityError(...)` after decryption |