Complete project rebrand for better positioning in the press freedom and digital security space. FieldWitness communicates both field deployment and evidence testimony — appropriate for the target audience of journalists, NGOs, and human rights organizations. Rename mapping: - soosef → fieldwitness (package, CLI, all imports) - soosef.stegasoo → fieldwitness.stego - soosef.verisoo → fieldwitness.attest - ~/.soosef/ → ~/.fwmetadata/ (innocuous data dir name) - SOOSEF_DATA_DIR → FIELDWITNESS_DATA_DIR - SoosefConfig → FieldWitnessConfig - SoosefError → FieldWitnessError Also includes: - License switch from MIT to GPL-3.0 - C2PA bridge module (Phase 0-2 MVP): cert.py, export.py, vendor_assertions.py - README repositioned to lead with provenance/federation, stego backgrounded - Threat model skeleton at docs/security/threat-model.md - Planning docs: docs/planning/c2pa-integration.md, docs/planning/gtm-feasibility.md Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
253 lines
9.7 KiB
Markdown
253 lines
9.7 KiB
Markdown
# Chain Format Specification
|
|
|
|
**Status**: Design
|
|
**Version**: 1 (record format version)
|
|
**Last updated**: 2026-04-01
|
|
|
|
## 1. Overview
|
|
|
|
The attestation chain is an append-only sequence of signed records stored locally on the
|
|
offline device. Each record includes a hash of the previous record, forming a tamper-evident
|
|
chain analogous to git commits or blockchain blocks.
|
|
|
|
The chain wraps existing Attest attestation records. A Attest record's serialized bytes
|
|
become the input to `content_hash`, preserving the original attestation while adding
|
|
ordering, entropy witnesses, and chain integrity guarantees.
|
|
|
|
## 2. AttestationChainRecord
|
|
|
|
### Field Definitions
|
|
|
|
| Field | CBOR Key | Type | Size | Description |
|
|
|---|---|---|---|---|
|
|
| `version` | 0 | unsigned int | 1 byte | Record format version. Currently `1`. |
|
|
| `record_id` | 1 | byte string | 16 bytes | UUID v7 (RFC 9562). Time-ordered unique identifier. |
|
|
| `chain_index` | 2 | unsigned int | 8 bytes max | Monotonically increasing, 0-based. Genesis record is index 0. |
|
|
| `prev_hash` | 3 | byte string | 32 bytes | SHA-256 of `canonical_bytes(previous_record)`. Genesis: `0x00 * 32`. |
|
|
| `content_hash` | 4 | byte string | 32 bytes | SHA-256 of the wrapped content (e.g., Attest record bytes). |
|
|
| `content_type` | 5 | text string | variable | MIME-like type identifier. `"attest/attestation-v1"` for Attest records. |
|
|
| `metadata` | 6 | CBOR map | variable | Extensible key-value map. See §2.1. |
|
|
| `claimed_ts` | 7 | integer | 8 bytes max | Unix timestamp in microseconds (µs). Signed integer to handle pre-epoch dates. |
|
|
| `entropy_witnesses` | 8 | CBOR map | variable | System entropy snapshot. See §3. |
|
|
| `signer_pubkey` | 9 | byte string | 32 bytes | Ed25519 raw public key bytes. |
|
|
| `signature` | 10 | byte string | 64 bytes | Ed25519 signature over `canonical_bytes(record)` excluding the signature field. |
|
|
|
|
### 2.1 Metadata Map
|
|
|
|
The `metadata` field is an open CBOR map with text string keys. Defined keys:
|
|
|
|
| Key | Type | Description |
|
|
|---|---|---|
|
|
| `"backfilled"` | bool | `true` if this record was created by the backfill migration |
|
|
| `"caption"` | text | Human-readable description of the attested content |
|
|
| `"location"` | text | Location name associated with the attestation |
|
|
| `"original_ts"` | integer | Original Attest timestamp (µs) if different from `claimed_ts` |
|
|
| `"tags"` | array of text | User-defined classification tags |
|
|
|
|
Applications may add custom keys. Unknown keys must be preserved during serialization.
|
|
|
|
## 3. Entropy Witnesses
|
|
|
|
Entropy witnesses are system-state snapshots collected at record creation time. They serve
|
|
as soft evidence that the claimed timestamp is plausible. Fabricating convincing witnesses
|
|
for a backdated record requires simulating the full system state at the claimed time.
|
|
|
|
| Field | CBOR Key | Type | Source | Fallback (non-Linux) |
|
|
|---|---|---|---|---|
|
|
| `sys_uptime` | 0 | float64 | `time.monotonic()` | Same (cross-platform) |
|
|
| `fs_snapshot` | 1 | byte string (16 bytes) | SHA-256 of `os.stat()` on chain DB, truncated to 16 bytes | SHA-256 of chain dir stat |
|
|
| `proc_entropy` | 2 | unsigned int | `/proc/sys/kernel/random/entropy_avail` | `len(os.urandom(32))` (always 32, marker for non-Linux) |
|
|
| `boot_id` | 3 | text string | `/proc/sys/kernel/random/boot_id` | `uuid.uuid4()` cached per process lifetime |
|
|
|
|
### Witness Properties
|
|
|
|
- **sys_uptime**: Monotonically increasing within a boot. Cannot decrease. A record with
|
|
`sys_uptime < previous_record.sys_uptime` and `claimed_ts > previous_record.claimed_ts`
|
|
is suspicious (reboot or clock manipulation).
|
|
- **fs_snapshot**: Changes with every write to the chain DB. Hash includes mtime, ctime,
|
|
size, and inode number.
|
|
- **proc_entropy**: Varies naturally. On Linux, reflects kernel entropy pool state.
|
|
- **boot_id**: Changes on every reboot. Identical `boot_id` across records implies same
|
|
boot session — combined with `sys_uptime`, this constrains the timeline.
|
|
|
|
## 4. Serialization
|
|
|
|
### 4.1 Canonical Bytes
|
|
|
|
`canonical_bytes(record)` produces the deterministic byte representation used for hashing
|
|
and signing. It is a CBOR map containing all fields **except** `signature`, encoded using
|
|
CBOR canonical encoding (RFC 8949 §4.2):
|
|
|
|
- Map keys sorted by integer value (0, 1, 2, ..., 9)
|
|
- Integers use minimal-length encoding
|
|
- No indefinite-length items
|
|
- No duplicate keys
|
|
|
|
```
|
|
canonical_bytes(record) = cbor2.dumps({
|
|
0: record.version,
|
|
1: record.record_id,
|
|
2: record.chain_index,
|
|
3: record.prev_hash,
|
|
4: record.content_hash,
|
|
5: record.content_type,
|
|
6: record.metadata,
|
|
7: record.claimed_ts,
|
|
8: {
|
|
0: record.entropy_witnesses.sys_uptime,
|
|
1: record.entropy_witnesses.fs_snapshot,
|
|
2: record.entropy_witnesses.proc_entropy,
|
|
3: record.entropy_witnesses.boot_id,
|
|
},
|
|
9: record.signer_pubkey,
|
|
}, canonical=True)
|
|
```
|
|
|
|
### 4.2 Record Hash
|
|
|
|
```
|
|
compute_record_hash(record) = SHA-256(canonical_bytes(record))
|
|
```
|
|
|
|
This hash is used as `prev_hash` in the next record and as Merkle tree leaves in export
|
|
bundles.
|
|
|
|
### 4.3 Signature
|
|
|
|
```
|
|
record.signature = Ed25519_Sign(private_key, canonical_bytes(record))
|
|
```
|
|
|
|
Verification:
|
|
```
|
|
Ed25519_Verify(record.signer_pubkey, record.signature, canonical_bytes(record))
|
|
```
|
|
|
|
### 4.4 Full Serialization
|
|
|
|
`serialize_record(record)` produces the full CBOR encoding including the signature field
|
|
(CBOR key 10). This is used for storage and transmission.
|
|
|
|
```
|
|
serialize_record(record) = cbor2.dumps({
|
|
0: record.version,
|
|
1: record.record_id,
|
|
...
|
|
9: record.signer_pubkey,
|
|
10: record.signature,
|
|
}, canonical=True)
|
|
```
|
|
|
|
## 5. Chain Rules
|
|
|
|
### 5.1 Genesis Record
|
|
|
|
The first record in a chain (index 0) has:
|
|
- `chain_index = 0`
|
|
- `prev_hash = b'\x00' * 32` (32 zero bytes)
|
|
|
|
The **chain ID** is defined as `SHA-256(canonical_bytes(genesis_record))`. This permanently
|
|
identifies the chain.
|
|
|
|
### 5.2 Append Rule
|
|
|
|
For record N (where N > 0):
|
|
```
|
|
record_N.chain_index == record_{N-1}.chain_index + 1
|
|
record_N.prev_hash == compute_record_hash(record_{N-1})
|
|
record_N.claimed_ts >= record_{N-1}.claimed_ts (SHOULD, not MUST — clock skew possible)
|
|
```
|
|
|
|
### 5.3 Verification
|
|
|
|
Full chain verification checks, for each record from index 0 to head:
|
|
1. `Ed25519_Verify(record.signer_pubkey, record.signature, canonical_bytes(record))` — signature valid
|
|
2. `record.chain_index == expected_index` — no gaps or duplicates
|
|
3. `record.prev_hash == compute_record_hash(previous_record)` — chain link intact
|
|
4. All `signer_pubkey` values are identical within a chain (single-signer chain)
|
|
|
|
Violation of rule 4 indicates a chain was signed by multiple identities, which may be
|
|
legitimate (key rotation) or malicious (chain hijacking). Key rotation is out of scope for
|
|
v1; implementations should flag this as a warning.
|
|
|
|
## 6. Storage Format
|
|
|
|
### 6.1 chain.bin (Append-Only Log)
|
|
|
|
Records are stored sequentially as length-prefixed CBOR:
|
|
|
|
```
|
|
┌─────────────────────────────┐
|
|
│ uint32 BE: record_0 length │
|
|
│ bytes: serialize(record_0) │
|
|
├─────────────────────────────┤
|
|
│ uint32 BE: record_1 length │
|
|
│ bytes: serialize(record_1) │
|
|
├─────────────────────────────┤
|
|
│ ... │
|
|
└─────────────────────────────┘
|
|
```
|
|
|
|
- Length prefix is 4 bytes, big-endian unsigned 32-bit integer
|
|
- Maximum record size: 4 GiB (practical limit much smaller)
|
|
- File is append-only; records are never modified or deleted
|
|
- File locking via `fcntl.flock(LOCK_EX)` for single-writer safety
|
|
|
|
### 6.2 state.cbor (Chain State Checkpoint)
|
|
|
|
A single CBOR map, atomically rewritten after each append:
|
|
|
|
```cbor
|
|
{
|
|
"chain_id": bytes[32], # SHA-256(canonical_bytes(genesis))
|
|
"head_index": uint, # Index of the most recent record
|
|
"head_hash": bytes[32], # Hash of the most recent record
|
|
"record_count": uint, # Total records in chain
|
|
"created_at": int, # Unix µs when chain was created
|
|
"last_append_at": int # Unix µs of last append
|
|
}
|
|
```
|
|
|
|
This file is a performance optimization — the canonical state is always derivable from
|
|
`chain.bin`. On corruption, `state.cbor` is rebuilt by scanning the log.
|
|
|
|
### 6.3 File Locations
|
|
|
|
```
|
|
~/.fwmetadata/chain/
|
|
chain.bin Append-only record log
|
|
state.cbor Chain state checkpoint
|
|
```
|
|
|
|
Paths are defined in `src/fieldwitness/paths.py`.
|
|
|
|
## 7. Migration from Attest-Only Attestations
|
|
|
|
Existing Attest attestations in `~/.fwmetadata/attestations/` are not modified. The chain
|
|
is a parallel structure. Migration is performed by the `fieldwitness chain backfill` command:
|
|
|
|
1. Iterate all records in Attest's `LocalStorage` (ordered by timestamp)
|
|
2. For each record, compute `content_hash = SHA-256(record.to_bytes())`
|
|
3. Create a chain record with:
|
|
- `content_type = "attest/attestation-v1"`
|
|
- `claimed_ts` set to the original Attest timestamp
|
|
- `metadata = {"backfilled": true, "original_ts": <attest_timestamp>}`
|
|
- Entropy witnesses collected at migration time (not original time)
|
|
4. Append to chain
|
|
|
|
Backfilled records are distinguishable via the `backfilled` metadata flag. Their entropy
|
|
witnesses reflect migration time, not original attestation time — this is honest and
|
|
intentional.
|
|
|
|
## 8. Content Types
|
|
|
|
The `content_type` field identifies what was hashed into `content_hash`. Defined types:
|
|
|
|
| Content Type | Description |
|
|
|---|---|
|
|
| `attest/attestation-v1` | Attest `AttestationRecord` serialized bytes |
|
|
| `fieldwitness/raw-file-v1` | Raw file bytes (for non-image attestations, future) |
|
|
| `fieldwitness/metadata-only-v1` | No file content; metadata-only attestation (future) |
|
|
|
|
New content types may be added without changing the record format version.
|