fieldwitness/docs/planning/c2pa-integration.md
Aaron D. Lee 490f9d4a1d Rebrand SooSeF to FieldWitness
Complete project rebrand for better positioning in the press freedom
and digital security space. FieldWitness communicates both field
deployment and evidence testimony — appropriate for the target audience
of journalists, NGOs, and human rights organizations.

Rename mapping:
- soosef → fieldwitness (package, CLI, all imports)
- soosef.stegasoo → fieldwitness.stego
- soosef.verisoo → fieldwitness.attest
- ~/.soosef/ → ~/.fwmetadata/ (innocuous data dir name)
- SOOSEF_DATA_DIR → FIELDWITNESS_DATA_DIR
- SoosefConfig → FieldWitnessConfig
- SoosefError → FieldWitnessError

Also includes:
- License switch from MIT to GPL-3.0
- C2PA bridge module (Phase 0-2 MVP): cert.py, export.py, vendor_assertions.py
- README repositioned to lead with provenance/federation, stego backgrounded
- Threat model skeleton at docs/security/threat-model.md
- Planning docs: docs/planning/c2pa-integration.md, docs/planning/gtm-feasibility.md

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-02 15:05:13 -04:00

241 lines
8.8 KiB
Markdown

# C2PA Integration Plan
**Audience:** FieldWitness developers and maintainers
**Status:** Planning (pre-implementation)
**Last updated:** 2026-04-01
## Overview
FieldWitness needs C2PA (Coalition for Content Provenance and Authenticity) export/import
capability. C2PA is the emerging industry standard for content provenance, backed by
Adobe, Microsoft, Google, and the BBC. ProofMode, Guardian Project, and Starling Lab
have all adopted C2PA. FieldWitness must speak C2PA to remain relevant in the provenance
space.
---
## C2PA Spec Essentials
- JUMBF-based provenance standard embedded in media files
- Core structures: **Manifest Store > Manifest > Claim + Assertions + Ingredients + Signature**
- Claims are CBOR maps with assertion references, signing algorithm, `claim_generator`,
and timestamps
- Standard assertions:
- `c2pa.actions` -- edit history
- `c2pa.hash.data` -- hard binding (byte-range)
- `c2pa.location.broad` -- city/region location
- `c2pa.exif` -- EXIF metadata
- `c2pa.creative.work` -- title, description, authorship
- `c2pa.training-mining` -- AI training/mining consent
- Vendor-specific assertions under reverse-DNS (e.g., `org.fieldwitness.*`)
- Signing uses **COSE_Sign1** (RFC 9052)
- Supported algorithms: Ed25519 (OKP), ES256/ES384/ES512 (ECDSA), PS256/PS384/PS512 (RSA-PSS)
- **X.509 certificate chain required** -- embedded in COSE unprotected header; raw public
keys are not sufficient
- Offline validation works with pre-installed trust anchors; self-signed certs work in
"local trust anchor" mode
## Python Library: c2pa-python
- Canonical binding from C2PA org (PyPI: `c2pa-python`, GitHub: `contentauth/c2pa-python`)
- Rust extension (`c2pa-rs` via PyO3), not pure Python
- Version ~0.6.x, API not fully stable
- Platform wheels: manylinux2014 x86_64/aarch64, macOS, Windows
- **No armv6/armv7 wheels** -- affects Tier 1 Raspberry Pi deployments
- Core API: `c2pa.Reader`, `c2pa.Builder`, `builder.sign()`, `c2pa.create_signer()`
- `create_signer` takes a callback, algorithm, certs PEM, optional timestamp URL
- `timestamp_url=None` skips RFC 3161 timestamping (acceptable for offline use)
---
## Concept Mapping: FieldWitness to C2PA
### Clean mappings
| FieldWitness | C2PA |
|--------|------|
| `AttestationRecord` | C2PA Manifest |
| `attestor_fingerprint` | Signer cert subject (wrapped in X.509) |
| `AttestationRecord.timestamp` | Claim `created` (ISO 8601) |
| `CaptureMetadata.captured_at` | `c2pa.exif` DateTimeOriginal |
| `CaptureMetadata.location` | `c2pa.location.broad` |
| `CaptureMetadata.device` | `c2pa.exif` Make/Model |
| `CaptureMetadata.caption` | `c2pa.creative.work` description |
| `ImageHashes.sha256` | `c2pa.hash.data` (hard binding) |
| Ed25519 private key | COSE_Sign1 signing key (needs X.509 wrapper) |
### FieldWitness has, C2PA does not
- Perceptual hashes (phash, dhash) -- map to vendor assertion `org.fieldwitness.perceptual-hashes`
- Merkle log inclusion proofs -- map to vendor assertion `org.fieldwitness.merkle-proof`
- Chain records with entropy witnesses -- map to vendor assertion `org.fieldwitness.chain-record`
- Delivery acknowledgment records (entirely FieldWitness-specific)
- Cross-org gossip federation
- Perceptual matching for verification (survives recompression)
- Selective disclosure / redaction
### C2PA has, FieldWitness does not
- Hard file binding (byte-range exclusion zones)
- X.509 certificate trust chains
- Actions history (`c2pa.actions`: crop, rotate, AI-generate, etc.)
- AI training/mining consent
- Ingredient DAG (content derivation graph)
---
## Privacy Design
Three tiers of identity disclosure:
1. **Org-level cert (preferred):** One self-signed X.509 cert per organization, not per
person. Subject is org name. Individual reporters do not appear in the manifest.
2. **Pseudonym cert:** Subject is pseudonym or random UUID. Valid C2PA but unrecognized
by external trust anchors.
3. **No C2PA export:** For critical-threat presets, evidence stays in FieldWitness format until
reaching Tier 2.
### GPS handling
C2PA's `c2pa.location.broad` is city/region level. FieldWitness captures precise GPS. On
export, downsample to city-level unless the operator explicitly opts in. Precise GPS
stays in FieldWitness record only.
### Metadata handling
Strip all EXIF from the output file except what is intentionally placed in the
`c2pa.exif` assertion.
---
## Offline-First Constraints
- **Tier 1 (field, no internet):** C2PA manifests without RFC 3161 timestamp. FieldWitness
chain record provides timestamp anchoring via vendor assertion.
- **Tier 2 (org server, may have internet):** Optionally contact TSA at export time.
Connects to existing `anchors.py` infrastructure.
- Entropy witnesses embedded as vendor assertions provide soft timestamp evidence.
- Evidence packages include org cert PEM alongside C2PA manifest for offline verification.
- `c2pa-python` availability gated behind `has_c2pa()` -- not all hardware can run it.
---
## Architecture
### New module: `src/fieldwitness/c2pa_bridge/`
```
src/fieldwitness/c2pa_bridge/
__init__.py # Public API: export, import, has_c2pa()
cert.py # Self-signed X.509 cert generation from Ed25519 key
export.py # AttestationRecord -> C2PA manifest
importer.py # C2PA manifest -> AttestationRecord (best-effort)
vendor_assertions.py # org.fieldwitness.* assertion schemas
cli.py # CLI subcommands: fieldwitness c2pa export / verify / import
```
### Module relationships
- `export.py` reads from `attest/models.py`, `federation/chain.py`,
`keystore/manager.py`; calls `cert.py` and `vendor_assertions.py`
- `importer.py` reads image bytes, writes `AttestationRecord` via
`attest/attestation.py`, parses vendor assertions
### Web UI
New routes in the `attest.py` blueprint:
- `GET /attest/<record_id>/c2pa` -- download C2PA-embedded image
- `POST /attest/import-c2pa` -- upload and import C2PA manifest
### Evidence packages
`evidence.py` gains `include_c2pa=True` option. Adds C2PA-embedded file variants and
org cert to the ZIP.
### pyproject.toml extra
```toml
c2pa = ["c2pa-python>=0.6.0", "fieldwitness[attest]"]
```
---
## Implementation Phases
### Phase 0 -- Prerequisites (~1h)
- `has_c2pa()` in `_availability.py`
- `c2pa` extra in `pyproject.toml`
### Phase 1 -- Certificate management (~3h)
- `c2pa_bridge/cert.py`
- Self-signed X.509 from Ed25519 identity key
- Configurable subject (org name default, pseudonym for high-threat)
- Store at `~/.fwmetadata/identity/c2pa_cert.pem`
- Regenerate on key rotation
### Phase 2 -- Export path (~6h)
- `c2pa_bridge/export.py` + `vendor_assertions.py`
- Core function `export_c2pa()` takes image data, `AttestationRecord`, key, cert, options
- Builds assertions: `c2pa.actions`, `c2pa.hash.data`, `c2pa.exif`, `c2pa.creative.work`,
`org.fieldwitness.perceptual-hashes`, `org.fieldwitness.chain-record`, `org.fieldwitness.attestation-id`
- Vendor assertion schemas versioned (v1)
### Phase 3 -- Import path (~5h)
- `c2pa_bridge/importer.py`
- `import_c2pa()` reads C2PA manifest, produces `AttestationRecord`
- Maps C2PA fields to FieldWitness model
- Returns `C2PAImportResult` with `trust_status`
- Creates new FieldWitness attestation record over imported data
### Phase 4 -- CLI integration (~4h)
- `fieldwitness c2pa export/verify/import/show` subcommands
- Gated on `has_c2pa()`
### Phase 5 -- Web UI + evidence packages (~5h)
- Blueprint routes for export/import
- Evidence package C2PA option
### Phase 6 -- Threat-level presets (~2h)
- Add `c2pa` config block to each preset (`export_enabled`, `privacy_level`,
`include_precise_gps`, `timestamp_url`)
- `C2PAConfig` sub-dataclass in `FieldWitnessConfig`
### MVP scope
**Phases 0-2 (~10h):** Produces C2PA-compatible images viewable in Adobe Content
Credentials and any C2PA verifier.
---
## Key Decisions (Before Coding)
1. **Use existing Ed25519 identity key for cert** (not a separate key) -- preserves
single-key-domain design.
2. **Cert stored at `~/.fwmetadata/identity/c2pa_cert.pem`**, regenerated on key rotation.
3. **Tier 1 ARM fallback:** Tier 1 produces FieldWitness records; Tier 2 generates C2PA export
on their behalf.
4. **Pin `c2pa-python>=0.6.0`**, add shim layer for API stability.
5. **Hard binding computed by `c2pa-python` Builder** automatically.
---
## FieldWitness's Unique C2PA Value
- **Cross-org chain of custody** via gossip federation (delivery ack records as ingredients)
- **Perceptual hash matching** embedded in C2PA (survives JPEG recompression via
WhatsApp/Telegram)
- **Merkle log inclusion proofs** in manifest (proves attestation committed to append-only log)
- **Entropy witnesses** as soft timestamp attestation (makes backdating harder without
RFC 3161)
- **Privacy-preserving by design** (org certs, GPS downsampling, zero-identity mode)
- **Fully offline end-to-end verification** (bundled cert + `c2pa-python`, no network needed)