Files
fieldwitness/src/fieldwitness/attest/attestation.py
Aaron D. Lee 490f9d4a1d Rebrand SooSeF to FieldWitness
Complete project rebrand for better positioning in the press freedom
and digital security space. FieldWitness communicates both field
deployment and evidence testimony — appropriate for the target audience
of journalists, NGOs, and human rights organizations.

Rename mapping:
- soosef → fieldwitness (package, CLI, all imports)
- soosef.stegasoo → fieldwitness.stego
- soosef.verisoo → fieldwitness.attest
- ~/.soosef/ → ~/.fwmetadata/ (innocuous data dir name)
- SOOSEF_DATA_DIR → FIELDWITNESS_DATA_DIR
- SoosefConfig → FieldWitnessConfig
- SoosefError → FieldWitnessError

Also includes:
- License switch from MIT to GPL-3.0
- C2PA bridge module (Phase 0-2 MVP): cert.py, export.py, vendor_assertions.py
- README repositioned to lead with provenance/federation, stego backgrounded
- Threat model skeleton at docs/security/threat-model.md
- Planning docs: docs/planning/c2pa-integration.md, docs/planning/gtm-feasibility.md

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-02 15:05:13 -04:00

666 lines
27 KiB
Python

"""
Attestation Creation Module for Attest.
This module is the core of Attest's provenance system. An attestation is a
cryptographic proof that binds together:
1. AN IMAGE - identified by multiple hashes (SHA-256 + perceptual)
2. AN IDENTITY - the attestor's Ed25519 public key fingerprint
3. A TIMESTAMP - when the attestation was created
4. METADATA - optional context (location, device, caption, etc.)
The binding is secured by an Ed25519 digital signature over all components,
making it computationally infeasible to forge or modify after creation.
Architecture Overview:
----------------------
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ Image Data │ │ Private │ │ Metadata │
│ (bytes) │ │ Key │ │ (optional) │
└──────┬──────┘ └──────┬──────┘ └──────┬──────┘
│ │ │
▼ │ ▼
┌─────────────┐ │ ┌─────────────┐
│ Compute │ │ │ Extract │
│ Hashes │ │ │ EXIF │
│ (SHA+pHash)│ │ │ (auto) │
└──────┬──────┘ │ └──────┬──────┘
│ │ │
└───────────┬───────┴───────────┬───────┘
│ │
▼ ▼
┌─────────────────────────────────┐
│ Build Canonical Signing Payload │
│ (deterministic byte string) │
└───────────────┬─────────────────┘
┌───────────────┐
│ Ed25519 Sign │
└───────┬───────┘
┌───────────────┐
│ Attestation │
│ Record │
└───────────────┘
Security Properties:
-------------------
- **Authenticity**: Only the private key holder can create valid signatures
- **Integrity**: Any modification invalidates the signature
- **Non-repudiation**: Attestor cannot deny creating the attestation
- **Timestamping**: Proves the image existed at attestation time
Usage Example:
-------------
from .attestation import create_attestation
from .crypto import load_private_key
# Load attestor's private key
private_key = load_private_key("~/.attest/private.pem")
# Create attestation with auto EXIF extraction
attestation = create_attestation(
image_data=open("photo.jpg", "rb").read(),
private_key=private_key,
metadata={"caption": "Street scene in Kyiv"},
)
# The attestation.record can now be stored in the append-only log
"""
from __future__ import annotations
from datetime import datetime, timezone
from typing import Any
from cryptography.hazmat.primitives.asymmetric.ed25519 import Ed25519PrivateKey
from .crypto import create_identity, sign
from .hashing import hash_image
from .models import (
Attestation,
AttestationRecord,
CaptureDevice,
CaptureMetadata,
GeoLocation,
ImageHashes,
)
# =============================================================================
# EXIF METADATA EXTRACTION
# =============================================================================
#
# EXIF (Exchangeable Image File Format) is metadata embedded in images by
# cameras and phones. It contains valuable provenance information:
# - When the photo was taken (DateTimeOriginal)
# - Where it was taken (GPS coordinates)
# - What device captured it (Make, Model)
#
# We extract this automatically to enrich attestations, but user-provided
# metadata always takes precedence (in case EXIF is wrong or missing).
# =============================================================================
def extract_exif_metadata(image_data: bytes) -> dict[str, Any]:
"""
Extract EXIF metadata from image bytes for attestation enrichment.
This function parses embedded EXIF data from JPEG/TIFF images to extract
provenance-relevant information. The extracted data provides evidence of
when, where, and how an image was captured.
Extraction Priority:
-------------------
1. **Timestamp**: DateTimeOriginal > DateTimeDigitized > DateTime
- DateTimeOriginal: When shutter was pressed (most reliable)
- DateTimeDigitized: When image was digitized (scanner use case)
- DateTime: Last modification time (least reliable for capture)
2. **Location**: GPS coordinates with hemisphere references
- Converted from DMS (degrees/minutes/seconds) to decimal degrees
- Hemisphere (N/S, E/W) determines sign
3. **Device**: Camera/phone identification
- Make: Manufacturer (e.g., "Apple", "Canon")
- Model: Device model (e.g., "iPhone 15 Pro", "EOS R5")
- Software: Firmware/app version
4. **Dimensions**: Image width and height in pixels
Args:
image_data: Raw image bytes (JPEG, TIFF, or other EXIF-capable format)
Returns:
Dictionary with extracted fields. May contain:
- captured_at: datetime object (UTC)
- location: GeoLocation object
- device: CaptureDevice object
- width: int (pixels)
- height: int (pixels)
Returns empty dict if:
- exifread library not installed
- Image has no EXIF data
- EXIF parsing fails
Note:
Uses 'exifread' library (optional dependency). Gracefully degrades
to empty dict if unavailable, allowing attestations without EXIF.
Example:
>>> exif = extract_exif_metadata(jpeg_bytes)
>>> if exif.get("location"):
... print(f"Photo taken at {exif['location']}")
"""
# -------------------------------------------------------------------------
# Import exifread lazily - it's an optional dependency
# If not installed, we simply skip EXIF extraction (graceful degradation)
# -------------------------------------------------------------------------
try:
import io
import exifread
except ImportError:
# exifread not installed - return empty dict, attestation still works
return {}
# -------------------------------------------------------------------------
# Parse EXIF tags from image bytes
# details=False skips thumbnail and MakerNote (faster, less memory)
# -------------------------------------------------------------------------
try:
tags = exifread.process_file(io.BytesIO(image_data), details=False)
except Exception:
# Malformed EXIF or unsupported format - continue without EXIF
return {}
if not tags:
return {}
result: dict[str, Any] = {}
# -------------------------------------------------------------------------
# TIMESTAMP EXTRACTION
# Try multiple EXIF date fields in order of reliability for capture time
# EXIF stores dates as strings: "YYYY:MM:DD HH:MM:SS" (note colons in date)
# -------------------------------------------------------------------------
for tag_name in ["EXIF DateTimeOriginal", "EXIF DateTimeDigitized", "Image DateTime"]:
if tag_name in tags:
try:
dt_str = str(tags[tag_name])
# Parse EXIF date format (colons in date part, space separator)
dt = datetime.strptime(dt_str, "%Y:%m:%d %H:%M:%S")
# Assume UTC if no timezone info (EXIF doesn't store timezone)
result["captured_at"] = dt.replace(tzinfo=timezone.utc)
break # Use first valid timestamp found
except (ValueError, TypeError):
continue # Malformed date string, try next field
# -------------------------------------------------------------------------
# GPS COORDINATE EXTRACTION
# EXIF stores GPS as rational numbers in DMS (degrees, minutes, seconds)
# with separate reference tags for hemisphere (N/S for lat, E/W for lon)
# -------------------------------------------------------------------------
gps_lat = tags.get("GPS GPSLatitude")
gps_lat_ref = tags.get("GPS GPSLatitudeRef") # "N" or "S"
gps_lon = tags.get("GPS GPSLongitude")
gps_lon_ref = tags.get("GPS GPSLongitudeRef") # "E" or "W"
if gps_lat and gps_lon:
try:
# Convert DMS to decimal degrees
lat = _parse_gps_coord(gps_lat.values)
lon = _parse_gps_coord(gps_lon.values)
# Apply hemisphere sign (South and West are negative)
if gps_lat_ref and str(gps_lat_ref) == "S":
lat = -lat
if gps_lon_ref and str(gps_lon_ref) == "W":
lon = -lon
result["location"] = GeoLocation(latitude=lat, longitude=lon)
except (ValueError, TypeError, AttributeError):
# Malformed GPS data - skip location rather than fail
pass
# -------------------------------------------------------------------------
# DEVICE INFORMATION EXTRACTION
# Identifies the camera/phone that captured the image
# Useful for establishing authenticity ("taken with iPhone, not Photoshop")
# -------------------------------------------------------------------------
make = tags.get("Image Make") # e.g., "Apple", "Canon", "Nikon"
model = tags.get("Image Model") # e.g., "iPhone 15 Pro", "EOS R5"
software = tags.get("Image Software") # e.g., "iOS 17.4", "Lightroom"
if make or model or software:
result["device"] = CaptureDevice(
make=str(make).strip() if make else None,
model=str(model).strip() if model else None,
software=str(software).strip() if software else None,
)
# -------------------------------------------------------------------------
# IMAGE DIMENSIONS
# May differ from actual pixel dimensions if image was resized after EXIF
# -------------------------------------------------------------------------
width = tags.get("EXIF ExifImageWidth") or tags.get("Image ImageWidth")
height = tags.get("EXIF ExifImageLength") or tags.get("Image ImageLength")
if width:
try:
result["width"] = int(str(width))
except (ValueError, TypeError):
pass
if height:
try:
result["height"] = int(str(height))
except (ValueError, TypeError):
pass
return result
def _parse_gps_coord(coord_values: list) -> float:
"""
Convert EXIF GPS coordinate from DMS to decimal degrees.
EXIF stores GPS coordinates as three rational numbers representing
degrees, minutes, and seconds. Each rational has a numerator and
denominator (to handle fractional seconds precisely).
Formula: decimal = degrees + (minutes / 60) + (seconds / 3600)
Args:
coord_values: List of three IfdTag Ratio objects [degrees, minutes, seconds]
Each has .num (numerator) and .den (denominator) attributes
Returns:
Decimal degrees as float (always positive; caller applies hemisphere sign)
Example:
GPS coords "50° 27' 0.36"" stored as:
[Ratio(50/1), Ratio(27/1), Ratio(36/100)]
Returns: 50.45010 (50 + 27/60 + 0.36/3600)
"""
# Extract degrees, minutes, seconds as floats from rational values
d = float(coord_values[0].num) / float(coord_values[0].den) # Degrees
m = float(coord_values[1].num) / float(coord_values[1].den) # Minutes
s = float(coord_values[2].num) / float(coord_values[2].den) # Seconds
# Convert to decimal: degrees + minutes/60 + seconds/3600
return d + (m / 60.0) + (s / 3600.0)
# =============================================================================
# ATTESTATION CREATION
# =============================================================================
#
# The main entry point for creating attestations. This function orchestrates:
# 1. Metadata preparation (EXIF extraction + user overrides)
# 2. Image hashing (cryptographic + perceptual)
# 3. Payload construction (canonical, deterministic format)
# 4. Digital signing (Ed25519)
# 5. Record assembly
#
# The result is an Attestation object containing both the image data and the
# signed record, ready for storage in the append-only log.
# =============================================================================
def create_attestation(
image_data: bytes,
private_key: Ed25519PrivateKey,
metadata: dict[str, Any] | CaptureMetadata | None = None,
timestamp: datetime | None = None,
auto_exif: bool = True,
) -> Attestation:
"""
Create a cryptographically signed attestation for an image.
This is the primary function for creating image attestations. It binds
an image to an identity at a specific point in time, with optional
metadata providing additional context.
The Attestation Process:
-----------------------
Input Image ──► Hash (SHA-256 + pHash + dHash)
EXIF Data ───► Extract ─────┤
User Metadata ──────────────┤
┌───────────────────┐
│ Canonical Payload │
│ (deterministic) │
└─────────┬─────────┘
Private Key ──► Sign ─────┤
┌───────────────────┐
│ AttestationRecord │
└───────────────────┘
Metadata Precedence:
-------------------
When auto_exif=True, metadata is merged with the following precedence:
1. User-provided metadata (highest priority - always wins)
2. EXIF-extracted metadata (fills in gaps)
This allows users to correct wrong EXIF data (e.g., wrong camera time)
while still benefiting from automatic extraction when available.
Args:
image_data:
Raw image bytes. Supports any format PIL can read (JPEG, PNG,
TIFF, WebP, etc.). The bytes are hashed directly for SHA-256,
and decoded for perceptual hashing.
private_key:
Ed25519 private key for signing. This establishes WHO is making
the attestation. The corresponding public key fingerprint is
embedded in the record.
metadata:
Optional provenance metadata. Can be:
- dict: Raw key-value pairs (flexible schema)
- CaptureMetadata: Structured object with typed fields
- None: No user metadata (may still extract EXIF)
Common fields: location, caption, device, captured_at, tags
timestamp:
Override the attestation timestamp. Defaults to current UTC time.
Note: This is ATTESTATION time, not CAPTURE time. Capture time
should be in metadata (auto-extracted from EXIF or user-provided).
auto_exif:
If True (default), automatically extract EXIF metadata from the
image and merge with user-provided metadata. Set to False to:
- Skip EXIF extraction for speed
- Avoid including potentially sensitive EXIF data
- When EXIF is known to be incorrect/missing
Returns:
Attestation object containing:
- image_data: Original image bytes (for storage/verification)
- image_hashes: Computed hashes (SHA-256 + perceptual)
- record: Signed AttestationRecord ready for the log
Raises:
ValueError: If image cannot be decoded for hashing
TypeError: If private_key is not an Ed25519PrivateKey
Example:
>>> # Simple attestation with auto EXIF
>>> attestation = create_attestation(image_bytes, private_key)
>>> # With manual location override
>>> attestation = create_attestation(
... image_bytes,
... private_key,
... metadata={"location": {"lat": 50.45, "lon": 30.52, "name": "Kyiv"}},
... )
>>> # Disable EXIF extraction
>>> attestation = create_attestation(
... image_bytes,
... private_key,
... auto_exif=False,
... )
Security Notes:
- The signature covers ALL fields (hashes, fingerprint, timestamp, metadata)
- Changing any field invalidates the signature
- Timestamp is attestation time, not necessarily capture time
- Verify attestations using attest.verification module
"""
# -------------------------------------------------------------------------
# STEP 1: Establish attestation timestamp
# Use provided timestamp or current UTC time
# -------------------------------------------------------------------------
timestamp = timestamp or datetime.now(timezone.utc)
# -------------------------------------------------------------------------
# STEP 2: Normalize metadata to dict format
# Accept CaptureMetadata objects, dicts, or None
# -------------------------------------------------------------------------
if isinstance(metadata, CaptureMetadata):
# Convert structured object to dict for JSON serialization
metadata_dict = metadata.to_dict()
elif metadata is None:
metadata_dict = {}
else:
# Make a copy to avoid mutating caller's dict
metadata_dict = dict(metadata)
# -------------------------------------------------------------------------
# STEP 3: Auto-extract and merge EXIF metadata (if enabled)
# EXIF provides defaults; user metadata takes precedence
# -------------------------------------------------------------------------
if auto_exif:
exif_data = extract_exif_metadata(image_data)
if exif_data:
# Start with empty merged dict, add EXIF fields not in user metadata
merged: dict[str, Any] = {}
# Convert EXIF objects to JSON-serializable dict format
# Only include if not already provided by user
if "captured_at" in exif_data and "captured_at" not in metadata_dict:
merged["captured_at"] = exif_data["captured_at"].isoformat()
if "location" in exif_data and "location" not in metadata_dict:
merged["location"] = exif_data["location"].to_dict()
if "device" in exif_data and "device" not in metadata_dict:
merged["device"] = exif_data["device"].to_dict()
if "width" in exif_data and "width" not in metadata_dict:
merged["width"] = exif_data["width"]
if "height" in exif_data and "height" not in metadata_dict:
merged["height"] = exif_data["height"]
# User metadata overwrites EXIF (higher precedence)
merged.update(metadata_dict)
metadata_dict = merged
metadata = metadata_dict
# -------------------------------------------------------------------------
# STEP 4: Compute image hashes
# SHA-256 for exact matching, pHash/dHash for perceptual matching
# These survive social media compression, resizing, format conversion
# -------------------------------------------------------------------------
image_hashes = hash_image(image_data)
# -------------------------------------------------------------------------
# STEP 5: Get attestor identity from private key
# The fingerprint is SHA-256(public_key)[:16] in hex (32 chars)
# -------------------------------------------------------------------------
identity = create_identity(private_key)
# -------------------------------------------------------------------------
# STEP 6: Build canonical signing payload
# Deterministic byte string that will be signed
# Must be reproducible for verification
# -------------------------------------------------------------------------
signing_payload = _build_signing_payload(
image_hashes, identity.fingerprint, timestamp, metadata
)
# -------------------------------------------------------------------------
# STEP 7: Sign the payload with Ed25519
# Produces 64-byte signature
# -------------------------------------------------------------------------
signature = sign(private_key, signing_payload)
# -------------------------------------------------------------------------
# STEP 8: Assemble the attestation record
# This is what gets stored in the append-only log
# -------------------------------------------------------------------------
record = AttestationRecord(
image_hashes=image_hashes,
signature=signature,
attestor_fingerprint=identity.fingerprint,
timestamp=timestamp,
metadata=metadata,
)
# Return full Attestation with image data (for storage) and record
return Attestation(
image_data=image_data,
image_hashes=image_hashes,
record=record,
)
def create_attestation_from_hashes(
image_hashes: ImageHashes,
private_key: Ed25519PrivateKey,
metadata: dict[str, Any] | None = None,
timestamp: datetime | None = None,
) -> AttestationRecord:
"""
Create attestation record from pre-computed hashes (without image bytes).
This is a lower-level function for scenarios where you have image hashes
but not the original image data:
Use Cases:
---------
- **Distributed systems**: Hashes computed on one node, signed on another
- **Batch processing**: Pre-compute hashes, sign later
- **Re-attestation**: Create new attestation for known hashes
- **Testing**: Create records without actual images
Unlike create_attestation(), this function:
- Does NOT compute hashes (uses provided ImageHashes)
- Does NOT extract EXIF (no image bytes available)
- Returns AttestationRecord directly (no Attestation wrapper)
Args:
image_hashes: Pre-computed ImageHashes object with sha256, phash, dhash
private_key: Ed25519 private key for signing
metadata: Optional metadata dict (no auto-extraction)
timestamp: Override timestamp (defaults to now, UTC)
Returns:
AttestationRecord ready for storage (no image data attached)
Example:
>>> # Re-attest an image you've already hashed
>>> from .hashing import hash_image
>>> hashes = hash_image(image_bytes)
>>> # ... later, on a different system ...
>>> record = create_attestation_from_hashes(hashes, private_key)
"""
timestamp = timestamp or datetime.now(timezone.utc)
metadata = metadata or {}
identity = create_identity(private_key)
signing_payload = _build_signing_payload(
image_hashes, identity.fingerprint, timestamp, metadata
)
signature = sign(private_key, signing_payload)
return AttestationRecord(
image_hashes=image_hashes,
signature=signature,
attestor_fingerprint=identity.fingerprint,
timestamp=timestamp,
metadata=metadata,
)
# =============================================================================
# SIGNING PAYLOAD CONSTRUCTION
# =============================================================================
#
# The signing payload is the exact byte sequence that gets signed. It MUST be:
# 1. Deterministic - same inputs always produce same payload
# 2. Unambiguous - different inputs never produce same payload
# 3. Complete - covers all attested data
#
# We achieve this with:
# - Newline-separated fields (unambiguous delimiter)
# - Sorted JSON keys (deterministic object serialization)
# - No whitespace in JSON (canonical form)
# - UTF-8 encoding (explicit byte representation)
# =============================================================================
def _build_signing_payload(
image_hashes: ImageHashes,
attestor_fingerprint: str,
timestamp: datetime,
metadata: dict[str, Any],
) -> bytes:
"""
Build the canonical byte string for Ed25519 signing.
This function constructs a deterministic representation of all attested
data. The same inputs MUST always produce the same output bytes, as this
is required for signature verification.
Payload Format:
--------------
Line 1: SHA-256 hash (64 hex characters)
Line 2: pHash (perceptual hash, typically 16 hex chars)
Line 3: dHash (difference hash, typically 16 hex chars)
Line 4: Attestor fingerprint (32 hex characters)
Line 5: Timestamp (ISO 8601 format with timezone)
Line 6: Metadata (JSON, sorted keys, no whitespace)
Example payload:
a1b2c3d4... (SHA-256, 64 chars)
f8e7d6c5... (pHash, 16 chars)
1a2b3c4d... (dHash, 16 chars)
9f8e7d6c... (fingerprint, 32 chars)
2024-01-15T10:30:00+00:00
{"caption":"Test","location":{"lat":50.45,"lon":30.52}}
Canonicalization Rules:
----------------------
- JSON keys are sorted alphabetically (Python's sort_keys=True)
- No whitespace in JSON (separators=(",", ":"))
- Timestamp includes timezone (isoformat())
- All lines joined with single newline (no trailing newline)
- Final encoding is UTF-8
Args:
image_hashes: ImageHashes object with sha256, phash, dhash
attestor_fingerprint: Hex string identifying the attestor
timestamp: Datetime object (should have timezone info)
metadata: Dict to serialize as JSON
Returns:
UTF-8 encoded bytes ready for signing
Security Notes:
- Changing any field changes the payload (and invalidates signature)
- The format is designed to be unambiguous (no field can "bleed" into another)
- Verification must use IDENTICAL canonicalization
"""
import json
# Canonical JSON: sorted keys, no whitespace (compact, deterministic)
metadata_json = json.dumps(metadata, sort_keys=True, separators=(",", ":"))
# Join all fields with newlines (unambiguous separator)
payload = "\n".join([
image_hashes.sha256, # Line 1: Cryptographic hash
image_hashes.phash, # Line 2: Perceptual hash (DCT-based)
image_hashes.dhash, # Line 3: Difference hash
attestor_fingerprint, # Line 4: Who is attesting
timestamp.isoformat(), # Line 5: When (ISO 8601)
metadata_json, # Line 6: Additional context
])
# Encode as UTF-8 bytes for signing
return payload.encode("utf-8")