More video work, planning, etc. -- Need to mark things EXPERIMENTAL.

This commit is contained in:
Aaron D. Lee
2026-03-24 16:00:30 -04:00
parent 05382c4081
commit 14fce4d3ed
15 changed files with 2641 additions and 84 deletions

View File

@@ -0,0 +1,294 @@
# Stegasoo Ideas Scout — Implementation Plans (2026-03-24)
Baseline: v4.3.0, Python >=3.11, FORMAT_VERSION 5, no existing users (no backward compat constraints).
---
## Tier 1 — Quick Wins
### 1. Platform-Calibrated DCT Presets
**Description**: `--platform telegram|discord|signal|whatsapp` flag for DCT encode. Bakes in each platform's known recompression parameters. Pre-verifies payload survives before outputting.
**Implementation approach**:
- New file `src/stegasoo/platform_presets.py``PlatformPreset` dataclass + `PRESETS` dict mapping platform → tuned `quant_step`, `jpeg_quality`, `embed_positions`, `max_dimension`, `recompress_quality`
- `dct_steganography.py`: `_embed_scipy_dct_safe()` / `_embed_jpegio()` accept optional preset overrides for `QUANT_STEP`, `DEFAULT_EMBED_POSITIONS`, output quality
- New `pre_verify_survival()` function: encode → re-save at platform quality → extract → pass/fail
- Thread `platform` param through `encode.py``steganography.py` → DCT functions
- `cli.py`: add `--platform` as `click.Choice` + `--verify/--no-verify` (pre-verification doubles encode time)
- LSB + `--platform` should error early — LSB data is destroyed by any JPEG recompression
**Known platform params** (from research):
| Platform | Quality | Max Dimension | Notes |
|----------|---------|---------------|-------|
| Telegram | ~82 | 2560×2560 | ~81KB embeddable |
| Discord | ~85 | Varies (Nitro) | |
| Signal | ~80 | Aggressive | |
| WhatsApp | ~70 | 1600×1600 | Most lossy |
**Go/No-Go metrics**:
- >95% payload survival rate per platform at 1KB message size in automated tests
- Pre-verification correctly predicts real platform behavior (manual validation per platform at least once)
**Complexity**: **M** — new file + parameter threading through 4-5 functions
**Risks**: Platform params change without notice. Add version/date stamps to presets and a `stegasoo tools verify-platform` test command.
---
### 2. Steganalysis Self-Check (`stegasoo check`)
**Description**: New CLI command running chi-square and RS (Regular-Singular) statistical analysis on stego images. Outputs detectability risk level (low/medium/high).
**Implementation approach**:
- New file `src/stegasoo/steganalysis.py`:
- `chi_square_analysis(image_data) -> float` — chi-square statistic on LSB distribution per channel
- `rs_analysis(image_data) -> float` — Regular-Singular groups analysis (requires numpy)
- `assess_risk(chi_p, rs_estimate) -> str` — maps to "low"/"medium"/"high"
- `check_image(image_data) -> dict` — orchestrator
- `cli.py`: new `@cli.command("check")` with `IMAGE` arg, `--json`, `--mode lsb|dct|auto`
- `constants.py`: threshold constants for chi-square p-value and RS boundaries
- `__init__.py`: export `check_image` in `__all__`
- Start LSB-only; DCT steganalysis (calibration attack) deferred
**Go/No-Go metrics**:
- Clean images → consistently "low risk"
- Naive sequential LSB → "high risk"
- Stegasoo LSB at <50% capacity → "low" or "medium"
**Complexity**: **M** — ~150 lines numpy per test, straightforward CLI integration
---
### 3. Python 3.13 DCT Cleanup
**Description**: The `jpegio``jpeglib` migration is already done in code. Remaining work: rename stale `jpegio` references and verify on 3.13.
**Implementation approach**:
- `dct_steganography.py`: rename `HAS_JPEGIO``HAS_JPEGLIB`, `_jpegio_*` functions → `_jpeglib_*`, update constant names (`JPEGIO_MAGIC``JPEGLIB_MAGIC`, etc.)
- Verify `jpeglib.to_jpegio()` compatibility shim — if jpeglib plans to deprecate it, migrate to native API
- Run full test suite on Python 3.13
**Go/No-Go metrics**:
- All DCT tests pass on Python 3.13
- No deprecation warnings from jpeglib
**Complexity**: **S** — renaming and verification only
---
## Tier 2 — Strategic
### 4. Content-Adaptive Embedding (S-UNIWARD/WOW-inspired)
**Description**: Replace uniform-random pixel selection with texture-weighted cost functions. Embed preferentially in busy/textured regions where changes are least detectable. 3-5x harder to detect statistically.
**Implementation approach**:
- New file `src/stegasoo/adaptive_cost.py`:
- `compute_cost_map(image_data) -> np.ndarray` — per-pixel distortion cost via directional high-pass filters (Daubechets wavelet bank / KB filter)
- `select_pixels_by_cost(cost_map, pixel_key, num_needed) -> list[int]` — weighted sampling, still ChaCha20-seeded for determinism
- `steganography.py`:
- `generate_pixel_indices()`: add `cost_map` param, use weighted sampling when provided
- `_embed_lsb()`: compute cost map when adaptive mode enabled
- `_extract_lsb()`: must compute identical cost map to find same pixels
- `dct_steganography.py`: adapt `DEFAULT_EMBED_POSITIONS` per-block based on block texture energy
- Thread `adaptive: bool` through `encode.py`/`decode.py`
- `constants.py`: add `EMBED_MODE_ADAPTIVE_LSB`, filter kernels, cost thresholds
**Go/No-Go metrics**:
- Chi-square test (Feature 2) shows measurable improvement vs uniform-random
- **Critical**: cost map computation is deterministic across platforms (quantize to fixed-point integers)
- Round-trip decode succeeds on Linux x86, Linux ARM, macOS
**Complexity**: **L** — novel algorithm, cross-platform determinism requirement, touches core embedding
**Risks**: Floating-point differences in wavelet computation could break extraction. Mitigate with integer quantization. Increases encode/decode time ~2-3x.
---
### 5. Per-Message Forward Secrecy via HKDF
**Description**: Derive ephemeral per-message encryption keys using HKDF expansion from the Argon2id root key + random nonce. Compromising one message doesn't reveal others.
**Implementation approach**:
- `crypto.py`:
- Add `from cryptography.hazmat.primitives.kdf.hkdf import HKDFExpand`
- `derive_message_key(root_key, nonce) -> bytes` — HKDF-Expand with SHA-256
- `encrypt_message()`: generate 16-byte random nonce, derive per-message key, embed nonce in header
- `decrypt_message()`: extract nonce, derive same key
- Also derive pixel selection key via HKDF with different `info` param
- `constants.py`:
- Bump `FORMAT_VERSION` to 6
- `HKDF_INFO_ENCRYPTION = b"stegasoo-v6-encrypt"`, `HKDF_INFO_PIXEL = b"stegasoo-v6-pixel"`
- `MESSAGE_NONCE_SIZE = 16`
- Header grows from 66 → 82 bytes: add `message_nonce(16)` field
- Update `HEADER_OVERHEAD` / `ENCRYPTION_OVERHEAD` in `steganography.py`
**Go/No-Go metrics**:
- Two messages with identical credentials produce different ciphertexts and different pixel locations
- `cryptography` library HKDF works with existing Argon2id output
**Complexity**: **M** — well-defined crypto change, touches security-critical header format
---
### 6. PWA Mobile Interface
**Description**: Convert Flask Web UI to Progressive Web App. Mobile-optimized, installable, offline-capable static pages.
**Implementation approach**:
- New files in `frontends/web/static/`: `manifest.json`, `sw.js`, icon set (192×192, 512×512)
- Base template: add manifest link, theme-color meta, viewport meta, service worker registration
- `app.py`: serve manifest with correct MIME, add cache headers for static assets
- Responsive CSS for encode/decode accordion forms
- Camera capture: `<input type="file" accept="image/*" capture="environment">` for reference photo
- Service worker caches static assets only — NOT encode/decode API endpoints
**Go/No-Go metrics**:
- Lighthouse PWA score >= 90
- Installable on Android Chrome and iOS Safari
- Offline: static pages load, encode/decode shows graceful "offline" message
**Complexity**: **M** — frontend only, no core library changes
**Risks**: Camera capture requires HTTPS (already supported via `ssl_utils.py`).
---
## Tier 3 — Moonshot
### 7. Plausible Deniability / Dual-Payload Mode
**Description**: Two independent encrypted payloads in one carrier, each with different credentials. Reveal decoy under coercion; real payload stays hidden.
**Implementation approach**:
- New file `src/stegasoo/dual_payload.py`:
- `encode_dual(message_a, message_b, carrier, creds_a, creds_b)`
- Partition available pixels into two disjoint pools using different seeds
- **Critical**: ALL images (single or dual) must fill unused pixel pool with random data so single-payload and dual-payload images are indistinguishable
- `steganography.py`: `generate_pixel_indices()` gets `exclude_indices` param
- `decode.py`: each credential set finds a different valid payload; wrong credentials produce garbage
- CLI + Web UI: dual-payload encode workflow
**Go/No-Go metrics**:
- Single-payload and dual-payload images are statistically indistinguishable (chi-square can't differentiate)
- Each payload decodes independently
- Wrong credentials for one payload don't reveal other payload's existence
**Complexity**: **XL** — novel design, halves capacity per payload, challenging UX, needs rigorous security analysis
**Dependencies**: Feature 2 (validation), Feature 4 (detectability reduction)
---
## Architectural Improvements
### 8. EmbeddingBackend Protocol
**Description**: Typed plugin interface for all embedding algorithms. Replace if/elif dispatch in `steganography.py` with a registry.
**Implementation approach**:
- New package `src/stegasoo/backends/`:
- `protocol.py``EmbeddingBackend(Protocol)` with `embed()`, `extract()`, `calculate_capacity()`, `is_available()`
- `lsb.py`, `dct.py` — wrap existing functions
- `registry.py``BackendRegistry` mapping mode strings to backends
- `steganography.py`: `embed_in_image()` / `extract_from_image()` dispatch via registry
- `__init__.py`: export protocol and `register_backend()`
**Complexity**: **M** — implement before Features 4 and 7 (they become new backends)
---
### 9. HKDF Key Separation
Subsumed by Feature 5. The HKDF expansion provides:
- Encryption key: `HKDF-Expand(root_key, info="stegasoo-encrypt", nonce)`
- Pixel selection key: `HKDF-Expand(root_key, info="stegasoo-pixel", nonce)`
- Future: MAC key, padding key, etc.
---
### 10. `[core]` Extra with Minimal Deps
**Description**: Move Pillow to `[image]` extra, base deps = `cryptography` + `argon2-cffi` + `zstandard` only.
**Complexity**: **S** — but Pillow is used in `crypto.py` for photo hashing (core to security model). Only worth it with a concrete headless use case. **Low priority.**
---
## Ecosystem Features
### 11. Aletheia Integration
Optional `--engine aletheia` backend for Feature 2's `stegasoo check`. BSD-licensed, provides SPA/RS/WS attacks + ML classifiers. **Complexity: S** (after Feature 2). **Depends on**: Feature 2.
### 12. C2PA/AI Provenance Watermarking
Embed C2PA metadata alongside stego payloads. **Complexity: L** — C2PA is a complex standard. Potentially conflicts with stego goals (adds detectable metadata). Research-heavy.
### 13. Signal/Matrix Bot
Bot that decodes stego images in a channel using configured channel key. **Complexity: M** — integration work, uses existing `decode()` API.
### 14. Homebrew Tap + Nix Flake
Package distribution for macOS/NixOS. **Complexity: S** — packaging only, no code changes.
---
## Summary Table
| # | Feature | Tier | Size | Dependencies | Primary Files |
|---|---------|------|------|-------------|---------------|
| 1 | Platform DCT Presets | T1 | M | — | new `platform_presets.py`, `dct_steganography.py`, `encode.py`, `cli.py` |
| 2 | Steganalysis Self-Check | T1 | M | — | new `steganalysis.py`, `cli.py`, `constants.py` |
| 3 | Python 3.13 DCT Cleanup | T1 | S | — | `dct_steganography.py` |
| 4 | Content-Adaptive Embedding | T2 | L | numpy, #2 | new `adaptive_cost.py`, `steganography.py`, `constants.py` |
| 5 | HKDF Forward Secrecy | T2 | M | — | `crypto.py`, `constants.py`, `steganography.py` |
| 6 | PWA Mobile Interface | T2 | M | — | `frontends/web/` templates + static |
| 7 | Dual-Payload Mode | T3 | XL | #2, #4 | new `dual_payload.py`, `steganography.py`, `cli.py` |
| 8 | EmbeddingBackend Protocol | Arch | M | — | new `backends/` package, `steganography.py` |
| 9 | HKDF Key Separation | Arch | — | Included in #5 | `crypto.py` |
| 10 | `[core]` Extra | Arch | S | — | `pyproject.toml` |
| 11 | Aletheia Integration | Eco | S | #2 | `steganalysis.py` |
| 12 | C2PA Watermarking | Eco | L | — | new module |
| 13 | Signal/Matrix Bot | Eco | M | — | new `bots/` package |
| 14 | Homebrew + Nix | Eco | S | — | packaging files only |
---
## Suggested Roadmap
### Phase 1 — Foundations (v4.4.0)
1. **#3** Python 3.13 DCT Cleanup (S) — unblocks CI on 3.13
2. **#8** EmbeddingBackend Protocol (M) — architectural cleanup before new embedding work
3. **#2** Steganalysis Self-Check (M) — validation tooling for everything that follows
### Phase 2 — Security & Robustness (v4.5.0)
4. **#5** HKDF Forward Secrecy (M) — FORMAT_VERSION bump to 6, improved crypto
5. **#1** Platform-Calibrated DCT Presets (M) — high user value for social media
6. **#14** Homebrew + Nix (S) — distribution expansion
### Phase 3 — Advanced Steganography (v5.0.0)
7. **#4** Content-Adaptive Embedding (L) — major security improvement
8. **#6** PWA Mobile Interface (M) — parallel frontend work stream
### Phase 4 — Moonshot (v5.x+)
9. **#7** Dual-Payload Mode (XL) — after #2 and #4 are solid
10. **#12** C2PA Watermarking (L) — research-heavy
11. **#13** Signal/Matrix Bot (M) — community-driven
---
## Additional Ideas (Backlog)
- **Animated GIF steganography** — LSB in GIF frames, natural multi-media extension
- **PDF steganography** — whitespace/font metric/embedded image payloads
- **Batch encode** — `stegasoo batch-encode --dir /photos/` with auto carrier selection (BATCH_* constants suggest this was planned)
- **Stego identification** — `stegasoo identify image.png` probes for known stego signatures
- **Per-device credential sync via QR** — channel key as stego image of reference photo
- **`stegasoo verify`** — decode + confirm message matches expected hash without revealing contents