Files
stegasoo/IdeasScout_PLANS_20260324.md

295 lines
14 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Stegasoo Ideas Scout — Implementation Plans (2026-03-24)
Baseline: v4.3.0, Python >=3.11, FORMAT_VERSION 5, no existing users (no backward compat constraints).
---
## Tier 1 — Quick Wins
### 1. Platform-Calibrated DCT Presets
**Description**: `--platform telegram|discord|signal|whatsapp` flag for DCT encode. Bakes in each platform's known recompression parameters. Pre-verifies payload survives before outputting.
**Implementation approach**:
- New file `src/stegasoo/platform_presets.py``PlatformPreset` dataclass + `PRESETS` dict mapping platform → tuned `quant_step`, `jpeg_quality`, `embed_positions`, `max_dimension`, `recompress_quality`
- `dct_steganography.py`: `_embed_scipy_dct_safe()` / `_embed_jpegio()` accept optional preset overrides for `QUANT_STEP`, `DEFAULT_EMBED_POSITIONS`, output quality
- New `pre_verify_survival()` function: encode → re-save at platform quality → extract → pass/fail
- Thread `platform` param through `encode.py``steganography.py` → DCT functions
- `cli.py`: add `--platform` as `click.Choice` + `--verify/--no-verify` (pre-verification doubles encode time)
- LSB + `--platform` should error early — LSB data is destroyed by any JPEG recompression
**Known platform params** (from research):
| Platform | Quality | Max Dimension | Notes |
|----------|---------|---------------|-------|
| Telegram | ~82 | 2560×2560 | ~81KB embeddable |
| Discord | ~85 | Varies (Nitro) | |
| Signal | ~80 | Aggressive | |
| WhatsApp | ~70 | 1600×1600 | Most lossy |
**Go/No-Go metrics**:
- >95% payload survival rate per platform at 1KB message size in automated tests
- Pre-verification correctly predicts real platform behavior (manual validation per platform at least once)
**Complexity**: **M** — new file + parameter threading through 4-5 functions
**Risks**: Platform params change without notice. Add version/date stamps to presets and a `stegasoo tools verify-platform` test command.
---
### 2. Steganalysis Self-Check (`stegasoo check`)
**Description**: New CLI command running chi-square and RS (Regular-Singular) statistical analysis on stego images. Outputs detectability risk level (low/medium/high).
**Implementation approach**:
- New file `src/stegasoo/steganalysis.py`:
- `chi_square_analysis(image_data) -> float` — chi-square statistic on LSB distribution per channel
- `rs_analysis(image_data) -> float` — Regular-Singular groups analysis (requires numpy)
- `assess_risk(chi_p, rs_estimate) -> str` — maps to "low"/"medium"/"high"
- `check_image(image_data) -> dict` — orchestrator
- `cli.py`: new `@cli.command("check")` with `IMAGE` arg, `--json`, `--mode lsb|dct|auto`
- `constants.py`: threshold constants for chi-square p-value and RS boundaries
- `__init__.py`: export `check_image` in `__all__`
- Start LSB-only; DCT steganalysis (calibration attack) deferred
**Go/No-Go metrics**:
- Clean images → consistently "low risk"
- Naive sequential LSB → "high risk"
- Stegasoo LSB at <50% capacity → "low" or "medium"
**Complexity**: **M** — ~150 lines numpy per test, straightforward CLI integration
---
### 3. Python 3.13 DCT Cleanup
**Description**: The `jpegio``jpeglib` migration is already done in code. Remaining work: rename stale `jpegio` references and verify on 3.13.
**Implementation approach**:
- `dct_steganography.py`: rename `HAS_JPEGIO``HAS_JPEGLIB`, `_jpegio_*` functions → `_jpeglib_*`, update constant names (`JPEGIO_MAGIC``JPEGLIB_MAGIC`, etc.)
- Verify `jpeglib.to_jpegio()` compatibility shim — if jpeglib plans to deprecate it, migrate to native API
- Run full test suite on Python 3.13
**Go/No-Go metrics**:
- All DCT tests pass on Python 3.13
- No deprecation warnings from jpeglib
**Complexity**: **S** — renaming and verification only
---
## Tier 2 — Strategic
### 4. Content-Adaptive Embedding (S-UNIWARD/WOW-inspired)
**Description**: Replace uniform-random pixel selection with texture-weighted cost functions. Embed preferentially in busy/textured regions where changes are least detectable. 3-5x harder to detect statistically.
**Implementation approach**:
- New file `src/stegasoo/adaptive_cost.py`:
- `compute_cost_map(image_data) -> np.ndarray` — per-pixel distortion cost via directional high-pass filters (Daubechets wavelet bank / KB filter)
- `select_pixels_by_cost(cost_map, pixel_key, num_needed) -> list[int]` — weighted sampling, still ChaCha20-seeded for determinism
- `steganography.py`:
- `generate_pixel_indices()`: add `cost_map` param, use weighted sampling when provided
- `_embed_lsb()`: compute cost map when adaptive mode enabled
- `_extract_lsb()`: must compute identical cost map to find same pixels
- `dct_steganography.py`: adapt `DEFAULT_EMBED_POSITIONS` per-block based on block texture energy
- Thread `adaptive: bool` through `encode.py`/`decode.py`
- `constants.py`: add `EMBED_MODE_ADAPTIVE_LSB`, filter kernels, cost thresholds
**Go/No-Go metrics**:
- Chi-square test (Feature 2) shows measurable improvement vs uniform-random
- **Critical**: cost map computation is deterministic across platforms (quantize to fixed-point integers)
- Round-trip decode succeeds on Linux x86, Linux ARM, macOS
**Complexity**: **L** — novel algorithm, cross-platform determinism requirement, touches core embedding
**Risks**: Floating-point differences in wavelet computation could break extraction. Mitigate with integer quantization. Increases encode/decode time ~2-3x.
---
### 5. Per-Message Forward Secrecy via HKDF
**Description**: Derive ephemeral per-message encryption keys using HKDF expansion from the Argon2id root key + random nonce. Compromising one message doesn't reveal others.
**Implementation approach**:
- `crypto.py`:
- Add `from cryptography.hazmat.primitives.kdf.hkdf import HKDFExpand`
- `derive_message_key(root_key, nonce) -> bytes` — HKDF-Expand with SHA-256
- `encrypt_message()`: generate 16-byte random nonce, derive per-message key, embed nonce in header
- `decrypt_message()`: extract nonce, derive same key
- Also derive pixel selection key via HKDF with different `info` param
- `constants.py`:
- Bump `FORMAT_VERSION` to 6
- `HKDF_INFO_ENCRYPTION = b"stegasoo-v6-encrypt"`, `HKDF_INFO_PIXEL = b"stegasoo-v6-pixel"`
- `MESSAGE_NONCE_SIZE = 16`
- Header grows from 66 → 82 bytes: add `message_nonce(16)` field
- Update `HEADER_OVERHEAD` / `ENCRYPTION_OVERHEAD` in `steganography.py`
**Go/No-Go metrics**:
- Two messages with identical credentials produce different ciphertexts and different pixel locations
- `cryptography` library HKDF works with existing Argon2id output
**Complexity**: **M** — well-defined crypto change, touches security-critical header format
---
### 6. PWA Mobile Interface
**Description**: Convert Flask Web UI to Progressive Web App. Mobile-optimized, installable, offline-capable static pages.
**Implementation approach**:
- New files in `frontends/web/static/`: `manifest.json`, `sw.js`, icon set (192×192, 512×512)
- Base template: add manifest link, theme-color meta, viewport meta, service worker registration
- `app.py`: serve manifest with correct MIME, add cache headers for static assets
- Responsive CSS for encode/decode accordion forms
- Camera capture: `<input type="file" accept="image/*" capture="environment">` for reference photo
- Service worker caches static assets only — NOT encode/decode API endpoints
**Go/No-Go metrics**:
- Lighthouse PWA score >= 90
- Installable on Android Chrome and iOS Safari
- Offline: static pages load, encode/decode shows graceful "offline" message
**Complexity**: **M** — frontend only, no core library changes
**Risks**: Camera capture requires HTTPS (already supported via `ssl_utils.py`).
---
## Tier 3 — Moonshot
### 7. Plausible Deniability / Dual-Payload Mode
**Description**: Two independent encrypted payloads in one carrier, each with different credentials. Reveal decoy under coercion; real payload stays hidden.
**Implementation approach**:
- New file `src/stegasoo/dual_payload.py`:
- `encode_dual(message_a, message_b, carrier, creds_a, creds_b)`
- Partition available pixels into two disjoint pools using different seeds
- **Critical**: ALL images (single or dual) must fill unused pixel pool with random data so single-payload and dual-payload images are indistinguishable
- `steganography.py`: `generate_pixel_indices()` gets `exclude_indices` param
- `decode.py`: each credential set finds a different valid payload; wrong credentials produce garbage
- CLI + Web UI: dual-payload encode workflow
**Go/No-Go metrics**:
- Single-payload and dual-payload images are statistically indistinguishable (chi-square can't differentiate)
- Each payload decodes independently
- Wrong credentials for one payload don't reveal other payload's existence
**Complexity**: **XL** — novel design, halves capacity per payload, challenging UX, needs rigorous security analysis
**Dependencies**: Feature 2 (validation), Feature 4 (detectability reduction)
---
## Architectural Improvements
### 8. EmbeddingBackend Protocol
**Description**: Typed plugin interface for all embedding algorithms. Replace if/elif dispatch in `steganography.py` with a registry.
**Implementation approach**:
- New package `src/stegasoo/backends/`:
- `protocol.py``EmbeddingBackend(Protocol)` with `embed()`, `extract()`, `calculate_capacity()`, `is_available()`
- `lsb.py`, `dct.py` — wrap existing functions
- `registry.py``BackendRegistry` mapping mode strings to backends
- `steganography.py`: `embed_in_image()` / `extract_from_image()` dispatch via registry
- `__init__.py`: export protocol and `register_backend()`
**Complexity**: **M** — implement before Features 4 and 7 (they become new backends)
---
### 9. HKDF Key Separation
Subsumed by Feature 5. The HKDF expansion provides:
- Encryption key: `HKDF-Expand(root_key, info="stegasoo-encrypt", nonce)`
- Pixel selection key: `HKDF-Expand(root_key, info="stegasoo-pixel", nonce)`
- Future: MAC key, padding key, etc.
---
### 10. `[core]` Extra with Minimal Deps
**Description**: Move Pillow to `[image]` extra, base deps = `cryptography` + `argon2-cffi` + `zstandard` only.
**Complexity**: **S** — but Pillow is used in `crypto.py` for photo hashing (core to security model). Only worth it with a concrete headless use case. **Low priority.**
---
## Ecosystem Features
### 11. Aletheia Integration
Optional `--engine aletheia` backend for Feature 2's `stegasoo check`. BSD-licensed, provides SPA/RS/WS attacks + ML classifiers. **Complexity: S** (after Feature 2). **Depends on**: Feature 2.
### 12. C2PA/AI Provenance Watermarking
Embed C2PA metadata alongside stego payloads. **Complexity: L** — C2PA is a complex standard. Potentially conflicts with stego goals (adds detectable metadata). Research-heavy.
### 13. Signal/Matrix Bot
Bot that decodes stego images in a channel using configured channel key. **Complexity: M** — integration work, uses existing `decode()` API.
### 14. Homebrew Tap + Nix Flake
Package distribution for macOS/NixOS. **Complexity: S** — packaging only, no code changes.
---
## Summary Table
| # | Feature | Tier | Size | Dependencies | Primary Files |
|---|---------|------|------|-------------|---------------|
| 1 | Platform DCT Presets | T1 | M | — | new `platform_presets.py`, `dct_steganography.py`, `encode.py`, `cli.py` |
| 2 | Steganalysis Self-Check | T1 | M | — | new `steganalysis.py`, `cli.py`, `constants.py` |
| 3 | Python 3.13 DCT Cleanup | T1 | S | — | `dct_steganography.py` |
| 4 | Content-Adaptive Embedding | T2 | L | numpy, #2 | new `adaptive_cost.py`, `steganography.py`, `constants.py` |
| 5 | HKDF Forward Secrecy | T2 | M | — | `crypto.py`, `constants.py`, `steganography.py` |
| 6 | PWA Mobile Interface | T2 | M | — | `frontends/web/` templates + static |
| 7 | Dual-Payload Mode | T3 | XL | #2, #4 | new `dual_payload.py`, `steganography.py`, `cli.py` |
| 8 | EmbeddingBackend Protocol | Arch | M | — | new `backends/` package, `steganography.py` |
| 9 | HKDF Key Separation | Arch | — | Included in #5 | `crypto.py` |
| 10 | `[core]` Extra | Arch | S | — | `pyproject.toml` |
| 11 | Aletheia Integration | Eco | S | #2 | `steganalysis.py` |
| 12 | C2PA Watermarking | Eco | L | — | new module |
| 13 | Signal/Matrix Bot | Eco | M | — | new `bots/` package |
| 14 | Homebrew + Nix | Eco | S | — | packaging files only |
---
## Suggested Roadmap
### Phase 1 — Foundations (v4.4.0)
1. **#3** Python 3.13 DCT Cleanup (S) — unblocks CI on 3.13
2. **#8** EmbeddingBackend Protocol (M) — architectural cleanup before new embedding work
3. **#2** Steganalysis Self-Check (M) — validation tooling for everything that follows
### Phase 2 — Security & Robustness (v4.5.0)
4. **#5** HKDF Forward Secrecy (M) — FORMAT_VERSION bump to 6, improved crypto
5. **#1** Platform-Calibrated DCT Presets (M) — high user value for social media
6. **#14** Homebrew + Nix (S) — distribution expansion
### Phase 3 — Advanced Steganography (v5.0.0)
7. **#4** Content-Adaptive Embedding (L) — major security improvement
8. **#6** PWA Mobile Interface (M) — parallel frontend work stream
### Phase 4 — Moonshot (v5.x+)
9. **#7** Dual-Payload Mode (XL) — after #2 and #4 are solid
10. **#12** C2PA Watermarking (L) — research-heavy
11. **#13** Signal/Matrix Bot (M) — community-driven
---
## Additional Ideas (Backlog)
- **Animated GIF steganography** — LSB in GIF frames, natural multi-media extension
- **PDF steganography** — whitespace/font metric/embedded image payloads
- **Batch encode** — `stegasoo batch-encode --dir /photos/` with auto carrier selection (BATCH_* constants suggest this was planned)
- **Stego identification** — `stegasoo identify image.png` probes for known stego signatures
- **Per-device credential sync via QR** — channel key as stego image of reference photo
- **`stegasoo verify`** — decode + confirm message matches expected hash without revealing contents