Files
stegasoo/IdeasScout_PLANS_20260324.md

14 KiB
Raw Blame History

Stegasoo Ideas Scout — Implementation Plans (2026-03-24)

Baseline: v4.3.0, Python >=3.11, FORMAT_VERSION 5, no existing users (no backward compat constraints).


Tier 1 — Quick Wins

1. Platform-Calibrated DCT Presets

Description: --platform telegram|discord|signal|whatsapp flag for DCT encode. Bakes in each platform's known recompression parameters. Pre-verifies payload survives before outputting.

Implementation approach:

  • New file src/stegasoo/platform_presets.pyPlatformPreset dataclass + PRESETS dict mapping platform → tuned quant_step, jpeg_quality, embed_positions, max_dimension, recompress_quality
  • dct_steganography.py: _embed_scipy_dct_safe() / _embed_jpegio() accept optional preset overrides for QUANT_STEP, DEFAULT_EMBED_POSITIONS, output quality
  • New pre_verify_survival() function: encode → re-save at platform quality → extract → pass/fail
  • Thread platform param through encode.pysteganography.py → DCT functions
  • cli.py: add --platform as click.Choice + --verify/--no-verify (pre-verification doubles encode time)
  • LSB + --platform should error early — LSB data is destroyed by any JPEG recompression

Known platform params (from research):

Platform Quality Max Dimension Notes
Telegram ~82 2560×2560 ~81KB embeddable
Discord ~85 Varies (Nitro)
Signal ~80 Aggressive
WhatsApp ~70 1600×1600 Most lossy

Go/No-Go metrics:

  • 95% payload survival rate per platform at 1KB message size in automated tests

  • Pre-verification correctly predicts real platform behavior (manual validation per platform at least once)

Complexity: M — new file + parameter threading through 4-5 functions

Risks: Platform params change without notice. Add version/date stamps to presets and a stegasoo tools verify-platform test command.


2. Steganalysis Self-Check (stegasoo check)

Description: New CLI command running chi-square and RS (Regular-Singular) statistical analysis on stego images. Outputs detectability risk level (low/medium/high).

Implementation approach:

  • New file src/stegasoo/steganalysis.py:
    • chi_square_analysis(image_data) -> float — chi-square statistic on LSB distribution per channel
    • rs_analysis(image_data) -> float — Regular-Singular groups analysis (requires numpy)
    • assess_risk(chi_p, rs_estimate) -> str — maps to "low"/"medium"/"high"
    • check_image(image_data) -> dict — orchestrator
  • cli.py: new @cli.command("check") with IMAGE arg, --json, --mode lsb|dct|auto
  • constants.py: threshold constants for chi-square p-value and RS boundaries
  • __init__.py: export check_image in __all__
  • Start LSB-only; DCT steganalysis (calibration attack) deferred

Go/No-Go metrics:

  • Clean images → consistently "low risk"
  • Naive sequential LSB → "high risk"
  • Stegasoo LSB at <50% capacity → "low" or "medium"

Complexity: M — ~150 lines numpy per test, straightforward CLI integration


3. Python 3.13 DCT Cleanup

Description: The jpegiojpeglib migration is already done in code. Remaining work: rename stale jpegio references and verify on 3.13.

Implementation approach:

  • dct_steganography.py: rename HAS_JPEGIOHAS_JPEGLIB, _jpegio_* functions → _jpeglib_*, update constant names (JPEGIO_MAGICJPEGLIB_MAGIC, etc.)
  • Verify jpeglib.to_jpegio() compatibility shim — if jpeglib plans to deprecate it, migrate to native API
  • Run full test suite on Python 3.13

Go/No-Go metrics:

  • All DCT tests pass on Python 3.13
  • No deprecation warnings from jpeglib

Complexity: S — renaming and verification only


Tier 2 — Strategic

4. Content-Adaptive Embedding (S-UNIWARD/WOW-inspired)

Description: Replace uniform-random pixel selection with texture-weighted cost functions. Embed preferentially in busy/textured regions where changes are least detectable. 3-5x harder to detect statistically.

Implementation approach:

  • New file src/stegasoo/adaptive_cost.py:
    • compute_cost_map(image_data) -> np.ndarray — per-pixel distortion cost via directional high-pass filters (Daubechets wavelet bank / KB filter)
    • select_pixels_by_cost(cost_map, pixel_key, num_needed) -> list[int] — weighted sampling, still ChaCha20-seeded for determinism
  • steganography.py:
    • generate_pixel_indices(): add cost_map param, use weighted sampling when provided
    • _embed_lsb(): compute cost map when adaptive mode enabled
    • _extract_lsb(): must compute identical cost map to find same pixels
  • dct_steganography.py: adapt DEFAULT_EMBED_POSITIONS per-block based on block texture energy
  • Thread adaptive: bool through encode.py/decode.py
  • constants.py: add EMBED_MODE_ADAPTIVE_LSB, filter kernels, cost thresholds

Go/No-Go metrics:

  • Chi-square test (Feature 2) shows measurable improvement vs uniform-random
  • Critical: cost map computation is deterministic across platforms (quantize to fixed-point integers)
  • Round-trip decode succeeds on Linux x86, Linux ARM, macOS

Complexity: L — novel algorithm, cross-platform determinism requirement, touches core embedding

Risks: Floating-point differences in wavelet computation could break extraction. Mitigate with integer quantization. Increases encode/decode time ~2-3x.


5. Per-Message Forward Secrecy via HKDF

Description: Derive ephemeral per-message encryption keys using HKDF expansion from the Argon2id root key + random nonce. Compromising one message doesn't reveal others.

Implementation approach:

  • crypto.py:
    • Add from cryptography.hazmat.primitives.kdf.hkdf import HKDFExpand
    • derive_message_key(root_key, nonce) -> bytes — HKDF-Expand with SHA-256
    • encrypt_message(): generate 16-byte random nonce, derive per-message key, embed nonce in header
    • decrypt_message(): extract nonce, derive same key
    • Also derive pixel selection key via HKDF with different info param
  • constants.py:
    • Bump FORMAT_VERSION to 6
    • HKDF_INFO_ENCRYPTION = b"stegasoo-v6-encrypt", HKDF_INFO_PIXEL = b"stegasoo-v6-pixel"
    • MESSAGE_NONCE_SIZE = 16
  • Header grows from 66 → 82 bytes: add message_nonce(16) field
  • Update HEADER_OVERHEAD / ENCRYPTION_OVERHEAD in steganography.py

Go/No-Go metrics:

  • Two messages with identical credentials produce different ciphertexts and different pixel locations
  • cryptography library HKDF works with existing Argon2id output

Complexity: M — well-defined crypto change, touches security-critical header format


6. PWA Mobile Interface

Description: Convert Flask Web UI to Progressive Web App. Mobile-optimized, installable, offline-capable static pages.

Implementation approach:

  • New files in frontends/web/static/: manifest.json, sw.js, icon set (192×192, 512×512)
  • Base template: add manifest link, theme-color meta, viewport meta, service worker registration
  • app.py: serve manifest with correct MIME, add cache headers for static assets
  • Responsive CSS for encode/decode accordion forms
  • Camera capture: <input type="file" accept="image/*" capture="environment"> for reference photo
  • Service worker caches static assets only — NOT encode/decode API endpoints

Go/No-Go metrics:

  • Lighthouse PWA score >= 90
  • Installable on Android Chrome and iOS Safari
  • Offline: static pages load, encode/decode shows graceful "offline" message

Complexity: M — frontend only, no core library changes

Risks: Camera capture requires HTTPS (already supported via ssl_utils.py).


Tier 3 — Moonshot

7. Plausible Deniability / Dual-Payload Mode

Description: Two independent encrypted payloads in one carrier, each with different credentials. Reveal decoy under coercion; real payload stays hidden.

Implementation approach:

  • New file src/stegasoo/dual_payload.py:
    • encode_dual(message_a, message_b, carrier, creds_a, creds_b)
    • Partition available pixels into two disjoint pools using different seeds
    • Critical: ALL images (single or dual) must fill unused pixel pool with random data so single-payload and dual-payload images are indistinguishable
  • steganography.py: generate_pixel_indices() gets exclude_indices param
  • decode.py: each credential set finds a different valid payload; wrong credentials produce garbage
  • CLI + Web UI: dual-payload encode workflow

Go/No-Go metrics:

  • Single-payload and dual-payload images are statistically indistinguishable (chi-square can't differentiate)
  • Each payload decodes independently
  • Wrong credentials for one payload don't reveal other payload's existence

Complexity: XL — novel design, halves capacity per payload, challenging UX, needs rigorous security analysis

Dependencies: Feature 2 (validation), Feature 4 (detectability reduction)


Architectural Improvements

8. EmbeddingBackend Protocol

Description: Typed plugin interface for all embedding algorithms. Replace if/elif dispatch in steganography.py with a registry.

Implementation approach:

  • New package src/stegasoo/backends/:
    • protocol.pyEmbeddingBackend(Protocol) with embed(), extract(), calculate_capacity(), is_available()
    • lsb.py, dct.py — wrap existing functions
    • registry.pyBackendRegistry mapping mode strings to backends
  • steganography.py: embed_in_image() / extract_from_image() dispatch via registry
  • __init__.py: export protocol and register_backend()

Complexity: M — implement before Features 4 and 7 (they become new backends)


9. HKDF Key Separation

Subsumed by Feature 5. The HKDF expansion provides:

  • Encryption key: HKDF-Expand(root_key, info="stegasoo-encrypt", nonce)
  • Pixel selection key: HKDF-Expand(root_key, info="stegasoo-pixel", nonce)
  • Future: MAC key, padding key, etc.

10. [core] Extra with Minimal Deps

Description: Move Pillow to [image] extra, base deps = cryptography + argon2-cffi + zstandard only.

Complexity: S — but Pillow is used in crypto.py for photo hashing (core to security model). Only worth it with a concrete headless use case. Low priority.


Ecosystem Features

11. Aletheia Integration

Optional --engine aletheia backend for Feature 2's stegasoo check. BSD-licensed, provides SPA/RS/WS attacks + ML classifiers. Complexity: S (after Feature 2). Depends on: Feature 2.

12. C2PA/AI Provenance Watermarking

Embed C2PA metadata alongside stego payloads. Complexity: L — C2PA is a complex standard. Potentially conflicts with stego goals (adds detectable metadata). Research-heavy.

13. Signal/Matrix Bot

Bot that decodes stego images in a channel using configured channel key. Complexity: M — integration work, uses existing decode() API.

14. Homebrew Tap + Nix Flake

Package distribution for macOS/NixOS. Complexity: S — packaging only, no code changes.


Summary Table

# Feature Tier Size Dependencies Primary Files
1 Platform DCT Presets T1 M new platform_presets.py, dct_steganography.py, encode.py, cli.py
2 Steganalysis Self-Check T1 M new steganalysis.py, cli.py, constants.py
3 Python 3.13 DCT Cleanup T1 S dct_steganography.py
4 Content-Adaptive Embedding T2 L numpy, #2 new adaptive_cost.py, steganography.py, constants.py
5 HKDF Forward Secrecy T2 M crypto.py, constants.py, steganography.py
6 PWA Mobile Interface T2 M frontends/web/ templates + static
7 Dual-Payload Mode T3 XL #2, #4 new dual_payload.py, steganography.py, cli.py
8 EmbeddingBackend Protocol Arch M new backends/ package, steganography.py
9 HKDF Key Separation Arch Included in #5 crypto.py
10 [core] Extra Arch S pyproject.toml
11 Aletheia Integration Eco S #2 steganalysis.py
12 C2PA Watermarking Eco L new module
13 Signal/Matrix Bot Eco M new bots/ package
14 Homebrew + Nix Eco S packaging files only

Suggested Roadmap

Phase 1 — Foundations (v4.4.0)

  1. #3 Python 3.13 DCT Cleanup (S) — unblocks CI on 3.13
  2. #8 EmbeddingBackend Protocol (M) — architectural cleanup before new embedding work
  3. #2 Steganalysis Self-Check (M) — validation tooling for everything that follows

Phase 2 — Security & Robustness (v4.5.0)

  1. #5 HKDF Forward Secrecy (M) — FORMAT_VERSION bump to 6, improved crypto
  2. #1 Platform-Calibrated DCT Presets (M) — high user value for social media
  3. #14 Homebrew + Nix (S) — distribution expansion

Phase 3 — Advanced Steganography (v5.0.0)

  1. #4 Content-Adaptive Embedding (L) — major security improvement
  2. #6 PWA Mobile Interface (M) — parallel frontend work stream

Phase 4 — Moonshot (v5.x+)

  1. #7 Dual-Payload Mode (XL) — after #2 and #4 are solid
  2. #12 C2PA Watermarking (L) — research-heavy
  3. #13 Signal/Matrix Bot (M) — community-driven

Additional Ideas (Backlog)

  • Animated GIF steganography — LSB in GIF frames, natural multi-media extension
  • PDF steganography — whitespace/font metric/embedded image payloads
  • Batch encodestegasoo batch-encode --dir /photos/ with auto carrier selection (BATCH_* constants suggest this was planned)
  • Stego identificationstegasoo identify image.png probes for known stego signatures
  • Per-device credential sync via QR — channel key as stego image of reference photo
  • stegasoo verify — decode + confirm message matches expected hash without revealing contents