14 KiB
Stegasoo Ideas Scout — Implementation Plans (2026-03-24)
Baseline: v4.3.0, Python >=3.11, FORMAT_VERSION 5, no existing users (no backward compat constraints).
Tier 1 — Quick Wins
1. Platform-Calibrated DCT Presets
Description: --platform telegram|discord|signal|whatsapp flag for DCT encode. Bakes in each platform's known recompression parameters. Pre-verifies payload survives before outputting.
Implementation approach:
- New file
src/stegasoo/platform_presets.py—PlatformPresetdataclass +PRESETSdict mapping platform → tunedquant_step,jpeg_quality,embed_positions,max_dimension,recompress_quality dct_steganography.py:_embed_scipy_dct_safe()/_embed_jpegio()accept optional preset overrides forQUANT_STEP,DEFAULT_EMBED_POSITIONS, output quality- New
pre_verify_survival()function: encode → re-save at platform quality → extract → pass/fail - Thread
platformparam throughencode.py→steganography.py→ DCT functions cli.py: add--platformasclick.Choice+--verify/--no-verify(pre-verification doubles encode time)- LSB +
--platformshould error early — LSB data is destroyed by any JPEG recompression
Known platform params (from research):
| Platform | Quality | Max Dimension | Notes |
|---|---|---|---|
| Telegram | ~82 | 2560×2560 | ~81KB embeddable |
| Discord | ~85 | Varies (Nitro) | |
| Signal | ~80 | Aggressive | |
| ~70 | 1600×1600 | Most lossy |
Go/No-Go metrics:
-
95% payload survival rate per platform at 1KB message size in automated tests
- Pre-verification correctly predicts real platform behavior (manual validation per platform at least once)
Complexity: M — new file + parameter threading through 4-5 functions
Risks: Platform params change without notice. Add version/date stamps to presets and a stegasoo tools verify-platform test command.
2. Steganalysis Self-Check (stegasoo check)
Description: New CLI command running chi-square and RS (Regular-Singular) statistical analysis on stego images. Outputs detectability risk level (low/medium/high).
Implementation approach:
- New file
src/stegasoo/steganalysis.py:chi_square_analysis(image_data) -> float— chi-square statistic on LSB distribution per channelrs_analysis(image_data) -> float— Regular-Singular groups analysis (requires numpy)assess_risk(chi_p, rs_estimate) -> str— maps to "low"/"medium"/"high"check_image(image_data) -> dict— orchestrator
cli.py: new@cli.command("check")withIMAGEarg,--json,--mode lsb|dct|autoconstants.py: threshold constants for chi-square p-value and RS boundaries__init__.py: exportcheck_imagein__all__- Start LSB-only; DCT steganalysis (calibration attack) deferred
Go/No-Go metrics:
- Clean images → consistently "low risk"
- Naive sequential LSB → "high risk"
- Stegasoo LSB at <50% capacity → "low" or "medium"
Complexity: M — ~150 lines numpy per test, straightforward CLI integration
3. Python 3.13 DCT Cleanup
Description: The jpegio → jpeglib migration is already done in code. Remaining work: rename stale jpegio references and verify on 3.13.
Implementation approach:
dct_steganography.py: renameHAS_JPEGIO→HAS_JPEGLIB,_jpegio_*functions →_jpeglib_*, update constant names (JPEGIO_MAGIC→JPEGLIB_MAGIC, etc.)- Verify
jpeglib.to_jpegio()compatibility shim — if jpeglib plans to deprecate it, migrate to native API - Run full test suite on Python 3.13
Go/No-Go metrics:
- All DCT tests pass on Python 3.13
- No deprecation warnings from jpeglib
Complexity: S — renaming and verification only
Tier 2 — Strategic
4. Content-Adaptive Embedding (S-UNIWARD/WOW-inspired)
Description: Replace uniform-random pixel selection with texture-weighted cost functions. Embed preferentially in busy/textured regions where changes are least detectable. 3-5x harder to detect statistically.
Implementation approach:
- New file
src/stegasoo/adaptive_cost.py:compute_cost_map(image_data) -> np.ndarray— per-pixel distortion cost via directional high-pass filters (Daubechets wavelet bank / KB filter)select_pixels_by_cost(cost_map, pixel_key, num_needed) -> list[int]— weighted sampling, still ChaCha20-seeded for determinism
steganography.py:generate_pixel_indices(): addcost_mapparam, use weighted sampling when provided_embed_lsb(): compute cost map when adaptive mode enabled_extract_lsb(): must compute identical cost map to find same pixels
dct_steganography.py: adaptDEFAULT_EMBED_POSITIONSper-block based on block texture energy- Thread
adaptive: boolthroughencode.py/decode.py constants.py: addEMBED_MODE_ADAPTIVE_LSB, filter kernels, cost thresholds
Go/No-Go metrics:
- Chi-square test (Feature 2) shows measurable improvement vs uniform-random
- Critical: cost map computation is deterministic across platforms (quantize to fixed-point integers)
- Round-trip decode succeeds on Linux x86, Linux ARM, macOS
Complexity: L — novel algorithm, cross-platform determinism requirement, touches core embedding
Risks: Floating-point differences in wavelet computation could break extraction. Mitigate with integer quantization. Increases encode/decode time ~2-3x.
5. Per-Message Forward Secrecy via HKDF
Description: Derive ephemeral per-message encryption keys using HKDF expansion from the Argon2id root key + random nonce. Compromising one message doesn't reveal others.
Implementation approach:
crypto.py:- Add
from cryptography.hazmat.primitives.kdf.hkdf import HKDFExpand derive_message_key(root_key, nonce) -> bytes— HKDF-Expand with SHA-256encrypt_message(): generate 16-byte random nonce, derive per-message key, embed nonce in headerdecrypt_message(): extract nonce, derive same key- Also derive pixel selection key via HKDF with different
infoparam
- Add
constants.py:- Bump
FORMAT_VERSIONto 6 HKDF_INFO_ENCRYPTION = b"stegasoo-v6-encrypt",HKDF_INFO_PIXEL = b"stegasoo-v6-pixel"MESSAGE_NONCE_SIZE = 16
- Bump
- Header grows from 66 → 82 bytes: add
message_nonce(16)field - Update
HEADER_OVERHEAD/ENCRYPTION_OVERHEADinsteganography.py
Go/No-Go metrics:
- Two messages with identical credentials produce different ciphertexts and different pixel locations
cryptographylibrary HKDF works with existing Argon2id output
Complexity: M — well-defined crypto change, touches security-critical header format
6. PWA Mobile Interface
Description: Convert Flask Web UI to Progressive Web App. Mobile-optimized, installable, offline-capable static pages.
Implementation approach:
- New files in
frontends/web/static/:manifest.json,sw.js, icon set (192×192, 512×512) - Base template: add manifest link, theme-color meta, viewport meta, service worker registration
app.py: serve manifest with correct MIME, add cache headers for static assets- Responsive CSS for encode/decode accordion forms
- Camera capture:
<input type="file" accept="image/*" capture="environment">for reference photo - Service worker caches static assets only — NOT encode/decode API endpoints
Go/No-Go metrics:
- Lighthouse PWA score >= 90
- Installable on Android Chrome and iOS Safari
- Offline: static pages load, encode/decode shows graceful "offline" message
Complexity: M — frontend only, no core library changes
Risks: Camera capture requires HTTPS (already supported via ssl_utils.py).
Tier 3 — Moonshot
7. Plausible Deniability / Dual-Payload Mode
Description: Two independent encrypted payloads in one carrier, each with different credentials. Reveal decoy under coercion; real payload stays hidden.
Implementation approach:
- New file
src/stegasoo/dual_payload.py:encode_dual(message_a, message_b, carrier, creds_a, creds_b)- Partition available pixels into two disjoint pools using different seeds
- Critical: ALL images (single or dual) must fill unused pixel pool with random data so single-payload and dual-payload images are indistinguishable
steganography.py:generate_pixel_indices()getsexclude_indicesparamdecode.py: each credential set finds a different valid payload; wrong credentials produce garbage- CLI + Web UI: dual-payload encode workflow
Go/No-Go metrics:
- Single-payload and dual-payload images are statistically indistinguishable (chi-square can't differentiate)
- Each payload decodes independently
- Wrong credentials for one payload don't reveal other payload's existence
Complexity: XL — novel design, halves capacity per payload, challenging UX, needs rigorous security analysis
Dependencies: Feature 2 (validation), Feature 4 (detectability reduction)
Architectural Improvements
8. EmbeddingBackend Protocol
Description: Typed plugin interface for all embedding algorithms. Replace if/elif dispatch in steganography.py with a registry.
Implementation approach:
- New package
src/stegasoo/backends/:protocol.py—EmbeddingBackend(Protocol)withembed(),extract(),calculate_capacity(),is_available()lsb.py,dct.py— wrap existing functionsregistry.py—BackendRegistrymapping mode strings to backends
steganography.py:embed_in_image()/extract_from_image()dispatch via registry__init__.py: export protocol andregister_backend()
Complexity: M — implement before Features 4 and 7 (they become new backends)
9. HKDF Key Separation
Subsumed by Feature 5. The HKDF expansion provides:
- Encryption key:
HKDF-Expand(root_key, info="stegasoo-encrypt", nonce) - Pixel selection key:
HKDF-Expand(root_key, info="stegasoo-pixel", nonce) - Future: MAC key, padding key, etc.
10. [core] Extra with Minimal Deps
Description: Move Pillow to [image] extra, base deps = cryptography + argon2-cffi + zstandard only.
Complexity: S — but Pillow is used in crypto.py for photo hashing (core to security model). Only worth it with a concrete headless use case. Low priority.
Ecosystem Features
11. Aletheia Integration
Optional --engine aletheia backend for Feature 2's stegasoo check. BSD-licensed, provides SPA/RS/WS attacks + ML classifiers. Complexity: S (after Feature 2). Depends on: Feature 2.
12. C2PA/AI Provenance Watermarking
Embed C2PA metadata alongside stego payloads. Complexity: L — C2PA is a complex standard. Potentially conflicts with stego goals (adds detectable metadata). Research-heavy.
13. Signal/Matrix Bot
Bot that decodes stego images in a channel using configured channel key. Complexity: M — integration work, uses existing decode() API.
14. Homebrew Tap + Nix Flake
Package distribution for macOS/NixOS. Complexity: S — packaging only, no code changes.
Summary Table
| # | Feature | Tier | Size | Dependencies | Primary Files |
|---|---|---|---|---|---|
| 1 | Platform DCT Presets | T1 | M | — | new platform_presets.py, dct_steganography.py, encode.py, cli.py |
| 2 | Steganalysis Self-Check | T1 | M | — | new steganalysis.py, cli.py, constants.py |
| 3 | Python 3.13 DCT Cleanup | T1 | S | — | dct_steganography.py |
| 4 | Content-Adaptive Embedding | T2 | L | numpy, #2 | new adaptive_cost.py, steganography.py, constants.py |
| 5 | HKDF Forward Secrecy | T2 | M | — | crypto.py, constants.py, steganography.py |
| 6 | PWA Mobile Interface | T2 | M | — | frontends/web/ templates + static |
| 7 | Dual-Payload Mode | T3 | XL | #2, #4 | new dual_payload.py, steganography.py, cli.py |
| 8 | EmbeddingBackend Protocol | Arch | M | — | new backends/ package, steganography.py |
| 9 | HKDF Key Separation | Arch | — | Included in #5 | crypto.py |
| 10 | [core] Extra |
Arch | S | — | pyproject.toml |
| 11 | Aletheia Integration | Eco | S | #2 | steganalysis.py |
| 12 | C2PA Watermarking | Eco | L | — | new module |
| 13 | Signal/Matrix Bot | Eco | M | — | new bots/ package |
| 14 | Homebrew + Nix | Eco | S | — | packaging files only |
Suggested Roadmap
Phase 1 — Foundations (v4.4.0)
- #3 Python 3.13 DCT Cleanup (S) — unblocks CI on 3.13
- #8 EmbeddingBackend Protocol (M) — architectural cleanup before new embedding work
- #2 Steganalysis Self-Check (M) — validation tooling for everything that follows
Phase 2 — Security & Robustness (v4.5.0)
- #5 HKDF Forward Secrecy (M) — FORMAT_VERSION bump to 6, improved crypto
- #1 Platform-Calibrated DCT Presets (M) — high user value for social media
- #14 Homebrew + Nix (S) — distribution expansion
Phase 3 — Advanced Steganography (v5.0.0)
- #4 Content-Adaptive Embedding (L) — major security improvement
- #6 PWA Mobile Interface (M) — parallel frontend work stream
Phase 4 — Moonshot (v5.x+)
- #7 Dual-Payload Mode (XL) — after #2 and #4 are solid
- #12 C2PA Watermarking (L) — research-heavy
- #13 Signal/Matrix Bot (M) — community-driven
Additional Ideas (Backlog)
- Animated GIF steganography — LSB in GIF frames, natural multi-media extension
- PDF steganography — whitespace/font metric/embedded image payloads
- Batch encode —
stegasoo batch-encode --dir /photos/with auto carrier selection (BATCH_* constants suggest this was planned) - Stego identification —
stegasoo identify image.pngprobes for known stego signatures - Per-device credential sync via QR — channel key as stego image of reference photo
stegasoo verify— decode + confirm message matches expected hash without revealing contents