Document CLI and Web UI architecture for future devs

CLI module now explains: - Click command group hierarchy (tree diagram) - JSON output pattern for scriptability - Secure input handling (hide_input, confirmation_prompt) - Dry-run mode pattern - Batch processing with variadic args and progress callbacks Web UI now explains: - Flask architecture overview with ASCII diagram - Subprocess isolation pattern (why we run stegasoo in subprocesses) - Async job management with polling flow diagram - Context processors for template globals - Secret key persistence for session survival - Environment-based configuration (12-factor style) If you're reading this code trying to learn Flask/Click patterns, these comments should actually teach you something useful. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-06 23:58:59 -05:00
parent 93420704e8
commit aa99a258f4
2 changed files with 298 additions and 18 deletions
--- a/frontends/web/app.py
+++ b/frontends/web/app.py
@@ -2,23 +2,76 @@
 """
 Stegasoo Web Frontend (v4.0.0)
-Flask-based web UI for steganography operations.
+A production Flask application demonstrating proper web architecture patterns.
-Supports both text messages and file embedding.
+This isn't just a quick demo - it's built to run on a Raspberry Pi 24/7.
 ARCHITECTURE OVERVIEW
 =====================
    ┌─────────────────────────────────────────────────────────────────────┐
    │                         FLASK APPLICATION                            │
    ├─────────────────────────────────────────────────────────────────────┤
    │                                                                      │
    │   Routes (/encode, /decode, /api/*)                                  │
    │       │                                                              │
    │       ├── auth.py           # Session management, user accounts      │
    │       ├── temp_storage.py   # File-based temp storage with expiry    │
    │       ├── subprocess_stego.py  # Isolated encode/decode workers      │
    │       └── ssl_utils.py      # Self-signed cert generation            │
    │                                                                      │
    │   Templates (Jinja2)                                                 │
    │       └── base.html → encode.html, decode.html, etc.                │
    │                                                                      │
    │   Static assets (CSS, JS)                                           │
    │       └── Vanilla JS, no framework (keeps it simple)                │
    │                                                                      │
    └─────────────────────────────────────────────────────────────────────┘
 KEY PATTERNS
 ============
 1. SUBPROCESS ISOLATION
   Stegasoo's DCT mode uses scipy/jpegio which can crash on malformed input.
   We run encode/decode in subprocesses so crashes don't take down the server:
       subprocess_stego = SubprocessStego(timeout=180)
       result = subprocess_stego.encode(carrier, ref, message, ...)
   If the subprocess crashes, we catch it and return an error gracefully.
 2. ASYNC JOBS WITH PROGRESS
   Encoding large images can take 30+ seconds. We use ThreadPoolExecutor
   to run jobs in background threads with progress reporting:
       job_id = generate_job_id()
       _executor.submit(_run_encode_job, job_id, params)
       # Client polls /api/encode/progress/<job_id> for updates
 3. CONTEXT PROCESSORS
   @app.context_processor injects variables into ALL templates:
       return {"version": __version__, "has_dct": has_dct_support()}
   Now every template can use {{ version }} without passing it explicitly.
 4. BEFORE_REQUEST HOOKS
   @app.before_request runs before every request. We use it for:
   - First-run setup redirect (no users → /setup)
   - Session validation
   - Cleanup of old temp files
 5. SECURE SECRET KEY
   Flask sessions need a secret key. We persist it to a file so sessions
   survive server restarts (otherwise everyone gets logged out).
 CHANGES in v4.0.0:
 - Added channel key support for deployment/group isolation
 - New /api/channel/status endpoint
 - Channel key selector on encode/decode pages
 - Messages encoded with channel key require same key to decode
 CHANGES in v3.2.0:
 - Removed date dependency from all operations
 - Renamed day_phrase → passphrase
 - No date selection or tracking needed
 - Simplified user experience for asynchronous communications
 NEW in v3.0: LSB and DCT embedding modes with advanced options.
 NEW in v3.0.1: DCT output format selection (PNG or JPEG) and color mode (grayscale or color).
 """
 import io
@@ -157,8 +210,35 @@ except ImportError:
 # ============================================================================
 # SUBPROCESS ISOLATION FOR STEGASOO OPERATIONS
 # ============================================================================
-# Runs encode/decode/compare in subprocesses to prevent jpegio/scipy crashes
+#
-# from taking down the Flask server.
+# This is a critical reliability pattern. Here's the problem:
 #
 # scipy's DCT and jpegio can crash (segfault) on:
 # - Malformed JPEG files
 # - Very large images that exhaust memory
 # - Certain edge cases in coefficient manipulation
 #
 # If these crash in the main Flask process, your whole server dies.
 # Users get a connection reset, and the service goes down.
 #
 # The solution: Run stegasoo operations in separate Python processes.
 #
 #     Main Flask process                 Worker subprocess
 #     ┌─────────────────┐               ┌─────────────────┐
 #     │                 │   spawn       │                 │
 #     │  /api/encode    │──────────────>│  encode()       │
 #     │                 │               │                 │
 #     │  wait for       │<──────────────│  return result  │
 #     │  result         │    or crash   │  (or crash)     │
 #     │                 │               │                 │
 #     │  handle error   │               │  (process dies) │
 #     └─────────────────┘               └─────────────────┘
 #
 # If the subprocess crashes, we catch the error and return a friendly message.
 # The main server keeps running. Users can try again with different input.
 #
 # The subprocess_stego module handles all the pickling/unpickling of data.
 from subprocess_stego import (
    SubprocessStego,
    cleanup_progress_file,
@@ -182,38 +262,89 @@ subprocess_stego = SubprocessStego(timeout=180)  # 3 minute timeout for large im
 # ============================================================================
 # FLASK APP CONFIGURATION
 # ============================================================================
 #
 # Flask configuration demonstrates several production patterns:
 #
 # 1. SECRET KEY PERSISTENCE
 #    Flask uses secret_key to sign session cookies. If it changes, all users
 #    get logged out. We save it to a file so it survives restarts.
 #
 # 2. CONTENT LENGTH LIMITS
 #    MAX_CONTENT_LENGTH prevents DoS via huge uploads. Flask will reject
 #    requests that exceed this before loading them into memory.
 #
 # 3. ENVIRONMENT-BASED CONFIG
 #    Settings come from environment variables, allowing:
 #    - Different settings per deployment (dev/staging/prod)
 #    - Docker/systemd to inject config without code changes
 #    - 12-factor app compliance
 #
 # 4. INSTANCE FOLDER
 #    Flask's instance_path is for per-deployment data (databases, keys).
 #    It's .gitignored by default - perfect for secrets.
 app = Flask(__name__)
 # Persist secret key so sessions survive restarts
 # Without this, every restart = everyone gets logged out
 _instance_path = Path(app.instance_path)
 _instance_path.mkdir(parents=True, exist_ok=True)
 _secret_key_file = _instance_path / ".secret_key"
 if _secret_key_file.exists():
    app.secret_key = _secret_key_file.read_text().strip()
 else:
-    app.secret_key = secrets.token_hex(32)
+    # First run: generate a new key and save it
    app.secret_key = secrets.token_hex(32)  # 256 bits of randomness
    _secret_key_file.write_text(app.secret_key)
-    _secret_key_file.chmod(0o600)
+    _secret_key_file.chmod(0o600)  # Only owner can read
 # Reject uploads larger than this (prevents memory exhaustion)
 app.config["MAX_CONTENT_LENGTH"] = MAX_FILE_SIZE
 # Auth configuration from environment
 # STEGASOO_AUTH_ENABLED=false disables login (for local/dev use)
 app.config["AUTH_ENABLED"] = os.environ.get("STEGASOO_AUTH_ENABLED", "true").lower() == "true"
 app.config["HTTPS_ENABLED"] = os.environ.get("STEGASOO_HTTPS_ENABLED", "false").lower() == "true"
-# Initialize auth module
+# Initialize auth module (sets up session handling, user DB)
 init_auth(app)
 # ============================================================================
 # ASYNC JOB MANAGEMENT (v4.1.2)
 # ============================================================================
-# Encode operations can run in background threads with progress reporting
+#
 # Problem: DCT encoding a large image can take 30-60 seconds.
 # Solution: Run it in a background thread, let the client poll for progress.
 #
 # The flow:
 #
 #     Client                          Server
 #     ──────                          ──────
 #     POST /api/encode/async ──────>  Start background job
 #                            <──────  Return job_id
 #
 #     GET /api/encode/progress/123 ─>  Check job status
 #                            <──────  {"progress": 45, "phase": "embedding"}
 #
 #     GET /api/encode/progress/123 ─>  Check again
 #                            <──────  {"status": "complete", "file_id": "abc"}
 #
 #     GET /api/download/abc ────────>  Download result
 #                            <──────  Encoded image
 #
 # Why ThreadPoolExecutor instead of Celery/Redis?
 # - This runs on a Raspberry Pi with 1GB RAM
 # - We don't need distributed workers
 # - Keep it simple - threads are fine for 2 concurrent jobs
 #
 # The thread pool is limited to 2 workers because:
 # - Each encode loads the full image into memory
 # - Too many concurrent jobs = OOM on the Pi
 # Thread pool for background encode/decode operations
 _executor = ThreadPoolExecutor(max_workers=2)
-# Job storage: job_id -> {status, result, error, file_id, ...}
+# Job storage: job_id -> {status, result, error, file_id, created, ...}
 # We use a dict with a lock because threads access it concurrently
 _jobs = {}
 _jobs_lock = threading.Lock()
@@ -268,6 +399,27 @@ THUMBNAIL_FILES: dict[str, bytes] = {}  # Not used - see temp_storage.py
 # ============================================================================
 # TEMPLATE CONTEXT PROCESSOR
 # ============================================================================
 #
 # Context processors inject variables into EVERY template automatically.
 # Instead of passing the same data to every render_template() call:
 #
 #     # Bad: repetitive and error-prone
 #     return render_template("page.html", version=__version__, has_dct=...)
 #
 # We define it once here and it's available everywhere:
 #
 #     # In any template:
 #     <p>Version: {{ version }}</p>
 #     {% if has_dct %}DCT mode available{% endif %}
 #
 # This is great for:
 # - Version numbers (show in footer)
 # - Feature flags (has_dct, auth_enabled)
 # - User info (username, is_admin)
 # - Global config (max sizes, limits)
 #
 # The function runs on EVERY request, so keep it fast.
 # Don't do expensive database queries here.
@app.context_processor
--- a/src/stegasoo/cli.py
+++ b/src/stegasoo/cli.py
@@ -1,7 +1,69 @@
 """
 Stegasoo CLI Module (v3.2.0)
-Command-line interface with batch processing and compression support.
+A proper CLI architecture using Click. This module demonstrates several
 important patterns for building production-quality command-line tools:
 PATTERN: COMMAND GROUPS
 =======================
 Click's @group decorator creates a hierarchy of commands:
    stegasoo                     <- Main entry point
    ├── encode                   <- Simple commands at root level
    ├── decode
    ├── generate
    ├── info
    ├── batch/                   <- Group for related commands
    │   ├── encode
    │   ├── decode
    │   └── check
    ├── channel/                 <- Another group
    │   ├── generate
    │   ├── show
    │   ├── status
    │   ├── qr
    │   └── clear
    ├── tools/                   <- Utility group
    │   ├── capacity
    │   ├── strip
    │   ├── peek
    │   └── exif
    └── admin/                   <- Administration group
        ├── recover
        └── generate-key
 PATTERN: JSON OUTPUT MODE
 =========================
 Every command supports --json for machine-readable output. The pattern:
    @click.pass_context
    def my_command(ctx, ...):
        if ctx.obj.get("json"):
            click.echo(json.dumps(result, indent=2))
        else:
            # Human-readable output with colors/formatting
            click.echo(f"✓ Success: {result}")
 This makes the CLI scriptable - you can pipe to jq, use in shell scripts, etc.
 PATTERN: SENSITIVE INPUT
 ========================
 Passwords/secrets use Click's secure prompts:
    @click.option("--passphrase", prompt=True, hide_input=True,
                  confirmation_prompt=True, help="Passphrase")
 - prompt=True: Asks if not provided
 - hide_input=True: No echo (like sudo)
 - confirmation_prompt=True: "Repeat for confirmation"
 PATTERN: DRY-RUN MODE
 =====================
 For destructive or slow operations, --dry-run shows what WOULD happen:
    if dry_run:
        click.echo(f"Would encode to {output}")
        return
 Changes in v3.2.0:
 - Updated to use DEFAULT_PASSPHRASE_WORDS (consistency with v3.2.0 naming)
@@ -32,10 +94,23 @@ from .constants import (
    __version__,
 )
-# Click context settings
+# Click context settings - these apply to all commands
 # help_option_names lets users use either -h or --help
 CONTEXT_SETTINGS = dict(help_option_names=["-h", "--help"])
 # =============================================================================
 # ROOT GROUP - The main entry point
 # =============================================================================
 #
 # @click.group() creates a command group. The function becomes both:
 # 1. A callable that sets up shared state (ctx.obj)
 # 2. A container for subcommands via @cli.command() decorators
 #
 # The context object (ctx.obj) is passed down to all subcommands.
 # We use it to share the --json flag across the entire CLI.
@click.group(context_settings=CONTEXT_SETTINGS)
@click.version_option(__version__, "-v", "--version")
@click.option("--json", "json_output", is_flag=True, help="Output results as JSON")
@@ -46,6 +121,8 @@ def cli(ctx, json_output):
    Hide messages in images using PIN + passphrase security.
    """
    # ensure_object(dict) creates ctx.obj if it doesn't exist
    # This prevents "NoneType has no attribute" errors
    ctx.ensure_object(dict)
    ctx.obj["json"] = json_output
@@ -53,6 +130,31 @@ def cli(ctx, json_output):
 # =============================================================================
 # ENCODE COMMANDS
 # =============================================================================
 #
 # The encode command demonstrates several Click patterns:
 #
 # 1. ARGUMENT vs OPTION
 #    - Arguments are positional: `stegasoo encode photo.png`
 #    - Options have flags: `stegasoo encode -m "message" --pin 1234`
 #    Rule of thumb: required inputs → arguments, optional/secret → options
 #
 # 2. MUTUAL EXCLUSIVITY
 #    We need either --message OR --file, not both. Click doesn't have built-in
 #    mutual exclusivity, so we check manually:
 #
 #        if not message and not file_payload:
 #            raise click.UsageError("Either --message or --file is required")
 #
 # 3. TYPE VALIDATION
 #    Click validates types automatically:
 #    - type=click.Path(exists=True) → file must exist
 #    - type=click.Choice(["a", "b"]) → must be one of these values
 #    - type=int → must be an integer
 #
 # 4. DEFAULT VALUES
 #    Options can have smart defaults:
 #    - default="zlib" → use this if not specified
 #    - default=True with is_flag=True → boolean flag defaults to on
@cli.command()
@@ -320,6 +422,32 @@ def decode(ctx, image, reference, passphrase, pin, output):
 # =============================================================================
 # BATCH COMMANDS
 # =============================================================================
 #
 # Batch processing demonstrates:
 #
 # 1. SUBGROUPS
 #    @cli.group() creates a nested command group:
 #        stegasoo batch encode *.png
 #        stegasoo batch decode *.png
 #        stegasoo batch check *.png
 #
 # 2. VARIADIC ARGUMENTS
 #    nargs=-1 accepts multiple arguments:
 #        @click.argument("images", nargs=-1, required=True)
 #    This lets users do: `stegasoo batch encode img1.png img2.png img3.png`
 #    Or with shell expansion: `stegasoo batch encode *.png`
 #
 # 3. PROGRESS CALLBACKS
 #    We pass a callback to the BatchProcessor for real-time updates:
 #
 #        def progress(current, total, item):
 #            click.echo(f"[{current}/{total}] {item.input_path.name}")
 #
 #        processor.batch_encode(..., progress_callback=progress)
 #
 # 4. PARALLEL PROCESSING
 #    --jobs/-j controls worker count. Default is 4 for good balance between
 #    speed and memory usage. Each worker loads images into memory.
@cli.group()