debate/.planning/research/ARCHITECTURE.md
Mikkel Georgsen c0ff95951e docs: add project research
Files:
- STACK.md: Technology stack recommendations (Python 3.12+, FastAPI, React 19+, Vite, Celery, PostgreSQL 18+)
- FEATURES.md: Feature landscape analysis (table stakes vs differentiators)
- ARCHITECTURE.md: Layered web-queue-worker architecture with SAT-based dependency resolution
- PITFALLS.md: Critical pitfalls and prevention strategies
- SUMMARY.md: Research synthesis with roadmap implications

Key findings:
- Stack: Modern 2026 async Python (FastAPI/Celery) + React/Three.js 3D frontend
- Architecture: Web-queue-worker pattern with sandboxed archiso builds
- Critical pitfall: Build sandboxing required from day one (CHAOS RAT AUR incident July 2025)

Recommended 9-phase roadmap: Infrastructure → Config → Dependency → Overlay → Build Queue → Frontend → Advanced SAT → 3D Viz → Optimization

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-25 02:07:11 +00:00

38 KiB

Architecture Patterns: Linux Distribution Builder Platform

Domain: Web-based Linux distribution customization and ISO generation Researched: 2026-01-25 Confidence: MEDIUM-HIGH

Executive Summary

Linux distribution builder platforms combine web interfaces with backend build systems, overlaying configuration layers onto base distributions to create customized bootable ISOs. Modern architectures (2026) leverage container-based immutable systems, asynchronous task queues, and SAT-solver dependency resolution. The Debate platform architecture aligns with established patterns from archiso, Universal Blue/Bazzite, and web-queue-worker patterns.

The Debate platform should follow a layered web-queue-worker architecture with these tiers:

┌─────────────────────────────────────────────────────────────────┐
│                    PRESENTATION LAYER                            │
│  React Frontend + Three.js 3D Visualization                      │
│  (User configuration interface, visual package builder)          │
└────────────────────┬────────────────────────────────────────────┘
                     │ HTTP/WebSocket
┌────────────────────▼────────────────────────────────────────────┐
│                      API LAYER                                   │
│  FastAPI (async endpoints, validation, session management)       │
└────────────────────┬────────────────────────────────────────────┘
                     │
         ┌───────────┼───────────┐
         │           │           │
┌────────▼──────┐ ┌─▼─────────┐ ┌▼───────────────┐
│  Dependency   │ │  Overlay  │ │  Build Queue   │
│  Resolver     │ │  Engine   │ │  Manager       │
│  (SAT solver) │ │  (Layers) │ │  (Celery)      │
└────────┬──────┘ └─┬─────────┘ └┬───────────────┘
         │          │             │
         └──────────┼─────────────┘
                    │
┌───────────────────▼─────────────────────────────────────────────┐
│                    PERSISTENCE LAYER                             │
│  PostgreSQL (config, user data, build metadata)                  │
│  Object Storage (ISO cache, build artifacts)                     │
└──────────────────────────────────────────────────────────────────┘
                    │
┌───────────────────▼─────────────────────────────────────────────┐
│                    BUILD EXECUTION LAYER                         │
│  Worker Nodes (Celery workers running archiso/mkarchiso)         │
│  - Profile generation                                            │
│  - Package installation to airootfs                              │
│  - Overlay application (OverlayFS concepts)                      │
│  - ISO generation with bootloader config                         │
└──────────────────────────────────────────────────────────────────┘

Component Boundaries

Core Components

Component Responsibility Communicates With State Management
React Frontend User interaction, 3D visualization, configuration UI API Layer (REST/WS) Client-side state (React context/Redux)
Three.js Renderer 3D package/layer visualization, visual debugging React components Scene state separate from app state
FastAPI Gateway Request routing, validation, auth, session mgmt All backend services Stateless (session in DB/cache)
Dependency Resolver Package conflict detection, SAT solving, suggestions API Layer, Database Computation-only (no persistent state)
Overlay Engine Layer composition, configuration merging, precedence Build Queue, Database Configuration versioning in DB
Build Queue Manager Job scheduling, worker coordination, priority mgmt Celery broker (Redis/RabbitMQ) Queue state in message broker
Celery Workers ISO build execution, archiso orchestration Build Queue, Object Storage Job state tracked in result backend
PostgreSQL DB User data, build configs, metadata, audit logs All backend services ACID transactional storage
Object Storage ISO caching, build artifacts, profile storage Workers, API (download endpoint) Immutable blob storage

Detailed Component Architecture

1. Presentation Layer (React + Three.js)

Purpose: Provide visual interface for distribution customization with 3D representation of layers.

Architecture Pattern:

  • State Management: Application state in React (configuration data) separate from scene state (3D objects). Changes flow from app state → scene rendering.
  • Performance: Use React Three Fiber (r3f) for declarative Three.js integration. Target 60 FPS, <100MB memory.
  • Optimization: InstancedMesh for repeated elements (packages), frustum culling, lazy loading with Suspense, GPU resource cleanup with dispose().
  • Model Format: GLTF/GLB for 3D assets.

Communication:

  • REST API for CRUD operations (save configuration, list builds)
  • WebSocket for real-time build progress updates
  • Server-Sent Events (SSE) alternative for progress streaming

Sources:

2. API Layer (FastAPI)

Purpose: Asynchronous API gateway handling request validation, routing, and coordination.

Architecture Pattern:

  • Layered Structure: Separate routers (by domain), services (business logic), and data access layers.
  • Async I/O: Use async/await throughout to prevent blocking on database/queue operations.
  • Middleware: Custom logging, metrics, error handling middleware for observability.
  • Validation: Pydantic models for request/response validation.

Endpoints:

  • /api/v1/configurations - CRUD for user configurations
  • /api/v1/packages - Package search, metadata, conflicts
  • /api/v1/builds - Submit build, query status, download ISO
  • /api/v1/layers - Layer definitions (Opening Statement, Platform, etc.)
  • /ws/builds/{build_id} - WebSocket for build progress

Performance: FastAPI achieves 300% better performance than synchronous frameworks for I/O-bound operations (2026 benchmarks).

Sources:

3. Dependency Resolver

Purpose: Detect package conflicts, resolve dependencies, suggest alternatives using SAT solver algorithms.

Architecture Pattern:

  • SAT Solver Implementation: Use libsolv (openSUSE) or similar SAT-based approach. Translate package dependencies to logic clauses, apply CDCL algorithm.
  • Algorithm: Conflict-Driven Clause Learning (CDCL) solves NP-complete dependency problems in milliseconds for typical workloads.
  • Input: Package selection across 5 layers (Opening Statement, Platform, Rhetoric, Talking Points, Closing Argument).
  • Output: Valid package set or conflict report with suggested resolutions.

Data Structure:

Package Dependency Graph:
- Nodes: Packages (name, version, layer)
- Edges: Dependencies (requires, conflicts, provides, suggests)
- Constraints: Version ranges, mutual exclusions

Integration:

  • Called synchronously from API during configuration validation
  • Pre-compute common dependency sets for base layers (cache results)
  • Asynchronous deep resolution for full build validation

Sources:

4. Overlay Engine

Purpose: Manage layered configuration packages, applying merge strategies and precedence rules.

Architecture Pattern:

  • Layer Model: 5 layers with defined precedence (Closing Argument > Talking Points > Rhetoric > Platform > Opening Statement).
  • OverlayFS Inspiration: Conceptually similar to OverlayFS union mounting, where upper layers override lower layers.
  • Configuration Merging: Files from higher layers replace/merge with lower layers based on merge strategy (replace, merge-append, merge-deep).

Layer Structure:

Layer Definition:
- id: unique identifier
- name: user-facing name (e.g., "Platform")
- order: precedence (1=lowest, 5=highest)
- packages: list of package selections
- files: custom files to overlay
- merge_strategy: how to handle conflicts

Merge Strategies:

  • Replace: Higher layer file completely replaces lower
  • Merge-Append: Concatenate files (e.g., package lists)
  • Merge-Deep: Smart merge (e.g., JSON/YAML key merging)

Output: Unified archiso profile with:

  • packages.x86_64 (merged package list)
  • airootfs/ directory (merged filesystem overlay)
  • profiledef.sh (combined metadata)

Sources:

5. Build Queue Manager (Celery)

Purpose: Distributed task queue for asynchronous ISO build jobs with priority scheduling.

Architecture Pattern:

  • Web-Queue-Worker Pattern: Web frontend → Message queue → Worker pool
  • Message Broker: Redis (low latency) or RabbitMQ (high reliability) for job queue
  • Result Backend: Redis or PostgreSQL for job status/results
  • Worker Pool: Multiple Celery workers (one per build server core for CPU-bound builds)

Job Types:

  1. Quick Validation: Dependency resolution (seconds) - High priority
  2. Full Build: ISO generation (minutes) - Normal priority
  3. Cache Warming: Pre-build common configurations - Low priority

Scheduling:

  • Priority Queue: User-initiated builds > automated cache warming
  • Rate Limiting: Prevent queue flooding, enforce user quotas
  • Retry Logic: Automatic retry with exponential backoff for transient failures
  • Timeout: Per-job timeout (e.g., 30 min max for build)

Coordinator Pattern:

  • Single coordinator manages job assignment and worker health
  • Leader election for coordinator HA (if scaled beyond single instance)

Monitoring:

  • Job state transitions logged to PostgreSQL
  • Metrics: queue depth, worker utilization, average build time
  • Dead letter queue for failed jobs requiring manual investigation

Sources:

6. Build Execution Workers (archiso-based)

Purpose: Execute ISO generation using archiso (mkarchiso) with custom profiles.

Architecture Pattern:

  • Profile-Based Build: Generate temporary archiso profile per build job
  • Isolation: Each build runs in isolated environment (separate working directory)
  • Stages: Profile generation → Package installation → Customization → ISO creation

Build Process Flow:

1. Profile Generation (Overlay Engine output)
   ├── Create temp directory
   ├── Write packages.x86_64 (merged package list)
   ├── Write profiledef.sh (metadata, permissions)
   ├── Copy airootfs/ overlay files
   └── Configure bootloaders (syslinux, grub, systemd-boot)

2. Package Installation
   ├── mkarchiso downloads packages (pacman cache)
   ├── Install to work_dir/x86_64/airootfs
   └── Apply package configurations

3. Customization (customize_airootfs.sh)
   ├── Enable systemd services
   ├── Apply user-specific configs
   ├── Run post-install scripts
   └── Set permissions

4. ISO Generation
   ├── Create kernel and initramfs images
   ├── Build squashfs filesystem
   ├── Assemble bootable ISO
   ├── Generate checksums
   └── Move to output directory

5. Post-Processing
   ├── Upload ISO to object storage
   ├── Update database (build status, ISO location)
   ├── Cache metadata for reuse
   └── Clean up working directory

Worker Configuration:

  • Resource Limits: 1 build per worker (CPU/memory intensive)
  • Concurrency: 6 workers max (6-core build server)
  • Working Directory: /tmp/archiso-tmp-{job_id} (cleaned after completion with -r flag)
  • Output Directory: Temporary → Object storage → Local cleanup

Optimizations:

  • Package Cache: Shared pacman cache across workers (prevent redundant downloads)
  • Layer Caching: Cache common base layers (Opening Statement variations)
  • Incremental Builds: Detect unchanged layers, reuse previous airootfs where possible

Sources:

7. Persistence Layer (PostgreSQL + Object Storage)

Purpose: Store configuration data, build metadata, and build artifacts.

PostgreSQL Schema Design:

-- User configurations
CREATE SCHEMA configurations;

CREATE TABLE configurations.user_configs (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    user_id UUID NOT NULL,
    name VARCHAR(255) NOT NULL,
    description TEXT,
    created_at TIMESTAMP DEFAULT NOW(),
    updated_at TIMESTAMP DEFAULT NOW()
);

CREATE TABLE configurations.layers (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    config_id UUID REFERENCES configurations.user_configs(id),
    layer_type VARCHAR(50) NOT NULL, -- opening_statement, platform, rhetoric, etc.
    layer_order INT NOT NULL,
    merge_strategy VARCHAR(50) DEFAULT 'replace'
);

CREATE TABLE configurations.layer_packages (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    layer_id UUID REFERENCES configurations.layers(id),
    package_name VARCHAR(255) NOT NULL,
    package_version VARCHAR(50),
    required BOOLEAN DEFAULT TRUE
);

CREATE TABLE configurations.layer_files (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    layer_id UUID REFERENCES configurations.layers(id),
    file_path VARCHAR(1024) NOT NULL, -- path in airootfs
    file_content TEXT, -- for small configs
    file_storage_url VARCHAR(2048), -- for large files in object storage
    permissions VARCHAR(4) DEFAULT '0644'
);

-- Build management
CREATE SCHEMA builds;

CREATE TABLE builds.build_jobs (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    config_id UUID REFERENCES configurations.user_configs(id),
    status VARCHAR(50) NOT NULL, -- queued, running, success, failed
    priority INT DEFAULT 5,
    started_at TIMESTAMP,
    completed_at TIMESTAMP,
    iso_url VARCHAR(2048), -- object storage location
    iso_checksum VARCHAR(128),
    error_message TEXT,
    build_log_url VARCHAR(2048)
);

CREATE TABLE builds.build_cache (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    config_hash VARCHAR(64) UNIQUE NOT NULL, -- hash of layer config
    iso_url VARCHAR(2048),
    created_at TIMESTAMP DEFAULT NOW(),
    last_accessed TIMESTAMP DEFAULT NOW(),
    access_count INT DEFAULT 0
);

-- Package metadata
CREATE SCHEMA packages;

CREATE TABLE packages.package_metadata (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    name VARCHAR(255) UNIQUE NOT NULL,
    description TEXT,
    repository VARCHAR(100), -- core, extra, community, aur
    version VARCHAR(50),
    dependencies JSONB, -- {requires: [], conflicts: [], provides: []}
    last_updated TIMESTAMP DEFAULT NOW()
);

Schema Organization Best Practices (2026):

  • Separate schemas for functional areas (configurations, builds, packages)
  • Schema-level access control for security isolation
  • CI/CD integration with migration tools (Flyway, Alembic)
  • Indexes on frequently queried fields (config_id, status, config_hash)

Object Storage:

  • Purpose: Store ISOs (large files, 1-4GB), build logs, custom overlay files
  • Technology: S3-compatible (AWS S3, MinIO, Cloudflare R2)
  • Structure:
    • /isos/{build_id}.iso - Generated ISOs
    • /logs/{build_id}.log - Build logs
    • /overlays/{layer_id}/{file_path} - Custom files too large for DB
    • /cache/{config_hash}.iso - Cached ISOs for reuse

Sources:

Data Flow

Configuration Creation Flow

User (Frontend)
    ↓ (1) Create/Edit configuration
API Layer (Validation)
    ↓ (2) Validate input
Dependency Resolver
    ↓ (3) Check conflicts
    ↓ (4) Return validation result
API Layer
    ↓ (5) Save configuration
PostgreSQL (configurations schema)
    ↓ (6) Return config_id
Frontend (Display confirmation)

Build Submission Flow

User (Frontend)
    ↓ (1) Submit build request
API Layer
    ↓ (2) Check cache (config hash)
PostgreSQL (build_cache)
    ├─→ (3a) Cache hit: return cached ISO URL
    └─→ (3b) Cache miss: create build job
Build Queue Manager (Celery)
    ↓ (4) Enqueue job with priority
Message Broker (Redis/RabbitMQ)
    ↓ (5) Job dispatched to worker
Celery Worker
    ↓ (6a) Fetch configuration from DB
    ↓ (6b) Generate archiso profile (Overlay Engine)
    ↓ (6c) Execute mkarchiso
    ↓ (6d) Upload ISO to object storage
    ↓ (6e) Update build status in DB
PostgreSQL + Object Storage
    ↓ (7) Job complete
API Layer (WebSocket)
    ↓ (8) Notify user
Frontend (Display download link)

Real-Time Progress Updates Flow

Celery Worker
    ↓ (1) Emit progress events during build
    ↓     (e.g., "downloading packages", "generating ISO")
Celery Result Backend
    ↓ (2) Store progress state
API Layer (WebSocket handler)
    ↓ (3) Poll/subscribe to job progress
    ↓ (4) Push updates to client
Frontend (WebSocket listener)
    ↓ (5) Update UI progress bar

Patterns to Follow

Pattern 1: Layered Configuration Precedence

What: Higher layers override lower layers with defined merge strategies.

When: User customizes configuration across multiple layers (Platform, Rhetoric, etc.).

Implementation:

class OverlayEngine:
    def merge_layers(self, layers: List[Layer]) -> Profile:
        """Merge layers from lowest to highest precedence."""
        sorted_layers = sorted(layers, key=lambda l: l.order)

        profile = Profile()
        for layer in sorted_layers:
            profile = self.apply_layer(profile, layer)

        return profile

    def apply_layer(self, profile: Profile, layer: Layer) -> Profile:
        """Apply layer based on merge strategy."""
        if layer.merge_strategy == "replace":
            profile.files.update(layer.files)  # Overwrite
        elif layer.merge_strategy == "merge-append":
            profile.packages.extend(layer.packages)  # Append
        elif layer.merge_strategy == "merge-deep":
            profile.config = deep_merge(profile.config, layer.config)

        return profile

Source: OverlayFS union mount concepts applied to configuration management.

Pattern 2: SAT-Based Dependency Resolution

What: Translate package dependencies to boolean satisfiability problem, solve with CDCL algorithm.

When: User adds package to configuration, system detects conflicts.

Implementation:

class DependencyResolver:
    def resolve(self, packages: List[Package]) -> Resolution:
        """Resolve dependencies using SAT solver."""
        clauses = self.build_clauses(packages)

        solver = SATSolver()
        result = solver.solve(clauses)

        if result.satisfiable:
            return Resolution(success=True, packages=result.model)
        else:
            conflicts = self.explain_conflicts(result.unsat_core)
            alternatives = self.suggest_alternatives(conflicts)
            return Resolution(success=False, conflicts=conflicts,
                            alternatives=alternatives)

    def build_clauses(self, packages: List[Package]) -> List[Clause]:
        """Convert dependency graph to CNF clauses."""
        clauses = []
        for pkg in packages:
            # If package selected, all dependencies must be selected
            for dep in pkg.requires:
                clauses.append(Implies(pkg, dep))
            # If package selected, no conflicts can be selected
            for conflict in pkg.conflicts:
                clauses.append(Not(And(pkg, conflict)))
        return clauses

Source: Libsolv implementation patterns

Pattern 3: Asynchronous Build Queue with Progress Tracking

What: Submit long-running build jobs to queue, track progress, notify on completion.

When: User submits build request (ISO generation takes minutes).

Implementation:

# API endpoint
@app.post("/api/v1/builds")
async def submit_build(config_id: UUID, background_tasks: BackgroundTasks):
    # Check cache first
    cache_key = compute_hash(config_id)
    cached = await check_cache(cache_key)
    if cached:
        return {"status": "cached", "iso_url": cached.iso_url}

    # Enqueue build job
    job = build_iso.apply_async(
        args=[config_id],
        priority=5,
        task_id=str(uuid.uuid4())
    )

    return {"status": "queued", "job_id": job.id}

# Celery task
@celery.task(bind=True)
def build_iso(self, config_id: UUID):
    self.update_state(state='DOWNLOADING', meta={'progress': 10})

    # Generate profile
    profile = overlay_engine.generate_profile(config_id)
    self.update_state(state='BUILDING', meta={'progress': 30})

    # Run mkarchiso
    result = subprocess.run([
        'mkarchiso', '-v', '-r',
        '-w', f'/tmp/archiso-{self.request.id}',
        '-o', '/tmp/output',
        profile.path
    ])
    self.update_state(state='UPLOADING', meta={'progress': 80})

    # Upload to object storage
    iso_url = upload_iso(f'/tmp/output/archlinux.iso')

    return {"iso_url": iso_url, "progress": 100}

Source: Celery best practices, Web-Queue-Worker pattern

Pattern 4: Cache-First Build Strategy

What: Hash configuration, check cache before building, reuse identical ISOs.

When: User submits build that may have been built previously.

Implementation:

def compute_config_hash(config_id: UUID) -> str:
    """Create deterministic hash of configuration."""
    config = db.query(Config).get(config_id)

    # Include all layers, packages, files in hash
    hash_input = {
        "layers": sorted([
            {
                "type": layer.type,
                "packages": sorted(layer.packages),
                "files": sorted([
                    {"path": f.path, "content_hash": hash(f.content)}
                    for f in layer.files
                ])
            }
            for layer in config.layers
        ], key=lambda x: x["type"])
    }

    return hashlib.sha256(
        json.dumps(hash_input, sort_keys=True).encode()
    ).hexdigest()

async def check_cache(config_hash: str) -> Optional[CachedBuild]:
    """Check if ISO exists for this configuration."""
    cached = await db.query(BuildCache).filter_by(
        config_hash=config_hash
    ).first()

    if cached and cached.iso_exists():
        # Update access metadata
        cached.last_accessed = datetime.now()
        cached.access_count += 1
        await db.commit()
        return cached

    return None

Benefit: Reduces build time from minutes to seconds for repeated configurations. Critical for popular base configurations (e.g., "KDE Desktop with development tools").

Anti-Patterns to Avoid

Anti-Pattern 1: Blocking API Calls During Build

What: Synchronously waiting for ISO build to complete in API endpoint.

Why bad: Ties up API worker for minutes, prevents handling other requests, poor user experience with timeout risks.

Instead: Use asynchronous task queue (Celery) with WebSocket/SSE for progress updates. API returns immediately with job_id, frontend polls or subscribes to updates.

Example:

# BAD: Blocking build
@app.post("/builds")
def build(config_id):
    iso = generate_iso(config_id)  # Takes 10 minutes!
    return {"iso_url": iso}

# GOOD: Async queue
@app.post("/builds")
async def build(config_id):
    job = build_iso.delay(config_id)
    return {"job_id": job.id, "status": "queued"}

Anti-Pattern 2: Duplicating State Between React and Three.js

What: Maintaining separate state trees for application data and 3D scene, manually syncing.

Why bad: State gets out of sync, bugs from inconsistent data, complexity in update logic.

Instead: Single source of truth in React state. Scene derives from state. User interactions → dispatch actions → update state → scene re-renders.

Example:

// BAD: Separate state
const [appState, setAppState] = useState({packages: []});
const [sceneObjects, setSceneObjects] = useState([]);

// GOOD: Scene derives from app state
const [config, setConfig] = useState({packages: []});

function Scene({packages}) {
  return packages.map(pkg => <PackageMesh key={pkg.id} {...pkg} />);
}

Source: React Three Fiber state management best practices

Anti-Pattern 3: Storing Large Files in PostgreSQL

What: Storing ISO files (1-4GB) or build logs (megabytes) as BYTEA in PostgreSQL.

Why bad: Database bloat, slow backups, memory pressure, poor performance for large blob operations.

Instead: Store large files in object storage (S3/MinIO), keep URLs/metadata in PostgreSQL.

Example:

-- BAD: ISO in database
CREATE TABLE builds (
    id UUID PRIMARY KEY,
    iso_data BYTEA  -- 2GB blob!
);

-- GOOD: URL reference
CREATE TABLE builds (
    id UUID PRIMARY KEY,
    iso_url VARCHAR(2048),  -- s3://bucket/isos/{id}.iso
    iso_checksum VARCHAR(128),
    iso_size_bytes BIGINT
);

Anti-Pattern 4: Running Multiple Builds Per Worker Concurrently

What: Allowing a single Celery worker to process multiple ISO builds in parallel.

Why bad: ISO generation is CPU and memory intensive (compressing filesystem, creating squashfs). Running multiple builds causes resource contention, thrashing, and OOM kills.

Instead: Configure Celery workers with concurrency=1 for build tasks. Run one build per worker. Scale horizontally with multiple workers.

Example:

# BAD: Multiple concurrent builds
celery -A app worker --concurrency=4  # 4 builds at once on 6-core machine

# GOOD: One build per worker
celery -A app worker --concurrency=1 -Q builds  # Start 6 workers for 6 cores

Anti-Pattern 5: No Dependency Validation Until Build Time

What: Allowing users to save configurations without checking package conflicts, discovering issues during ISO build.

Why bad: Wastes build resources (minutes of CPU time), poor user experience (delayed error feedback), difficult to debug which package caused failure.

Instead: Run dependency resolution in API layer during configuration save/update. Provide immediate feedback with conflict explanations and alternatives.

Example:

# BAD: Validate during build
@celery.task
def build_iso(config_id):
    packages = load_packages(config_id)
    result = resolve_dependencies(packages)  # Fails here after queueing!
    if not result.valid:
        raise BuildError("Conflicts detected")

# GOOD: Validate on save
@app.post("/configs")
async def save_config(config: ConfigInput):
    resolution = dependency_resolver.resolve(config.packages)
    if not resolution.valid:
        return {"error": "conflicts", "details": resolution.conflicts}

    await db.save(config)
    return {"success": True}

Scalability Considerations

Concern At 100 users At 10K users At 1M users
API Layer Single FastAPI instance Multiple instances behind load balancer Auto-scaling group, CDN for static assets
Build Queue Single Redis broker Redis cluster or RabbitMQ Kafka for high-throughput messaging
Workers 1 build server (6 cores) 3-5 build servers Auto-scaling worker pool, spot instances
Database Single PostgreSQL instance Primary + read replicas Sharded PostgreSQL or distributed SQL (CockroachDB)
Storage Local MinIO S3-compatible with CDN Multi-region S3 with CloudFront
Caching In-memory cache Redis cache cluster Multi-tier cache (Redis + CDN)

Horizontal Scaling Strategy

API Layer:

  • Stateless FastAPI instances (session in DB/Redis)
  • Load balancer (Nginx, HAProxy, AWS ALB)
  • Auto-scaling based on CPU/request latency

Build Workers:

  • Independent Celery workers connecting to shared broker
  • Each worker runs 1 build at a time
  • Scale workers based on queue depth (add workers when >10 jobs queued)

Database:

  • Read replicas for queries (config lookups)
  • Write operations to primary (build status updates)
  • Connection pooling (PgBouncer)

Storage:

  • Object storage is inherently scalable
  • CDN for ISO downloads (reduce egress costs)
  • Lifecycle policies (delete ISOs older than 30 days if not accessed)

Build Order Implications for Development

Phase 1: Core Infrastructure

What to build: Database schema, basic API scaffolding, object storage setup. Why first: Foundation for all other components. No dependencies on complex logic. Duration estimate: 1-2 weeks

Phase 2: Configuration Management

What to build: Layer data models, CRUD endpoints, basic validation. Why second: Enables testing configuration storage before complex dependency resolution. Duration estimate: 1-2 weeks

Phase 3: Dependency Resolver (Simplified)

What to build: Basic conflict detection (direct conflicts only, no SAT solver yet). Why third: Provides early validation capability. Full SAT solver can wait. Duration estimate: 1 week

Phase 4: Overlay Engine

What to build: Layer merging logic, profile generation for archiso. Why fourth: Requires configuration data models from Phase 2. Produces profiles for builds. Duration estimate: 2 weeks

Phase 5: Build Queue + Workers

What to build: Celery setup, basic build task, worker orchestration. Why fifth: Depends on Overlay Engine for profile generation. Core value delivery. Duration estimate: 2-3 weeks

Phase 6: Frontend (Basic)

What to build: React UI for configuration (forms, no 3D yet), build submission. Why sixth: API must exist first. Provides usable interface for testing builds. Duration estimate: 2-3 weeks

Phase 7: Advanced Dependency Resolution

What to build: Full SAT solver integration, conflict explanations, alternatives. Why seventh: Complex feature. System works with basic validation from Phase 3. Duration estimate: 2-3 weeks

Phase 8: 3D Visualization

What to build: Three.js integration, layer visualization, visual debugging. Why eighth: Polish/differentiator feature. Core functionality works without it. Duration estimate: 3-4 weeks

Phase 9: Caching + Optimization

What to build: Build cache, package cache, performance tuning. Why ninth: Optimization after core features work. Requires usage data to tune. Duration estimate: 1-2 weeks

Total estimated duration: 17-23 weeks (4-6 months)

Critical Architectural Decisions

Decision 1: Message Broker (Redis vs RabbitMQ)

Recommendation: Start with Redis, migrate to RabbitMQ if reliability requirements increase.

Rationale:

  • Redis: Lower latency, simpler setup, sufficient for <10K builds/day
  • RabbitMQ: Higher reliability, message persistence, better for >100K builds/day

When to switch: If experiencing message loss or need guaranteed delivery.

Decision 2: Container-Based vs. Direct archiso

Recommendation: Use direct archiso (mkarchiso) on bare metal workers initially.

Rationale:

  • Container-based (like Bazzite/Universal Blue) adds complexity (OCI image builds)
  • Direct archiso is simpler, well-documented, less abstraction
  • Can containerize workers later if isolation/portability becomes critical

When to reconsider: Multi-cloud deployment or need strong isolation between builds.

Decision 3: Monolithic vs. Microservices API

Recommendation: Start monolithic (single FastAPI app), split services if scaling demands.

Rationale:

  • Monolith: Faster development, easier debugging, sufficient for <100K users
  • Microservices: Adds operational complexity (service mesh, inter-service communication)

When to split: If specific services (e.g., dependency resolver) need independent scaling.

Decision 4: Real-Time Updates (WebSocket vs. SSE vs. Polling)

Recommendation: Use Server-Sent Events (SSE) for build progress.

Rationale:

  • WebSocket: Bidirectional, but overkill for one-way progress updates
  • SSE: Simpler, built-in reconnection, sufficient for progress streaming
  • Polling: Wasteful, higher latency

Implementation:

@app.get("/api/v1/builds/{job_id}/stream")
async def stream_progress(job_id: str):
    async def event_generator():
        while True:
            status = await get_job_status(job_id)
            yield f"data: {json.dumps(status)}\n\n"
            if status['state'] in ['SUCCESS', 'FAILURE']:
                break
            await asyncio.sleep(1)

    return EventSourceResponse(event_generator())

Sources

Archiso & Build Systems:

Dependency Resolution:

API & Queue Architecture:

Storage & Database:

Frontend:

Confidence Assessment

  • Overall Architecture: MEDIUM-HIGH - Based on established patterns (web-queue-worker, archiso) with modern 2026 practices
  • Component Boundaries: HIGH - Clear separation of concerns, well-defined interfaces
  • Build Process: HIGH - archiso is well-documented, multiple reference implementations
  • Dependency Resolution: MEDIUM - SAT solver approach is proven, but integration complexity unknown
  • Scalability: MEDIUM - Patterns are sound, but specific bottlenecks depend on usage patterns
  • Frontend 3D: MEDIUM - Three.js + React patterns established, but performance depends on complexity