debate/.planning/research/ARCHITECTURE.md

# Architecture Patterns: Linux Distribution Builder Platform

**Domain:** Web-based Linux distribution customization and ISO generation
**Researched:** 2026-01-25
**Confidence:** MEDIUM-HIGH

## Executive Summary

Linux distribution builder platforms combine web interfaces with backend build systems, overlaying configuration layers onto base distributions to create customized bootable ISOs. Modern architectures (2026) leverage container-based immutable systems, asynchronous task queues, and SAT-solver dependency resolution. The Debate platform architecture aligns with established patterns from archiso, Universal Blue/Bazzite, and web-queue-worker patterns.

## Recommended Architecture

The Debate platform should follow a **layered web-queue-worker architecture** with these tiers:

```
┌─────────────────────────────────────────────────────────────────┐
│                    PRESENTATION LAYER                            │
│  React Frontend + Three.js 3D Visualization                      │
│  (User configuration interface, visual package builder)          │
└────────────────────┬────────────────────────────────────────────┘
                     │ HTTP/WebSocket
┌────────────────────▼────────────────────────────────────────────┐
│                      API LAYER                                   │
│  FastAPI (async endpoints, validation, session management)       │
└────────────────────┬────────────────────────────────────────────┘
                     │
         ┌───────────┼───────────┐
         │           │           │
┌────────▼──────┐ ┌─▼─────────┐ ┌▼───────────────┐
│  Dependency   │ │  Overlay  │ │  Build Queue   │
│  Resolver     │ │  Engine   │ │  Manager       │
│  (SAT solver) │ │  (Layers) │ │  (Celery)      │
└────────┬──────┘ └─┬─────────┘ └┬───────────────┘
         │          │             │
         └──────────┼─────────────┘
                    │
┌───────────────────▼─────────────────────────────────────────────┐
│                    PERSISTENCE LAYER                             │
│  PostgreSQL (config, user data, build metadata)                  │
│  Object Storage (ISO cache, build artifacts)                     │
└──────────────────────────────────────────────────────────────────┘
                    │
┌───────────────────▼─────────────────────────────────────────────┐
│                    BUILD EXECUTION LAYER                         │
│  Worker Nodes (Celery workers running archiso/mkarchiso)         │
│  - Profile generation                                            │
│  - Package installation to airootfs                              │
│  - Overlay application (OverlayFS concepts)                      │
│  - ISO generation with bootloader config                         │
└──────────────────────────────────────────────────────────────────┘
```

## Component Boundaries

### Core Components

| Component | Responsibility | Communicates With | State Management |
|-----------|---------------|-------------------|------------------|
| **React Frontend** | User interaction, 3D visualization, configuration UI | API Layer (REST/WS) | Client-side state (React context/Redux) |
| **Three.js Renderer** | 3D package/layer visualization, visual debugging | React components | Scene state separate from app state |
| **FastAPI Gateway** | Request routing, validation, auth, session mgmt | All backend services | Stateless (session in DB/cache) |
| **Dependency Resolver** | Package conflict detection, SAT solving, suggestions | API Layer, Database | Computation-only (no persistent state) |
| **Overlay Engine** | Layer composition, configuration merging, precedence | Build Queue, Database | Configuration versioning in DB |
| **Build Queue Manager** | Job scheduling, worker coordination, priority mgmt | Celery broker (Redis/RabbitMQ) | Queue state in message broker |
| **Celery Workers** | ISO build execution, archiso orchestration | Build Queue, Object Storage | Job state tracked in result backend |
| **PostgreSQL DB** | User data, build configs, metadata, audit logs | All backend services | ACID transactional storage |
| **Object Storage** | ISO caching, build artifacts, profile storage | Workers, API (download endpoint) | Immutable blob storage |

### Detailed Component Architecture

#### 1. Presentation Layer (React + Three.js)

**Purpose:** Provide visual interface for distribution customization with 3D representation of layers.

**Architecture Pattern:**
- **State Management:** Application state in React (configuration data) separate from scene state (3D objects). Changes flow from app state → scene rendering.
- **Performance:** Use React Three Fiber (r3f) for declarative Three.js integration. Target 60 FPS, <100MB memory.
- **Optimization:** InstancedMesh for repeated elements (packages), frustum culling, lazy loading with Suspense, GPU resource cleanup with dispose().
- **Model Format:** GLTF/GLB for 3D assets.

**Communication:**
- REST API for CRUD operations (save configuration, list builds)
- WebSocket for real-time build progress updates
- Server-Sent Events (SSE) alternative for progress streaming

**Sources:**
- [React Three Fiber vs. Three.js Performance Guide 2026](https://graffersid.com/react-three-fiber-vs-three-js/)
- [3D Data Visualization with React and Three.js](https://medium.com/cortico/3d-data-visualization-with-react-and-three-js-7272fb6de432)

#### 2. API Layer (FastAPI)

**Purpose:** Asynchronous API gateway handling request validation, routing, and coordination.

**Architecture Pattern:**
- **Layered Structure:** Separate routers (by domain), services (business logic), and data access layers.
- **Async I/O:** Use async/await throughout to prevent blocking on database/queue operations.
- **Middleware:** Custom logging, metrics, error handling middleware for observability.
- **Validation:** Pydantic models for request/response validation.

**Endpoints:**
- `/api/v1/configurations` - CRUD for user configurations
- `/api/v1/packages` - Package search, metadata, conflicts
- `/api/v1/builds` - Submit build, query status, download ISO
- `/api/v1/layers` - Layer definitions (Opening Statement, Platform, etc.)
- `/ws/builds/{build_id}` - WebSocket for build progress

**Performance:** FastAPI achieves 300% better performance than synchronous frameworks for I/O-bound operations (2026 benchmarks).

**Sources:**
- [Modern FastAPI Architecture Patterns 2026](https://medium.com/algomart/modern-fastapi-architecture-patterns-for-scalable-production-systems-41a87b165a8b)
- [FastAPI for Microservices 2025](https://talent500.com/blog/fastapi-microservices-python-api-design-patterns-2025/)

#### 3. Dependency Resolver

**Purpose:** Detect package conflicts, resolve dependencies, suggest alternatives using SAT solver algorithms.

**Architecture Pattern:**
- **SAT Solver Implementation:** Use libsolv (openSUSE) or similar SAT-based approach. Translate package dependencies to logic clauses, apply CDCL algorithm.
- **Algorithm:** Conflict-Driven Clause Learning (CDCL) solves NP-complete dependency problems in milliseconds for typical workloads.
- **Input:** Package selection across 5 layers (Opening Statement, Platform, Rhetoric, Talking Points, Closing Argument).
- **Output:** Valid package set or conflict report with suggested resolutions.

**Data Structure:**
```
Package Dependency Graph:
- Nodes: Packages (name, version, layer)
- Edges: Dependencies (requires, conflicts, provides, suggests)
- Constraints: Version ranges, mutual exclusions
```

**Integration:**
- Called synchronously from API during configuration validation
- Pre-compute common dependency sets for base layers (cache results)
- Asynchronous deep resolution for full build validation

**Sources:**
- [Libsolv SAT Solver](https://github.com/openSUSE/libsolv)
- [Version SAT Research](https://research.swtch.com/version-sat)
- [Dependency Resolution Made Simple](https://borretti.me/article/dependency-resolution-made-simple)

#### 4. Overlay Engine

**Purpose:** Manage layered configuration packages, applying merge strategies and precedence rules.

**Architecture Pattern:**
- **Layer Model:** 5 layers with defined precedence (Closing Argument > Talking Points > Rhetoric > Platform > Opening Statement).
- **OverlayFS Inspiration:** Conceptually similar to OverlayFS union mounting, where upper layers override lower layers.
- **Configuration Merging:** Files from higher layers replace/merge with lower layers based on merge strategy (replace, merge-append, merge-deep).

**Layer Structure:**
```
Layer Definition:
- id: unique identifier
- name: user-facing name (e.g., "Platform")
- order: precedence (1=lowest, 5=highest)
- packages: list of package selections
- files: custom files to overlay
- merge_strategy: how to handle conflicts
```

**Merge Strategies:**
- **Replace:** Higher layer file completely replaces lower
- **Merge-Append:** Concatenate files (e.g., package lists)
- **Merge-Deep:** Smart merge (e.g., JSON/YAML key merging)

**Output:** Unified archiso profile with:
- `packages.x86_64` (merged package list)
- `airootfs/` directory (merged filesystem overlay)
- `profiledef.sh` (combined metadata)

**Sources:**
- [OverlayFS Linux Kernel Documentation](https://docs.kernel.org/filesystems/overlayfs.html)
- [OverlayFS ArchWiki](https://wiki.archlinux.org/title/Overlay_filesystem)

#### 5. Build Queue Manager (Celery)

**Purpose:** Distributed task queue for asynchronous ISO build jobs with priority scheduling.

**Architecture Pattern:**
- **Web-Queue-Worker Pattern:** Web frontend → Message queue → Worker pool
- **Message Broker:** Redis (low latency) or RabbitMQ (high reliability) for job queue
- **Result Backend:** Redis or PostgreSQL for job status/results
- **Worker Pool:** Multiple Celery workers (one per build server core for CPU-bound builds)

**Job Types:**
1. **Quick Validation:** Dependency resolution (seconds) - High priority
2. **Full Build:** ISO generation (minutes) - Normal priority
3. **Cache Warming:** Pre-build common configurations - Low priority

**Scheduling:**
- **Priority Queue:** User-initiated builds > automated cache warming
- **Rate Limiting:** Prevent queue flooding, enforce user quotas
- **Retry Logic:** Automatic retry with exponential backoff for transient failures
- **Timeout:** Per-job timeout (e.g., 30 min max for build)

**Coordinator Pattern:**
- Single coordinator manages job assignment and worker health
- Leader election for coordinator HA (if scaled beyond single instance)

**Monitoring:**
- Job state transitions logged to PostgreSQL
- Metrics: queue depth, worker utilization, average build time
- Dead letter queue for failed jobs requiring manual investigation

**Sources:**
- [Celery Distributed Task Queue](https://docs.celeryq.dev/)
- [Design Distributed Job Scheduler](https://www.systemdesignhandbook.com/guides/design-a-distributed-job-scheduler/)
- [Web-Queue-Worker Architecture - Azure](https://learn.microsoft.com/en-us/azure/architecture/guide/architecture-styles/web-queue-worker)

#### 6. Build Execution Workers (archiso-based)

**Purpose:** Execute ISO generation using archiso (mkarchiso) with custom profiles.

**Architecture Pattern:**
- **Profile-Based Build:** Generate temporary archiso profile per build job
- **Isolation:** Each build runs in isolated environment (separate working directory)
- **Stages:** Profile generation → Package installation → Customization → ISO creation

**Build Process Flow:**
```
1. Profile Generation (Overlay Engine output)
   ├── Create temp directory
   ├── Write packages.x86_64 (merged package list)
   ├── Write profiledef.sh (metadata, permissions)
   ├── Copy airootfs/ overlay files
   └── Configure bootloaders (syslinux, grub, systemd-boot)

2. Package Installation
   ├── mkarchiso downloads packages (pacman cache)
   ├── Install to work_dir/x86_64/airootfs
   └── Apply package configurations

3. Customization (customize_airootfs.sh)
   ├── Enable systemd services
   ├── Apply user-specific configs
   ├── Run post-install scripts
   └── Set permissions

4. ISO Generation
   ├── Create kernel and initramfs images
   ├── Build squashfs filesystem
   ├── Assemble bootable ISO
   ├── Generate checksums
   └── Move to output directory

5. Post-Processing
   ├── Upload ISO to object storage
   ├── Update database (build status, ISO location)
   ├── Cache metadata for reuse
   └── Clean up working directory
```

**Worker Configuration:**
- **Resource Limits:** 1 build per worker (CPU/memory intensive)
- **Concurrency:** 6 workers max (6-core build server)
- **Working Directory:** `/tmp/archiso-tmp-{job_id}` (cleaned after completion with -r flag)
- **Output Directory:** Temporary → Object storage → Local cleanup

**Optimizations:**
- **Package Cache:** Shared pacman cache across workers (prevent redundant downloads)
- **Layer Caching:** Cache common base layers (Opening Statement variations)
- **Incremental Builds:** Detect unchanged layers, reuse previous airootfs where possible

**Sources:**
- [Archiso ArchWiki](https://wiki.archlinux.org/title/Archiso)
- [Custom Archiso Tutorial](https://serverless.industries/2024/12/30/custom-archiso.en.html)

#### 7. Persistence Layer (PostgreSQL + Object Storage)

**Purpose:** Store configuration data, build metadata, and build artifacts.

**PostgreSQL Schema Design:**

```sql
-- User configurations
CREATE SCHEMA configurations;

CREATE TABLE configurations.user_configs (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    user_id UUID NOT NULL,
    name VARCHAR(255) NOT NULL,
    description TEXT,
    created_at TIMESTAMP DEFAULT NOW(),
    updated_at TIMESTAMP DEFAULT NOW()
);

CREATE TABLE configurations.layers (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    config_id UUID REFERENCES configurations.user_configs(id),
    layer_type VARCHAR(50) NOT NULL, -- opening_statement, platform, rhetoric, etc.
    layer_order INT NOT NULL,
    merge_strategy VARCHAR(50) DEFAULT 'replace'
);

CREATE TABLE configurations.layer_packages (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    layer_id UUID REFERENCES configurations.layers(id),
    package_name VARCHAR(255) NOT NULL,
    package_version VARCHAR(50),
    required BOOLEAN DEFAULT TRUE
);

CREATE TABLE configurations.layer_files (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    layer_id UUID REFERENCES configurations.layers(id),
    file_path VARCHAR(1024) NOT NULL, -- path in airootfs
    file_content TEXT, -- for small configs
    file_storage_url VARCHAR(2048), -- for large files in object storage
    permissions VARCHAR(4) DEFAULT '0644'
);

-- Build management
CREATE SCHEMA builds;

CREATE TABLE builds.build_jobs (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    config_id UUID REFERENCES configurations.user_configs(id),
    status VARCHAR(50) NOT NULL, -- queued, running, success, failed
    priority INT DEFAULT 5,
    started_at TIMESTAMP,
    completed_at TIMESTAMP,
    iso_url VARCHAR(2048), -- object storage location
    iso_checksum VARCHAR(128),
    error_message TEXT,
    build_log_url VARCHAR(2048)
);

CREATE TABLE builds.build_cache (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    config_hash VARCHAR(64) UNIQUE NOT NULL, -- hash of layer config
    iso_url VARCHAR(2048),
    created_at TIMESTAMP DEFAULT NOW(),
    last_accessed TIMESTAMP DEFAULT NOW(),
    access_count INT DEFAULT 0
);

-- Package metadata
CREATE SCHEMA packages;

CREATE TABLE packages.package_metadata (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    name VARCHAR(255) UNIQUE NOT NULL,
    description TEXT,
    repository VARCHAR(100), -- core, extra, community, aur
    version VARCHAR(50),
    dependencies JSONB, -- {requires: [], conflicts: [], provides: []}
    last_updated TIMESTAMP DEFAULT NOW()
);
```

**Schema Organization Best Practices (2026):**
- Separate schemas for functional areas (configurations, builds, packages)
- Schema-level access control for security isolation
- CI/CD integration with migration tools (Flyway, Alembic)
- Indexes on frequently queried fields (config_id, status, config_hash)

**Object Storage:**
- **Purpose:** Store ISOs (large files, 1-4GB), build logs, custom overlay files
- **Technology:** S3-compatible (AWS S3, MinIO, Cloudflare R2)
- **Structure:**
  - `/isos/{build_id}.iso` - Generated ISOs
  - `/logs/{build_id}.log` - Build logs
  - `/overlays/{layer_id}/{file_path}` - Custom files too large for DB
  - `/cache/{config_hash}.iso` - Cached ISOs for reuse

**Sources:**
- [PostgreSQL Schema Design Best Practices 2026](https://wiki.postgresql.org/wiki/Database_Schema_Recommendations_for_an_Application)
- [SQL Database Fundamentals 2026](https://www.nucamp.co/blog/sql-and-database-fundamentals-in-2026-queries-design-and-postgresql-essentials)

## Data Flow

### Configuration Creation Flow

```
User (Frontend)
    ↓ (1) Create/Edit configuration
API Layer (Validation)
    ↓ (2) Validate input
Dependency Resolver
    ↓ (3) Check conflicts
    ↓ (4) Return validation result
API Layer
    ↓ (5) Save configuration
PostgreSQL (configurations schema)
    ↓ (6) Return config_id
Frontend (Display confirmation)
```

### Build Submission Flow

```
User (Frontend)
    ↓ (1) Submit build request
API Layer
    ↓ (2) Check cache (config hash)
PostgreSQL (build_cache)
    ├─→ (3a) Cache hit: return cached ISO URL
    └─→ (3b) Cache miss: create build job
Build Queue Manager (Celery)
    ↓ (4) Enqueue job with priority
Message Broker (Redis/RabbitMQ)
    ↓ (5) Job dispatched to worker
Celery Worker
    ↓ (6a) Fetch configuration from DB
    ↓ (6b) Generate archiso profile (Overlay Engine)
    ↓ (6c) Execute mkarchiso
    ↓ (6d) Upload ISO to object storage
    ↓ (6e) Update build status in DB
PostgreSQL + Object Storage
    ↓ (7) Job complete
API Layer (WebSocket)
    ↓ (8) Notify user
Frontend (Display download link)
```

### Real-Time Progress Updates Flow

```
Celery Worker
    ↓ (1) Emit progress events during build
    ↓     (e.g., "downloading packages", "generating ISO")
Celery Result Backend
    ↓ (2) Store progress state
API Layer (WebSocket handler)
    ↓ (3) Poll/subscribe to job progress
    ↓ (4) Push updates to client
Frontend (WebSocket listener)
    ↓ (5) Update UI progress bar
```

## Patterns to Follow

### Pattern 1: Layered Configuration Precedence

**What:** Higher layers override lower layers with defined merge strategies.

**When:** User customizes configuration across multiple layers (Platform, Rhetoric, etc.).

**Implementation:**
```python
class OverlayEngine:
    def merge_layers(self, layers: List[Layer]) -> Profile:
        """Merge layers from lowest to highest precedence."""
        sorted_layers = sorted(layers, key=lambda l: l.order)

        profile = Profile()
        for layer in sorted_layers:
            profile = self.apply_layer(profile, layer)

        return profile

    def apply_layer(self, profile: Profile, layer: Layer) -> Profile:
        """Apply layer based on merge strategy."""
        if layer.merge_strategy == "replace":
            profile.files.update(layer.files)  # Overwrite
        elif layer.merge_strategy == "merge-append":
            profile.packages.extend(layer.packages)  # Append
        elif layer.merge_strategy == "merge-deep":
            profile.config = deep_merge(profile.config, layer.config)

        return profile
```

**Source:** OverlayFS union mount concepts applied to configuration management.

### Pattern 2: SAT-Based Dependency Resolution

**What:** Translate package dependencies to boolean satisfiability problem, solve with CDCL algorithm.

**When:** User adds package to configuration, system detects conflicts.

**Implementation:**
```python
class DependencyResolver:
    def resolve(self, packages: List[Package]) -> Resolution:
        """Resolve dependencies using SAT solver."""
        clauses = self.build_clauses(packages)

        solver = SATSolver()
        result = solver.solve(clauses)

        if result.satisfiable:
            return Resolution(success=True, packages=result.model)
        else:
            conflicts = self.explain_conflicts(result.unsat_core)
            alternatives = self.suggest_alternatives(conflicts)
            return Resolution(success=False, conflicts=conflicts,
                            alternatives=alternatives)

    def build_clauses(self, packages: List[Package]) -> List[Clause]:
        """Convert dependency graph to CNF clauses."""
        clauses = []
        for pkg in packages:
            # If package selected, all dependencies must be selected
            for dep in pkg.requires:
                clauses.append(Implies(pkg, dep))
            # If package selected, no conflicts can be selected
            for conflict in pkg.conflicts:
                clauses.append(Not(And(pkg, conflict)))
        return clauses
```

**Source:** [Libsolv implementation patterns](https://github.com/openSUSE/libsolv)

### Pattern 3: Asynchronous Build Queue with Progress Tracking

**What:** Submit long-running build jobs to queue, track progress, notify on completion.

**When:** User submits build request (ISO generation takes minutes).

**Implementation:**
```python
# API endpoint
@app.post("/api/v1/builds")
async def submit_build(config_id: UUID, background_tasks: BackgroundTasks):
    # Check cache first
    cache_key = compute_hash(config_id)
    cached = await check_cache(cache_key)
    if cached:
        return {"status": "cached", "iso_url": cached.iso_url}

    # Enqueue build job
    job = build_iso.apply_async(
        args=[config_id],
        priority=5,
        task_id=str(uuid.uuid4())
    )

    return {"status": "queued", "job_id": job.id}

# Celery task
@celery.task(bind=True)
def build_iso(self, config_id: UUID):
    self.update_state(state='DOWNLOADING', meta={'progress': 10})

    # Generate profile
    profile = overlay_engine.generate_profile(config_id)
    self.update_state(state='BUILDING', meta={'progress': 30})

    # Run mkarchiso
    result = subprocess.run([
        'mkarchiso', '-v', '-r',
        '-w', f'/tmp/archiso-{self.request.id}',
        '-o', '/tmp/output',
        profile.path
    ])
    self.update_state(state='UPLOADING', meta={'progress': 80})

    # Upload to object storage
    iso_url = upload_iso(f'/tmp/output/archlinux.iso')

    return {"iso_url": iso_url, "progress": 100}
```

**Source:** [Celery best practices](https://docs.celeryq.dev/), [Web-Queue-Worker pattern](https://learn.microsoft.com/en-us/azure/architecture/guide/architecture-styles/web-queue-worker)

### Pattern 4: Cache-First Build Strategy

**What:** Hash configuration, check cache before building, reuse identical ISOs.

**When:** User submits build that may have been built previously.

**Implementation:**
```python
def compute_config_hash(config_id: UUID) -> str:
    """Create deterministic hash of configuration."""
    config = db.query(Config).get(config_id)

    # Include all layers, packages, files in hash
    hash_input = {
        "layers": sorted([
            {
                "type": layer.type,
                "packages": sorted(layer.packages),
                "files": sorted([
                    {"path": f.path, "content_hash": hash(f.content)}
                    for f in layer.files
                ])
            }
            for layer in config.layers
        ], key=lambda x: x["type"])
    }

    return hashlib.sha256(
        json.dumps(hash_input, sort_keys=True).encode()
    ).hexdigest()

async def check_cache(config_hash: str) -> Optional[CachedBuild]:
    """Check if ISO exists for this configuration."""
    cached = await db.query(BuildCache).filter_by(
        config_hash=config_hash
    ).first()

    if cached and cached.iso_exists():
        # Update access metadata
        cached.last_accessed = datetime.now()
        cached.access_count += 1
        await db.commit()
        return cached

    return None
```

**Benefit:** Reduces build time from minutes to seconds for repeated configurations. Critical for popular base configurations (e.g., "KDE Desktop with development tools").

## Anti-Patterns to Avoid

### Anti-Pattern 1: Blocking API Calls During Build

**What:** Synchronously waiting for ISO build to complete in API endpoint.

**Why bad:** Ties up API worker for minutes, prevents handling other requests, poor user experience with timeout risks.

**Instead:** Use asynchronous task queue (Celery) with WebSocket/SSE for progress updates. API returns immediately with job_id, frontend polls or subscribes to updates.

**Example:**
```python
# BAD: Blocking build
@app.post("/builds")
def build(config_id):
    iso = generate_iso(config_id)  # Takes 10 minutes!
    return {"iso_url": iso}

# GOOD: Async queue
@app.post("/builds")
async def build(config_id):
    job = build_iso.delay(config_id)
    return {"job_id": job.id, "status": "queued"}
```

### Anti-Pattern 2: Duplicating State Between React and Three.js

**What:** Maintaining separate state trees for application data and 3D scene, manually syncing.

**Why bad:** State gets out of sync, bugs from inconsistent data, complexity in update logic.

**Instead:** Single source of truth in React state. Scene derives from state. User interactions → dispatch actions → update state → scene re-renders.

**Example:**
```javascript
// BAD: Separate state
const [appState, setAppState] = useState({packages: []});
const [sceneObjects, setSceneObjects] = useState([]);

// GOOD: Scene derives from app state
const [config, setConfig] = useState({packages: []});

function Scene({packages}) {
  return packages.map(pkg => <PackageMesh key={pkg.id} {...pkg} />);
}
```

**Source:** [React Three Fiber state management best practices](https://medium.com/cortico/3d-data-visualization-with-react-and-three-js-7272fb6de432)

### Anti-Pattern 3: Storing Large Files in PostgreSQL

**What:** Storing ISO files (1-4GB) or build logs (megabytes) as BYTEA in PostgreSQL.

**Why bad:** Database bloat, slow backups, memory pressure, poor performance for large blob operations.

**Instead:** Store large files in object storage (S3/MinIO), keep URLs/metadata in PostgreSQL.

**Example:**
```sql
-- BAD: ISO in database
CREATE TABLE builds (
    id UUID PRIMARY KEY,
    iso_data BYTEA  -- 2GB blob!
);

-- GOOD: URL reference
CREATE TABLE builds (
    id UUID PRIMARY KEY,
    iso_url VARCHAR(2048),  -- s3://bucket/isos/{id}.iso
    iso_checksum VARCHAR(128),
    iso_size_bytes BIGINT
);
```

### Anti-Pattern 4: Running Multiple Builds Per Worker Concurrently

**What:** Allowing a single Celery worker to process multiple ISO builds in parallel.

**Why bad:** ISO generation is CPU and memory intensive (compressing filesystem, creating squashfs). Running multiple builds causes resource contention, thrashing, and OOM kills.

**Instead:** Configure Celery workers with concurrency=1 for build tasks. Run one build per worker. Scale horizontally with multiple workers.

**Example:**
```bash
# BAD: Multiple concurrent builds
celery -A app worker --concurrency=4  # 4 builds at once on 6-core machine

# GOOD: One build per worker
celery -A app worker --concurrency=1 -Q builds  # Start 6 workers for 6 cores
```

### Anti-Pattern 5: No Dependency Validation Until Build Time

**What:** Allowing users to save configurations without checking package conflicts, discovering issues during ISO build.

**Why bad:** Wastes build resources (minutes of CPU time), poor user experience (delayed error feedback), difficult to debug which package caused failure.

**Instead:** Run dependency resolution in API layer during configuration save/update. Provide immediate feedback with conflict explanations and alternatives.

**Example:**
```python
# BAD: Validate during build
@celery.task
def build_iso(config_id):
    packages = load_packages(config_id)
    result = resolve_dependencies(packages)  # Fails here after queueing!
    if not result.valid:
        raise BuildError("Conflicts detected")

# GOOD: Validate on save
@app.post("/configs")
async def save_config(config: ConfigInput):
    resolution = dependency_resolver.resolve(config.packages)
    if not resolution.valid:
        return {"error": "conflicts", "details": resolution.conflicts}

    await db.save(config)
    return {"success": True}
```

## Scalability Considerations

| Concern | At 100 users | At 10K users | At 1M users |
|---------|--------------|--------------|-------------|
| **API Layer** | Single FastAPI instance | Multiple instances behind load balancer | Auto-scaling group, CDN for static assets |
| **Build Queue** | Single Redis broker | Redis cluster or RabbitMQ | Kafka for high-throughput messaging |
| **Workers** | 1 build server (6 cores) | 3-5 build servers | Auto-scaling worker pool, spot instances |
| **Database** | Single PostgreSQL instance | Primary + read replicas | Sharded PostgreSQL or distributed SQL (CockroachDB) |
| **Storage** | Local MinIO | S3-compatible with CDN | Multi-region S3 with CloudFront |
| **Caching** | In-memory cache | Redis cache cluster | Multi-tier cache (Redis + CDN) |

### Horizontal Scaling Strategy

**API Layer:**
- Stateless FastAPI instances (session in DB/Redis)
- Load balancer (Nginx, HAProxy, AWS ALB)
- Auto-scaling based on CPU/request latency

**Build Workers:**
- Independent Celery workers connecting to shared broker
- Each worker runs 1 build at a time
- Scale workers based on queue depth (add workers when >10 jobs queued)

**Database:**
- Read replicas for queries (config lookups)
- Write operations to primary (build status updates)
- Connection pooling (PgBouncer)

**Storage:**
- Object storage is inherently scalable
- CDN for ISO downloads (reduce egress costs)
- Lifecycle policies (delete ISOs older than 30 days if not accessed)

## Build Order Implications for Development

### Phase 1: Core Infrastructure
**What to build:** Database schema, basic API scaffolding, object storage setup.
**Why first:** Foundation for all other components. No dependencies on complex logic.
**Duration estimate:** 1-2 weeks

### Phase 2: Configuration Management
**What to build:** Layer data models, CRUD endpoints, basic validation.
**Why second:** Enables testing configuration storage before complex dependency resolution.
**Duration estimate:** 1-2 weeks

### Phase 3: Dependency Resolver (Simplified)
**What to build:** Basic conflict detection (direct conflicts only, no SAT solver yet).
**Why third:** Provides early validation capability. Full SAT solver can wait.
**Duration estimate:** 1 week

### Phase 4: Overlay Engine
**What to build:** Layer merging logic, profile generation for archiso.
**Why fourth:** Requires configuration data models from Phase 2. Produces profiles for builds.
**Duration estimate:** 2 weeks

### Phase 5: Build Queue + Workers
**What to build:** Celery setup, basic build task, worker orchestration.
**Why fifth:** Depends on Overlay Engine for profile generation. Core value delivery.
**Duration estimate:** 2-3 weeks

### Phase 6: Frontend (Basic)
**What to build:** React UI for configuration (forms, no 3D yet), build submission.
**Why sixth:** API must exist first. Provides usable interface for testing builds.
**Duration estimate:** 2-3 weeks

### Phase 7: Advanced Dependency Resolution
**What to build:** Full SAT solver integration, conflict explanations, alternatives.
**Why seventh:** Complex feature. System works with basic validation from Phase 3.
**Duration estimate:** 2-3 weeks

### Phase 8: 3D Visualization
**What to build:** Three.js integration, layer visualization, visual debugging.
**Why eighth:** Polish/differentiator feature. Core functionality works without it.
**Duration estimate:** 3-4 weeks

### Phase 9: Caching + Optimization
**What to build:** Build cache, package cache, performance tuning.
**Why ninth:** Optimization after core features work. Requires usage data to tune.
**Duration estimate:** 1-2 weeks

**Total estimated duration:** 17-23 weeks (4-6 months)

## Critical Architectural Decisions

### Decision 1: Message Broker (Redis vs RabbitMQ)

**Recommendation:** Start with Redis, migrate to RabbitMQ if reliability requirements increase.

**Rationale:**
- Redis: Lower latency, simpler setup, sufficient for <10K builds/day
- RabbitMQ: Higher reliability, message persistence, better for >100K builds/day

**When to switch:** If experiencing message loss or need guaranteed delivery.

### Decision 2: Container-Based vs. Direct archiso

**Recommendation:** Use direct archiso (mkarchiso) on bare metal workers initially.

**Rationale:**
- Container-based (like Bazzite/Universal Blue) adds complexity (OCI image builds)
- Direct archiso is simpler, well-documented, less abstraction
- Can containerize workers later if isolation/portability becomes critical

**When to reconsider:** Multi-cloud deployment or need strong isolation between builds.

### Decision 3: Monolithic vs. Microservices API

**Recommendation:** Start monolithic (single FastAPI app), split services if scaling demands.

**Rationale:**
- Monolith: Faster development, easier debugging, sufficient for <100K users
- Microservices: Adds operational complexity (service mesh, inter-service communication)

**When to split:** If specific services (e.g., dependency resolver) need independent scaling.

### Decision 4: Real-Time Updates (WebSocket vs. SSE vs. Polling)

**Recommendation:** Use Server-Sent Events (SSE) for build progress.

**Rationale:**
- WebSocket: Bidirectional, but overkill for one-way progress updates
- SSE: Simpler, built-in reconnection, sufficient for progress streaming
- Polling: Wasteful, higher latency

**Implementation:**
```python
@app.get("/api/v1/builds/{job_id}/stream")
async def stream_progress(job_id: str):
    async def event_generator():
        while True:
            status = await get_job_status(job_id)
            yield f"data: {json.dumps(status)}\n\n"
            if status['state'] in ['SUCCESS', 'FAILURE']:
                break
            await asyncio.sleep(1)

    return EventSourceResponse(event_generator())
```

## Sources

**Archiso & Build Systems:**
- [Archiso ArchWiki](https://wiki.archlinux.org/title/Archiso) - MEDIUM confidence
- [Custom Archiso Tutorial 2024](https://serverless.industries/2024/12/30/custom-archiso.en.html) - MEDIUM confidence
- [Bazzite ISO Build Process](https://deepwiki.com/ublue-os/bazzite/2.6-iso-build-process) - MEDIUM confidence
- [Universal Blue](https://universal-blue.org/) - MEDIUM confidence

**Dependency Resolution:**
- [Libsolv SAT Solver](https://github.com/openSUSE/libsolv) - HIGH confidence (official)
- [Version SAT Research](https://research.swtch.com/version-sat) - HIGH confidence
- [Dependency Resolution Made Simple](https://borretti.me/article/dependency-resolution-made-simple) - MEDIUM confidence
- [Package Conflict Resolution](https://distropack.dev/Blog/Post?slug=package-conflict-resolution-handling-conflicting-packages) - LOW confidence

**API & Queue Architecture:**
- [FastAPI Architecture Patterns 2026](https://medium.com/algomart/modern-fastapi-architecture-patterns-for-scalable-production-systems-41a87b165a8b) - MEDIUM confidence
- [Celery Documentation](https://docs.celeryq.dev/) - HIGH confidence (official)
- [Web-Queue-Worker Pattern - Azure](https://learn.microsoft.com/en-us/azure/architecture/guide/architecture-styles/web-queue-worker) - HIGH confidence (official)
- [Design Distributed Job Scheduler](https://www.systemdesignhandbook.com/guides/design-a-distributed-job-scheduler/) - MEDIUM confidence

**Storage & Database:**
- [PostgreSQL Schema Design Best Practices](https://wiki.postgresql.org/wiki/Database_Schema_Recommendations_for_an_Application) - HIGH confidence (official)
- [OverlayFS Linux Kernel Docs](https://docs.kernel.org/filesystems/overlayfs.html) - HIGH confidence (official)

**Frontend:**
- [React Three Fiber Performance 2026](https://graffersid.com/react-three-fiber-vs-three-js/) - MEDIUM confidence
- [3D Data Visualization with React](https://medium.com/cortico/3d-data-visualization-with-react-and-three-js-7272fb6de432) - MEDIUM confidence

## Confidence Assessment

- **Overall Architecture:** MEDIUM-HIGH - Based on established patterns (web-queue-worker, archiso) with modern 2026 practices
- **Component Boundaries:** HIGH - Clear separation of concerns, well-defined interfaces
- **Build Process:** HIGH - archiso is well-documented, multiple reference implementations
- **Dependency Resolution:** MEDIUM - SAT solver approach is proven, but integration complexity unknown
- **Scalability:** MEDIUM - Patterns are sound, but specific bottlenecks depend on usage patterns
- **Frontend 3D:** MEDIUM - Three.js + React patterns established, but performance depends on complexity