# Architecture Patterns: Linux Distribution Builder Platform **Domain:** Web-based Linux distribution customization and ISO generation **Researched:** 2026-01-25 **Confidence:** MEDIUM-HIGH ## Executive Summary Linux distribution builder platforms combine web interfaces with backend build systems, overlaying configuration layers onto base distributions to create customized bootable ISOs. Modern architectures (2026) leverage container-based immutable systems, asynchronous task queues, and SAT-solver dependency resolution. The Debate platform architecture aligns with established patterns from archiso, Universal Blue/Bazzite, and web-queue-worker patterns. ## Recommended Architecture The Debate platform should follow a **layered web-queue-worker architecture** with these tiers: ``` ┌─────────────────────────────────────────────────────────────────┐ │ PRESENTATION LAYER │ │ React Frontend + Three.js 3D Visualization │ │ (User configuration interface, visual package builder) │ └────────────────────┬────────────────────────────────────────────┘ │ HTTP/WebSocket ┌────────────────────▼────────────────────────────────────────────┐ │ API LAYER │ │ FastAPI (async endpoints, validation, session management) │ └────────────────────┬────────────────────────────────────────────┘ │ ┌───────────┼───────────┐ │ │ │ ┌────────▼──────┐ ┌─▼─────────┐ ┌▼───────────────┐ │ Dependency │ │ Overlay │ │ Build Queue │ │ Resolver │ │ Engine │ │ Manager │ │ (SAT solver) │ │ (Layers) │ │ (Celery) │ └────────┬──────┘ └─┬─────────┘ └┬───────────────┘ │ │ │ └──────────┼─────────────┘ │ ┌───────────────────▼─────────────────────────────────────────────┐ │ PERSISTENCE LAYER │ │ PostgreSQL (config, user data, build metadata) │ │ Object Storage (ISO cache, build artifacts) │ └──────────────────────────────────────────────────────────────────┘ │ ┌───────────────────▼─────────────────────────────────────────────┐ │ BUILD EXECUTION LAYER │ │ Worker Nodes (Celery workers running archiso/mkarchiso) │ │ - Profile generation │ │ - Package installation to airootfs │ │ - Overlay application (OverlayFS concepts) │ │ - ISO generation with bootloader config │ └──────────────────────────────────────────────────────────────────┘ ``` ## Component Boundaries ### Core Components | Component | Responsibility | Communicates With | State Management | |-----------|---------------|-------------------|------------------| | **React Frontend** | User interaction, 3D visualization, configuration UI | API Layer (REST/WS) | Client-side state (React context/Redux) | | **Three.js Renderer** | 3D package/layer visualization, visual debugging | React components | Scene state separate from app state | | **FastAPI Gateway** | Request routing, validation, auth, session mgmt | All backend services | Stateless (session in DB/cache) | | **Dependency Resolver** | Package conflict detection, SAT solving, suggestions | API Layer, Database | Computation-only (no persistent state) | | **Overlay Engine** | Layer composition, configuration merging, precedence | Build Queue, Database | Configuration versioning in DB | | **Build Queue Manager** | Job scheduling, worker coordination, priority mgmt | Celery broker (Redis/RabbitMQ) | Queue state in message broker | | **Celery Workers** | ISO build execution, archiso orchestration | Build Queue, Object Storage | Job state tracked in result backend | | **PostgreSQL DB** | User data, build configs, metadata, audit logs | All backend services | ACID transactional storage | | **Object Storage** | ISO caching, build artifacts, profile storage | Workers, API (download endpoint) | Immutable blob storage | ### Detailed Component Architecture #### 1. Presentation Layer (React + Three.js) **Purpose:** Provide visual interface for distribution customization with 3D representation of layers. **Architecture Pattern:** - **State Management:** Application state in React (configuration data) separate from scene state (3D objects). Changes flow from app state → scene rendering. - **Performance:** Use React Three Fiber (r3f) for declarative Three.js integration. Target 60 FPS, <100MB memory. - **Optimization:** InstancedMesh for repeated elements (packages), frustum culling, lazy loading with Suspense, GPU resource cleanup with dispose(). - **Model Format:** GLTF/GLB for 3D assets. **Communication:** - REST API for CRUD operations (save configuration, list builds) - WebSocket for real-time build progress updates - Server-Sent Events (SSE) alternative for progress streaming **Sources:** - [React Three Fiber vs. Three.js Performance Guide 2026](https://graffersid.com/react-three-fiber-vs-three-js/) - [3D Data Visualization with React and Three.js](https://medium.com/cortico/3d-data-visualization-with-react-and-three-js-7272fb6de432) #### 2. API Layer (FastAPI) **Purpose:** Asynchronous API gateway handling request validation, routing, and coordination. **Architecture Pattern:** - **Layered Structure:** Separate routers (by domain), services (business logic), and data access layers. - **Async I/O:** Use async/await throughout to prevent blocking on database/queue operations. - **Middleware:** Custom logging, metrics, error handling middleware for observability. - **Validation:** Pydantic models for request/response validation. **Endpoints:** - `/api/v1/configurations` - CRUD for user configurations - `/api/v1/packages` - Package search, metadata, conflicts - `/api/v1/builds` - Submit build, query status, download ISO - `/api/v1/layers` - Layer definitions (Opening Statement, Platform, etc.) - `/ws/builds/{build_id}` - WebSocket for build progress **Performance:** FastAPI achieves 300% better performance than synchronous frameworks for I/O-bound operations (2026 benchmarks). **Sources:** - [Modern FastAPI Architecture Patterns 2026](https://medium.com/algomart/modern-fastapi-architecture-patterns-for-scalable-production-systems-41a87b165a8b) - [FastAPI for Microservices 2025](https://talent500.com/blog/fastapi-microservices-python-api-design-patterns-2025/) #### 3. Dependency Resolver **Purpose:** Detect package conflicts, resolve dependencies, suggest alternatives using SAT solver algorithms. **Architecture Pattern:** - **SAT Solver Implementation:** Use libsolv (openSUSE) or similar SAT-based approach. Translate package dependencies to logic clauses, apply CDCL algorithm. - **Algorithm:** Conflict-Driven Clause Learning (CDCL) solves NP-complete dependency problems in milliseconds for typical workloads. - **Input:** Package selection across 5 layers (Opening Statement, Platform, Rhetoric, Talking Points, Closing Argument). - **Output:** Valid package set or conflict report with suggested resolutions. **Data Structure:** ``` Package Dependency Graph: - Nodes: Packages (name, version, layer) - Edges: Dependencies (requires, conflicts, provides, suggests) - Constraints: Version ranges, mutual exclusions ``` **Integration:** - Called synchronously from API during configuration validation - Pre-compute common dependency sets for base layers (cache results) - Asynchronous deep resolution for full build validation **Sources:** - [Libsolv SAT Solver](https://github.com/openSUSE/libsolv) - [Version SAT Research](https://research.swtch.com/version-sat) - [Dependency Resolution Made Simple](https://borretti.me/article/dependency-resolution-made-simple) #### 4. Overlay Engine **Purpose:** Manage layered configuration packages, applying merge strategies and precedence rules. **Architecture Pattern:** - **Layer Model:** 5 layers with defined precedence (Closing Argument > Talking Points > Rhetoric > Platform > Opening Statement). - **OverlayFS Inspiration:** Conceptually similar to OverlayFS union mounting, where upper layers override lower layers. - **Configuration Merging:** Files from higher layers replace/merge with lower layers based on merge strategy (replace, merge-append, merge-deep). **Layer Structure:** ``` Layer Definition: - id: unique identifier - name: user-facing name (e.g., "Platform") - order: precedence (1=lowest, 5=highest) - packages: list of package selections - files: custom files to overlay - merge_strategy: how to handle conflicts ``` **Merge Strategies:** - **Replace:** Higher layer file completely replaces lower - **Merge-Append:** Concatenate files (e.g., package lists) - **Merge-Deep:** Smart merge (e.g., JSON/YAML key merging) **Output:** Unified archiso profile with: - `packages.x86_64` (merged package list) - `airootfs/` directory (merged filesystem overlay) - `profiledef.sh` (combined metadata) **Sources:** - [OverlayFS Linux Kernel Documentation](https://docs.kernel.org/filesystems/overlayfs.html) - [OverlayFS ArchWiki](https://wiki.archlinux.org/title/Overlay_filesystem) #### 5. Build Queue Manager (Celery) **Purpose:** Distributed task queue for asynchronous ISO build jobs with priority scheduling. **Architecture Pattern:** - **Web-Queue-Worker Pattern:** Web frontend → Message queue → Worker pool - **Message Broker:** Redis (low latency) or RabbitMQ (high reliability) for job queue - **Result Backend:** Redis or PostgreSQL for job status/results - **Worker Pool:** Multiple Celery workers (one per build server core for CPU-bound builds) **Job Types:** 1. **Quick Validation:** Dependency resolution (seconds) - High priority 2. **Full Build:** ISO generation (minutes) - Normal priority 3. **Cache Warming:** Pre-build common configurations - Low priority **Scheduling:** - **Priority Queue:** User-initiated builds > automated cache warming - **Rate Limiting:** Prevent queue flooding, enforce user quotas - **Retry Logic:** Automatic retry with exponential backoff for transient failures - **Timeout:** Per-job timeout (e.g., 30 min max for build) **Coordinator Pattern:** - Single coordinator manages job assignment and worker health - Leader election for coordinator HA (if scaled beyond single instance) **Monitoring:** - Job state transitions logged to PostgreSQL - Metrics: queue depth, worker utilization, average build time - Dead letter queue for failed jobs requiring manual investigation **Sources:** - [Celery Distributed Task Queue](https://docs.celeryq.dev/) - [Design Distributed Job Scheduler](https://www.systemdesignhandbook.com/guides/design-a-distributed-job-scheduler/) - [Web-Queue-Worker Architecture - Azure](https://learn.microsoft.com/en-us/azure/architecture/guide/architecture-styles/web-queue-worker) #### 6. Build Execution Workers (archiso-based) **Purpose:** Execute ISO generation using archiso (mkarchiso) with custom profiles. **Architecture Pattern:** - **Profile-Based Build:** Generate temporary archiso profile per build job - **Isolation:** Each build runs in isolated environment (separate working directory) - **Stages:** Profile generation → Package installation → Customization → ISO creation **Build Process Flow:** ``` 1. Profile Generation (Overlay Engine output) ├── Create temp directory ├── Write packages.x86_64 (merged package list) ├── Write profiledef.sh (metadata, permissions) ├── Copy airootfs/ overlay files └── Configure bootloaders (syslinux, grub, systemd-boot) 2. Package Installation ├── mkarchiso downloads packages (pacman cache) ├── Install to work_dir/x86_64/airootfs └── Apply package configurations 3. Customization (customize_airootfs.sh) ├── Enable systemd services ├── Apply user-specific configs ├── Run post-install scripts └── Set permissions 4. ISO Generation ├── Create kernel and initramfs images ├── Build squashfs filesystem ├── Assemble bootable ISO ├── Generate checksums └── Move to output directory 5. Post-Processing ├── Upload ISO to object storage ├── Update database (build status, ISO location) ├── Cache metadata for reuse └── Clean up working directory ``` **Worker Configuration:** - **Resource Limits:** 1 build per worker (CPU/memory intensive) - **Concurrency:** 6 workers max (6-core build server) - **Working Directory:** `/tmp/archiso-tmp-{job_id}` (cleaned after completion with -r flag) - **Output Directory:** Temporary → Object storage → Local cleanup **Optimizations:** - **Package Cache:** Shared pacman cache across workers (prevent redundant downloads) - **Layer Caching:** Cache common base layers (Opening Statement variations) - **Incremental Builds:** Detect unchanged layers, reuse previous airootfs where possible **Sources:** - [Archiso ArchWiki](https://wiki.archlinux.org/title/Archiso) - [Custom Archiso Tutorial](https://serverless.industries/2024/12/30/custom-archiso.en.html) #### 7. Persistence Layer (PostgreSQL + Object Storage) **Purpose:** Store configuration data, build metadata, and build artifacts. **PostgreSQL Schema Design:** ```sql -- User configurations CREATE SCHEMA configurations; CREATE TABLE configurations.user_configs ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), user_id UUID NOT NULL, name VARCHAR(255) NOT NULL, description TEXT, created_at TIMESTAMP DEFAULT NOW(), updated_at TIMESTAMP DEFAULT NOW() ); CREATE TABLE configurations.layers ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), config_id UUID REFERENCES configurations.user_configs(id), layer_type VARCHAR(50) NOT NULL, -- opening_statement, platform, rhetoric, etc. layer_order INT NOT NULL, merge_strategy VARCHAR(50) DEFAULT 'replace' ); CREATE TABLE configurations.layer_packages ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), layer_id UUID REFERENCES configurations.layers(id), package_name VARCHAR(255) NOT NULL, package_version VARCHAR(50), required BOOLEAN DEFAULT TRUE ); CREATE TABLE configurations.layer_files ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), layer_id UUID REFERENCES configurations.layers(id), file_path VARCHAR(1024) NOT NULL, -- path in airootfs file_content TEXT, -- for small configs file_storage_url VARCHAR(2048), -- for large files in object storage permissions VARCHAR(4) DEFAULT '0644' ); -- Build management CREATE SCHEMA builds; CREATE TABLE builds.build_jobs ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), config_id UUID REFERENCES configurations.user_configs(id), status VARCHAR(50) NOT NULL, -- queued, running, success, failed priority INT DEFAULT 5, started_at TIMESTAMP, completed_at TIMESTAMP, iso_url VARCHAR(2048), -- object storage location iso_checksum VARCHAR(128), error_message TEXT, build_log_url VARCHAR(2048) ); CREATE TABLE builds.build_cache ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), config_hash VARCHAR(64) UNIQUE NOT NULL, -- hash of layer config iso_url VARCHAR(2048), created_at TIMESTAMP DEFAULT NOW(), last_accessed TIMESTAMP DEFAULT NOW(), access_count INT DEFAULT 0 ); -- Package metadata CREATE SCHEMA packages; CREATE TABLE packages.package_metadata ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), name VARCHAR(255) UNIQUE NOT NULL, description TEXT, repository VARCHAR(100), -- core, extra, community, aur version VARCHAR(50), dependencies JSONB, -- {requires: [], conflicts: [], provides: []} last_updated TIMESTAMP DEFAULT NOW() ); ``` **Schema Organization Best Practices (2026):** - Separate schemas for functional areas (configurations, builds, packages) - Schema-level access control for security isolation - CI/CD integration with migration tools (Flyway, Alembic) - Indexes on frequently queried fields (config_id, status, config_hash) **Object Storage:** - **Purpose:** Store ISOs (large files, 1-4GB), build logs, custom overlay files - **Technology:** S3-compatible (AWS S3, MinIO, Cloudflare R2) - **Structure:** - `/isos/{build_id}.iso` - Generated ISOs - `/logs/{build_id}.log` - Build logs - `/overlays/{layer_id}/{file_path}` - Custom files too large for DB - `/cache/{config_hash}.iso` - Cached ISOs for reuse **Sources:** - [PostgreSQL Schema Design Best Practices 2026](https://wiki.postgresql.org/wiki/Database_Schema_Recommendations_for_an_Application) - [SQL Database Fundamentals 2026](https://www.nucamp.co/blog/sql-and-database-fundamentals-in-2026-queries-design-and-postgresql-essentials) ## Data Flow ### Configuration Creation Flow ``` User (Frontend) ↓ (1) Create/Edit configuration API Layer (Validation) ↓ (2) Validate input Dependency Resolver ↓ (3) Check conflicts ↓ (4) Return validation result API Layer ↓ (5) Save configuration PostgreSQL (configurations schema) ↓ (6) Return config_id Frontend (Display confirmation) ``` ### Build Submission Flow ``` User (Frontend) ↓ (1) Submit build request API Layer ↓ (2) Check cache (config hash) PostgreSQL (build_cache) ├─→ (3a) Cache hit: return cached ISO URL └─→ (3b) Cache miss: create build job Build Queue Manager (Celery) ↓ (4) Enqueue job with priority Message Broker (Redis/RabbitMQ) ↓ (5) Job dispatched to worker Celery Worker ↓ (6a) Fetch configuration from DB ↓ (6b) Generate archiso profile (Overlay Engine) ↓ (6c) Execute mkarchiso ↓ (6d) Upload ISO to object storage ↓ (6e) Update build status in DB PostgreSQL + Object Storage ↓ (7) Job complete API Layer (WebSocket) ↓ (8) Notify user Frontend (Display download link) ``` ### Real-Time Progress Updates Flow ``` Celery Worker ↓ (1) Emit progress events during build ↓ (e.g., "downloading packages", "generating ISO") Celery Result Backend ↓ (2) Store progress state API Layer (WebSocket handler) ↓ (3) Poll/subscribe to job progress ↓ (4) Push updates to client Frontend (WebSocket listener) ↓ (5) Update UI progress bar ``` ## Patterns to Follow ### Pattern 1: Layered Configuration Precedence **What:** Higher layers override lower layers with defined merge strategies. **When:** User customizes configuration across multiple layers (Platform, Rhetoric, etc.). **Implementation:** ```python class OverlayEngine: def merge_layers(self, layers: List[Layer]) -> Profile: """Merge layers from lowest to highest precedence.""" sorted_layers = sorted(layers, key=lambda l: l.order) profile = Profile() for layer in sorted_layers: profile = self.apply_layer(profile, layer) return profile def apply_layer(self, profile: Profile, layer: Layer) -> Profile: """Apply layer based on merge strategy.""" if layer.merge_strategy == "replace": profile.files.update(layer.files) # Overwrite elif layer.merge_strategy == "merge-append": profile.packages.extend(layer.packages) # Append elif layer.merge_strategy == "merge-deep": profile.config = deep_merge(profile.config, layer.config) return profile ``` **Source:** OverlayFS union mount concepts applied to configuration management. ### Pattern 2: SAT-Based Dependency Resolution **What:** Translate package dependencies to boolean satisfiability problem, solve with CDCL algorithm. **When:** User adds package to configuration, system detects conflicts. **Implementation:** ```python class DependencyResolver: def resolve(self, packages: List[Package]) -> Resolution: """Resolve dependencies using SAT solver.""" clauses = self.build_clauses(packages) solver = SATSolver() result = solver.solve(clauses) if result.satisfiable: return Resolution(success=True, packages=result.model) else: conflicts = self.explain_conflicts(result.unsat_core) alternatives = self.suggest_alternatives(conflicts) return Resolution(success=False, conflicts=conflicts, alternatives=alternatives) def build_clauses(self, packages: List[Package]) -> List[Clause]: """Convert dependency graph to CNF clauses.""" clauses = [] for pkg in packages: # If package selected, all dependencies must be selected for dep in pkg.requires: clauses.append(Implies(pkg, dep)) # If package selected, no conflicts can be selected for conflict in pkg.conflicts: clauses.append(Not(And(pkg, conflict))) return clauses ``` **Source:** [Libsolv implementation patterns](https://github.com/openSUSE/libsolv) ### Pattern 3: Asynchronous Build Queue with Progress Tracking **What:** Submit long-running build jobs to queue, track progress, notify on completion. **When:** User submits build request (ISO generation takes minutes). **Implementation:** ```python # API endpoint @app.post("/api/v1/builds") async def submit_build(config_id: UUID, background_tasks: BackgroundTasks): # Check cache first cache_key = compute_hash(config_id) cached = await check_cache(cache_key) if cached: return {"status": "cached", "iso_url": cached.iso_url} # Enqueue build job job = build_iso.apply_async( args=[config_id], priority=5, task_id=str(uuid.uuid4()) ) return {"status": "queued", "job_id": job.id} # Celery task @celery.task(bind=True) def build_iso(self, config_id: UUID): self.update_state(state='DOWNLOADING', meta={'progress': 10}) # Generate profile profile = overlay_engine.generate_profile(config_id) self.update_state(state='BUILDING', meta={'progress': 30}) # Run mkarchiso result = subprocess.run([ 'mkarchiso', '-v', '-r', '-w', f'/tmp/archiso-{self.request.id}', '-o', '/tmp/output', profile.path ]) self.update_state(state='UPLOADING', meta={'progress': 80}) # Upload to object storage iso_url = upload_iso(f'/tmp/output/archlinux.iso') return {"iso_url": iso_url, "progress": 100} ``` **Source:** [Celery best practices](https://docs.celeryq.dev/), [Web-Queue-Worker pattern](https://learn.microsoft.com/en-us/azure/architecture/guide/architecture-styles/web-queue-worker) ### Pattern 4: Cache-First Build Strategy **What:** Hash configuration, check cache before building, reuse identical ISOs. **When:** User submits build that may have been built previously. **Implementation:** ```python def compute_config_hash(config_id: UUID) -> str: """Create deterministic hash of configuration.""" config = db.query(Config).get(config_id) # Include all layers, packages, files in hash hash_input = { "layers": sorted([ { "type": layer.type, "packages": sorted(layer.packages), "files": sorted([ {"path": f.path, "content_hash": hash(f.content)} for f in layer.files ]) } for layer in config.layers ], key=lambda x: x["type"]) } return hashlib.sha256( json.dumps(hash_input, sort_keys=True).encode() ).hexdigest() async def check_cache(config_hash: str) -> Optional[CachedBuild]: """Check if ISO exists for this configuration.""" cached = await db.query(BuildCache).filter_by( config_hash=config_hash ).first() if cached and cached.iso_exists(): # Update access metadata cached.last_accessed = datetime.now() cached.access_count += 1 await db.commit() return cached return None ``` **Benefit:** Reduces build time from minutes to seconds for repeated configurations. Critical for popular base configurations (e.g., "KDE Desktop with development tools"). ## Anti-Patterns to Avoid ### Anti-Pattern 1: Blocking API Calls During Build **What:** Synchronously waiting for ISO build to complete in API endpoint. **Why bad:** Ties up API worker for minutes, prevents handling other requests, poor user experience with timeout risks. **Instead:** Use asynchronous task queue (Celery) with WebSocket/SSE for progress updates. API returns immediately with job_id, frontend polls or subscribes to updates. **Example:** ```python # BAD: Blocking build @app.post("/builds") def build(config_id): iso = generate_iso(config_id) # Takes 10 minutes! return {"iso_url": iso} # GOOD: Async queue @app.post("/builds") async def build(config_id): job = build_iso.delay(config_id) return {"job_id": job.id, "status": "queued"} ``` ### Anti-Pattern 2: Duplicating State Between React and Three.js **What:** Maintaining separate state trees for application data and 3D scene, manually syncing. **Why bad:** State gets out of sync, bugs from inconsistent data, complexity in update logic. **Instead:** Single source of truth in React state. Scene derives from state. User interactions → dispatch actions → update state → scene re-renders. **Example:** ```javascript // BAD: Separate state const [appState, setAppState] = useState({packages: []}); const [sceneObjects, setSceneObjects] = useState([]); // GOOD: Scene derives from app state const [config, setConfig] = useState({packages: []}); function Scene({packages}) { return packages.map(pkg => ); } ``` **Source:** [React Three Fiber state management best practices](https://medium.com/cortico/3d-data-visualization-with-react-and-three-js-7272fb6de432) ### Anti-Pattern 3: Storing Large Files in PostgreSQL **What:** Storing ISO files (1-4GB) or build logs (megabytes) as BYTEA in PostgreSQL. **Why bad:** Database bloat, slow backups, memory pressure, poor performance for large blob operations. **Instead:** Store large files in object storage (S3/MinIO), keep URLs/metadata in PostgreSQL. **Example:** ```sql -- BAD: ISO in database CREATE TABLE builds ( id UUID PRIMARY KEY, iso_data BYTEA -- 2GB blob! ); -- GOOD: URL reference CREATE TABLE builds ( id UUID PRIMARY KEY, iso_url VARCHAR(2048), -- s3://bucket/isos/{id}.iso iso_checksum VARCHAR(128), iso_size_bytes BIGINT ); ``` ### Anti-Pattern 4: Running Multiple Builds Per Worker Concurrently **What:** Allowing a single Celery worker to process multiple ISO builds in parallel. **Why bad:** ISO generation is CPU and memory intensive (compressing filesystem, creating squashfs). Running multiple builds causes resource contention, thrashing, and OOM kills. **Instead:** Configure Celery workers with concurrency=1 for build tasks. Run one build per worker. Scale horizontally with multiple workers. **Example:** ```bash # BAD: Multiple concurrent builds celery -A app worker --concurrency=4 # 4 builds at once on 6-core machine # GOOD: One build per worker celery -A app worker --concurrency=1 -Q builds # Start 6 workers for 6 cores ``` ### Anti-Pattern 5: No Dependency Validation Until Build Time **What:** Allowing users to save configurations without checking package conflicts, discovering issues during ISO build. **Why bad:** Wastes build resources (minutes of CPU time), poor user experience (delayed error feedback), difficult to debug which package caused failure. **Instead:** Run dependency resolution in API layer during configuration save/update. Provide immediate feedback with conflict explanations and alternatives. **Example:** ```python # BAD: Validate during build @celery.task def build_iso(config_id): packages = load_packages(config_id) result = resolve_dependencies(packages) # Fails here after queueing! if not result.valid: raise BuildError("Conflicts detected") # GOOD: Validate on save @app.post("/configs") async def save_config(config: ConfigInput): resolution = dependency_resolver.resolve(config.packages) if not resolution.valid: return {"error": "conflicts", "details": resolution.conflicts} await db.save(config) return {"success": True} ``` ## Scalability Considerations | Concern | At 100 users | At 10K users | At 1M users | |---------|--------------|--------------|-------------| | **API Layer** | Single FastAPI instance | Multiple instances behind load balancer | Auto-scaling group, CDN for static assets | | **Build Queue** | Single Redis broker | Redis cluster or RabbitMQ | Kafka for high-throughput messaging | | **Workers** | 1 build server (6 cores) | 3-5 build servers | Auto-scaling worker pool, spot instances | | **Database** | Single PostgreSQL instance | Primary + read replicas | Sharded PostgreSQL or distributed SQL (CockroachDB) | | **Storage** | Local MinIO | S3-compatible with CDN | Multi-region S3 with CloudFront | | **Caching** | In-memory cache | Redis cache cluster | Multi-tier cache (Redis + CDN) | ### Horizontal Scaling Strategy **API Layer:** - Stateless FastAPI instances (session in DB/Redis) - Load balancer (Nginx, HAProxy, AWS ALB) - Auto-scaling based on CPU/request latency **Build Workers:** - Independent Celery workers connecting to shared broker - Each worker runs 1 build at a time - Scale workers based on queue depth (add workers when >10 jobs queued) **Database:** - Read replicas for queries (config lookups) - Write operations to primary (build status updates) - Connection pooling (PgBouncer) **Storage:** - Object storage is inherently scalable - CDN for ISO downloads (reduce egress costs) - Lifecycle policies (delete ISOs older than 30 days if not accessed) ## Build Order Implications for Development ### Phase 1: Core Infrastructure **What to build:** Database schema, basic API scaffolding, object storage setup. **Why first:** Foundation for all other components. No dependencies on complex logic. **Duration estimate:** 1-2 weeks ### Phase 2: Configuration Management **What to build:** Layer data models, CRUD endpoints, basic validation. **Why second:** Enables testing configuration storage before complex dependency resolution. **Duration estimate:** 1-2 weeks ### Phase 3: Dependency Resolver (Simplified) **What to build:** Basic conflict detection (direct conflicts only, no SAT solver yet). **Why third:** Provides early validation capability. Full SAT solver can wait. **Duration estimate:** 1 week ### Phase 4: Overlay Engine **What to build:** Layer merging logic, profile generation for archiso. **Why fourth:** Requires configuration data models from Phase 2. Produces profiles for builds. **Duration estimate:** 2 weeks ### Phase 5: Build Queue + Workers **What to build:** Celery setup, basic build task, worker orchestration. **Why fifth:** Depends on Overlay Engine for profile generation. Core value delivery. **Duration estimate:** 2-3 weeks ### Phase 6: Frontend (Basic) **What to build:** React UI for configuration (forms, no 3D yet), build submission. **Why sixth:** API must exist first. Provides usable interface for testing builds. **Duration estimate:** 2-3 weeks ### Phase 7: Advanced Dependency Resolution **What to build:** Full SAT solver integration, conflict explanations, alternatives. **Why seventh:** Complex feature. System works with basic validation from Phase 3. **Duration estimate:** 2-3 weeks ### Phase 8: 3D Visualization **What to build:** Three.js integration, layer visualization, visual debugging. **Why eighth:** Polish/differentiator feature. Core functionality works without it. **Duration estimate:** 3-4 weeks ### Phase 9: Caching + Optimization **What to build:** Build cache, package cache, performance tuning. **Why ninth:** Optimization after core features work. Requires usage data to tune. **Duration estimate:** 1-2 weeks **Total estimated duration:** 17-23 weeks (4-6 months) ## Critical Architectural Decisions ### Decision 1: Message Broker (Redis vs RabbitMQ) **Recommendation:** Start with Redis, migrate to RabbitMQ if reliability requirements increase. **Rationale:** - Redis: Lower latency, simpler setup, sufficient for <10K builds/day - RabbitMQ: Higher reliability, message persistence, better for >100K builds/day **When to switch:** If experiencing message loss or need guaranteed delivery. ### Decision 2: Container-Based vs. Direct archiso **Recommendation:** Use direct archiso (mkarchiso) on bare metal workers initially. **Rationale:** - Container-based (like Bazzite/Universal Blue) adds complexity (OCI image builds) - Direct archiso is simpler, well-documented, less abstraction - Can containerize workers later if isolation/portability becomes critical **When to reconsider:** Multi-cloud deployment or need strong isolation between builds. ### Decision 3: Monolithic vs. Microservices API **Recommendation:** Start monolithic (single FastAPI app), split services if scaling demands. **Rationale:** - Monolith: Faster development, easier debugging, sufficient for <100K users - Microservices: Adds operational complexity (service mesh, inter-service communication) **When to split:** If specific services (e.g., dependency resolver) need independent scaling. ### Decision 4: Real-Time Updates (WebSocket vs. SSE vs. Polling) **Recommendation:** Use Server-Sent Events (SSE) for build progress. **Rationale:** - WebSocket: Bidirectional, but overkill for one-way progress updates - SSE: Simpler, built-in reconnection, sufficient for progress streaming - Polling: Wasteful, higher latency **Implementation:** ```python @app.get("/api/v1/builds/{job_id}/stream") async def stream_progress(job_id: str): async def event_generator(): while True: status = await get_job_status(job_id) yield f"data: {json.dumps(status)}\n\n" if status['state'] in ['SUCCESS', 'FAILURE']: break await asyncio.sleep(1) return EventSourceResponse(event_generator()) ``` ## Sources **Archiso & Build Systems:** - [Archiso ArchWiki](https://wiki.archlinux.org/title/Archiso) - MEDIUM confidence - [Custom Archiso Tutorial 2024](https://serverless.industries/2024/12/30/custom-archiso.en.html) - MEDIUM confidence - [Bazzite ISO Build Process](https://deepwiki.com/ublue-os/bazzite/2.6-iso-build-process) - MEDIUM confidence - [Universal Blue](https://universal-blue.org/) - MEDIUM confidence **Dependency Resolution:** - [Libsolv SAT Solver](https://github.com/openSUSE/libsolv) - HIGH confidence (official) - [Version SAT Research](https://research.swtch.com/version-sat) - HIGH confidence - [Dependency Resolution Made Simple](https://borretti.me/article/dependency-resolution-made-simple) - MEDIUM confidence - [Package Conflict Resolution](https://distropack.dev/Blog/Post?slug=package-conflict-resolution-handling-conflicting-packages) - LOW confidence **API & Queue Architecture:** - [FastAPI Architecture Patterns 2026](https://medium.com/algomart/modern-fastapi-architecture-patterns-for-scalable-production-systems-41a87b165a8b) - MEDIUM confidence - [Celery Documentation](https://docs.celeryq.dev/) - HIGH confidence (official) - [Web-Queue-Worker Pattern - Azure](https://learn.microsoft.com/en-us/azure/architecture/guide/architecture-styles/web-queue-worker) - HIGH confidence (official) - [Design Distributed Job Scheduler](https://www.systemdesignhandbook.com/guides/design-a-distributed-job-scheduler/) - MEDIUM confidence **Storage & Database:** - [PostgreSQL Schema Design Best Practices](https://wiki.postgresql.org/wiki/Database_Schema_Recommendations_for_an_Application) - HIGH confidence (official) - [OverlayFS Linux Kernel Docs](https://docs.kernel.org/filesystems/overlayfs.html) - HIGH confidence (official) **Frontend:** - [React Three Fiber Performance 2026](https://graffersid.com/react-three-fiber-vs-three-js/) - MEDIUM confidence - [3D Data Visualization with React](https://medium.com/cortico/3d-data-visualization-with-react-and-three-js-7272fb6de432) - MEDIUM confidence ## Confidence Assessment - **Overall Architecture:** MEDIUM-HIGH - Based on established patterns (web-queue-worker, archiso) with modern 2026 practices - **Component Boundaries:** HIGH - Clear separation of concerns, well-defined interfaces - **Build Process:** HIGH - archiso is well-documented, multiple reference implementations - **Dependency Resolution:** MEDIUM - SAT solver approach is proven, but integration complexity unknown - **Scalability:** MEDIUM - Patterns are sound, but specific bottlenecks depend on usage patterns - **Frontend 3D:** MEDIUM - Three.js + React patterns established, but performance depends on complexity