Files: - STACK.md: Technology stack recommendations (Python 3.12+, FastAPI, React 19+, Vite, Celery, PostgreSQL 18+) - FEATURES.md: Feature landscape analysis (table stakes vs differentiators) - ARCHITECTURE.md: Layered web-queue-worker architecture with SAT-based dependency resolution - PITFALLS.md: Critical pitfalls and prevention strategies - SUMMARY.md: Research synthesis with roadmap implications Key findings: - Stack: Modern 2026 async Python (FastAPI/Celery) + React/Three.js 3D frontend - Architecture: Web-queue-worker pattern with sandboxed archiso builds - Critical pitfall: Build sandboxing required from day one (CHAOS RAT AUR incident July 2025) Recommended 9-phase roadmap: Infrastructure → Config → Dependency → Overlay → Build Queue → Frontend → Advanced SAT → 3D Viz → Optimization Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
38 KiB
Architecture Patterns: Linux Distribution Builder Platform
Domain: Web-based Linux distribution customization and ISO generation Researched: 2026-01-25 Confidence: MEDIUM-HIGH
Executive Summary
Linux distribution builder platforms combine web interfaces with backend build systems, overlaying configuration layers onto base distributions to create customized bootable ISOs. Modern architectures (2026) leverage container-based immutable systems, asynchronous task queues, and SAT-solver dependency resolution. The Debate platform architecture aligns with established patterns from archiso, Universal Blue/Bazzite, and web-queue-worker patterns.
Recommended Architecture
The Debate platform should follow a layered web-queue-worker architecture with these tiers:
┌─────────────────────────────────────────────────────────────────┐
│ PRESENTATION LAYER │
│ React Frontend + Three.js 3D Visualization │
│ (User configuration interface, visual package builder) │
└────────────────────┬────────────────────────────────────────────┘
│ HTTP/WebSocket
┌────────────────────▼────────────────────────────────────────────┐
│ API LAYER │
│ FastAPI (async endpoints, validation, session management) │
└────────────────────┬────────────────────────────────────────────┘
│
┌───────────┼───────────┐
│ │ │
┌────────▼──────┐ ┌─▼─────────┐ ┌▼───────────────┐
│ Dependency │ │ Overlay │ │ Build Queue │
│ Resolver │ │ Engine │ │ Manager │
│ (SAT solver) │ │ (Layers) │ │ (Celery) │
└────────┬──────┘ └─┬─────────┘ └┬───────────────┘
│ │ │
└──────────┼─────────────┘
│
┌───────────────────▼─────────────────────────────────────────────┐
│ PERSISTENCE LAYER │
│ PostgreSQL (config, user data, build metadata) │
│ Object Storage (ISO cache, build artifacts) │
└──────────────────────────────────────────────────────────────────┘
│
┌───────────────────▼─────────────────────────────────────────────┐
│ BUILD EXECUTION LAYER │
│ Worker Nodes (Celery workers running archiso/mkarchiso) │
│ - Profile generation │
│ - Package installation to airootfs │
│ - Overlay application (OverlayFS concepts) │
│ - ISO generation with bootloader config │
└──────────────────────────────────────────────────────────────────┘
Component Boundaries
Core Components
| Component | Responsibility | Communicates With | State Management |
|---|---|---|---|
| React Frontend | User interaction, 3D visualization, configuration UI | API Layer (REST/WS) | Client-side state (React context/Redux) |
| Three.js Renderer | 3D package/layer visualization, visual debugging | React components | Scene state separate from app state |
| FastAPI Gateway | Request routing, validation, auth, session mgmt | All backend services | Stateless (session in DB/cache) |
| Dependency Resolver | Package conflict detection, SAT solving, suggestions | API Layer, Database | Computation-only (no persistent state) |
| Overlay Engine | Layer composition, configuration merging, precedence | Build Queue, Database | Configuration versioning in DB |
| Build Queue Manager | Job scheduling, worker coordination, priority mgmt | Celery broker (Redis/RabbitMQ) | Queue state in message broker |
| Celery Workers | ISO build execution, archiso orchestration | Build Queue, Object Storage | Job state tracked in result backend |
| PostgreSQL DB | User data, build configs, metadata, audit logs | All backend services | ACID transactional storage |
| Object Storage | ISO caching, build artifacts, profile storage | Workers, API (download endpoint) | Immutable blob storage |
Detailed Component Architecture
1. Presentation Layer (React + Three.js)
Purpose: Provide visual interface for distribution customization with 3D representation of layers.
Architecture Pattern:
- State Management: Application state in React (configuration data) separate from scene state (3D objects). Changes flow from app state → scene rendering.
- Performance: Use React Three Fiber (r3f) for declarative Three.js integration. Target 60 FPS, <100MB memory.
- Optimization: InstancedMesh for repeated elements (packages), frustum culling, lazy loading with Suspense, GPU resource cleanup with dispose().
- Model Format: GLTF/GLB for 3D assets.
Communication:
- REST API for CRUD operations (save configuration, list builds)
- WebSocket for real-time build progress updates
- Server-Sent Events (SSE) alternative for progress streaming
Sources:
2. API Layer (FastAPI)
Purpose: Asynchronous API gateway handling request validation, routing, and coordination.
Architecture Pattern:
- Layered Structure: Separate routers (by domain), services (business logic), and data access layers.
- Async I/O: Use async/await throughout to prevent blocking on database/queue operations.
- Middleware: Custom logging, metrics, error handling middleware for observability.
- Validation: Pydantic models for request/response validation.
Endpoints:
/api/v1/configurations- CRUD for user configurations/api/v1/packages- Package search, metadata, conflicts/api/v1/builds- Submit build, query status, download ISO/api/v1/layers- Layer definitions (Opening Statement, Platform, etc.)/ws/builds/{build_id}- WebSocket for build progress
Performance: FastAPI achieves 300% better performance than synchronous frameworks for I/O-bound operations (2026 benchmarks).
Sources:
3. Dependency Resolver
Purpose: Detect package conflicts, resolve dependencies, suggest alternatives using SAT solver algorithms.
Architecture Pattern:
- SAT Solver Implementation: Use libsolv (openSUSE) or similar SAT-based approach. Translate package dependencies to logic clauses, apply CDCL algorithm.
- Algorithm: Conflict-Driven Clause Learning (CDCL) solves NP-complete dependency problems in milliseconds for typical workloads.
- Input: Package selection across 5 layers (Opening Statement, Platform, Rhetoric, Talking Points, Closing Argument).
- Output: Valid package set or conflict report with suggested resolutions.
Data Structure:
Package Dependency Graph:
- Nodes: Packages (name, version, layer)
- Edges: Dependencies (requires, conflicts, provides, suggests)
- Constraints: Version ranges, mutual exclusions
Integration:
- Called synchronously from API during configuration validation
- Pre-compute common dependency sets for base layers (cache results)
- Asynchronous deep resolution for full build validation
Sources:
4. Overlay Engine
Purpose: Manage layered configuration packages, applying merge strategies and precedence rules.
Architecture Pattern:
- Layer Model: 5 layers with defined precedence (Closing Argument > Talking Points > Rhetoric > Platform > Opening Statement).
- OverlayFS Inspiration: Conceptually similar to OverlayFS union mounting, where upper layers override lower layers.
- Configuration Merging: Files from higher layers replace/merge with lower layers based on merge strategy (replace, merge-append, merge-deep).
Layer Structure:
Layer Definition:
- id: unique identifier
- name: user-facing name (e.g., "Platform")
- order: precedence (1=lowest, 5=highest)
- packages: list of package selections
- files: custom files to overlay
- merge_strategy: how to handle conflicts
Merge Strategies:
- Replace: Higher layer file completely replaces lower
- Merge-Append: Concatenate files (e.g., package lists)
- Merge-Deep: Smart merge (e.g., JSON/YAML key merging)
Output: Unified archiso profile with:
packages.x86_64(merged package list)airootfs/directory (merged filesystem overlay)profiledef.sh(combined metadata)
Sources:
5. Build Queue Manager (Celery)
Purpose: Distributed task queue for asynchronous ISO build jobs with priority scheduling.
Architecture Pattern:
- Web-Queue-Worker Pattern: Web frontend → Message queue → Worker pool
- Message Broker: Redis (low latency) or RabbitMQ (high reliability) for job queue
- Result Backend: Redis or PostgreSQL for job status/results
- Worker Pool: Multiple Celery workers (one per build server core for CPU-bound builds)
Job Types:
- Quick Validation: Dependency resolution (seconds) - High priority
- Full Build: ISO generation (minutes) - Normal priority
- Cache Warming: Pre-build common configurations - Low priority
Scheduling:
- Priority Queue: User-initiated builds > automated cache warming
- Rate Limiting: Prevent queue flooding, enforce user quotas
- Retry Logic: Automatic retry with exponential backoff for transient failures
- Timeout: Per-job timeout (e.g., 30 min max for build)
Coordinator Pattern:
- Single coordinator manages job assignment and worker health
- Leader election for coordinator HA (if scaled beyond single instance)
Monitoring:
- Job state transitions logged to PostgreSQL
- Metrics: queue depth, worker utilization, average build time
- Dead letter queue for failed jobs requiring manual investigation
Sources:
- Celery Distributed Task Queue
- Design Distributed Job Scheduler
- Web-Queue-Worker Architecture - Azure
6. Build Execution Workers (archiso-based)
Purpose: Execute ISO generation using archiso (mkarchiso) with custom profiles.
Architecture Pattern:
- Profile-Based Build: Generate temporary archiso profile per build job
- Isolation: Each build runs in isolated environment (separate working directory)
- Stages: Profile generation → Package installation → Customization → ISO creation
Build Process Flow:
1. Profile Generation (Overlay Engine output)
├── Create temp directory
├── Write packages.x86_64 (merged package list)
├── Write profiledef.sh (metadata, permissions)
├── Copy airootfs/ overlay files
└── Configure bootloaders (syslinux, grub, systemd-boot)
2. Package Installation
├── mkarchiso downloads packages (pacman cache)
├── Install to work_dir/x86_64/airootfs
└── Apply package configurations
3. Customization (customize_airootfs.sh)
├── Enable systemd services
├── Apply user-specific configs
├── Run post-install scripts
└── Set permissions
4. ISO Generation
├── Create kernel and initramfs images
├── Build squashfs filesystem
├── Assemble bootable ISO
├── Generate checksums
└── Move to output directory
5. Post-Processing
├── Upload ISO to object storage
├── Update database (build status, ISO location)
├── Cache metadata for reuse
└── Clean up working directory
Worker Configuration:
- Resource Limits: 1 build per worker (CPU/memory intensive)
- Concurrency: 6 workers max (6-core build server)
- Working Directory:
/tmp/archiso-tmp-{job_id}(cleaned after completion with -r flag) - Output Directory: Temporary → Object storage → Local cleanup
Optimizations:
- Package Cache: Shared pacman cache across workers (prevent redundant downloads)
- Layer Caching: Cache common base layers (Opening Statement variations)
- Incremental Builds: Detect unchanged layers, reuse previous airootfs where possible
Sources:
7. Persistence Layer (PostgreSQL + Object Storage)
Purpose: Store configuration data, build metadata, and build artifacts.
PostgreSQL Schema Design:
-- User configurations
CREATE SCHEMA configurations;
CREATE TABLE configurations.user_configs (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
user_id UUID NOT NULL,
name VARCHAR(255) NOT NULL,
description TEXT,
created_at TIMESTAMP DEFAULT NOW(),
updated_at TIMESTAMP DEFAULT NOW()
);
CREATE TABLE configurations.layers (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
config_id UUID REFERENCES configurations.user_configs(id),
layer_type VARCHAR(50) NOT NULL, -- opening_statement, platform, rhetoric, etc.
layer_order INT NOT NULL,
merge_strategy VARCHAR(50) DEFAULT 'replace'
);
CREATE TABLE configurations.layer_packages (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
layer_id UUID REFERENCES configurations.layers(id),
package_name VARCHAR(255) NOT NULL,
package_version VARCHAR(50),
required BOOLEAN DEFAULT TRUE
);
CREATE TABLE configurations.layer_files (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
layer_id UUID REFERENCES configurations.layers(id),
file_path VARCHAR(1024) NOT NULL, -- path in airootfs
file_content TEXT, -- for small configs
file_storage_url VARCHAR(2048), -- for large files in object storage
permissions VARCHAR(4) DEFAULT '0644'
);
-- Build management
CREATE SCHEMA builds;
CREATE TABLE builds.build_jobs (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
config_id UUID REFERENCES configurations.user_configs(id),
status VARCHAR(50) NOT NULL, -- queued, running, success, failed
priority INT DEFAULT 5,
started_at TIMESTAMP,
completed_at TIMESTAMP,
iso_url VARCHAR(2048), -- object storage location
iso_checksum VARCHAR(128),
error_message TEXT,
build_log_url VARCHAR(2048)
);
CREATE TABLE builds.build_cache (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
config_hash VARCHAR(64) UNIQUE NOT NULL, -- hash of layer config
iso_url VARCHAR(2048),
created_at TIMESTAMP DEFAULT NOW(),
last_accessed TIMESTAMP DEFAULT NOW(),
access_count INT DEFAULT 0
);
-- Package metadata
CREATE SCHEMA packages;
CREATE TABLE packages.package_metadata (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
name VARCHAR(255) UNIQUE NOT NULL,
description TEXT,
repository VARCHAR(100), -- core, extra, community, aur
version VARCHAR(50),
dependencies JSONB, -- {requires: [], conflicts: [], provides: []}
last_updated TIMESTAMP DEFAULT NOW()
);
Schema Organization Best Practices (2026):
- Separate schemas for functional areas (configurations, builds, packages)
- Schema-level access control for security isolation
- CI/CD integration with migration tools (Flyway, Alembic)
- Indexes on frequently queried fields (config_id, status, config_hash)
Object Storage:
- Purpose: Store ISOs (large files, 1-4GB), build logs, custom overlay files
- Technology: S3-compatible (AWS S3, MinIO, Cloudflare R2)
- Structure:
/isos/{build_id}.iso- Generated ISOs/logs/{build_id}.log- Build logs/overlays/{layer_id}/{file_path}- Custom files too large for DB/cache/{config_hash}.iso- Cached ISOs for reuse
Sources:
Data Flow
Configuration Creation Flow
User (Frontend)
↓ (1) Create/Edit configuration
API Layer (Validation)
↓ (2) Validate input
Dependency Resolver
↓ (3) Check conflicts
↓ (4) Return validation result
API Layer
↓ (5) Save configuration
PostgreSQL (configurations schema)
↓ (6) Return config_id
Frontend (Display confirmation)
Build Submission Flow
User (Frontend)
↓ (1) Submit build request
API Layer
↓ (2) Check cache (config hash)
PostgreSQL (build_cache)
├─→ (3a) Cache hit: return cached ISO URL
└─→ (3b) Cache miss: create build job
Build Queue Manager (Celery)
↓ (4) Enqueue job with priority
Message Broker (Redis/RabbitMQ)
↓ (5) Job dispatched to worker
Celery Worker
↓ (6a) Fetch configuration from DB
↓ (6b) Generate archiso profile (Overlay Engine)
↓ (6c) Execute mkarchiso
↓ (6d) Upload ISO to object storage
↓ (6e) Update build status in DB
PostgreSQL + Object Storage
↓ (7) Job complete
API Layer (WebSocket)
↓ (8) Notify user
Frontend (Display download link)
Real-Time Progress Updates Flow
Celery Worker
↓ (1) Emit progress events during build
↓ (e.g., "downloading packages", "generating ISO")
Celery Result Backend
↓ (2) Store progress state
API Layer (WebSocket handler)
↓ (3) Poll/subscribe to job progress
↓ (4) Push updates to client
Frontend (WebSocket listener)
↓ (5) Update UI progress bar
Patterns to Follow
Pattern 1: Layered Configuration Precedence
What: Higher layers override lower layers with defined merge strategies.
When: User customizes configuration across multiple layers (Platform, Rhetoric, etc.).
Implementation:
class OverlayEngine:
def merge_layers(self, layers: List[Layer]) -> Profile:
"""Merge layers from lowest to highest precedence."""
sorted_layers = sorted(layers, key=lambda l: l.order)
profile = Profile()
for layer in sorted_layers:
profile = self.apply_layer(profile, layer)
return profile
def apply_layer(self, profile: Profile, layer: Layer) -> Profile:
"""Apply layer based on merge strategy."""
if layer.merge_strategy == "replace":
profile.files.update(layer.files) # Overwrite
elif layer.merge_strategy == "merge-append":
profile.packages.extend(layer.packages) # Append
elif layer.merge_strategy == "merge-deep":
profile.config = deep_merge(profile.config, layer.config)
return profile
Source: OverlayFS union mount concepts applied to configuration management.
Pattern 2: SAT-Based Dependency Resolution
What: Translate package dependencies to boolean satisfiability problem, solve with CDCL algorithm.
When: User adds package to configuration, system detects conflicts.
Implementation:
class DependencyResolver:
def resolve(self, packages: List[Package]) -> Resolution:
"""Resolve dependencies using SAT solver."""
clauses = self.build_clauses(packages)
solver = SATSolver()
result = solver.solve(clauses)
if result.satisfiable:
return Resolution(success=True, packages=result.model)
else:
conflicts = self.explain_conflicts(result.unsat_core)
alternatives = self.suggest_alternatives(conflicts)
return Resolution(success=False, conflicts=conflicts,
alternatives=alternatives)
def build_clauses(self, packages: List[Package]) -> List[Clause]:
"""Convert dependency graph to CNF clauses."""
clauses = []
for pkg in packages:
# If package selected, all dependencies must be selected
for dep in pkg.requires:
clauses.append(Implies(pkg, dep))
# If package selected, no conflicts can be selected
for conflict in pkg.conflicts:
clauses.append(Not(And(pkg, conflict)))
return clauses
Source: Libsolv implementation patterns
Pattern 3: Asynchronous Build Queue with Progress Tracking
What: Submit long-running build jobs to queue, track progress, notify on completion.
When: User submits build request (ISO generation takes minutes).
Implementation:
# API endpoint
@app.post("/api/v1/builds")
async def submit_build(config_id: UUID, background_tasks: BackgroundTasks):
# Check cache first
cache_key = compute_hash(config_id)
cached = await check_cache(cache_key)
if cached:
return {"status": "cached", "iso_url": cached.iso_url}
# Enqueue build job
job = build_iso.apply_async(
args=[config_id],
priority=5,
task_id=str(uuid.uuid4())
)
return {"status": "queued", "job_id": job.id}
# Celery task
@celery.task(bind=True)
def build_iso(self, config_id: UUID):
self.update_state(state='DOWNLOADING', meta={'progress': 10})
# Generate profile
profile = overlay_engine.generate_profile(config_id)
self.update_state(state='BUILDING', meta={'progress': 30})
# Run mkarchiso
result = subprocess.run([
'mkarchiso', '-v', '-r',
'-w', f'/tmp/archiso-{self.request.id}',
'-o', '/tmp/output',
profile.path
])
self.update_state(state='UPLOADING', meta={'progress': 80})
# Upload to object storage
iso_url = upload_iso(f'/tmp/output/archlinux.iso')
return {"iso_url": iso_url, "progress": 100}
Source: Celery best practices, Web-Queue-Worker pattern
Pattern 4: Cache-First Build Strategy
What: Hash configuration, check cache before building, reuse identical ISOs.
When: User submits build that may have been built previously.
Implementation:
def compute_config_hash(config_id: UUID) -> str:
"""Create deterministic hash of configuration."""
config = db.query(Config).get(config_id)
# Include all layers, packages, files in hash
hash_input = {
"layers": sorted([
{
"type": layer.type,
"packages": sorted(layer.packages),
"files": sorted([
{"path": f.path, "content_hash": hash(f.content)}
for f in layer.files
])
}
for layer in config.layers
], key=lambda x: x["type"])
}
return hashlib.sha256(
json.dumps(hash_input, sort_keys=True).encode()
).hexdigest()
async def check_cache(config_hash: str) -> Optional[CachedBuild]:
"""Check if ISO exists for this configuration."""
cached = await db.query(BuildCache).filter_by(
config_hash=config_hash
).first()
if cached and cached.iso_exists():
# Update access metadata
cached.last_accessed = datetime.now()
cached.access_count += 1
await db.commit()
return cached
return None
Benefit: Reduces build time from minutes to seconds for repeated configurations. Critical for popular base configurations (e.g., "KDE Desktop with development tools").
Anti-Patterns to Avoid
Anti-Pattern 1: Blocking API Calls During Build
What: Synchronously waiting for ISO build to complete in API endpoint.
Why bad: Ties up API worker for minutes, prevents handling other requests, poor user experience with timeout risks.
Instead: Use asynchronous task queue (Celery) with WebSocket/SSE for progress updates. API returns immediately with job_id, frontend polls or subscribes to updates.
Example:
# BAD: Blocking build
@app.post("/builds")
def build(config_id):
iso = generate_iso(config_id) # Takes 10 minutes!
return {"iso_url": iso}
# GOOD: Async queue
@app.post("/builds")
async def build(config_id):
job = build_iso.delay(config_id)
return {"job_id": job.id, "status": "queued"}
Anti-Pattern 2: Duplicating State Between React and Three.js
What: Maintaining separate state trees for application data and 3D scene, manually syncing.
Why bad: State gets out of sync, bugs from inconsistent data, complexity in update logic.
Instead: Single source of truth in React state. Scene derives from state. User interactions → dispatch actions → update state → scene re-renders.
Example:
// BAD: Separate state
const [appState, setAppState] = useState({packages: []});
const [sceneObjects, setSceneObjects] = useState([]);
// GOOD: Scene derives from app state
const [config, setConfig] = useState({packages: []});
function Scene({packages}) {
return packages.map(pkg => <PackageMesh key={pkg.id} {...pkg} />);
}
Source: React Three Fiber state management best practices
Anti-Pattern 3: Storing Large Files in PostgreSQL
What: Storing ISO files (1-4GB) or build logs (megabytes) as BYTEA in PostgreSQL.
Why bad: Database bloat, slow backups, memory pressure, poor performance for large blob operations.
Instead: Store large files in object storage (S3/MinIO), keep URLs/metadata in PostgreSQL.
Example:
-- BAD: ISO in database
CREATE TABLE builds (
id UUID PRIMARY KEY,
iso_data BYTEA -- 2GB blob!
);
-- GOOD: URL reference
CREATE TABLE builds (
id UUID PRIMARY KEY,
iso_url VARCHAR(2048), -- s3://bucket/isos/{id}.iso
iso_checksum VARCHAR(128),
iso_size_bytes BIGINT
);
Anti-Pattern 4: Running Multiple Builds Per Worker Concurrently
What: Allowing a single Celery worker to process multiple ISO builds in parallel.
Why bad: ISO generation is CPU and memory intensive (compressing filesystem, creating squashfs). Running multiple builds causes resource contention, thrashing, and OOM kills.
Instead: Configure Celery workers with concurrency=1 for build tasks. Run one build per worker. Scale horizontally with multiple workers.
Example:
# BAD: Multiple concurrent builds
celery -A app worker --concurrency=4 # 4 builds at once on 6-core machine
# GOOD: One build per worker
celery -A app worker --concurrency=1 -Q builds # Start 6 workers for 6 cores
Anti-Pattern 5: No Dependency Validation Until Build Time
What: Allowing users to save configurations without checking package conflicts, discovering issues during ISO build.
Why bad: Wastes build resources (minutes of CPU time), poor user experience (delayed error feedback), difficult to debug which package caused failure.
Instead: Run dependency resolution in API layer during configuration save/update. Provide immediate feedback with conflict explanations and alternatives.
Example:
# BAD: Validate during build
@celery.task
def build_iso(config_id):
packages = load_packages(config_id)
result = resolve_dependencies(packages) # Fails here after queueing!
if not result.valid:
raise BuildError("Conflicts detected")
# GOOD: Validate on save
@app.post("/configs")
async def save_config(config: ConfigInput):
resolution = dependency_resolver.resolve(config.packages)
if not resolution.valid:
return {"error": "conflicts", "details": resolution.conflicts}
await db.save(config)
return {"success": True}
Scalability Considerations
| Concern | At 100 users | At 10K users | At 1M users |
|---|---|---|---|
| API Layer | Single FastAPI instance | Multiple instances behind load balancer | Auto-scaling group, CDN for static assets |
| Build Queue | Single Redis broker | Redis cluster or RabbitMQ | Kafka for high-throughput messaging |
| Workers | 1 build server (6 cores) | 3-5 build servers | Auto-scaling worker pool, spot instances |
| Database | Single PostgreSQL instance | Primary + read replicas | Sharded PostgreSQL or distributed SQL (CockroachDB) |
| Storage | Local MinIO | S3-compatible with CDN | Multi-region S3 with CloudFront |
| Caching | In-memory cache | Redis cache cluster | Multi-tier cache (Redis + CDN) |
Horizontal Scaling Strategy
API Layer:
- Stateless FastAPI instances (session in DB/Redis)
- Load balancer (Nginx, HAProxy, AWS ALB)
- Auto-scaling based on CPU/request latency
Build Workers:
- Independent Celery workers connecting to shared broker
- Each worker runs 1 build at a time
- Scale workers based on queue depth (add workers when >10 jobs queued)
Database:
- Read replicas for queries (config lookups)
- Write operations to primary (build status updates)
- Connection pooling (PgBouncer)
Storage:
- Object storage is inherently scalable
- CDN for ISO downloads (reduce egress costs)
- Lifecycle policies (delete ISOs older than 30 days if not accessed)
Build Order Implications for Development
Phase 1: Core Infrastructure
What to build: Database schema, basic API scaffolding, object storage setup. Why first: Foundation for all other components. No dependencies on complex logic. Duration estimate: 1-2 weeks
Phase 2: Configuration Management
What to build: Layer data models, CRUD endpoints, basic validation. Why second: Enables testing configuration storage before complex dependency resolution. Duration estimate: 1-2 weeks
Phase 3: Dependency Resolver (Simplified)
What to build: Basic conflict detection (direct conflicts only, no SAT solver yet). Why third: Provides early validation capability. Full SAT solver can wait. Duration estimate: 1 week
Phase 4: Overlay Engine
What to build: Layer merging logic, profile generation for archiso. Why fourth: Requires configuration data models from Phase 2. Produces profiles for builds. Duration estimate: 2 weeks
Phase 5: Build Queue + Workers
What to build: Celery setup, basic build task, worker orchestration. Why fifth: Depends on Overlay Engine for profile generation. Core value delivery. Duration estimate: 2-3 weeks
Phase 6: Frontend (Basic)
What to build: React UI for configuration (forms, no 3D yet), build submission. Why sixth: API must exist first. Provides usable interface for testing builds. Duration estimate: 2-3 weeks
Phase 7: Advanced Dependency Resolution
What to build: Full SAT solver integration, conflict explanations, alternatives. Why seventh: Complex feature. System works with basic validation from Phase 3. Duration estimate: 2-3 weeks
Phase 8: 3D Visualization
What to build: Three.js integration, layer visualization, visual debugging. Why eighth: Polish/differentiator feature. Core functionality works without it. Duration estimate: 3-4 weeks
Phase 9: Caching + Optimization
What to build: Build cache, package cache, performance tuning. Why ninth: Optimization after core features work. Requires usage data to tune. Duration estimate: 1-2 weeks
Total estimated duration: 17-23 weeks (4-6 months)
Critical Architectural Decisions
Decision 1: Message Broker (Redis vs RabbitMQ)
Recommendation: Start with Redis, migrate to RabbitMQ if reliability requirements increase.
Rationale:
- Redis: Lower latency, simpler setup, sufficient for <10K builds/day
- RabbitMQ: Higher reliability, message persistence, better for >100K builds/day
When to switch: If experiencing message loss or need guaranteed delivery.
Decision 2: Container-Based vs. Direct archiso
Recommendation: Use direct archiso (mkarchiso) on bare metal workers initially.
Rationale:
- Container-based (like Bazzite/Universal Blue) adds complexity (OCI image builds)
- Direct archiso is simpler, well-documented, less abstraction
- Can containerize workers later if isolation/portability becomes critical
When to reconsider: Multi-cloud deployment or need strong isolation between builds.
Decision 3: Monolithic vs. Microservices API
Recommendation: Start monolithic (single FastAPI app), split services if scaling demands.
Rationale:
- Monolith: Faster development, easier debugging, sufficient for <100K users
- Microservices: Adds operational complexity (service mesh, inter-service communication)
When to split: If specific services (e.g., dependency resolver) need independent scaling.
Decision 4: Real-Time Updates (WebSocket vs. SSE vs. Polling)
Recommendation: Use Server-Sent Events (SSE) for build progress.
Rationale:
- WebSocket: Bidirectional, but overkill for one-way progress updates
- SSE: Simpler, built-in reconnection, sufficient for progress streaming
- Polling: Wasteful, higher latency
Implementation:
@app.get("/api/v1/builds/{job_id}/stream")
async def stream_progress(job_id: str):
async def event_generator():
while True:
status = await get_job_status(job_id)
yield f"data: {json.dumps(status)}\n\n"
if status['state'] in ['SUCCESS', 'FAILURE']:
break
await asyncio.sleep(1)
return EventSourceResponse(event_generator())
Sources
Archiso & Build Systems:
- Archiso ArchWiki - MEDIUM confidence
- Custom Archiso Tutorial 2024 - MEDIUM confidence
- Bazzite ISO Build Process - MEDIUM confidence
- Universal Blue - MEDIUM confidence
Dependency Resolution:
- Libsolv SAT Solver - HIGH confidence (official)
- Version SAT Research - HIGH confidence
- Dependency Resolution Made Simple - MEDIUM confidence
- Package Conflict Resolution - LOW confidence
API & Queue Architecture:
- FastAPI Architecture Patterns 2026 - MEDIUM confidence
- Celery Documentation - HIGH confidence (official)
- Web-Queue-Worker Pattern - Azure - HIGH confidence (official)
- Design Distributed Job Scheduler - MEDIUM confidence
Storage & Database:
- PostgreSQL Schema Design Best Practices - HIGH confidence (official)
- OverlayFS Linux Kernel Docs - HIGH confidence (official)
Frontend:
- React Three Fiber Performance 2026 - MEDIUM confidence
- 3D Data Visualization with React - MEDIUM confidence
Confidence Assessment
- Overall Architecture: MEDIUM-HIGH - Based on established patterns (web-queue-worker, archiso) with modern 2026 practices
- Component Boundaries: HIGH - Clear separation of concerns, well-defined interfaces
- Build Process: HIGH - archiso is well-documented, multiple reference implementations
- Dependency Resolution: MEDIUM - SAT solver approach is proven, but integration complexity unknown
- Scalability: MEDIUM - Patterns are sound, but specific bottlenecks depend on usage patterns
- Frontend 3D: MEDIUM - Three.js + React patterns established, but performance depends on complexity