debate/.planning/research/ARCHITECTURE.md
Mikkel Georgsen c0ff95951e docs: add project research
Files:
- STACK.md: Technology stack recommendations (Python 3.12+, FastAPI, React 19+, Vite, Celery, PostgreSQL 18+)
- FEATURES.md: Feature landscape analysis (table stakes vs differentiators)
- ARCHITECTURE.md: Layered web-queue-worker architecture with SAT-based dependency resolution
- PITFALLS.md: Critical pitfalls and prevention strategies
- SUMMARY.md: Research synthesis with roadmap implications

Key findings:
- Stack: Modern 2026 async Python (FastAPI/Celery) + React/Three.js 3D frontend
- Architecture: Web-queue-worker pattern with sandboxed archiso builds
- Critical pitfall: Build sandboxing required from day one (CHAOS RAT AUR incident July 2025)

Recommended 9-phase roadmap: Infrastructure → Config → Dependency → Overlay → Build Queue → Frontend → Advanced SAT → 3D Viz → Optimization

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-25 02:07:11 +00:00

900 lines
38 KiB
Markdown

# Architecture Patterns: Linux Distribution Builder Platform
**Domain:** Web-based Linux distribution customization and ISO generation
**Researched:** 2026-01-25
**Confidence:** MEDIUM-HIGH
## Executive Summary
Linux distribution builder platforms combine web interfaces with backend build systems, overlaying configuration layers onto base distributions to create customized bootable ISOs. Modern architectures (2026) leverage container-based immutable systems, asynchronous task queues, and SAT-solver dependency resolution. The Debate platform architecture aligns with established patterns from archiso, Universal Blue/Bazzite, and web-queue-worker patterns.
## Recommended Architecture
The Debate platform should follow a **layered web-queue-worker architecture** with these tiers:
```
┌─────────────────────────────────────────────────────────────────┐
│ PRESENTATION LAYER │
│ React Frontend + Three.js 3D Visualization │
│ (User configuration interface, visual package builder) │
└────────────────────┬────────────────────────────────────────────┘
│ HTTP/WebSocket
┌────────────────────▼────────────────────────────────────────────┐
│ API LAYER │
│ FastAPI (async endpoints, validation, session management) │
└────────────────────┬────────────────────────────────────────────┘
┌───────────┼───────────┐
│ │ │
┌────────▼──────┐ ┌─▼─────────┐ ┌▼───────────────┐
│ Dependency │ │ Overlay │ │ Build Queue │
│ Resolver │ │ Engine │ │ Manager │
│ (SAT solver) │ │ (Layers) │ │ (Celery) │
└────────┬──────┘ └─┬─────────┘ └┬───────────────┘
│ │ │
└──────────┼─────────────┘
┌───────────────────▼─────────────────────────────────────────────┐
│ PERSISTENCE LAYER │
│ PostgreSQL (config, user data, build metadata) │
│ Object Storage (ISO cache, build artifacts) │
└──────────────────────────────────────────────────────────────────┘
┌───────────────────▼─────────────────────────────────────────────┐
│ BUILD EXECUTION LAYER │
│ Worker Nodes (Celery workers running archiso/mkarchiso) │
│ - Profile generation │
│ - Package installation to airootfs │
│ - Overlay application (OverlayFS concepts) │
│ - ISO generation with bootloader config │
└──────────────────────────────────────────────────────────────────┘
```
## Component Boundaries
### Core Components
| Component | Responsibility | Communicates With | State Management |
|-----------|---------------|-------------------|------------------|
| **React Frontend** | User interaction, 3D visualization, configuration UI | API Layer (REST/WS) | Client-side state (React context/Redux) |
| **Three.js Renderer** | 3D package/layer visualization, visual debugging | React components | Scene state separate from app state |
| **FastAPI Gateway** | Request routing, validation, auth, session mgmt | All backend services | Stateless (session in DB/cache) |
| **Dependency Resolver** | Package conflict detection, SAT solving, suggestions | API Layer, Database | Computation-only (no persistent state) |
| **Overlay Engine** | Layer composition, configuration merging, precedence | Build Queue, Database | Configuration versioning in DB |
| **Build Queue Manager** | Job scheduling, worker coordination, priority mgmt | Celery broker (Redis/RabbitMQ) | Queue state in message broker |
| **Celery Workers** | ISO build execution, archiso orchestration | Build Queue, Object Storage | Job state tracked in result backend |
| **PostgreSQL DB** | User data, build configs, metadata, audit logs | All backend services | ACID transactional storage |
| **Object Storage** | ISO caching, build artifacts, profile storage | Workers, API (download endpoint) | Immutable blob storage |
### Detailed Component Architecture
#### 1. Presentation Layer (React + Three.js)
**Purpose:** Provide visual interface for distribution customization with 3D representation of layers.
**Architecture Pattern:**
- **State Management:** Application state in React (configuration data) separate from scene state (3D objects). Changes flow from app state → scene rendering.
- **Performance:** Use React Three Fiber (r3f) for declarative Three.js integration. Target 60 FPS, <100MB memory.
- **Optimization:** InstancedMesh for repeated elements (packages), frustum culling, lazy loading with Suspense, GPU resource cleanup with dispose().
- **Model Format:** GLTF/GLB for 3D assets.
**Communication:**
- REST API for CRUD operations (save configuration, list builds)
- WebSocket for real-time build progress updates
- Server-Sent Events (SSE) alternative for progress streaming
**Sources:**
- [React Three Fiber vs. Three.js Performance Guide 2026](https://graffersid.com/react-three-fiber-vs-three-js/)
- [3D Data Visualization with React and Three.js](https://medium.com/cortico/3d-data-visualization-with-react-and-three-js-7272fb6de432)
#### 2. API Layer (FastAPI)
**Purpose:** Asynchronous API gateway handling request validation, routing, and coordination.
**Architecture Pattern:**
- **Layered Structure:** Separate routers (by domain), services (business logic), and data access layers.
- **Async I/O:** Use async/await throughout to prevent blocking on database/queue operations.
- **Middleware:** Custom logging, metrics, error handling middleware for observability.
- **Validation:** Pydantic models for request/response validation.
**Endpoints:**
- `/api/v1/configurations` - CRUD for user configurations
- `/api/v1/packages` - Package search, metadata, conflicts
- `/api/v1/builds` - Submit build, query status, download ISO
- `/api/v1/layers` - Layer definitions (Opening Statement, Platform, etc.)
- `/ws/builds/{build_id}` - WebSocket for build progress
**Performance:** FastAPI achieves 300% better performance than synchronous frameworks for I/O-bound operations (2026 benchmarks).
**Sources:**
- [Modern FastAPI Architecture Patterns 2026](https://medium.com/algomart/modern-fastapi-architecture-patterns-for-scalable-production-systems-41a87b165a8b)
- [FastAPI for Microservices 2025](https://talent500.com/blog/fastapi-microservices-python-api-design-patterns-2025/)
#### 3. Dependency Resolver
**Purpose:** Detect package conflicts, resolve dependencies, suggest alternatives using SAT solver algorithms.
**Architecture Pattern:**
- **SAT Solver Implementation:** Use libsolv (openSUSE) or similar SAT-based approach. Translate package dependencies to logic clauses, apply CDCL algorithm.
- **Algorithm:** Conflict-Driven Clause Learning (CDCL) solves NP-complete dependency problems in milliseconds for typical workloads.
- **Input:** Package selection across 5 layers (Opening Statement, Platform, Rhetoric, Talking Points, Closing Argument).
- **Output:** Valid package set or conflict report with suggested resolutions.
**Data Structure:**
```
Package Dependency Graph:
- Nodes: Packages (name, version, layer)
- Edges: Dependencies (requires, conflicts, provides, suggests)
- Constraints: Version ranges, mutual exclusions
```
**Integration:**
- Called synchronously from API during configuration validation
- Pre-compute common dependency sets for base layers (cache results)
- Asynchronous deep resolution for full build validation
**Sources:**
- [Libsolv SAT Solver](https://github.com/openSUSE/libsolv)
- [Version SAT Research](https://research.swtch.com/version-sat)
- [Dependency Resolution Made Simple](https://borretti.me/article/dependency-resolution-made-simple)
#### 4. Overlay Engine
**Purpose:** Manage layered configuration packages, applying merge strategies and precedence rules.
**Architecture Pattern:**
- **Layer Model:** 5 layers with defined precedence (Closing Argument > Talking Points > Rhetoric > Platform > Opening Statement).
- **OverlayFS Inspiration:** Conceptually similar to OverlayFS union mounting, where upper layers override lower layers.
- **Configuration Merging:** Files from higher layers replace/merge with lower layers based on merge strategy (replace, merge-append, merge-deep).
**Layer Structure:**
```
Layer Definition:
- id: unique identifier
- name: user-facing name (e.g., "Platform")
- order: precedence (1=lowest, 5=highest)
- packages: list of package selections
- files: custom files to overlay
- merge_strategy: how to handle conflicts
```
**Merge Strategies:**
- **Replace:** Higher layer file completely replaces lower
- **Merge-Append:** Concatenate files (e.g., package lists)
- **Merge-Deep:** Smart merge (e.g., JSON/YAML key merging)
**Output:** Unified archiso profile with:
- `packages.x86_64` (merged package list)
- `airootfs/` directory (merged filesystem overlay)
- `profiledef.sh` (combined metadata)
**Sources:**
- [OverlayFS Linux Kernel Documentation](https://docs.kernel.org/filesystems/overlayfs.html)
- [OverlayFS ArchWiki](https://wiki.archlinux.org/title/Overlay_filesystem)
#### 5. Build Queue Manager (Celery)
**Purpose:** Distributed task queue for asynchronous ISO build jobs with priority scheduling.
**Architecture Pattern:**
- **Web-Queue-Worker Pattern:** Web frontend → Message queue → Worker pool
- **Message Broker:** Redis (low latency) or RabbitMQ (high reliability) for job queue
- **Result Backend:** Redis or PostgreSQL for job status/results
- **Worker Pool:** Multiple Celery workers (one per build server core for CPU-bound builds)
**Job Types:**
1. **Quick Validation:** Dependency resolution (seconds) - High priority
2. **Full Build:** ISO generation (minutes) - Normal priority
3. **Cache Warming:** Pre-build common configurations - Low priority
**Scheduling:**
- **Priority Queue:** User-initiated builds > automated cache warming
- **Rate Limiting:** Prevent queue flooding, enforce user quotas
- **Retry Logic:** Automatic retry with exponential backoff for transient failures
- **Timeout:** Per-job timeout (e.g., 30 min max for build)
**Coordinator Pattern:**
- Single coordinator manages job assignment and worker health
- Leader election for coordinator HA (if scaled beyond single instance)
**Monitoring:**
- Job state transitions logged to PostgreSQL
- Metrics: queue depth, worker utilization, average build time
- Dead letter queue for failed jobs requiring manual investigation
**Sources:**
- [Celery Distributed Task Queue](https://docs.celeryq.dev/)
- [Design Distributed Job Scheduler](https://www.systemdesignhandbook.com/guides/design-a-distributed-job-scheduler/)
- [Web-Queue-Worker Architecture - Azure](https://learn.microsoft.com/en-us/azure/architecture/guide/architecture-styles/web-queue-worker)
#### 6. Build Execution Workers (archiso-based)
**Purpose:** Execute ISO generation using archiso (mkarchiso) with custom profiles.
**Architecture Pattern:**
- **Profile-Based Build:** Generate temporary archiso profile per build job
- **Isolation:** Each build runs in isolated environment (separate working directory)
- **Stages:** Profile generation → Package installation → Customization → ISO creation
**Build Process Flow:**
```
1. Profile Generation (Overlay Engine output)
├── Create temp directory
├── Write packages.x86_64 (merged package list)
├── Write profiledef.sh (metadata, permissions)
├── Copy airootfs/ overlay files
└── Configure bootloaders (syslinux, grub, systemd-boot)
2. Package Installation
├── mkarchiso downloads packages (pacman cache)
├── Install to work_dir/x86_64/airootfs
└── Apply package configurations
3. Customization (customize_airootfs.sh)
├── Enable systemd services
├── Apply user-specific configs
├── Run post-install scripts
└── Set permissions
4. ISO Generation
├── Create kernel and initramfs images
├── Build squashfs filesystem
├── Assemble bootable ISO
├── Generate checksums
└── Move to output directory
5. Post-Processing
├── Upload ISO to object storage
├── Update database (build status, ISO location)
├── Cache metadata for reuse
└── Clean up working directory
```
**Worker Configuration:**
- **Resource Limits:** 1 build per worker (CPU/memory intensive)
- **Concurrency:** 6 workers max (6-core build server)
- **Working Directory:** `/tmp/archiso-tmp-{job_id}` (cleaned after completion with -r flag)
- **Output Directory:** Temporary → Object storage → Local cleanup
**Optimizations:**
- **Package Cache:** Shared pacman cache across workers (prevent redundant downloads)
- **Layer Caching:** Cache common base layers (Opening Statement variations)
- **Incremental Builds:** Detect unchanged layers, reuse previous airootfs where possible
**Sources:**
- [Archiso ArchWiki](https://wiki.archlinux.org/title/Archiso)
- [Custom Archiso Tutorial](https://serverless.industries/2024/12/30/custom-archiso.en.html)
#### 7. Persistence Layer (PostgreSQL + Object Storage)
**Purpose:** Store configuration data, build metadata, and build artifacts.
**PostgreSQL Schema Design:**
```sql
-- User configurations
CREATE SCHEMA configurations;
CREATE TABLE configurations.user_configs (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
user_id UUID NOT NULL,
name VARCHAR(255) NOT NULL,
description TEXT,
created_at TIMESTAMP DEFAULT NOW(),
updated_at TIMESTAMP DEFAULT NOW()
);
CREATE TABLE configurations.layers (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
config_id UUID REFERENCES configurations.user_configs(id),
layer_type VARCHAR(50) NOT NULL, -- opening_statement, platform, rhetoric, etc.
layer_order INT NOT NULL,
merge_strategy VARCHAR(50) DEFAULT 'replace'
);
CREATE TABLE configurations.layer_packages (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
layer_id UUID REFERENCES configurations.layers(id),
package_name VARCHAR(255) NOT NULL,
package_version VARCHAR(50),
required BOOLEAN DEFAULT TRUE
);
CREATE TABLE configurations.layer_files (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
layer_id UUID REFERENCES configurations.layers(id),
file_path VARCHAR(1024) NOT NULL, -- path in airootfs
file_content TEXT, -- for small configs
file_storage_url VARCHAR(2048), -- for large files in object storage
permissions VARCHAR(4) DEFAULT '0644'
);
-- Build management
CREATE SCHEMA builds;
CREATE TABLE builds.build_jobs (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
config_id UUID REFERENCES configurations.user_configs(id),
status VARCHAR(50) NOT NULL, -- queued, running, success, failed
priority INT DEFAULT 5,
started_at TIMESTAMP,
completed_at TIMESTAMP,
iso_url VARCHAR(2048), -- object storage location
iso_checksum VARCHAR(128),
error_message TEXT,
build_log_url VARCHAR(2048)
);
CREATE TABLE builds.build_cache (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
config_hash VARCHAR(64) UNIQUE NOT NULL, -- hash of layer config
iso_url VARCHAR(2048),
created_at TIMESTAMP DEFAULT NOW(),
last_accessed TIMESTAMP DEFAULT NOW(),
access_count INT DEFAULT 0
);
-- Package metadata
CREATE SCHEMA packages;
CREATE TABLE packages.package_metadata (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
name VARCHAR(255) UNIQUE NOT NULL,
description TEXT,
repository VARCHAR(100), -- core, extra, community, aur
version VARCHAR(50),
dependencies JSONB, -- {requires: [], conflicts: [], provides: []}
last_updated TIMESTAMP DEFAULT NOW()
);
```
**Schema Organization Best Practices (2026):**
- Separate schemas for functional areas (configurations, builds, packages)
- Schema-level access control for security isolation
- CI/CD integration with migration tools (Flyway, Alembic)
- Indexes on frequently queried fields (config_id, status, config_hash)
**Object Storage:**
- **Purpose:** Store ISOs (large files, 1-4GB), build logs, custom overlay files
- **Technology:** S3-compatible (AWS S3, MinIO, Cloudflare R2)
- **Structure:**
- `/isos/{build_id}.iso` - Generated ISOs
- `/logs/{build_id}.log` - Build logs
- `/overlays/{layer_id}/{file_path}` - Custom files too large for DB
- `/cache/{config_hash}.iso` - Cached ISOs for reuse
**Sources:**
- [PostgreSQL Schema Design Best Practices 2026](https://wiki.postgresql.org/wiki/Database_Schema_Recommendations_for_an_Application)
- [SQL Database Fundamentals 2026](https://www.nucamp.co/blog/sql-and-database-fundamentals-in-2026-queries-design-and-postgresql-essentials)
## Data Flow
### Configuration Creation Flow
```
User (Frontend)
↓ (1) Create/Edit configuration
API Layer (Validation)
↓ (2) Validate input
Dependency Resolver
↓ (3) Check conflicts
↓ (4) Return validation result
API Layer
↓ (5) Save configuration
PostgreSQL (configurations schema)
↓ (6) Return config_id
Frontend (Display confirmation)
```
### Build Submission Flow
```
User (Frontend)
↓ (1) Submit build request
API Layer
↓ (2) Check cache (config hash)
PostgreSQL (build_cache)
├─→ (3a) Cache hit: return cached ISO URL
└─→ (3b) Cache miss: create build job
Build Queue Manager (Celery)
↓ (4) Enqueue job with priority
Message Broker (Redis/RabbitMQ)
↓ (5) Job dispatched to worker
Celery Worker
↓ (6a) Fetch configuration from DB
↓ (6b) Generate archiso profile (Overlay Engine)
↓ (6c) Execute mkarchiso
↓ (6d) Upload ISO to object storage
↓ (6e) Update build status in DB
PostgreSQL + Object Storage
↓ (7) Job complete
API Layer (WebSocket)
↓ (8) Notify user
Frontend (Display download link)
```
### Real-Time Progress Updates Flow
```
Celery Worker
↓ (1) Emit progress events during build
↓ (e.g., "downloading packages", "generating ISO")
Celery Result Backend
↓ (2) Store progress state
API Layer (WebSocket handler)
↓ (3) Poll/subscribe to job progress
↓ (4) Push updates to client
Frontend (WebSocket listener)
↓ (5) Update UI progress bar
```
## Patterns to Follow
### Pattern 1: Layered Configuration Precedence
**What:** Higher layers override lower layers with defined merge strategies.
**When:** User customizes configuration across multiple layers (Platform, Rhetoric, etc.).
**Implementation:**
```python
class OverlayEngine:
def merge_layers(self, layers: List[Layer]) -> Profile:
"""Merge layers from lowest to highest precedence."""
sorted_layers = sorted(layers, key=lambda l: l.order)
profile = Profile()
for layer in sorted_layers:
profile = self.apply_layer(profile, layer)
return profile
def apply_layer(self, profile: Profile, layer: Layer) -> Profile:
"""Apply layer based on merge strategy."""
if layer.merge_strategy == "replace":
profile.files.update(layer.files) # Overwrite
elif layer.merge_strategy == "merge-append":
profile.packages.extend(layer.packages) # Append
elif layer.merge_strategy == "merge-deep":
profile.config = deep_merge(profile.config, layer.config)
return profile
```
**Source:** OverlayFS union mount concepts applied to configuration management.
### Pattern 2: SAT-Based Dependency Resolution
**What:** Translate package dependencies to boolean satisfiability problem, solve with CDCL algorithm.
**When:** User adds package to configuration, system detects conflicts.
**Implementation:**
```python
class DependencyResolver:
def resolve(self, packages: List[Package]) -> Resolution:
"""Resolve dependencies using SAT solver."""
clauses = self.build_clauses(packages)
solver = SATSolver()
result = solver.solve(clauses)
if result.satisfiable:
return Resolution(success=True, packages=result.model)
else:
conflicts = self.explain_conflicts(result.unsat_core)
alternatives = self.suggest_alternatives(conflicts)
return Resolution(success=False, conflicts=conflicts,
alternatives=alternatives)
def build_clauses(self, packages: List[Package]) -> List[Clause]:
"""Convert dependency graph to CNF clauses."""
clauses = []
for pkg in packages:
# If package selected, all dependencies must be selected
for dep in pkg.requires:
clauses.append(Implies(pkg, dep))
# If package selected, no conflicts can be selected
for conflict in pkg.conflicts:
clauses.append(Not(And(pkg, conflict)))
return clauses
```
**Source:** [Libsolv implementation patterns](https://github.com/openSUSE/libsolv)
### Pattern 3: Asynchronous Build Queue with Progress Tracking
**What:** Submit long-running build jobs to queue, track progress, notify on completion.
**When:** User submits build request (ISO generation takes minutes).
**Implementation:**
```python
# API endpoint
@app.post("/api/v1/builds")
async def submit_build(config_id: UUID, background_tasks: BackgroundTasks):
# Check cache first
cache_key = compute_hash(config_id)
cached = await check_cache(cache_key)
if cached:
return {"status": "cached", "iso_url": cached.iso_url}
# Enqueue build job
job = build_iso.apply_async(
args=[config_id],
priority=5,
task_id=str(uuid.uuid4())
)
return {"status": "queued", "job_id": job.id}
# Celery task
@celery.task(bind=True)
def build_iso(self, config_id: UUID):
self.update_state(state='DOWNLOADING', meta={'progress': 10})
# Generate profile
profile = overlay_engine.generate_profile(config_id)
self.update_state(state='BUILDING', meta={'progress': 30})
# Run mkarchiso
result = subprocess.run([
'mkarchiso', '-v', '-r',
'-w', f'/tmp/archiso-{self.request.id}',
'-o', '/tmp/output',
profile.path
])
self.update_state(state='UPLOADING', meta={'progress': 80})
# Upload to object storage
iso_url = upload_iso(f'/tmp/output/archlinux.iso')
return {"iso_url": iso_url, "progress": 100}
```
**Source:** [Celery best practices](https://docs.celeryq.dev/), [Web-Queue-Worker pattern](https://learn.microsoft.com/en-us/azure/architecture/guide/architecture-styles/web-queue-worker)
### Pattern 4: Cache-First Build Strategy
**What:** Hash configuration, check cache before building, reuse identical ISOs.
**When:** User submits build that may have been built previously.
**Implementation:**
```python
def compute_config_hash(config_id: UUID) -> str:
"""Create deterministic hash of configuration."""
config = db.query(Config).get(config_id)
# Include all layers, packages, files in hash
hash_input = {
"layers": sorted([
{
"type": layer.type,
"packages": sorted(layer.packages),
"files": sorted([
{"path": f.path, "content_hash": hash(f.content)}
for f in layer.files
])
}
for layer in config.layers
], key=lambda x: x["type"])
}
return hashlib.sha256(
json.dumps(hash_input, sort_keys=True).encode()
).hexdigest()
async def check_cache(config_hash: str) -> Optional[CachedBuild]:
"""Check if ISO exists for this configuration."""
cached = await db.query(BuildCache).filter_by(
config_hash=config_hash
).first()
if cached and cached.iso_exists():
# Update access metadata
cached.last_accessed = datetime.now()
cached.access_count += 1
await db.commit()
return cached
return None
```
**Benefit:** Reduces build time from minutes to seconds for repeated configurations. Critical for popular base configurations (e.g., "KDE Desktop with development tools").
## Anti-Patterns to Avoid
### Anti-Pattern 1: Blocking API Calls During Build
**What:** Synchronously waiting for ISO build to complete in API endpoint.
**Why bad:** Ties up API worker for minutes, prevents handling other requests, poor user experience with timeout risks.
**Instead:** Use asynchronous task queue (Celery) with WebSocket/SSE for progress updates. API returns immediately with job_id, frontend polls or subscribes to updates.
**Example:**
```python
# BAD: Blocking build
@app.post("/builds")
def build(config_id):
iso = generate_iso(config_id) # Takes 10 minutes!
return {"iso_url": iso}
# GOOD: Async queue
@app.post("/builds")
async def build(config_id):
job = build_iso.delay(config_id)
return {"job_id": job.id, "status": "queued"}
```
### Anti-Pattern 2: Duplicating State Between React and Three.js
**What:** Maintaining separate state trees for application data and 3D scene, manually syncing.
**Why bad:** State gets out of sync, bugs from inconsistent data, complexity in update logic.
**Instead:** Single source of truth in React state. Scene derives from state. User interactions → dispatch actions → update state → scene re-renders.
**Example:**
```javascript
// BAD: Separate state
const [appState, setAppState] = useState({packages: []});
const [sceneObjects, setSceneObjects] = useState([]);
// GOOD: Scene derives from app state
const [config, setConfig] = useState({packages: []});
function Scene({packages}) {
return packages.map(pkg => <PackageMesh key={pkg.id} {...pkg} />);
}
```
**Source:** [React Three Fiber state management best practices](https://medium.com/cortico/3d-data-visualization-with-react-and-three-js-7272fb6de432)
### Anti-Pattern 3: Storing Large Files in PostgreSQL
**What:** Storing ISO files (1-4GB) or build logs (megabytes) as BYTEA in PostgreSQL.
**Why bad:** Database bloat, slow backups, memory pressure, poor performance for large blob operations.
**Instead:** Store large files in object storage (S3/MinIO), keep URLs/metadata in PostgreSQL.
**Example:**
```sql
-- BAD: ISO in database
CREATE TABLE builds (
id UUID PRIMARY KEY,
iso_data BYTEA -- 2GB blob!
);
-- GOOD: URL reference
CREATE TABLE builds (
id UUID PRIMARY KEY,
iso_url VARCHAR(2048), -- s3://bucket/isos/{id}.iso
iso_checksum VARCHAR(128),
iso_size_bytes BIGINT
);
```
### Anti-Pattern 4: Running Multiple Builds Per Worker Concurrently
**What:** Allowing a single Celery worker to process multiple ISO builds in parallel.
**Why bad:** ISO generation is CPU and memory intensive (compressing filesystem, creating squashfs). Running multiple builds causes resource contention, thrashing, and OOM kills.
**Instead:** Configure Celery workers with concurrency=1 for build tasks. Run one build per worker. Scale horizontally with multiple workers.
**Example:**
```bash
# BAD: Multiple concurrent builds
celery -A app worker --concurrency=4 # 4 builds at once on 6-core machine
# GOOD: One build per worker
celery -A app worker --concurrency=1 -Q builds # Start 6 workers for 6 cores
```
### Anti-Pattern 5: No Dependency Validation Until Build Time
**What:** Allowing users to save configurations without checking package conflicts, discovering issues during ISO build.
**Why bad:** Wastes build resources (minutes of CPU time), poor user experience (delayed error feedback), difficult to debug which package caused failure.
**Instead:** Run dependency resolution in API layer during configuration save/update. Provide immediate feedback with conflict explanations and alternatives.
**Example:**
```python
# BAD: Validate during build
@celery.task
def build_iso(config_id):
packages = load_packages(config_id)
result = resolve_dependencies(packages) # Fails here after queueing!
if not result.valid:
raise BuildError("Conflicts detected")
# GOOD: Validate on save
@app.post("/configs")
async def save_config(config: ConfigInput):
resolution = dependency_resolver.resolve(config.packages)
if not resolution.valid:
return {"error": "conflicts", "details": resolution.conflicts}
await db.save(config)
return {"success": True}
```
## Scalability Considerations
| Concern | At 100 users | At 10K users | At 1M users |
|---------|--------------|--------------|-------------|
| **API Layer** | Single FastAPI instance | Multiple instances behind load balancer | Auto-scaling group, CDN for static assets |
| **Build Queue** | Single Redis broker | Redis cluster or RabbitMQ | Kafka for high-throughput messaging |
| **Workers** | 1 build server (6 cores) | 3-5 build servers | Auto-scaling worker pool, spot instances |
| **Database** | Single PostgreSQL instance | Primary + read replicas | Sharded PostgreSQL or distributed SQL (CockroachDB) |
| **Storage** | Local MinIO | S3-compatible with CDN | Multi-region S3 with CloudFront |
| **Caching** | In-memory cache | Redis cache cluster | Multi-tier cache (Redis + CDN) |
### Horizontal Scaling Strategy
**API Layer:**
- Stateless FastAPI instances (session in DB/Redis)
- Load balancer (Nginx, HAProxy, AWS ALB)
- Auto-scaling based on CPU/request latency
**Build Workers:**
- Independent Celery workers connecting to shared broker
- Each worker runs 1 build at a time
- Scale workers based on queue depth (add workers when >10 jobs queued)
**Database:**
- Read replicas for queries (config lookups)
- Write operations to primary (build status updates)
- Connection pooling (PgBouncer)
**Storage:**
- Object storage is inherently scalable
- CDN for ISO downloads (reduce egress costs)
- Lifecycle policies (delete ISOs older than 30 days if not accessed)
## Build Order Implications for Development
### Phase 1: Core Infrastructure
**What to build:** Database schema, basic API scaffolding, object storage setup.
**Why first:** Foundation for all other components. No dependencies on complex logic.
**Duration estimate:** 1-2 weeks
### Phase 2: Configuration Management
**What to build:** Layer data models, CRUD endpoints, basic validation.
**Why second:** Enables testing configuration storage before complex dependency resolution.
**Duration estimate:** 1-2 weeks
### Phase 3: Dependency Resolver (Simplified)
**What to build:** Basic conflict detection (direct conflicts only, no SAT solver yet).
**Why third:** Provides early validation capability. Full SAT solver can wait.
**Duration estimate:** 1 week
### Phase 4: Overlay Engine
**What to build:** Layer merging logic, profile generation for archiso.
**Why fourth:** Requires configuration data models from Phase 2. Produces profiles for builds.
**Duration estimate:** 2 weeks
### Phase 5: Build Queue + Workers
**What to build:** Celery setup, basic build task, worker orchestration.
**Why fifth:** Depends on Overlay Engine for profile generation. Core value delivery.
**Duration estimate:** 2-3 weeks
### Phase 6: Frontend (Basic)
**What to build:** React UI for configuration (forms, no 3D yet), build submission.
**Why sixth:** API must exist first. Provides usable interface for testing builds.
**Duration estimate:** 2-3 weeks
### Phase 7: Advanced Dependency Resolution
**What to build:** Full SAT solver integration, conflict explanations, alternatives.
**Why seventh:** Complex feature. System works with basic validation from Phase 3.
**Duration estimate:** 2-3 weeks
### Phase 8: 3D Visualization
**What to build:** Three.js integration, layer visualization, visual debugging.
**Why eighth:** Polish/differentiator feature. Core functionality works without it.
**Duration estimate:** 3-4 weeks
### Phase 9: Caching + Optimization
**What to build:** Build cache, package cache, performance tuning.
**Why ninth:** Optimization after core features work. Requires usage data to tune.
**Duration estimate:** 1-2 weeks
**Total estimated duration:** 17-23 weeks (4-6 months)
## Critical Architectural Decisions
### Decision 1: Message Broker (Redis vs RabbitMQ)
**Recommendation:** Start with Redis, migrate to RabbitMQ if reliability requirements increase.
**Rationale:**
- Redis: Lower latency, simpler setup, sufficient for <10K builds/day
- RabbitMQ: Higher reliability, message persistence, better for >100K builds/day
**When to switch:** If experiencing message loss or need guaranteed delivery.
### Decision 2: Container-Based vs. Direct archiso
**Recommendation:** Use direct archiso (mkarchiso) on bare metal workers initially.
**Rationale:**
- Container-based (like Bazzite/Universal Blue) adds complexity (OCI image builds)
- Direct archiso is simpler, well-documented, less abstraction
- Can containerize workers later if isolation/portability becomes critical
**When to reconsider:** Multi-cloud deployment or need strong isolation between builds.
### Decision 3: Monolithic vs. Microservices API
**Recommendation:** Start monolithic (single FastAPI app), split services if scaling demands.
**Rationale:**
- Monolith: Faster development, easier debugging, sufficient for <100K users
- Microservices: Adds operational complexity (service mesh, inter-service communication)
**When to split:** If specific services (e.g., dependency resolver) need independent scaling.
### Decision 4: Real-Time Updates (WebSocket vs. SSE vs. Polling)
**Recommendation:** Use Server-Sent Events (SSE) for build progress.
**Rationale:**
- WebSocket: Bidirectional, but overkill for one-way progress updates
- SSE: Simpler, built-in reconnection, sufficient for progress streaming
- Polling: Wasteful, higher latency
**Implementation:**
```python
@app.get("/api/v1/builds/{job_id}/stream")
async def stream_progress(job_id: str):
async def event_generator():
while True:
status = await get_job_status(job_id)
yield f"data: {json.dumps(status)}\n\n"
if status['state'] in ['SUCCESS', 'FAILURE']:
break
await asyncio.sleep(1)
return EventSourceResponse(event_generator())
```
## Sources
**Archiso & Build Systems:**
- [Archiso ArchWiki](https://wiki.archlinux.org/title/Archiso) - MEDIUM confidence
- [Custom Archiso Tutorial 2024](https://serverless.industries/2024/12/30/custom-archiso.en.html) - MEDIUM confidence
- [Bazzite ISO Build Process](https://deepwiki.com/ublue-os/bazzite/2.6-iso-build-process) - MEDIUM confidence
- [Universal Blue](https://universal-blue.org/) - MEDIUM confidence
**Dependency Resolution:**
- [Libsolv SAT Solver](https://github.com/openSUSE/libsolv) - HIGH confidence (official)
- [Version SAT Research](https://research.swtch.com/version-sat) - HIGH confidence
- [Dependency Resolution Made Simple](https://borretti.me/article/dependency-resolution-made-simple) - MEDIUM confidence
- [Package Conflict Resolution](https://distropack.dev/Blog/Post?slug=package-conflict-resolution-handling-conflicting-packages) - LOW confidence
**API & Queue Architecture:**
- [FastAPI Architecture Patterns 2026](https://medium.com/algomart/modern-fastapi-architecture-patterns-for-scalable-production-systems-41a87b165a8b) - MEDIUM confidence
- [Celery Documentation](https://docs.celeryq.dev/) - HIGH confidence (official)
- [Web-Queue-Worker Pattern - Azure](https://learn.microsoft.com/en-us/azure/architecture/guide/architecture-styles/web-queue-worker) - HIGH confidence (official)
- [Design Distributed Job Scheduler](https://www.systemdesignhandbook.com/guides/design-a-distributed-job-scheduler/) - MEDIUM confidence
**Storage & Database:**
- [PostgreSQL Schema Design Best Practices](https://wiki.postgresql.org/wiki/Database_Schema_Recommendations_for_an_Application) - HIGH confidence (official)
- [OverlayFS Linux Kernel Docs](https://docs.kernel.org/filesystems/overlayfs.html) - HIGH confidence (official)
**Frontend:**
- [React Three Fiber Performance 2026](https://graffersid.com/react-three-fiber-vs-three-js/) - MEDIUM confidence
- [3D Data Visualization with React](https://medium.com/cortico/3d-data-visualization-with-react-and-three-js-7272fb6de432) - MEDIUM confidence
## Confidence Assessment
- **Overall Architecture:** MEDIUM-HIGH - Based on established patterns (web-queue-worker, archiso) with modern 2026 practices
- **Component Boundaries:** HIGH - Clear separation of concerns, well-defined interfaces
- **Build Process:** HIGH - archiso is well-documented, multiple reference implementations
- **Dependency Resolution:** MEDIUM - SAT solver approach is proven, but integration complexity unknown
- **Scalability:** MEDIUM - Patterns are sound, but specific bottlenecks depend on usage patterns
- **Frontend 3D:** MEDIUM - Three.js + React patterns established, but performance depends on complexity