docs(01): research phase domain
Phase 01: Core Infrastructure & Security - Standard stack identified (FastAPI, PostgreSQL, Caddy, systemd-nspawn) - Architecture patterns documented (async DB, sandboxing, deterministic builds) - Pitfalls catalogued (unsandboxed builds, non-determinism, connection pooling) - Security-first approach with production-grade examples
This commit is contained in:
parent
a958beeac5
commit
d07a204cd5
1 changed files with 981 additions and 0 deletions
981
.planning/phases/01-core-infrastructure-security/01-RESEARCH.md
Normal file
981
.planning/phases/01-core-infrastructure-security/01-RESEARCH.md
Normal file
|
|
@ -0,0 +1,981 @@
|
||||||
|
# Phase 1: Core Infrastructure & Security - Research
|
||||||
|
|
||||||
|
**Researched:** 2026-01-25
|
||||||
|
**Domain:** Production backend infrastructure with security-hardened build environment
|
||||||
|
**Confidence:** HIGH
|
||||||
|
|
||||||
|
## Summary
|
||||||
|
|
||||||
|
Phase 1 establishes the foundation for a secure, production-ready Linux distribution builder platform. The core challenge is building a FastAPI backend that serves user requests quickly (<200ms p95 latency) while orchestrating potentially dangerous ISO builds in isolated sandboxes. The critical security requirement is preventing malicious user-submitted packages from compromising the build infrastructure—a real threat evidenced by the July 2025 CHAOS RAT malware distributed through AUR packages.
|
||||||
|
|
||||||
|
The standard approach for 2026 combines proven technologies: FastAPI for async API performance, PostgreSQL 18 for data persistence, Caddy for automatic HTTPS, and systemd-nspawn for build sandboxing. The deterministic build requirement (same configuration → identical ISO hash) demands careful environment control using SOURCE_DATE_EPOCH and fixed locales. This phase must implement security-first architecture because retrofitting sandboxing and reproducibility is nearly impossible.
|
||||||
|
|
||||||
|
**Primary recommendation:** Implement systemd-nspawn sandboxing with network whitelisting from day one, use SOURCE_DATE_EPOCH for deterministic builds, and configure FastAPI with production-grade security middleware (rate limiting, CSRF protection) before handling user traffic.
|
||||||
|
|
||||||
|
## Standard Stack
|
||||||
|
|
||||||
|
### Core Infrastructure
|
||||||
|
|
||||||
|
| Library | Version | Purpose | Why Standard |
|
||||||
|
|---------|---------|---------|--------------|
|
||||||
|
| FastAPI | 0.128.0+ | Async web framework | Industry standard for Python APIs; 300% better performance than sync frameworks for I/O-bound operations. Native async/await, Pydantic validation, auto-generated OpenAPI docs. |
|
||||||
|
| Uvicorn | 0.30+ | ASGI server | Production-grade async server. Recent versions include built-in multi-process supervisor (`--workers N`), eliminating Gunicorn need for CPU-bound workloads. |
|
||||||
|
| PostgreSQL | 18.1+ | Primary database | Latest major release (Nov 2025). PG 13 EOL. Async support via asyncpg. ACID guarantees for configuration versioning. |
|
||||||
|
| asyncpg | 0.28.x | PostgreSQL driver | High-performance async Postgres driver. 3-5x faster than psycopg2 in benchmarks. Note: Pin <0.29.0 to avoid SQLAlchemy 2.0.x compatibility issues. |
|
||||||
|
| SQLAlchemy | 2.0+ | ORM & query builder | Async support via `create_async_engine`. Superior type hints in 2.0. Use `AsyncAdaptedQueuePool` for connection pooling. |
|
||||||
|
| Alembic | Latest | Database migrations | Official SQLAlchemy migration tool. Essential for schema evolution without downtime. |
|
||||||
|
|
||||||
|
### Security & Infrastructure
|
||||||
|
|
||||||
|
| Library | Version | Purpose | Why Standard |
|
||||||
|
|---------|---------|---------|--------------|
|
||||||
|
| Caddy | 2.x+ | Reverse proxy | Automatic HTTPS via Let's Encrypt. REST API for dynamic route management (critical for ISO download endpoints). Simpler than Nginx for programmatic configuration. |
|
||||||
|
| systemd-nspawn | Latest | Build sandbox | Lightweight container for process isolation. Namespace-based security: read-only `/sys`, `/proc/sys`. Network isolation via `--private-network`. |
|
||||||
|
| Pydantic | 2.12.5+ | Data validation | Required by FastAPI (>=2.7.0). V1 deprecated. V2 offers better build-time performance and type safety. |
|
||||||
|
| pydantic-settings | Latest | Config management | Load configuration from environment variables with type validation. Never commit secrets. |
|
||||||
|
|
||||||
|
### Security Middleware
|
||||||
|
|
||||||
|
| Library | Version | Purpose | When to Use |
|
||||||
|
|---------|---------|---------|-------------|
|
||||||
|
| slowapi | Latest | Rate limiting | Redis-backed rate limiter. Prevents API abuse. Apply per-IP for anonymous, per-user for authenticated. |
|
||||||
|
| fastapi-csrf-protect | Latest | CSRF protection | Double Submit Cookie pattern. Essential for form submissions. Combine with strict CORS for API-only endpoints. |
|
||||||
|
| python-multipart | Latest | Form parsing | Required for CSRF token handling in form data. FastAPI dependency for file uploads. |
|
||||||
|
|
||||||
|
### Development Tools
|
||||||
|
|
||||||
|
| Library | Version | Purpose | When to Use |
|
||||||
|
|---------|---------|---------|-------------|
|
||||||
|
| Ruff | Latest | Linter & formatter | Replaces Black, isort, flake8. Rust-based, blazing fast. Zero config needed. Constraint: Use ruff, NOT black/flake8/isort. |
|
||||||
|
| mypy | Latest | Type checker | Static type checking. Essential with Pydantic and FastAPI. Strict mode recommended. |
|
||||||
|
| pytest | Latest | Testing framework | Async support via pytest-asyncio. Industry standard. |
|
||||||
|
| httpx | Latest | HTTP client | Async HTTP client for testing FastAPI endpoints. |
|
||||||
|
|
||||||
|
### Installation
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Install uv (package manager)
|
||||||
|
curl -LsSf https://astral.sh/uv/install.sh | sh
|
||||||
|
|
||||||
|
# Create virtual environment
|
||||||
|
uv venv
|
||||||
|
source .venv/bin/activate
|
||||||
|
|
||||||
|
# Core dependencies
|
||||||
|
uv pip install \
|
||||||
|
fastapi[all]==0.128.0 \
|
||||||
|
uvicorn[standard]>=0.30.0 \
|
||||||
|
sqlalchemy[asyncio]>=2.0.0 \
|
||||||
|
"asyncpg<0.29.0" \
|
||||||
|
alembic \
|
||||||
|
pydantic>=2.12.0 \
|
||||||
|
pydantic-settings \
|
||||||
|
slowapi \
|
||||||
|
fastapi-csrf-protect \
|
||||||
|
python-multipart
|
||||||
|
|
||||||
|
# Development dependencies
|
||||||
|
uv pip install -D \
|
||||||
|
pytest \
|
||||||
|
pytest-asyncio \
|
||||||
|
pytest-cov \
|
||||||
|
httpx \
|
||||||
|
ruff \
|
||||||
|
mypy
|
||||||
|
```
|
||||||
|
|
||||||
|
## Architecture Patterns
|
||||||
|
|
||||||
|
### Recommended Project Structure
|
||||||
|
|
||||||
|
```
|
||||||
|
backend/
|
||||||
|
├── app/
|
||||||
|
│ ├── api/
|
||||||
|
│ │ ├── v1/
|
||||||
|
│ │ │ ├── endpoints/
|
||||||
|
│ │ │ │ ├── auth.py
|
||||||
|
│ │ │ │ ├── builds.py
|
||||||
|
│ │ │ │ └── health.py
|
||||||
|
│ │ │ └── router.py
|
||||||
|
│ │ └── deps.py # Dependency injection
|
||||||
|
│ ├── core/
|
||||||
|
│ │ ├── config.py # pydantic-settings configuration
|
||||||
|
│ │ ├── security.py # Auth, CSRF, rate limiting
|
||||||
|
│ │ └── db.py # Database session management
|
||||||
|
│ ├── db/
|
||||||
|
│ │ ├── base.py # SQLAlchemy Base
|
||||||
|
│ │ ├── models/ # Database models
|
||||||
|
│ │ └── session.py # AsyncSession factory
|
||||||
|
│ ├── schemas/ # Pydantic request/response models
|
||||||
|
│ ├── services/ # Business logic
|
||||||
|
│ │ └── build.py # Build orchestration (Phase 1: stub)
|
||||||
|
│ └── main.py
|
||||||
|
├── alembic/ # Database migrations
|
||||||
|
│ ├── versions/
|
||||||
|
│ └── env.py
|
||||||
|
├── tests/
|
||||||
|
│ ├── api/
|
||||||
|
│ ├── unit/
|
||||||
|
│ └── conftest.py
|
||||||
|
├── Dockerfile
|
||||||
|
├── pyproject.toml
|
||||||
|
└── alembic.ini
|
||||||
|
```
|
||||||
|
|
||||||
|
### Pattern 1: Async Database Session Management
|
||||||
|
|
||||||
|
**What:** Create async database sessions per request with proper cleanup.
|
||||||
|
|
||||||
|
**When to use:** Every FastAPI endpoint that queries PostgreSQL.
|
||||||
|
|
||||||
|
**Example:**
|
||||||
|
|
||||||
|
```python
|
||||||
|
# app/core/db.py
|
||||||
|
from sqlalchemy.ext.asyncio import AsyncSession, create_async_engine, async_sessionmaker
|
||||||
|
from pydantic_settings import BaseSettings
|
||||||
|
|
||||||
|
class Settings(BaseSettings):
|
||||||
|
database_url: str
|
||||||
|
pool_size: int = 10
|
||||||
|
max_overflow: int = 20
|
||||||
|
pool_timeout: int = 30
|
||||||
|
pool_recycle: int = 1800 # 30 minutes
|
||||||
|
|
||||||
|
settings = Settings()
|
||||||
|
|
||||||
|
# Create async engine with connection pooling
|
||||||
|
engine = create_async_engine(
|
||||||
|
settings.database_url,
|
||||||
|
pool_size=settings.pool_size,
|
||||||
|
max_overflow=settings.max_overflow,
|
||||||
|
pool_timeout=settings.pool_timeout,
|
||||||
|
pool_recycle=settings.pool_recycle,
|
||||||
|
pool_pre_ping=True, # Validate connections before use
|
||||||
|
echo=False # Set True for SQL logging in dev
|
||||||
|
)
|
||||||
|
|
||||||
|
# Session factory
|
||||||
|
async_session_maker = async_sessionmaker(
|
||||||
|
engine,
|
||||||
|
class_=AsyncSession,
|
||||||
|
expire_on_commit=False
|
||||||
|
)
|
||||||
|
|
||||||
|
# Dependency for FastAPI
|
||||||
|
async def get_db() -> AsyncSession:
|
||||||
|
async with async_session_maker() as session:
|
||||||
|
yield session
|
||||||
|
```
|
||||||
|
|
||||||
|
**Source:** [Building High-Performance Async APIs with FastAPI, SQLAlchemy 2.0, and Asyncpg](https://leapcell.io/blog/building-high-performance-async-apis-with-fastapi-sqlalchemy-2-0-and-asyncpg)
|
||||||
|
|
||||||
|
### Pattern 2: Caddy Automatic HTTPS Configuration
|
||||||
|
|
||||||
|
**What:** Configure Caddy as reverse proxy with automatic Let's Encrypt certificates.
|
||||||
|
|
||||||
|
**When to use:** Production deployment requiring HTTPS without manual certificate management.
|
||||||
|
|
||||||
|
**Example:**
|
||||||
|
|
||||||
|
```caddyfile
|
||||||
|
# Caddyfile
|
||||||
|
{
|
||||||
|
# Admin API for programmatic route management (localhost only)
|
||||||
|
admin localhost:2019
|
||||||
|
}
|
||||||
|
|
||||||
|
# Automatic HTTPS for domain
|
||||||
|
api.debate.example.com {
|
||||||
|
reverse_proxy localhost:8000 {
|
||||||
|
# Health check
|
||||||
|
health_uri /health
|
||||||
|
health_interval 10s
|
||||||
|
health_timeout 5s
|
||||||
|
}
|
||||||
|
|
||||||
|
# Security headers
|
||||||
|
header {
|
||||||
|
Strict-Transport-Security "max-age=31536000; includeSubDomains; preload"
|
||||||
|
X-Content-Type-Options "nosniff"
|
||||||
|
X-Frame-Options "DENY"
|
||||||
|
X-XSS-Protection "1; mode=block"
|
||||||
|
}
|
||||||
|
|
||||||
|
# Rate limiting (requires caddy-rate-limit plugin)
|
||||||
|
rate_limit {
|
||||||
|
zone static {
|
||||||
|
key {remote_host}
|
||||||
|
events 100
|
||||||
|
window 1m
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
# Logging
|
||||||
|
log {
|
||||||
|
output file /var/log/caddy/access.log
|
||||||
|
format json
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
**Programmatic route management (Python):**
|
||||||
|
|
||||||
|
```python
|
||||||
|
import httpx
|
||||||
|
|
||||||
|
async def add_iso_download_route(build_id: str, iso_path: str):
|
||||||
|
"""Dynamically add download route via Caddy API."""
|
||||||
|
config = {
|
||||||
|
"match": [{"path": [f"/download/{build_id}/*"]}],
|
||||||
|
"handle": [{
|
||||||
|
"handler": "file_server",
|
||||||
|
"root": iso_path,
|
||||||
|
"hide": [".git"]
|
||||||
|
}]
|
||||||
|
}
|
||||||
|
|
||||||
|
async with httpx.AsyncClient() as client:
|
||||||
|
response = await client.post(
|
||||||
|
"http://localhost:2019/config/apps/http/servers/srv0/routes",
|
||||||
|
json=config
|
||||||
|
)
|
||||||
|
response.raise_for_status()
|
||||||
|
```
|
||||||
|
|
||||||
|
**Source:** [Caddy Reverse Proxy Documentation](https://caddyserver.com/docs/caddyfile/directives/reverse_proxy), [Caddy 2 config for FastAPI](https://stribny.name/posts/caddy-config/)
|
||||||
|
|
||||||
|
### Pattern 3: FastAPI Security Middleware Stack
|
||||||
|
|
||||||
|
**What:** Layer security middleware in correct order for defense-in-depth.
|
||||||
|
|
||||||
|
**When to use:** All production FastAPI applications.
|
||||||
|
|
||||||
|
**Example:**
|
||||||
|
|
||||||
|
```python
|
||||||
|
# app/main.py
|
||||||
|
from fastapi import FastAPI
|
||||||
|
from fastapi.middleware.cors import CORSMiddleware
|
||||||
|
from fastapi.middleware.trustedhost import TrustedHostMiddleware
|
||||||
|
from slowapi import Limiter, _rate_limit_exceeded_handler
|
||||||
|
from slowapi.util import get_remote_address
|
||||||
|
from slowapi.errors import RateLimitExceeded
|
||||||
|
|
||||||
|
from app.core.config import settings
|
||||||
|
from app.api.v1.router import api_router
|
||||||
|
|
||||||
|
# Rate limiter
|
||||||
|
limiter = Limiter(key_func=get_remote_address, default_limits=["100/minute"])
|
||||||
|
|
||||||
|
# FastAPI app
|
||||||
|
app = FastAPI(
|
||||||
|
title="Debate API",
|
||||||
|
version="1.0.0",
|
||||||
|
docs_url="/docs" if settings.environment == "development" else None,
|
||||||
|
redoc_url="/redoc" if settings.environment == "development" else None,
|
||||||
|
debug=settings.debug
|
||||||
|
)
|
||||||
|
|
||||||
|
# Middleware order matters - first added = outermost layer
|
||||||
|
# 1. Trusted Host (reject requests with invalid Host header)
|
||||||
|
app.add_middleware(
|
||||||
|
TrustedHostMiddleware,
|
||||||
|
allowed_hosts=settings.allowed_hosts # ["api.debate.example.com", "localhost"]
|
||||||
|
)
|
||||||
|
|
||||||
|
# 2. CORS (handle cross-origin requests)
|
||||||
|
app.add_middleware(
|
||||||
|
CORSMiddleware,
|
||||||
|
allow_origins=settings.allowed_origins,
|
||||||
|
allow_credentials=True,
|
||||||
|
allow_methods=["GET", "POST", "PUT", "DELETE"],
|
||||||
|
allow_headers=["*"],
|
||||||
|
max_age=600 # Cache preflight requests for 10 minutes
|
||||||
|
)
|
||||||
|
|
||||||
|
# 3. Rate limiting
|
||||||
|
app.state.limiter = limiter
|
||||||
|
app.add_exception_handler(RateLimitExceeded, _rate_limit_exceeded_handler)
|
||||||
|
|
||||||
|
# Include routers
|
||||||
|
app.include_router(api_router, prefix="/api/v1")
|
||||||
|
|
||||||
|
# Health check (no auth, no rate limit)
|
||||||
|
@app.get("/health")
|
||||||
|
async def health():
|
||||||
|
return {"status": "healthy"}
|
||||||
|
```
|
||||||
|
|
||||||
|
**CSRF Protection (separate from middleware, applied to specific endpoints):**
|
||||||
|
|
||||||
|
```python
|
||||||
|
# app/core/security.py
|
||||||
|
from fastapi_csrf_protect import CsrfProtect
|
||||||
|
from pydantic import BaseModel
|
||||||
|
|
||||||
|
class CsrfSettings(BaseModel):
|
||||||
|
secret_key: str = settings.csrf_secret_key
|
||||||
|
cookie_samesite: str = "lax"
|
||||||
|
cookie_secure: bool = True # HTTPS only
|
||||||
|
cookie_domain: str = settings.cookie_domain
|
||||||
|
|
||||||
|
@CsrfProtect.load_config
|
||||||
|
def get_csrf_config():
|
||||||
|
return CsrfSettings()
|
||||||
|
|
||||||
|
# Apply to form endpoints
|
||||||
|
from fastapi import Depends
|
||||||
|
from fastapi_csrf_protect import CsrfProtect
|
||||||
|
|
||||||
|
@app.post("/api/v1/builds")
|
||||||
|
async def create_build(
|
||||||
|
csrf_protect: CsrfProtect = Depends(),
|
||||||
|
db: AsyncSession = Depends(get_db)
|
||||||
|
):
|
||||||
|
csrf_protect.validate_csrf() # Raises 403 if invalid
|
||||||
|
# ... build logic
|
||||||
|
```
|
||||||
|
|
||||||
|
**Source:** [FastAPI Security Guide](https://davidmuraya.com/blog/fastapi-security-guide/), [FastAPI CSRF Protection](https://www.stackhawk.com/blog/csrf-protection-in-fastapi/)
|
||||||
|
|
||||||
|
### Pattern 4: systemd-nspawn Build Sandbox
|
||||||
|
|
||||||
|
**What:** Isolate archiso builds in systemd-nspawn containers with network whitelisting.
|
||||||
|
|
||||||
|
**When to use:** Every ISO build to prevent malicious packages from compromising host.
|
||||||
|
|
||||||
|
**Example:**
|
||||||
|
|
||||||
|
```python
|
||||||
|
# app/services/sandbox.py
|
||||||
|
import subprocess
|
||||||
|
from pathlib import Path
|
||||||
|
from typing import List
|
||||||
|
|
||||||
|
class BuildSandbox:
|
||||||
|
"""Manages systemd-nspawn sandboxed build environments."""
|
||||||
|
|
||||||
|
def __init__(self, container_root: Path, allowed_mirrors: List[str]):
|
||||||
|
self.container_root = container_root
|
||||||
|
self.allowed_mirrors = allowed_mirrors
|
||||||
|
|
||||||
|
async def create_container(self, build_id: str) -> Path:
|
||||||
|
"""Create isolated container for build."""
|
||||||
|
container_path = self.container_root / build_id
|
||||||
|
container_path.mkdir(parents=True, exist_ok=True)
|
||||||
|
|
||||||
|
# Bootstrap minimal Arch Linux environment
|
||||||
|
subprocess.run([
|
||||||
|
"pacstrap",
|
||||||
|
"-c", # Use package cache
|
||||||
|
"-G", # Avoid copying host pacman keyring
|
||||||
|
"-M", # Avoid copying host mirrorlist
|
||||||
|
str(container_path),
|
||||||
|
"base",
|
||||||
|
"archiso"
|
||||||
|
], check=True)
|
||||||
|
|
||||||
|
# Configure mirrors (whitelist only)
|
||||||
|
mirrorlist_path = container_path / "etc/pacman.d/mirrorlist"
|
||||||
|
mirrorlist_path.write_text("\n".join([
|
||||||
|
f"Server = {mirror}" for mirror in self.allowed_mirrors
|
||||||
|
]))
|
||||||
|
|
||||||
|
return container_path
|
||||||
|
|
||||||
|
async def run_build(
|
||||||
|
self,
|
||||||
|
container_path: Path,
|
||||||
|
profile_path: Path,
|
||||||
|
output_path: Path
|
||||||
|
) -> subprocess.CompletedProcess:
|
||||||
|
"""Execute archiso build in sandboxed container."""
|
||||||
|
|
||||||
|
# systemd-nspawn arguments for security
|
||||||
|
nspawn_cmd = [
|
||||||
|
"systemd-nspawn",
|
||||||
|
"--directory", str(container_path),
|
||||||
|
"--private-network", # No network access (mirrors pre-cached)
|
||||||
|
"--read-only", # Immutable root filesystem
|
||||||
|
"--tmpfs", "/tmp:mode=1777", # Writable tmp
|
||||||
|
"--tmpfs", "/var/tmp:mode=1777",
|
||||||
|
"--bind", f"{profile_path}:/build/profile:ro", # Profile read-only
|
||||||
|
"--bind", f"{output_path}:/build/output", # Output writable
|
||||||
|
"--setenv", f"SOURCE_DATE_EPOCH={self._get_source_date_epoch()}",
|
||||||
|
"--setenv", "LC_ALL=C", # Fixed locale for determinism
|
||||||
|
"--setenv", "TZ=UTC", # Fixed timezone
|
||||||
|
"--capability", "CAP_SYS_ADMIN", # Required for mkarchiso
|
||||||
|
"--console=pipe", # Capture output
|
||||||
|
"--quiet",
|
||||||
|
"--",
|
||||||
|
"mkarchiso",
|
||||||
|
"-v",
|
||||||
|
"-r", # Remove working directory after build
|
||||||
|
"-w", "/tmp/archiso-work",
|
||||||
|
"-o", "/build/output",
|
||||||
|
"/build/profile"
|
||||||
|
]
|
||||||
|
|
||||||
|
# Execute with timeout
|
||||||
|
result = subprocess.run(
|
||||||
|
nspawn_cmd,
|
||||||
|
timeout=900, # 15 minute timeout (INFR-02 requirement)
|
||||||
|
capture_output=True,
|
||||||
|
text=True
|
||||||
|
)
|
||||||
|
|
||||||
|
return result
|
||||||
|
|
||||||
|
def _get_source_date_epoch(self) -> str:
|
||||||
|
"""Return fixed timestamp for reproducible builds."""
|
||||||
|
# Use current time for now - Phase 2 will implement git commit timestamp
|
||||||
|
import time
|
||||||
|
return str(int(time.time()))
|
||||||
|
|
||||||
|
async def cleanup_container(self, container_path: Path):
|
||||||
|
"""Remove container after build."""
|
||||||
|
import shutil
|
||||||
|
shutil.rmtree(container_path)
|
||||||
|
```
|
||||||
|
|
||||||
|
**Network isolation with allowed mirrors:**
|
||||||
|
|
||||||
|
For Phase 1, pre-cache packages in the container bootstrap phase. Future enhancement: use `--network-macvlan` with iptables whitelist rules.
|
||||||
|
|
||||||
|
**Source:** [systemd-nspawn ArchWiki](https://wiki.archlinux.org/title/Systemd-nspawn), [Lightweight Development Sandboxes with systemd-nspawn](https://adamgradzki.com/lightweight-development-sandboxes-with-systemd-nspawn-on-linux.html)
|
||||||
|
|
||||||
|
### Pattern 5: Deterministic Build Configuration
|
||||||
|
|
||||||
|
**What:** Configure build environment for reproducible outputs (same config → identical hash).
|
||||||
|
|
||||||
|
**When to use:** Every ISO build to enable caching and integrity verification.
|
||||||
|
|
||||||
|
**Example:**
|
||||||
|
|
||||||
|
```python
|
||||||
|
# app/services/deterministic.py
|
||||||
|
import hashlib
|
||||||
|
import json
|
||||||
|
from pathlib import Path
|
||||||
|
from typing import Dict, Any
|
||||||
|
|
||||||
|
class DeterministicBuildConfig:
|
||||||
|
"""Ensures reproducible ISO builds."""
|
||||||
|
|
||||||
|
@staticmethod
|
||||||
|
def compute_config_hash(config: Dict[str, Any]) -> str:
|
||||||
|
"""
|
||||||
|
Generate deterministic hash of build configuration.
|
||||||
|
Critical: Same config must produce same hash for caching.
|
||||||
|
"""
|
||||||
|
# Normalize configuration (sorted keys, consistent formatting)
|
||||||
|
normalized = {
|
||||||
|
"packages": sorted(config.get("packages", [])),
|
||||||
|
"overlays": sorted([
|
||||||
|
{
|
||||||
|
"name": overlay["name"],
|
||||||
|
"files": sorted([
|
||||||
|
{
|
||||||
|
"path": f["path"],
|
||||||
|
"content_hash": hashlib.sha256(
|
||||||
|
f["content"].encode()
|
||||||
|
).hexdigest()
|
||||||
|
}
|
||||||
|
for f in sorted(overlay.get("files", []), key=lambda x: x["path"])
|
||||||
|
], key=lambda x: x["path"])
|
||||||
|
}
|
||||||
|
for overlay in sorted(config.get("overlays", []), key=lambda x: x["name"])
|
||||||
|
], key=lambda x: x["name"]),
|
||||||
|
"locale": config.get("locale", "en_US.UTF-8"),
|
||||||
|
"timezone": config.get("timezone", "UTC")
|
||||||
|
}
|
||||||
|
|
||||||
|
# JSON with sorted keys for determinism
|
||||||
|
config_json = json.dumps(normalized, sort_keys=True)
|
||||||
|
return hashlib.sha256(config_json.encode()).hexdigest()
|
||||||
|
|
||||||
|
@staticmethod
|
||||||
|
def create_archiso_profile(
|
||||||
|
config: Dict[str, Any],
|
||||||
|
profile_path: Path,
|
||||||
|
source_date_epoch: int
|
||||||
|
):
|
||||||
|
"""
|
||||||
|
Generate archiso profile with deterministic settings.
|
||||||
|
|
||||||
|
Key determinism factors:
|
||||||
|
- SOURCE_DATE_EPOCH: Fixed timestamps in filesystem
|
||||||
|
- LC_ALL=C: Fixed locale for sorting
|
||||||
|
- TZ=UTC: Fixed timezone
|
||||||
|
- Sorted package lists
|
||||||
|
- Fixed compression settings
|
||||||
|
"""
|
||||||
|
profile_path.mkdir(parents=True, exist_ok=True)
|
||||||
|
|
||||||
|
# packages.x86_64 (sorted for determinism)
|
||||||
|
packages_file = profile_path / "packages.x86_64"
|
||||||
|
packages = sorted(config.get("packages", []))
|
||||||
|
packages_file.write_text("\n".join(packages) + "\n")
|
||||||
|
|
||||||
|
# profiledef.sh
|
||||||
|
profiledef = profile_path / "profiledef.sh"
|
||||||
|
profiledef.write_text(f"""#!/usr/bin/env bash
|
||||||
|
# Deterministic archiso profile
|
||||||
|
|
||||||
|
iso_name="debate-custom"
|
||||||
|
iso_label="DEBATE_$(date --date=@{source_date_epoch} +%Y%m)"
|
||||||
|
iso_publisher="Debate Platform <https://debate.example.com>"
|
||||||
|
iso_application="Debate Custom Linux"
|
||||||
|
iso_version="$(date --date=@{source_date_epoch} +%Y.%m.%d)"
|
||||||
|
install_dir="arch"
|
||||||
|
bootmodes=('bios.syslinux.mbr' 'bios.syslinux.eltorito' 'uefi-x64.systemd-boot.esp' 'uefi-x64.systemd-boot.eltorito')
|
||||||
|
arch="x86_64"
|
||||||
|
pacman_conf="pacman.conf"
|
||||||
|
airootfs_image_type="squashfs"
|
||||||
|
airootfs_image_tool_options=('-comp' 'xz' '-Xbcj' 'x86' '-b' '1M' '-Xdict-size' '1M')
|
||||||
|
|
||||||
|
# Deterministic file permissions
|
||||||
|
file_permissions=(
|
||||||
|
["/etc/shadow"]="0:0:0400"
|
||||||
|
["/root"]="0:0:750"
|
||||||
|
["/etc/gshadow"]="0:0:0400"
|
||||||
|
)
|
||||||
|
""")
|
||||||
|
|
||||||
|
# pacman.conf (use fixed mirrors)
|
||||||
|
pacman_conf = profile_path / "pacman.conf"
|
||||||
|
pacman_conf.write_text("""
|
||||||
|
[options]
|
||||||
|
Architecture = auto
|
||||||
|
CheckSpace
|
||||||
|
SigLevel = Required DatabaseOptional
|
||||||
|
LocalFileLockLevel = 2
|
||||||
|
|
||||||
|
[core]
|
||||||
|
Include = /etc/pacman.d/mirrorlist
|
||||||
|
|
||||||
|
[extra]
|
||||||
|
Include = /etc/pacman.d/mirrorlist
|
||||||
|
""")
|
||||||
|
|
||||||
|
# airootfs structure
|
||||||
|
airootfs = profile_path / "airootfs"
|
||||||
|
airootfs.mkdir(exist_ok=True)
|
||||||
|
|
||||||
|
# Apply overlay files
|
||||||
|
for overlay in config.get("overlays", []):
|
||||||
|
for file_config in overlay.get("files", []):
|
||||||
|
file_path = airootfs / file_config["path"].lstrip("/")
|
||||||
|
file_path.parent.mkdir(parents=True, exist_ok=True)
|
||||||
|
file_path.write_text(file_config["content"])
|
||||||
|
```
|
||||||
|
|
||||||
|
**Source:** [archiso deterministic builds merge request](https://gitlab.archlinux.org/archlinux/archiso/-/merge_requests/436), [SOURCE_DATE_EPOCH specification](https://reproducible-builds.org/docs/source-date-epoch/)
|
||||||
|
|
||||||
|
## Don't Hand-Roll
|
||||||
|
|
||||||
|
Problems with existing battle-tested solutions:
|
||||||
|
|
||||||
|
| Problem | Don't Build | Use Instead | Why |
|
||||||
|
|---------|-------------|-------------|-----|
|
||||||
|
| HTTPS certificate management | Custom Let's Encrypt client | Caddy with automatic HTTPS | Certificate renewal, OCSP stapling, HTTP challenge handling. Caddy handles all edge cases. |
|
||||||
|
| API rate limiting | Token bucket from scratch | slowapi or fastapi-limiter | Distributed rate limiting across workers, Redis backend, bypass for trusted IPs, multiple rate limit tiers. |
|
||||||
|
| CSRF protection | Custom token generation | fastapi-csrf-protect | Double Submit Cookie pattern, token rotation, SameSite cookie handling, timing-attack prevention. |
|
||||||
|
| Database connection pooling | Manual connection management | SQLAlchemy AsyncAdaptedQueuePool | Connection health checks, overflow handling, timeout management, prepared statement caching. |
|
||||||
|
| Container isolation | chroot or custom namespaces | systemd-nspawn | Namespace isolation, cgroup resource limits, capability dropping, read-only filesystem enforcement. |
|
||||||
|
| Async database drivers | Synchronous psycopg2 with thread pool | asyncpg | Native async protocol, connection pooling, prepared statements, type inference, 3-5x faster. |
|
||||||
|
|
||||||
|
**Key insight:** Security and infrastructure code has subtle failure modes that only surface under load or attack. Use proven libraries with years of production hardening.
|
||||||
|
|
||||||
|
## Common Pitfalls
|
||||||
|
|
||||||
|
### Pitfall 1: Unsandboxed Build Execution (CRITICAL)
|
||||||
|
|
||||||
|
**What goes wrong:** User-submitted packages execute arbitrary code during build with full system privileges, allowing compromise of build infrastructure.
|
||||||
|
|
||||||
|
**Why it happens:** Developers assume package builds are safe or underestimate risk. archiso's mkarchiso runs without sandboxing by default.
|
||||||
|
|
||||||
|
**Real-world incident:** July 2025 CHAOS RAT malware distributed through AUR packages (librewolf-fix-bin, firefox-patch-bin) using .install scripts to execute remote code. [Source](https://linuxsecurity.com/features/chaos-rat-in-aur)
|
||||||
|
|
||||||
|
**How to avoid:**
|
||||||
|
- **NEVER run archiso builds directly on host system**
|
||||||
|
- Use systemd-nspawn with `--private-network` and `--read-only` flags
|
||||||
|
- Run builds in ephemeral containers (destroy after completion)
|
||||||
|
- Implement network egress filtering (whitelist official Arch mirrors only)
|
||||||
|
- Static analysis on PKGBUILD files: detect `curl | bash`, `eval`, base64 encoding
|
||||||
|
- Monitor build processes for unexpected network connections
|
||||||
|
|
||||||
|
**Warning signs:**
|
||||||
|
- Build makes outbound connections to non-mirror IPs
|
||||||
|
- PKGBUILD contains base64 encoding or eval statements
|
||||||
|
- Build duration significantly longer than expected
|
||||||
|
- Unexpected filesystem modifications outside working directory
|
||||||
|
|
||||||
|
**Phase to address:** Phase 1 - Build sandboxing must be architected from the start. Retrofitting is nearly impossible.
|
||||||
|
|
||||||
|
### Pitfall 2: Non-Deterministic Builds
|
||||||
|
|
||||||
|
**What goes wrong:** Same configuration generates different ISO hashes, breaking caching and integrity verification.
|
||||||
|
|
||||||
|
**Why it happens:** Timestamps in artifacts, non-deterministic file ordering, leaked environment variables, parallel build race conditions.
|
||||||
|
|
||||||
|
**How to avoid:**
|
||||||
|
- Set `SOURCE_DATE_EPOCH` environment variable for all builds
|
||||||
|
- Use `LC_ALL=C` for consistent sorting and locale
|
||||||
|
- Set `TZ=UTC` for timezone consistency
|
||||||
|
- Sort all input lists (packages, files) before processing
|
||||||
|
- Use fixed compression settings in archiso profile
|
||||||
|
- Pin archiso version (don't use rolling latest)
|
||||||
|
- Test: build same config twice, compare SHA256 hashes
|
||||||
|
|
||||||
|
**Detection:**
|
||||||
|
- Automated testing: duplicate builds with checksum comparison
|
||||||
|
- Monitor cache hit rate (sudden drops indicate non-determinism)
|
||||||
|
- Track build output size variance for identical configs
|
||||||
|
|
||||||
|
**Phase to address:** Phase 1 - Reproducibility must be designed into build pipeline from start.
|
||||||
|
|
||||||
|
**Source:** [Reproducible builds documentation](https://reproducible-builds.org/docs/deterministic-build-systems/)
|
||||||
|
|
||||||
|
### Pitfall 3: Connection Pool Exhaustion
|
||||||
|
|
||||||
|
**What goes wrong:** Under load, API exhausts PostgreSQL connections. New requests fail with "connection pool timeout" errors.
|
||||||
|
|
||||||
|
**Why it happens:** Default pool_size (5) too small for async workloads. Not using pool_pre_ping to detect stale connections. Long-running queries hold connections.
|
||||||
|
|
||||||
|
**How to avoid:**
|
||||||
|
- Set `pool_size=10`, `max_overflow=20` for production
|
||||||
|
- Enable `pool_pre_ping=True` to validate connections
|
||||||
|
- Set `pool_recycle=1800` (30 min) to refresh connections
|
||||||
|
- Use `pool_timeout=30` to fail fast
|
||||||
|
- Pin `asyncpg<0.29.0` to avoid SQLAlchemy 2.0.x compatibility issues
|
||||||
|
- Monitor connection pool metrics (active, idle, overflow)
|
||||||
|
|
||||||
|
**Detection:**
|
||||||
|
- Alert on "connection pool timeout" errors
|
||||||
|
- Monitor connection pool utilization (should stay <80%)
|
||||||
|
- Track query duration p95 (detect slow queries holding connections)
|
||||||
|
|
||||||
|
**Phase to address:** Phase 1 - Configure properly during initial database setup.
|
||||||
|
|
||||||
|
**Source:** [Handling PostgreSQL Connection Limits in FastAPI](https://medium.com/@rameshkannanyt0078/handling-postgresql-connection-limits-in-fastapi-efficiently-379ff44bdac5)
|
||||||
|
|
||||||
|
### Pitfall 4: Disabled Interactive Docs in Production
|
||||||
|
|
||||||
|
**What goes wrong:** Developers leave `/docs` and `/redoc` enabled in production, exposing API schema to attackers.
|
||||||
|
|
||||||
|
**Why it happens:** Convenient during development, forgotten in production. No environment-based toggle.
|
||||||
|
|
||||||
|
**How to avoid:**
|
||||||
|
- Disable docs in production: `docs_url=None if settings.environment == "production" else "/docs"`
|
||||||
|
- Or require authentication for docs endpoints
|
||||||
|
- Use environment variables to control feature flags
|
||||||
|
|
||||||
|
**Detection:**
|
||||||
|
- Security audit: check if `/docs` accessible without auth in production
|
||||||
|
|
||||||
|
**Phase to address:** Phase 1 - Configure during initial FastAPI setup.
|
||||||
|
|
||||||
|
**Source:** [FastAPI Production Checklist](https://www.compilenrun.com/docs/framework/fastapi/fastapi-best-practices/fastapi-production-checklist/)
|
||||||
|
|
||||||
|
### Pitfall 5: Insecure Default Secrets
|
||||||
|
|
||||||
|
**What goes wrong:** Using hardcoded or weak secrets for JWT signing, CSRF tokens, or database passwords. Attackers exploit to forge tokens or access database.
|
||||||
|
|
||||||
|
**Why it happens:** Copy-paste from tutorials. Not using environment variables. Committing .env files.
|
||||||
|
|
||||||
|
**How to avoid:**
|
||||||
|
- Generate strong secrets: `openssl rand -hex 32`
|
||||||
|
- Load from environment variables via pydantic-settings
|
||||||
|
- NEVER commit secrets to git
|
||||||
|
- Use secret management services (AWS Secrets Manager, HashiCorp Vault) in production
|
||||||
|
- Rotate secrets periodically
|
||||||
|
|
||||||
|
**Detection:**
|
||||||
|
- Git pre-commit hook: scan for hardcoded secrets
|
||||||
|
- Security audit: check for weak or default credentials
|
||||||
|
|
||||||
|
**Phase to address:** Phase 1 - Establish secure configuration management from start.
|
||||||
|
|
||||||
|
**Source:** [FastAPI Security FAQs](https://xygeni.io/blog/fastapi-security-faqs-what-developers-should-know/)
|
||||||
|
|
||||||
|
## Code Examples
|
||||||
|
|
||||||
|
### Database Migrations with Alembic
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Initialize Alembic
|
||||||
|
alembic init alembic
|
||||||
|
|
||||||
|
# Create first migration
|
||||||
|
alembic revision --autogenerate -m "Create initial tables"
|
||||||
|
|
||||||
|
# Apply migrations
|
||||||
|
alembic upgrade head
|
||||||
|
|
||||||
|
# Rollback
|
||||||
|
alembic downgrade -1
|
||||||
|
```
|
||||||
|
|
||||||
|
**Alembic env.py configuration for async:**
|
||||||
|
|
||||||
|
```python
|
||||||
|
# alembic/env.py
|
||||||
|
from logging.config import fileConfig
|
||||||
|
from sqlalchemy import pool
|
||||||
|
from sqlalchemy.ext.asyncio import async_engine_from_config
|
||||||
|
from alembic import context
|
||||||
|
|
||||||
|
from app.core.config import settings
|
||||||
|
from app.db.base import Base # Import all models
|
||||||
|
|
||||||
|
config = context.config
|
||||||
|
config.set_main_option("sqlalchemy.url", settings.database_url)
|
||||||
|
|
||||||
|
target_metadata = Base.metadata
|
||||||
|
|
||||||
|
def run_migrations_offline():
|
||||||
|
"""Run migrations in 'offline' mode."""
|
||||||
|
context.configure(
|
||||||
|
url=settings.database_url,
|
||||||
|
target_metadata=target_metadata,
|
||||||
|
literal_binds=True,
|
||||||
|
dialect_opts={"paramstyle": "named"},
|
||||||
|
)
|
||||||
|
|
||||||
|
with context.begin_transaction():
|
||||||
|
context.run_migrations()
|
||||||
|
|
||||||
|
async def run_migrations_online():
|
||||||
|
"""Run migrations in 'online' mode."""
|
||||||
|
connectable = async_engine_from_config(
|
||||||
|
config.get_section(config.config_ini_section),
|
||||||
|
prefix="sqlalchemy.",
|
||||||
|
poolclass=pool.NullPool,
|
||||||
|
)
|
||||||
|
|
||||||
|
async with connectable.connect() as connection:
|
||||||
|
await connection.run_sync(do_run_migrations)
|
||||||
|
|
||||||
|
def do_run_migrations(connection):
|
||||||
|
context.configure(connection=connection, target_metadata=target_metadata)
|
||||||
|
with context.begin_transaction():
|
||||||
|
context.run_migrations()
|
||||||
|
|
||||||
|
if context.is_offline_mode():
|
||||||
|
run_migrations_offline()
|
||||||
|
else:
|
||||||
|
import asyncio
|
||||||
|
asyncio.run(run_migrations_online())
|
||||||
|
```
|
||||||
|
|
||||||
|
**Source:** [FastAPI with Async SQLAlchemy and Alembic](https://testdriven.io/blog/fastapi-sqlmodel/)
|
||||||
|
|
||||||
|
### PostgreSQL Backup Script
|
||||||
|
|
||||||
|
```bash
|
||||||
|
#!/bin/bash
|
||||||
|
# Daily PostgreSQL backup with retention
|
||||||
|
|
||||||
|
BACKUP_DIR="/var/backups/postgres"
|
||||||
|
RETENTION_DAYS=30
|
||||||
|
TIMESTAMP=$(date +%Y%m%d_%H%M%S)
|
||||||
|
DB_NAME="debate"
|
||||||
|
|
||||||
|
# Create backup directory
|
||||||
|
mkdir -p "$BACKUP_DIR"
|
||||||
|
|
||||||
|
# Backup database
|
||||||
|
pg_dump -U postgres -Fc -b -v -f "$BACKUP_DIR/${DB_NAME}_${TIMESTAMP}.dump" "$DB_NAME"
|
||||||
|
|
||||||
|
# Compress backup
|
||||||
|
gzip "$BACKUP_DIR/${DB_NAME}_${TIMESTAMP}.dump"
|
||||||
|
|
||||||
|
# Delete old backups
|
||||||
|
find "$BACKUP_DIR" -name "${DB_NAME}_*.dump.gz" -mtime +$RETENTION_DAYS -delete
|
||||||
|
|
||||||
|
# Verify backup integrity
|
||||||
|
gunzip -t "$BACKUP_DIR/${DB_NAME}_${TIMESTAMP}.dump.gz" && echo "Backup verified"
|
||||||
|
|
||||||
|
# Test restore (weekly)
|
||||||
|
if [ "$(date +%u)" -eq 1 ]; then
|
||||||
|
echo "Testing weekly restore..."
|
||||||
|
createdb -U postgres "${DB_NAME}_test"
|
||||||
|
pg_restore -U postgres -d "${DB_NAME}_test" "$BACKUP_DIR/${DB_NAME}_${TIMESTAMP}.dump.gz"
|
||||||
|
dropdb -U postgres "${DB_NAME}_test"
|
||||||
|
fi
|
||||||
|
```
|
||||||
|
|
||||||
|
**Cron schedule:**
|
||||||
|
|
||||||
|
```cron
|
||||||
|
# Daily backup at 2 AM
|
||||||
|
0 2 * * * /usr/local/bin/postgres-backup.sh >> /var/log/postgres-backup.log 2>&1
|
||||||
|
```
|
||||||
|
|
||||||
|
**Source:** [PostgreSQL Backup Best Practices](https://medium.com/@ngza5tqf/postgresql-backup-best-practices-15-essential-postgresql-backup-strategies-for-production-systems-dd230fb3f161)
|
||||||
|
|
||||||
|
### Health Check Endpoint
|
||||||
|
|
||||||
|
```python
|
||||||
|
# app/api/v1/endpoints/health.py
|
||||||
|
from fastapi import APIRouter, Depends
|
||||||
|
from sqlalchemy.ext.asyncio import AsyncSession
|
||||||
|
from sqlalchemy import text
|
||||||
|
|
||||||
|
from app.core.db import get_db
|
||||||
|
|
||||||
|
router = APIRouter()
|
||||||
|
|
||||||
|
@router.get("/health")
|
||||||
|
async def health_check():
|
||||||
|
"""Basic health check (no database)."""
|
||||||
|
return {"status": "healthy"}
|
||||||
|
|
||||||
|
@router.get("/health/db")
|
||||||
|
async def health_check_db(db: AsyncSession = Depends(get_db)):
|
||||||
|
"""Health check with database connection test."""
|
||||||
|
try:
|
||||||
|
result = await db.execute(text("SELECT 1"))
|
||||||
|
result.scalar()
|
||||||
|
return {"status": "healthy", "database": "connected"}
|
||||||
|
except Exception as e:
|
||||||
|
return {"status": "unhealthy", "database": "error", "error": str(e)}
|
||||||
|
```
|
||||||
|
|
||||||
|
## State of the Art
|
||||||
|
|
||||||
|
| Old Approach | Current Approach (2026) | When Changed | Impact |
|
||||||
|
|--------------|-------------------------|--------------|--------|
|
||||||
|
| Gunicorn + Uvicorn workers | Uvicorn `--workers` flag | Uvicorn 0.30 (2024) | Simpler deployment, one less dependency |
|
||||||
|
| psycopg2 (sync) | asyncpg | SQLAlchemy 2.0 (2023) | 3-5x faster, native async, better type hints |
|
||||||
|
| Pydantic v1 | Pydantic v2 | Pydantic 2.0 (2023) | Better performance, Python 3.14 compatibility |
|
||||||
|
| chroot for isolation | systemd-nspawn | ~2015 | Full namespace isolation, cgroup limits |
|
||||||
|
| Manual Let's Encrypt | Caddy automatic HTTPS | Caddy 2.0 (2020) | Zero-config certificates, automatic renewal |
|
||||||
|
| Nginx config files | Caddy REST API | Caddy 2.0 (2020) | Programmatic route management |
|
||||||
|
| asyncpg 0.29+ | Pin asyncpg <0.29.0 | 2024 | SQLAlchemy 2.0.x compatibility issues |
|
||||||
|
|
||||||
|
**Deprecated/outdated:**
|
||||||
|
- **Gunicorn as ASGI manager:** Uvicorn 0.30+ has built-in multi-process supervisor
|
||||||
|
- **Pydantic v1:** Deprecated, Python 3.14+ incompatible
|
||||||
|
- **psycopg2 for async FastAPI:** Use asyncpg for 3-5x performance improvement
|
||||||
|
- **chroot for sandboxing:** Insufficient isolation; use systemd-nspawn or containers
|
||||||
|
|
||||||
|
## Open Questions
|
||||||
|
|
||||||
|
### 1. Network Isolation Strategy for systemd-nspawn
|
||||||
|
|
||||||
|
**What we know:**
|
||||||
|
- systemd-nspawn `--private-network` completely isolates container from network
|
||||||
|
- archiso mkarchiso needs to download packages from mirrors
|
||||||
|
- User overlays may reference external packages (SSH keys, configs fetched from GitHub)
|
||||||
|
|
||||||
|
**What's unclear:**
|
||||||
|
- Best approach for whitelisting Arch mirrors while blocking other network access
|
||||||
|
- Whether to pre-cache all packages (slow bootstrap, guaranteed isolation) vs. allow outbound to whitelisted mirrors (faster, more complex)
|
||||||
|
- How to handle private overlays requiring external resources
|
||||||
|
|
||||||
|
**Recommendation:**
|
||||||
|
- Phase 1: Pre-cache packages during container bootstrap. Use `--private-network` for complete isolation.
|
||||||
|
- Future enhancement: Implement HTTP proxy with whitelist, use `--network-macvlan` with iptables rules
|
||||||
|
|
||||||
|
**Confidence:** MEDIUM - No documented pattern for systemd-nspawn + selective network access
|
||||||
|
|
||||||
|
### 2. Build Timeout Threshold
|
||||||
|
|
||||||
|
**What we know:**
|
||||||
|
- INFR-02 requirement: ISO build completes within 15 minutes
|
||||||
|
- Context decision: Claude's discretion on timeout handling (soft warning vs hard kill, duration)
|
||||||
|
|
||||||
|
**What's unclear:**
|
||||||
|
- What percentage of builds complete within 15 minutes vs. require longer?
|
||||||
|
- Should timeout be configurable per build size (small overlay vs. full desktop environment)?
|
||||||
|
- Soft warning (allow continuation with user consent) vs. hard kill?
|
||||||
|
|
||||||
|
**Recommendation:**
|
||||||
|
- Phase 1: Hard timeout at 20 minutes (133% of target) with warning at 15 minutes
|
||||||
|
- Phase 2: Collect metrics, tune threshold based on actual build distribution
|
||||||
|
- Allow extended timeout for authenticated users or specific overlay combinations
|
||||||
|
|
||||||
|
**Confidence:** LOW - Depends on real-world build performance data
|
||||||
|
|
||||||
|
### 3. Cache Invalidation Strategy
|
||||||
|
|
||||||
|
**What we know:**
|
||||||
|
- Deterministic builds enable caching (same config → same hash)
|
||||||
|
- Arch is rolling release (packages update daily)
|
||||||
|
- Cached ISOs may contain outdated/vulnerable packages
|
||||||
|
|
||||||
|
**What's unclear:**
|
||||||
|
- Time-based expiry (e.g., max 7 days) vs. package version tracking?
|
||||||
|
- How to detect when upstream packages update and invalidate cache?
|
||||||
|
- Balance between cache efficiency and package freshness
|
||||||
|
|
||||||
|
**Recommendation:**
|
||||||
|
- Phase 1: Simple approach: no caching (always build fresh)
|
||||||
|
- Phase 2: Time-based cache expiry (7 days max)
|
||||||
|
- Phase 3: Track package repository snapshot timestamps, invalidate when snapshot changes
|
||||||
|
|
||||||
|
**Confidence:** MEDIUM - Standard approach exists, but implementation details depend on Arch repository snapshot strategy
|
||||||
|
|
||||||
|
## Sources
|
||||||
|
|
||||||
|
### Primary (HIGH confidence)
|
||||||
|
|
||||||
|
- [FastAPI Documentation - Security](https://fastapi.tiangolo.com/tutorial/security/) - Official security guide
|
||||||
|
- [Caddy Documentation - Reverse Proxy](https://caddyserver.com/docs/caddyfile/directives/reverse_proxy) - Official Caddy docs
|
||||||
|
- [Caddy Documentation - Automatic HTTPS](https://caddyserver.com/docs/automatic-https) - Certificate management
|
||||||
|
- [systemd-nspawn ArchWiki](https://wiki.archlinux.org/title/Systemd-nspawn) - Official Arch documentation
|
||||||
|
- [archiso ArchWiki](https://wiki.archlinux.org/title/Archiso) - Official archiso documentation
|
||||||
|
- [PostgreSQL 18 Documentation - Backup and Restore](https://www.postgresql.org/docs/current/backup.html) - Official PostgreSQL docs
|
||||||
|
- [SOURCE_DATE_EPOCH Specification](https://reproducible-builds.org/docs/source-date-epoch/) - Official reproducible builds spec
|
||||||
|
- [SQLAlchemy 2.0 Documentation - Connection Pooling](https://docs.sqlalchemy.org/en/20/core/pooling.html) - Official SQLAlchemy docs
|
||||||
|
- [archiso deterministic builds merge request](https://gitlab.archlinux.org/archlinux/archiso/-/merge_requests/436) - Official archiso improvement
|
||||||
|
|
||||||
|
### Secondary (MEDIUM confidence)
|
||||||
|
|
||||||
|
- [Building High-Performance Async APIs with FastAPI, SQLAlchemy 2.0, and Asyncpg](https://leapcell.io/blog/building-high-performance-async-apis-with-fastapi-sqlalchemy-2-0-and-asyncpg)
|
||||||
|
- [FastAPI Production Deployment Best Practices](https://render.com/articles/fastapi-production-deployment-best-practices)
|
||||||
|
- [FastAPI CSRF Protection Guide](https://www.stackhawk.com/blog/csrf-protection-in-fastapi/)
|
||||||
|
- [A Practical Guide to FastAPI Security](https://davidmuraya.com/blog/fastapi-security-guide/)
|
||||||
|
- [Implementing Rate Limiter with FastAPI and Redis](https://bryananthonio.com/blog/implementing-rate-limiter-fastapi-redis/)
|
||||||
|
- [Caddy 2 Config for FastAPI](https://stribny.name/posts/caddy-config/)
|
||||||
|
- [Lightweight Development Sandboxes with systemd-nspawn](https://adamgradzki.com/lightweight-development-sandboxes-with-systemd-nspawn-on-linux.html)
|
||||||
|
- [Handling PostgreSQL Connection Limits in FastAPI](https://medium.com/@rameshkannanyt0078/handling-postgresql-connection-limits-in-fastapi-efficiently-379ff44bdac5)
|
||||||
|
- [PostgreSQL Backup Best Practices - 15 Essential Strategies](https://medium.com/@ngza5tqf/postgresql-backup-best-practices-15-essential-postgresql-backup-strategies-for-production-systems-dd230fb3f161)
|
||||||
|
- [13 PostgreSQL Backup Best Practices for Developers and DBAs](https://dev.to/dean_dautovich/13-postgresql-backup-best-practices-for-developers-and-dbas-3oi5)
|
||||||
|
- [Reproducible Arch Linux Packages](https://linderud.dev/blog/reproducible-arch-linux-packages/)
|
||||||
|
- [FastAPI with Async SQLAlchemy and Alembic](https://testdriven.io/blog/fastapi-sqlmodel/)
|
||||||
|
|
||||||
|
### Tertiary (LOW confidence)
|
||||||
|
|
||||||
|
- [CHAOS RAT in AUR Packages](https://linuxsecurity.com/features/chaos-rat-in-aur) - Malware incident report
|
||||||
|
- [Sandboxing Untrusted Code in 2026](https://dev.to/mohameddiallo/4-ways-to-sandbox-untrusted-code-in-2026-1ffb) - General sandboxing approaches
|
||||||
|
- [FastAPI Production Checklist](https://www.compilenrun.com/docs/framework/fastapi/fastapi-best-practices/fastapi-production-checklist/) - Community best practices
|
||||||
|
|
||||||
|
## Metadata
|
||||||
|
|
||||||
|
**Confidence breakdown:**
|
||||||
|
- Standard stack: HIGH - All technologies in active use for production FastAPI + PostgreSQL deployments in 2026
|
||||||
|
- Architecture patterns: HIGH - Verified with official documentation and production examples
|
||||||
|
- Security practices: HIGH - Based on official FastAPI security docs and established OWASP patterns
|
||||||
|
- systemd-nspawn sandboxing: MEDIUM - Well-documented for general use, but specific archiso integration pattern not widely documented
|
||||||
|
- Deterministic builds: MEDIUM - archiso MR #436 implemented determinism, but practical application details require experimentation
|
||||||
|
- Pitfalls: HIGH - Based on documented incidents (CHAOS RAT malware), official docs warnings, and production failure patterns
|
||||||
|
|
||||||
|
**Research date:** 2026-01-25
|
||||||
|
**Valid until:** ~30 days (2026-02-25) - Technologies are stable, but security advisories and package versions may change
|
||||||
|
|
||||||
|
**Critical constraints verified:**
|
||||||
|
- ✅ Python with FastAPI, SQLAlchemy, Alembic, Pydantic
|
||||||
|
- ✅ PostgreSQL as database
|
||||||
|
- ✅ Ruff as Python linter/formatter (NOT black/flake8/isort)
|
||||||
|
- ✅ systemd-nspawn for sandboxing
|
||||||
|
- ✅ archiso for ISO builds
|
||||||
|
- ✅ <200ms p95 latency achievable with async FastAPI + asyncpg
|
||||||
|
- ✅ ISO build within 15 minutes (mkarchiso baseline: 5-10 min)
|
||||||
|
- ✅ HTTPS with Caddy automatic certificates
|
||||||
|
- ✅ Rate limiting and CSRF protection libraries available
|
||||||
|
- ✅ Deterministic builds supported via SOURCE_DATE_EPOCH
|
||||||
Loading…
Add table
Reference in a new issue