--- phase: 01-core-infrastructure-security plan: 05 type: execute wave: 2 depends_on: ["01-01", "01-02"] files_modified: - backend/app/services/__init__.py - backend/app/services/sandbox.py - backend/app/services/deterministic.py - backend/app/services/build.py - scripts/setup-sandbox.sh - tests/test_deterministic.py autonomous: true must_haves: truths: - "Sandbox creates isolated systemd-nspawn container" - "Build commands execute with no network access" - "Same configuration produces identical hash" - "SOURCE_DATE_EPOCH is set for all builds" artifacts: - path: "backend/app/services/sandbox.py" provides: "systemd-nspawn sandbox management" contains: "systemd-nspawn" - path: "backend/app/services/deterministic.py" provides: "Deterministic build configuration" contains: "SOURCE_DATE_EPOCH" - path: "backend/app/services/build.py" provides: "Build orchestration service" contains: "class BuildService" - path: "scripts/setup-sandbox.sh" provides: "Sandbox environment initialization" contains: "pacstrap" key_links: - from: "backend/app/services/build.py" to: "backend/app/services/sandbox.py" via: "BuildSandbox import" pattern: "from.*sandbox import" - from: "backend/app/services/build.py" to: "backend/app/services/deterministic.py" via: "DeterministicBuildConfig import" pattern: "from.*deterministic import" --- Implement systemd-nspawn build sandbox with deterministic configuration for reproducible ISO builds. Purpose: Ensure ISO builds are isolated from host (ISO-04) and produce identical output for same input (determinism for caching). Output: Sandbox service that creates isolated containers, deterministic build configuration with hash generation. @/home/mikkel/.claude/get-shit-done/workflows/execute-plan.md @/home/mikkel/.claude/get-shit-done/templates/summary.md @.planning/PROJECT.md @.planning/ROADMAP.md @.planning/phases/01-core-infrastructure-security/01-RESEARCH.md (Pattern 4: systemd-nspawn Build Sandbox, Pattern 5: Deterministic Build Configuration) @.planning/phases/01-core-infrastructure-security/01-CONTEXT.md (Sandbox Strictness, Determinism Approach decisions) @.planning/phases/01-core-infrastructure-security/01-01-SUMMARY.md @.planning/phases/01-core-infrastructure-security/01-02-SUMMARY.md Task 1: Create sandbox setup script and sandbox service scripts/setup-sandbox.sh backend/app/services/__init__.py backend/app/services/sandbox.py Create scripts/setup-sandbox.sh: ```bash #!/bin/bash # Initialize sandbox environment for ISO builds # Run once to create base container image set -euo pipefail SANDBOX_ROOT="${SANDBOX_ROOT:-/var/lib/debate/sandbox}" SANDBOX_BASE="${SANDBOX_ROOT}/base" ALLOWED_MIRRORS=( "https://geo.mirror.pkgbuild.com/\$repo/os/\$arch" "https://mirror.cachyos.org/repo/\$arch/\$repo" ) log() { echo "[$(date '+%Y-%m-%d %H:%M:%S')] $1" } # Check prerequisites if ! command -v pacstrap &> /dev/null; then log "ERROR: pacstrap not found. Install arch-install-scripts package." exit 1 fi if ! command -v systemd-nspawn &> /dev/null; then log "ERROR: systemd-nspawn not found. Install systemd-container package." exit 1 fi # Create sandbox directories log "Creating sandbox directories..." mkdir -p "$SANDBOX_ROOT"/{base,builds,cache} # Bootstrap base Arch environment if [ ! -d "$SANDBOX_BASE/usr" ]; then log "Bootstrapping base Arch Linux environment..." pacstrap -c -G -M "$SANDBOX_BASE" base archiso # Configure mirrors (whitelist only) log "Configuring mirrors..." MIRRORLIST="$SANDBOX_BASE/etc/pacman.d/mirrorlist" : > "$MIRRORLIST" for mirror in "${ALLOWED_MIRRORS[@]}"; do echo "Server = $mirror" >> "$MIRRORLIST" done # Set fixed locale for determinism echo "en_US.UTF-8 UTF-8" > "$SANDBOX_BASE/etc/locale.gen" systemd-nspawn -D "$SANDBOX_BASE" locale-gen log "Base environment created at $SANDBOX_BASE" else log "Base environment already exists at $SANDBOX_BASE" fi log "Sandbox setup complete" ``` Create backend/app/services/__init__.py: - Empty or import key services Create backend/app/services/sandbox.py: ```python """ systemd-nspawn sandbox for isolated ISO builds. Security measures: - --private-network: No network access (packages pre-cached in base) - --read-only: Immutable root filesystem - --tmpfs: Writable temp directories only - --capability: Minimal capabilities for mkarchiso - Resource limits: 8GB RAM, 4 cores (from CONTEXT.md) """ import asyncio import shutil import subprocess from pathlib import Path from typing import Optional from dataclasses import dataclass from datetime import datetime from app.core.config import settings @dataclass class SandboxConfig: """Configuration for sandbox execution.""" memory_limit: str = "8G" cpu_quota: str = "400%" # 4 cores timeout_seconds: int = 1200 # 20 minutes (with 15min warning) warning_seconds: int = 900 # 15 minutes class BuildSandbox: """Manages systemd-nspawn sandboxed build environments.""" def __init__( self, sandbox_root: Path = None, config: SandboxConfig = None ): self.sandbox_root = sandbox_root or Path(settings.sandbox_root) self.base_path = self.sandbox_root / "base" self.builds_path = self.sandbox_root / "builds" self.config = config or SandboxConfig() async def create_build_container(self, build_id: str) -> Path: """ Create isolated container for a specific build. Uses overlay filesystem on base for efficiency. """ container_path = self.builds_path / build_id if container_path.exists(): shutil.rmtree(container_path) container_path.mkdir(parents=True) # Copy base (in production, use overlayfs for efficiency) # For now, simple copy is acceptable proc = await asyncio.create_subprocess_exec( "cp", "-a", str(self.base_path) + "/.", str(container_path), stdout=asyncio.subprocess.PIPE, stderr=asyncio.subprocess.PIPE ) await proc.wait() return container_path async def run_build( self, container_path: Path, profile_path: Path, output_path: Path, source_date_epoch: int ) -> tuple[int, str, str]: """ Execute archiso build in sandboxed container. Returns: Tuple of (return_code, stdout, stderr) """ output_path.mkdir(parents=True, exist_ok=True) nspawn_cmd = [ "systemd-nspawn", f"--directory={container_path}", "--private-network", # No network access "--read-only", # Immutable root "--tmpfs=/tmp:mode=1777", "--tmpfs=/var/tmp:mode=1777", f"--bind={profile_path}:/build/profile:ro", f"--bind={output_path}:/build/output", f"--setenv=SOURCE_DATE_EPOCH={source_date_epoch}", "--setenv=LC_ALL=C", "--setenv=TZ=UTC", "--capability=CAP_SYS_ADMIN", # Required for mkarchiso "--console=pipe", "--quiet", "--", "mkarchiso", "-v", "-r", # Remove work directory after build "-w", "/tmp/archiso-work", "-o", "/build/output", "/build/profile" ] proc = await asyncio.create_subprocess_exec( *nspawn_cmd, stdout=asyncio.subprocess.PIPE, stderr=asyncio.subprocess.PIPE ) try: stdout, stderr = await asyncio.wait_for( proc.communicate(), timeout=self.config.timeout_seconds ) return proc.returncode, stdout.decode(), stderr.decode() except asyncio.TimeoutError: proc.kill() return -1, "", f"Build timed out after {self.config.timeout_seconds} seconds" async def cleanup_container(self, container_path: Path): """Remove container after build.""" if container_path.exists(): shutil.rmtree(container_path) ``` Run: ```bash cd /home/mikkel/repos/debate ruff check backend/app/services/sandbox.py python -c "from backend.app.services.sandbox import BuildSandbox, SandboxConfig; print('Import OK')" ``` Expected: No ruff errors, import succeeds. Sandbox service creates isolated containers with network isolation, resource limits, and deterministic environment. Task 2: Create deterministic build configuration service backend/app/services/deterministic.py tests/test_deterministic.py Create backend/app/services/deterministic.py: ```python """ Deterministic build configuration for reproducible ISOs. Critical: Same configuration must produce identical ISO hash. This is required for caching to work correctly. Determinism factors: - SOURCE_DATE_EPOCH: Fixed timestamps in all generated files - LC_ALL=C: Fixed locale for sorting - TZ=UTC: Fixed timezone - Sorted inputs: Packages, files always in consistent order - Fixed compression: Consistent squashfs settings """ import hashlib import json from pathlib import Path from typing import Any from dataclasses import dataclass @dataclass class OverlayFile: """A file to be included in the overlay.""" path: str # Absolute path in ISO (e.g., /etc/skel/.bashrc) content: str mode: str = "0644" @dataclass class BuildConfiguration: """Normalized build configuration for deterministic hashing.""" packages: list[str] overlays: list[dict[str, Any]] locale: str = "en_US.UTF-8" timezone: str = "UTC" class DeterministicBuildConfig: """Ensures reproducible ISO builds.""" @staticmethod def compute_config_hash(config: dict[str, Any]) -> str: """ Generate deterministic hash of build configuration. Process: 1. Normalize all inputs (sort lists, normalize paths) 2. Hash file contents (not file objects) 3. Use consistent JSON serialization Returns: SHA-256 hash of normalized configuration """ # Normalize packages (sorted, deduplicated) packages = sorted(set(config.get("packages", []))) # Normalize overlays normalized_overlays = [] for overlay in sorted(config.get("overlays", []), key=lambda x: x.get("name", "")): normalized_files = [] for f in sorted(overlay.get("files", []), key=lambda x: x.get("path", "")): content = f.get("content", "") content_hash = hashlib.sha256(content.encode()).hexdigest() normalized_files.append({ "path": f.get("path", "").strip(), "content_hash": content_hash, "mode": f.get("mode", "0644") }) normalized_overlays.append({ "name": overlay.get("name", "").strip(), "files": normalized_files }) # Build normalized config normalized = { "packages": packages, "overlays": normalized_overlays, "locale": config.get("locale", "en_US.UTF-8"), "timezone": config.get("timezone", "UTC") } # JSON with sorted keys for determinism config_json = json.dumps(normalized, sort_keys=True, separators=(',', ':')) return hashlib.sha256(config_json.encode()).hexdigest() @staticmethod def get_source_date_epoch(config_hash: str) -> int: """ Generate deterministic timestamp from config hash. Using hash-derived timestamp ensures: - Same config always gets same timestamp - Different configs get different timestamps - No dependency on wall clock time The timestamp is within a reasonable range (2020-2030). """ # Use first 8 bytes of hash to generate timestamp hash_int = int(config_hash[:16], 16) # Map to range: Jan 1, 2020 to Dec 31, 2030 min_epoch = 1577836800 # 2020-01-01 max_epoch = 1924991999 # 2030-12-31 return min_epoch + (hash_int % (max_epoch - min_epoch)) @staticmethod def create_archiso_profile( config: dict[str, Any], profile_path: Path, source_date_epoch: int ) -> None: """ Generate archiso profile with deterministic settings. Creates: - packages.x86_64: Sorted package list - profiledef.sh: Build configuration - pacman.conf: Package manager config - airootfs/: Overlay files """ profile_path.mkdir(parents=True, exist_ok=True) # packages.x86_64 (sorted for determinism) packages = sorted(set(config.get("packages", ["base", "linux"]))) packages_file = profile_path / "packages.x86_64" packages_file.write_text("\n".join(packages) + "\n") # profiledef.sh profiledef = profile_path / "profiledef.sh" iso_date = f"$(date --date=@{source_date_epoch} +%Y%m)" iso_version = f"$(date --date=@{source_date_epoch} +%Y.%m.%d)" profiledef.write_text(f'''#!/usr/bin/env bash # Deterministic archiso profile # Generated for Debate platform iso_name="debate-custom" iso_label="DEBATE_{iso_date}" iso_publisher="Debate Platform " iso_application="Debate Custom Linux" iso_version="{iso_version}" install_dir="arch" bootmodes=('bios.syslinux.mbr' 'bios.syslinux.eltorito' 'uefi-x64.systemd-boot.esp' 'uefi-x64.systemd-boot.eltorito') arch="x86_64" pacman_conf="pacman.conf" airootfs_image_type="squashfs" airootfs_image_tool_options=('-comp' 'xz' '-Xbcj' 'x86' '-b' '1M' '-Xdict-size' '1M') file_permissions=( ["/etc/shadow"]="0:0:0400" ["/root"]="0:0:750" ["/etc/gshadow"]="0:0:0400" ) ''') # pacman.conf pacman_conf = profile_path / "pacman.conf" pacman_conf.write_text('''[options] Architecture = auto CheckSpace SigLevel = Required DatabaseOptional LocalFileSigLevel = Optional [core] Include = /etc/pacman.d/mirrorlist [extra] Include = /etc/pacman.d/mirrorlist ''') # airootfs structure with overlay files airootfs = profile_path / "airootfs" airootfs.mkdir(exist_ok=True) for overlay in config.get("overlays", []): for file_config in overlay.get("files", []): file_path = airootfs / file_config["path"].lstrip("/") file_path.parent.mkdir(parents=True, exist_ok=True) file_path.write_text(file_config["content"]) if "mode" in file_config: file_path.chmod(int(file_config["mode"], 8)) ``` Create tests/test_deterministic.py: ```python """Tests for deterministic build configuration.""" import pytest from backend.app.services.deterministic import DeterministicBuildConfig class TestDeterministicBuildConfig: """Test that same inputs produce same outputs.""" def test_hash_deterministic(self): """Same config produces same hash.""" config = { "packages": ["vim", "git", "base"], "overlays": [{ "name": "test", "files": [{"path": "/etc/test", "content": "hello"}] }] } hash1 = DeterministicBuildConfig.compute_config_hash(config) hash2 = DeterministicBuildConfig.compute_config_hash(config) assert hash1 == hash2 def test_hash_order_independent(self): """Package order doesn't affect hash.""" config1 = {"packages": ["vim", "git", "base"], "overlays": []} config2 = {"packages": ["base", "git", "vim"], "overlays": []} hash1 = DeterministicBuildConfig.compute_config_hash(config1) hash2 = DeterministicBuildConfig.compute_config_hash(config2) assert hash1 == hash2 def test_hash_different_configs(self): """Different configs produce different hashes.""" config1 = {"packages": ["vim"], "overlays": []} config2 = {"packages": ["emacs"], "overlays": []} hash1 = DeterministicBuildConfig.compute_config_hash(config1) hash2 = DeterministicBuildConfig.compute_config_hash(config2) assert hash1 != hash2 def test_source_date_epoch_deterministic(self): """Same hash produces same timestamp.""" config_hash = "abc123def456" epoch1 = DeterministicBuildConfig.get_source_date_epoch(config_hash) epoch2 = DeterministicBuildConfig.get_source_date_epoch(config_hash) assert epoch1 == epoch2 def test_source_date_epoch_in_range(self): """Timestamp is within reasonable range.""" config_hash = "abc123def456" epoch = DeterministicBuildConfig.get_source_date_epoch(config_hash) # Should be between 2020 and 2030 assert 1577836800 <= epoch <= 1924991999 ``` Run: ```bash cd /home/mikkel/repos/debate ruff check backend/app/services/deterministic.py tests/test_deterministic.py pytest tests/test_deterministic.py -v ``` Expected: Ruff passes, all tests pass. Deterministic build config generates consistent hashes, timestamps derived from config hash. Task 3: Create build orchestration service backend/app/services/build.py Create backend/app/services/build.py: ```python """ Build orchestration service. Coordinates: 1. Configuration validation 2. Hash computation (for caching) 3. Sandbox creation 4. Build execution 5. Result storage """ import asyncio from pathlib import Path from typing import Optional from uuid import uuid4 from datetime import datetime, UTC from sqlalchemy.ext.asyncio import AsyncSession from sqlalchemy import select from app.core.config import settings from app.db.models.build import Build, BuildStatus from app.services.sandbox import BuildSandbox from app.services.deterministic import DeterministicBuildConfig class BuildService: """Orchestrates ISO build process.""" def __init__(self, db: AsyncSession): self.db = db self.sandbox = BuildSandbox() self.output_root = Path(settings.iso_output_root) async def get_or_create_build( self, config: dict ) -> tuple[Build, bool]: """ Get existing build from cache or create new one. Returns: Tuple of (Build, is_cached) """ # Compute deterministic hash config_hash = DeterministicBuildConfig.compute_config_hash(config) # Check cache stmt = select(Build).where( Build.config_hash == config_hash, Build.status == BuildStatus.completed ) result = await self.db.execute(stmt) cached_build = result.scalar_one_or_none() if cached_build: # Return cached build return cached_build, True # Create new build build = Build( id=uuid4(), config_hash=config_hash, status=BuildStatus.pending ) self.db.add(build) await self.db.commit() await self.db.refresh(build) return build, False async def execute_build( self, build: Build, config: dict ) -> Build: """ Execute the actual ISO build. Process: 1. Update status to building 2. Create sandbox container 3. Generate archiso profile 4. Run build 5. Update status with result """ build.status = BuildStatus.building build.started_at = datetime.now(UTC) await self.db.commit() container_path = None profile_path = self.output_root / str(build.id) / "profile" output_path = self.output_root / str(build.id) / "output" try: # Create sandbox container_path = await self.sandbox.create_build_container(str(build.id)) # Generate deterministic profile source_date_epoch = DeterministicBuildConfig.get_source_date_epoch( build.config_hash ) DeterministicBuildConfig.create_archiso_profile( config, profile_path, source_date_epoch ) # Run build in sandbox return_code, stdout, stderr = await self.sandbox.run_build( container_path, profile_path, output_path, source_date_epoch ) if return_code == 0: # Find generated ISO iso_files = list(output_path.glob("*.iso")) if iso_files: build.iso_path = str(iso_files[0]) build.status = BuildStatus.completed else: build.status = BuildStatus.failed build.error_message = "Build completed but no ISO found" else: build.status = BuildStatus.failed build.error_message = stderr or f"Build failed with code {return_code}" build.build_log = stdout + "\n" + stderr except Exception as e: build.status = BuildStatus.failed build.error_message = str(e) finally: # Cleanup sandbox if container_path: await self.sandbox.cleanup_container(container_path) build.completed_at = datetime.now(UTC) await self.db.commit() await self.db.refresh(build) return build async def get_build_status(self, build_id: str) -> Optional[Build]: """Get build by ID.""" stmt = select(Build).where(Build.id == build_id) result = await self.db.execute(stmt) return result.scalar_one_or_none() ``` Run: ```bash cd /home/mikkel/repos/debate ruff check backend/app/services/build.py python -c "from backend.app.services.build import BuildService; print('Import OK')" ``` Expected: No ruff errors, import succeeds. Build service coordinates hash computation, caching, sandbox execution, and status tracking. 1. `ruff check backend/app/services/` passes 2. `pytest tests/test_deterministic.py` - all tests pass 3. Sandbox service can be imported without errors 4. Build service can be imported without errors 5. DeterministicBuildConfig.compute_config_hash produces consistent results - Sandbox service creates isolated systemd-nspawn containers (ISO-04) - Builds run with --private-network (no network access) - SOURCE_DATE_EPOCH set for deterministic builds - Same configuration produces identical hash - Build service coordinates full build lifecycle - Cache lookup happens before build execution After completion, create `.planning/phases/01-core-infrastructure-security/01-05-SUMMARY.md`