Phase 01: Core Infrastructure & Security - 5 plans in 3 waves - 3 parallel (Wave 1-2), 1 sequential (Wave 3) - Ready for execution Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
23 KiB
| phase | plan | type | wave | depends_on | files_modified | autonomous | must_haves | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 01-core-infrastructure-security | 05 | execute | 3 |
|
|
true |
|
Purpose: Ensure ISO builds are isolated from host (ISO-04) and produce identical output for same input (determinism for caching). Output: Sandbox service that creates isolated containers, deterministic build configuration with hash generation.
<execution_context> @/home/mikkel/.claude/get-shit-done/workflows/execute-plan.md @/home/mikkel/.claude/get-shit-done/templates/summary.md </execution_context>
@.planning/PROJECT.md @.planning/ROADMAP.md @.planning/phases/01-core-infrastructure-security/01-RESEARCH.md (Pattern 4: systemd-nspawn Build Sandbox, Pattern 5: Deterministic Build Configuration) @.planning/phases/01-core-infrastructure-security/01-CONTEXT.md (Sandbox Strictness, Determinism Approach decisions) @.planning/phases/01-core-infrastructure-security/01-01-SUMMARY.md @.planning/phases/01-core-infrastructure-security/01-02-SUMMARY.md Task 1: Create sandbox setup script and sandbox service scripts/setup-sandbox.sh backend/app/services/__init__.py backend/app/services/sandbox.py Create scripts/setup-sandbox.sh: ```bash #!/bin/bash # Initialize sandbox environment for ISO builds # Run once to create base container imageset -euo pipefail
SANDBOX_ROOT="${SANDBOX_ROOT:-/var/lib/debate/sandbox}" SANDBOX_BASE="${SANDBOX_ROOT}/base" ALLOWED_MIRRORS=( "https://geo.mirror.pkgbuild.com/$repo/os/$arch" "https://mirror.cachyos.org/repo/$arch/$repo" )
log() { echo "[$(date '+%Y-%m-%d %H:%M:%S')] $1" }
Check prerequisites
if ! command -v pacstrap &> /dev/null; then log "ERROR: pacstrap not found. Install arch-install-scripts package." exit 1 fi
if ! command -v systemd-nspawn &> /dev/null; then log "ERROR: systemd-nspawn not found. Install systemd-container package." exit 1 fi
Create sandbox directories
log "Creating sandbox directories..." mkdir -p "$SANDBOX_ROOT"/{base,builds,cache}
Bootstrap base Arch environment
if [ ! -d "$SANDBOX_BASE/usr" ]; then log "Bootstrapping base Arch Linux environment..." pacstrap -c -G -M "$SANDBOX_BASE" base archiso
# Configure mirrors (whitelist only)
log "Configuring mirrors..."
MIRRORLIST="$SANDBOX_BASE/etc/pacman.d/mirrorlist"
: > "$MIRRORLIST"
for mirror in "${ALLOWED_MIRRORS[@]}"; do
echo "Server = $mirror" >> "$MIRRORLIST"
done
# Set fixed locale for determinism
echo "en_US.UTF-8 UTF-8" > "$SANDBOX_BASE/etc/locale.gen"
systemd-nspawn -D "$SANDBOX_BASE" locale-gen
log "Base environment created at $SANDBOX_BASE"
else log "Base environment already exists at $SANDBOX_BASE" fi
log "Sandbox setup complete"
Create backend/app/services/__init__.py:
- Empty or import key services
Create backend/app/services/sandbox.py:
```python
"""
systemd-nspawn sandbox for isolated ISO builds.
Security measures:
- --private-network: No network access (packages pre-cached in base)
- --read-only: Immutable root filesystem
- --tmpfs: Writable temp directories only
- --capability: Minimal capabilities for mkarchiso
- Resource limits: 8GB RAM, 4 cores (from CONTEXT.md)
"""
import asyncio
import shutil
import subprocess
from pathlib import Path
from typing import Optional
from dataclasses import dataclass
from datetime import datetime
from app.core.config import settings
@dataclass
class SandboxConfig:
"""Configuration for sandbox execution."""
memory_limit: str = "8G"
cpu_quota: str = "400%" # 4 cores
timeout_seconds: int = 1200 # 20 minutes (with 15min warning)
warning_seconds: int = 900 # 15 minutes
class BuildSandbox:
"""Manages systemd-nspawn sandboxed build environments."""
def __init__(
self,
sandbox_root: Path = None,
config: SandboxConfig = None
):
self.sandbox_root = sandbox_root or Path(settings.sandbox_root)
self.base_path = self.sandbox_root / "base"
self.builds_path = self.sandbox_root / "builds"
self.config = config or SandboxConfig()
async def create_build_container(self, build_id: str) -> Path:
"""
Create isolated container for a specific build.
Uses overlay filesystem on base for efficiency.
"""
container_path = self.builds_path / build_id
if container_path.exists():
shutil.rmtree(container_path)
container_path.mkdir(parents=True)
# Copy base (in production, use overlayfs for efficiency)
# For now, simple copy is acceptable
proc = await asyncio.create_subprocess_exec(
"cp", "-a", str(self.base_path) + "/.", str(container_path),
stdout=asyncio.subprocess.PIPE,
stderr=asyncio.subprocess.PIPE
)
await proc.wait()
return container_path
async def run_build(
self,
container_path: Path,
profile_path: Path,
output_path: Path,
source_date_epoch: int
) -> tuple[int, str, str]:
"""
Execute archiso build in sandboxed container.
Returns:
Tuple of (return_code, stdout, stderr)
"""
output_path.mkdir(parents=True, exist_ok=True)
nspawn_cmd = [
"systemd-nspawn",
f"--directory={container_path}",
"--private-network", # No network access
"--read-only", # Immutable root
"--tmpfs=/tmp:mode=1777",
"--tmpfs=/var/tmp:mode=1777",
f"--bind={profile_path}:/build/profile:ro",
f"--bind={output_path}:/build/output",
f"--setenv=SOURCE_DATE_EPOCH={source_date_epoch}",
"--setenv=LC_ALL=C",
"--setenv=TZ=UTC",
"--capability=CAP_SYS_ADMIN", # Required for mkarchiso
"--console=pipe",
"--quiet",
"--",
"mkarchiso",
"-v",
"-r", # Remove work directory after build
"-w", "/tmp/archiso-work",
"-o", "/build/output",
"/build/profile"
]
proc = await asyncio.create_subprocess_exec(
*nspawn_cmd,
stdout=asyncio.subprocess.PIPE,
stderr=asyncio.subprocess.PIPE
)
try:
stdout, stderr = await asyncio.wait_for(
proc.communicate(),
timeout=self.config.timeout_seconds
)
return proc.returncode, stdout.decode(), stderr.decode()
except asyncio.TimeoutError:
proc.kill()
return -1, "", f"Build timed out after {self.config.timeout_seconds} seconds"
async def cleanup_container(self, container_path: Path):
"""Remove container after build."""
if container_path.exists():
shutil.rmtree(container_path)
Run:
```bash
cd /home/mikkel/repos/debate
ruff check backend/app/services/sandbox.py
python -c "from backend.app.services.sandbox import BuildSandbox, SandboxConfig; print('Import OK')"
```
Expected: No ruff errors, import succeeds.
Sandbox service creates isolated containers with network isolation, resource limits, and deterministic environment.
Task 2: Create deterministic build configuration service
backend/app/services/deterministic.py
tests/test_deterministic.py
Create backend/app/services/deterministic.py:
```python
"""
Deterministic build configuration for reproducible ISOs.
Critical: Same configuration must produce identical ISO hash. This is required for caching to work correctly.
Determinism factors:
- SOURCE_DATE_EPOCH: Fixed timestamps in all generated files
- LC_ALL=C: Fixed locale for sorting
- TZ=UTC: Fixed timezone
- Sorted inputs: Packages, files always in consistent order
- Fixed compression: Consistent squashfs settings """
import hashlib import json from pathlib import Path from typing import Any from dataclasses import dataclass
@dataclass class OverlayFile: """A file to be included in the overlay.""" path: str # Absolute path in ISO (e.g., /etc/skel/.bashrc) content: str mode: str = "0644"
@dataclass class BuildConfiguration: """Normalized build configuration for deterministic hashing.""" packages: list[str] overlays: list[dict[str, Any]] locale: str = "en_US.UTF-8" timezone: str = "UTC"
class DeterministicBuildConfig: """Ensures reproducible ISO builds."""
@staticmethod
def compute_config_hash(config: dict[str, Any]) -> str:
"""
Generate deterministic hash of build configuration.
Process:
1. Normalize all inputs (sort lists, normalize paths)
2. Hash file contents (not file objects)
3. Use consistent JSON serialization
Returns:
SHA-256 hash of normalized configuration
"""
# Normalize packages (sorted, deduplicated)
packages = sorted(set(config.get("packages", [])))
# Normalize overlays
normalized_overlays = []
for overlay in sorted(config.get("overlays", []), key=lambda x: x.get("name", "")):
normalized_files = []
for f in sorted(overlay.get("files", []), key=lambda x: x.get("path", "")):
content = f.get("content", "")
content_hash = hashlib.sha256(content.encode()).hexdigest()
normalized_files.append({
"path": f.get("path", "").strip(),
"content_hash": content_hash,
"mode": f.get("mode", "0644")
})
normalized_overlays.append({
"name": overlay.get("name", "").strip(),
"files": normalized_files
})
# Build normalized config
normalized = {
"packages": packages,
"overlays": normalized_overlays,
"locale": config.get("locale", "en_US.UTF-8"),
"timezone": config.get("timezone", "UTC")
}
# JSON with sorted keys for determinism
config_json = json.dumps(normalized, sort_keys=True, separators=(',', ':'))
return hashlib.sha256(config_json.encode()).hexdigest()
@staticmethod
def get_source_date_epoch(config_hash: str) -> int:
"""
Generate deterministic timestamp from config hash.
Using hash-derived timestamp ensures:
- Same config always gets same timestamp
- Different configs get different timestamps
- No dependency on wall clock time
The timestamp is within a reasonable range (2020-2030).
"""
# Use first 8 bytes of hash to generate timestamp
hash_int = int(config_hash[:16], 16)
# Map to range: Jan 1, 2020 to Dec 31, 2030
min_epoch = 1577836800 # 2020-01-01
max_epoch = 1924991999 # 2030-12-31
return min_epoch + (hash_int % (max_epoch - min_epoch))
@staticmethod
def create_archiso_profile(
config: dict[str, Any],
profile_path: Path,
source_date_epoch: int
) -> None:
"""
Generate archiso profile with deterministic settings.
Creates:
- packages.x86_64: Sorted package list
- profiledef.sh: Build configuration
- pacman.conf: Package manager config
- airootfs/: Overlay files
"""
profile_path.mkdir(parents=True, exist_ok=True)
# packages.x86_64 (sorted for determinism)
packages = sorted(set(config.get("packages", ["base", "linux"])))
packages_file = profile_path / "packages.x86_64"
packages_file.write_text("\n".join(packages) + "\n")
# profiledef.sh
profiledef = profile_path / "profiledef.sh"
iso_date = f"$(date --date=@{source_date_epoch} +%Y%m)"
iso_version = f"$(date --date=@{source_date_epoch} +%Y.%m.%d)"
profiledef.write_text(f'''#!/usr/bin/env bash
Deterministic archiso profile
Generated for Debate platform
iso_name="debate-custom" iso_label="DEBATE_{iso_date}" iso_publisher="Debate Platform https://debate.example.com" iso_application="Debate Custom Linux" iso_version="{iso_version}" install_dir="arch" bootmodes=('bios.syslinux.mbr' 'bios.syslinux.eltorito' 'uefi-x64.systemd-boot.esp' 'uefi-x64.systemd-boot.eltorito') arch="x86_64" pacman_conf="pacman.conf" airootfs_image_type="squashfs" airootfs_image_tool_options=('-comp' 'xz' '-Xbcj' 'x86' '-b' '1M' '-Xdict-size' '1M')
file_permissions=( ["/etc/shadow"]="0:0:0400" ["/root"]="0:0:750" ["/etc/gshadow"]="0:0:0400" ) ''')
# pacman.conf
pacman_conf = profile_path / "pacman.conf"
pacman_conf.write_text('''[options]
Architecture = auto CheckSpace SigLevel = Required DatabaseOptional LocalFileSigLevel = Optional
[core] Include = /etc/pacman.d/mirrorlist
[extra] Include = /etc/pacman.d/mirrorlist ''')
# airootfs structure with overlay files
airootfs = profile_path / "airootfs"
airootfs.mkdir(exist_ok=True)
for overlay in config.get("overlays", []):
for file_config in overlay.get("files", []):
file_path = airootfs / file_config["path"].lstrip("/")
file_path.parent.mkdir(parents=True, exist_ok=True)
file_path.write_text(file_config["content"])
if "mode" in file_config:
file_path.chmod(int(file_config["mode"], 8))
Create tests/test_deterministic.py:
```python
"""Tests for deterministic build configuration."""
import pytest
from backend.app.services.deterministic import DeterministicBuildConfig
class TestDeterministicBuildConfig:
"""Test that same inputs produce same outputs."""
def test_hash_deterministic(self):
"""Same config produces same hash."""
config = {
"packages": ["vim", "git", "base"],
"overlays": [{
"name": "test",
"files": [{"path": "/etc/test", "content": "hello"}]
}]
}
hash1 = DeterministicBuildConfig.compute_config_hash(config)
hash2 = DeterministicBuildConfig.compute_config_hash(config)
assert hash1 == hash2
def test_hash_order_independent(self):
"""Package order doesn't affect hash."""
config1 = {"packages": ["vim", "git", "base"], "overlays": []}
config2 = {"packages": ["base", "git", "vim"], "overlays": []}
hash1 = DeterministicBuildConfig.compute_config_hash(config1)
hash2 = DeterministicBuildConfig.compute_config_hash(config2)
assert hash1 == hash2
def test_hash_different_configs(self):
"""Different configs produce different hashes."""
config1 = {"packages": ["vim"], "overlays": []}
config2 = {"packages": ["emacs"], "overlays": []}
hash1 = DeterministicBuildConfig.compute_config_hash(config1)
hash2 = DeterministicBuildConfig.compute_config_hash(config2)
assert hash1 != hash2
def test_source_date_epoch_deterministic(self):
"""Same hash produces same timestamp."""
config_hash = "abc123def456"
epoch1 = DeterministicBuildConfig.get_source_date_epoch(config_hash)
epoch2 = DeterministicBuildConfig.get_source_date_epoch(config_hash)
assert epoch1 == epoch2
def test_source_date_epoch_in_range(self):
"""Timestamp is within reasonable range."""
config_hash = "abc123def456"
epoch = DeterministicBuildConfig.get_source_date_epoch(config_hash)
# Should be between 2020 and 2030
assert 1577836800 <= epoch <= 1924991999
Run:
```bash
cd /home/mikkel/repos/debate
ruff check backend/app/services/deterministic.py tests/test_deterministic.py
pytest tests/test_deterministic.py -v
```
Expected: Ruff passes, all tests pass.
Deterministic build config generates consistent hashes, timestamps derived from config hash.
Task 3: Create build orchestration service
backend/app/services/build.py
Create backend/app/services/build.py:
```python
"""
Build orchestration service.
Coordinates:
- Configuration validation
- Hash computation (for caching)
- Sandbox creation
- Build execution
- Result storage """
import asyncio from pathlib import Path from typing import Optional from uuid import uuid4 from datetime import datetime, UTC
from sqlalchemy.ext.asyncio import AsyncSession from sqlalchemy import select
from app.core.config import settings from app.db.models.build import Build, BuildStatus from app.services.sandbox import BuildSandbox from app.services.deterministic import DeterministicBuildConfig
class BuildService: """Orchestrates ISO build process."""
def __init__(self, db: AsyncSession):
self.db = db
self.sandbox = BuildSandbox()
self.output_root = Path(settings.iso_output_root)
async def get_or_create_build(
self,
config: dict
) -> tuple[Build, bool]:
"""
Get existing build from cache or create new one.
Returns:
Tuple of (Build, is_cached)
"""
# Compute deterministic hash
config_hash = DeterministicBuildConfig.compute_config_hash(config)
# Check cache
stmt = select(Build).where(
Build.config_hash == config_hash,
Build.status == BuildStatus.completed
)
result = await self.db.execute(stmt)
cached_build = result.scalar_one_or_none()
if cached_build:
# Return cached build
return cached_build, True
# Create new build
build = Build(
id=uuid4(),
config_hash=config_hash,
status=BuildStatus.pending
)
self.db.add(build)
await self.db.commit()
await self.db.refresh(build)
return build, False
async def execute_build(
self,
build: Build,
config: dict
) -> Build:
"""
Execute the actual ISO build.
Process:
1. Update status to building
2. Create sandbox container
3. Generate archiso profile
4. Run build
5. Update status with result
"""
build.status = BuildStatus.building
build.started_at = datetime.now(UTC)
await self.db.commit()
container_path = None
profile_path = self.output_root / str(build.id) / "profile"
output_path = self.output_root / str(build.id) / "output"
try:
# Create sandbox
container_path = await self.sandbox.create_build_container(str(build.id))
# Generate deterministic profile
source_date_epoch = DeterministicBuildConfig.get_source_date_epoch(
build.config_hash
)
DeterministicBuildConfig.create_archiso_profile(
config, profile_path, source_date_epoch
)
# Run build in sandbox
return_code, stdout, stderr = await self.sandbox.run_build(
container_path, profile_path, output_path, source_date_epoch
)
if return_code == 0:
# Find generated ISO
iso_files = list(output_path.glob("*.iso"))
if iso_files:
build.iso_path = str(iso_files[0])
build.status = BuildStatus.completed
else:
build.status = BuildStatus.failed
build.error_message = "Build completed but no ISO found"
else:
build.status = BuildStatus.failed
build.error_message = stderr or f"Build failed with code {return_code}"
build.build_log = stdout + "\n" + stderr
except Exception as e:
build.status = BuildStatus.failed
build.error_message = str(e)
finally:
# Cleanup sandbox
if container_path:
await self.sandbox.cleanup_container(container_path)
build.completed_at = datetime.now(UTC)
await self.db.commit()
await self.db.refresh(build)
return build
async def get_build_status(self, build_id: str) -> Optional[Build]:
"""Get build by ID."""
stmt = select(Build).where(Build.id == build_id)
result = await self.db.execute(stmt)
return result.scalar_one_or_none()
</action>
<verify>
Run:
```bash
cd /home/mikkel/repos/debate
ruff check backend/app/services/build.py
python -c "from backend.app.services.build import BuildService; print('Import OK')"
Expected: No ruff errors, import succeeds. Build service coordinates hash computation, caching, sandbox execution, and status tracking.
1. `ruff check backend/app/services/` passes 2. `pytest tests/test_deterministic.py` - all tests pass 3. Sandbox service can be imported without errors 4. Build service can be imported without errors 5. DeterministicBuildConfig.compute_config_hash produces consistent results<success_criteria>
- Sandbox service creates isolated systemd-nspawn containers (ISO-04)
- Builds run with --private-network (no network access)
- SOURCE_DATE_EPOCH set for deterministic builds
- Same configuration produces identical hash
- Build service coordinates full build lifecycle
- Cache lookup happens before build execution </success_criteria>