homelab/.planning/codebase/ARCHITECTURE.md
Mikkel Georgsen a639a53b0b docs: add codebase map and domain research
Codebase: 7 documents (stack, architecture, structure, conventions, testing, integrations, concerns)
Research: 5 documents (stack, features, architecture, pitfalls, summary)
2026-02-04 13:50:03 +00:00

6.6 KiB

Architecture

Analysis Date: 2026-02-04

Pattern Overview

Overall: Hub-and-spoke service orchestration with API-driven infrastructure management.

Key Characteristics:

  • Centralized management container (VMID 102 - mgmt) coordinating all infrastructure
  • Layered abstraction: CLI helpers → REST APIs → external services
  • Event-driven notifications (Telegram bot bridges management layer to user)
  • Credential-based authentication for all service integrations

Layers

Management Layer:

  • Purpose: Orchestration and automation entry point for the homelab
  • Location: /home/mikkel/homelab (git repository in mgmt container)
  • Contains: CLI helper scripts (~/bin/*), Telegram bot, documentation
  • Depends on: Remote SSH access to container/VM IP addresses, Proxmox API, service REST APIs
  • Used by: Claude Code automation, Telegram bot commands, cron jobs

API Integration Layer:

  • Purpose: Abstracts service APIs into simple CLI interfaces
  • Location: ~/bin/ (pve, npm-api, dns, pbs, beszel, kuma, updates, telegram)
  • Contains: Python and Bash wrappers around external service APIs
  • Depends on: Proxmox API, Nginx Proxy Manager API, Technitium DNS API, PBS REST API, Beszel PocketBase, Uptime Kuma REST API, Telegram Bot API
  • Used by: Telegram bot, CI/CD automation, interactive CLI usage

Service Layer:

  • Purpose: Individual hosted services providing infrastructure capabilities
  • Location: Distributed across containers (NPM, DNS, PBS, Dockge, Forgejo, etc.)
  • Contains: Docker containers, LXC services, backup systems
  • Depends on: PVE host networking, shared storage, external integrations
  • Used by: API layer, end-user access via web UI or CLI

Data & Communication Layer:

  • Purpose: State persistence and inter-service communication
  • Location: Shared storage (~/stuff - ZFS bind mount), credential files (~/.config/*/credentials)
  • Contains: Backup data, configuration files, Telegram inbox/images/files
  • Depends on: PVE ZFS dataset, filesystem access
  • Used by: All services, backup/restore operations

Data Flow

Infrastructure Query Flow (e.g., pve list):

  1. User invokes CLI helper: ~/bin/pve list
  2. Helper loads credentials from ~/.config/pve/credentials
  3. Helper authenticates to Proxmox API at core.georgsen.dk:8006 using token auth
  4. Proxmox returns cluster resource state (VMs/containers)
  5. Helper formats and displays output to user

Service Management Flow (e.g., dns add myhost 10.5.0.50):

  1. User invokes: ~/bin/dns add myhost 10.5.0.50
  2. DNS helper loads credentials and authenticates to Technitium at 10.5.0.2:5380
  3. Helper makes HTTP API call to add A record
  4. Technitium stores in zone file and updates DNS records
  5. Helper confirms success to user

Backup Status Flow (e.g., /pbs command in Telegram):

  1. Telegram user sends /pbs command
  2. Bot handler in telegram/bot.py executes ~/bin/pbs status
  3. PBS helper SSH's to 10.5.0.6 as root
  4. SSH command reads backup logs and GC status from PBS container
  5. Helper formats human-readable output
  6. Bot sends result back to Telegram chat (truncated to 4000 chars for Telegram API limit)

State Management:

  • Credentials: Stored in ~/.config/*/credentials files (sourced at runtime)
  • Telegram messages: Appended to telegram/inbox file for Claude to read
  • Media uploads: Saved to telegram/images/ and telegram/files/ with timestamps
  • Authorization: telegram/authorized_users file maintains allowlist of chat IDs

Key Abstractions

Helper Scripts (API Adapters):

  • Purpose: Translate user intent into remote service API calls
  • Examples: ~/bin/pve, ~/bin/dns, ~/bin/pbs, ~/bin/beszel, ~/bin/kuma
  • Pattern: Load credentials → authenticate → execute command → format output
  • Language: Mix of Python (pve, updates, telegram) and Bash (dns, pbs, beszel, kuma)

Telegram Bot:

  • Purpose: Provides two-way interactive access to management functions
  • Implementation: telegram/bot.py using python-telegram-bot library
  • Pattern: Command handlers dispatch to helper scripts, results sent back to user
  • Channels: Commands (e.g., /pbs), free-text messages saved to inbox, photos/files downloaded

Service Registry (Documentation):

  • Purpose: Centralized reference for service locations and access patterns
  • Implementation: homelab-documentation.md and CLAUDE.md
  • Contents: IP addresses, ports, authentication methods, SSH targets, network topology

Entry Points

CLI Usage (Direct):

  • Location: ~/bin/{helper} scripts
  • Triggers: Manual invocation by user or cron jobs
  • Responsibilities: Execute service operations, format output, validate inputs

Telegram Bot:

  • Location: telegram/bot.py (systemd service: telegram-bot.service)
  • Triggers: Telegram message or command from authorized user
  • Responsibilities: Authenticate user, route command/message, execute via helper scripts, send response

Automation Scripts:

  • Location: Potential cron jobs or scheduled tasks
  • Triggers: Time-based scheduling
  • Responsibilities: Execute periodic management tasks (e.g., backup checks, updates)

Manual Execution:

  • Location: Interactive shell in mgmt container
  • Triggers: User SSH session
  • Responsibilities: Run helpers for ad-hoc infrastructure management

Error Handling

Strategy: Graceful degradation with informative messaging.

Patterns:

  • CLI helpers return non-zero exit codes on failure (exception handling in Python, set -e in Bash)
  • Timeout protection: Telegram bot commands have 30-second timeout (configurable per command)
  • Service unavailability: Caught in try/except blocks, fall back to next option (e.g., pve tries LXC first, then QEMU)
  • Credential failures: Load-time validation, clear error message if credentials file missing
  • Network errors: SSH timeouts, API connection failures logged to stdout/stderr

Cross-Cutting Concerns

Logging:

  • Telegram bot uses Python stdlib logging (INFO level, writes to systemd journal)
  • CLI helpers write directly to stdout/stderr
  • PBS helper uses SSH error output for remote command failures

Validation:

  • Telegram bot validates hostnames (alphanumeric + dots + hyphens only) before ping
  • DNS helper validates that name and IP are provided before API call
  • PVE helper validates VMID is integer before API call

Authentication:

  • Credentials stored in ~/.config/{service}/credentials as simple key=value files
  • Sourced at runtime (Bash) or read at startup (Python)
  • Token-based auth for Proxmox (no password in memory)
  • Basic auth for DNS and other REST APIs (credentials URL-encoded if needed)
  • Bearer token for Uptime Kuma (API key-based)

Architecture analysis: 2026-02-04