homelab/.planning/codebase/ARCHITECTURE.md
Mikkel Georgsen a639a53b0b docs: add codebase map and domain research
Codebase: 7 documents (stack, architecture, structure, conventions, testing, integrations, concerns)
Research: 5 documents (stack, features, architecture, pitfalls, summary)
2026-02-04 13:50:03 +00:00

151 lines
6.6 KiB
Markdown

# Architecture
**Analysis Date:** 2026-02-04
## Pattern Overview
**Overall:** Hub-and-spoke service orchestration with API-driven infrastructure management.
**Key Characteristics:**
- Centralized management container (VMID 102 - mgmt) coordinating all infrastructure
- Layered abstraction: CLI helpers → REST APIs → external services
- Event-driven notifications (Telegram bot bridges management layer to user)
- Credential-based authentication for all service integrations
## Layers
**Management Layer:**
- Purpose: Orchestration and automation entry point for the homelab
- Location: `/home/mikkel/homelab` (git repository in mgmt container)
- Contains: CLI helper scripts (`~/bin/*`), Telegram bot, documentation
- Depends on: Remote SSH access to container/VM IP addresses, Proxmox API, service REST APIs
- Used by: Claude Code automation, Telegram bot commands, cron jobs
**API Integration Layer:**
- Purpose: Abstracts service APIs into simple CLI interfaces
- Location: `~/bin/` (pve, npm-api, dns, pbs, beszel, kuma, updates, telegram)
- Contains: Python and Bash wrappers around external service APIs
- Depends on: Proxmox API, Nginx Proxy Manager API, Technitium DNS API, PBS REST API, Beszel PocketBase, Uptime Kuma REST API, Telegram Bot API
- Used by: Telegram bot, CI/CD automation, interactive CLI usage
**Service Layer:**
- Purpose: Individual hosted services providing infrastructure capabilities
- Location: Distributed across containers (NPM, DNS, PBS, Dockge, Forgejo, etc.)
- Contains: Docker containers, LXC services, backup systems
- Depends on: PVE host networking, shared storage, external integrations
- Used by: API layer, end-user access via web UI or CLI
**Data & Communication Layer:**
- Purpose: State persistence and inter-service communication
- Location: Shared storage (`~/stuff` - ZFS bind mount), credential files (`~/.config/*/credentials`)
- Contains: Backup data, configuration files, Telegram inbox/images/files
- Depends on: PVE ZFS dataset, filesystem access
- Used by: All services, backup/restore operations
## Data Flow
**Infrastructure Query Flow (e.g., `pve list`):**
1. User invokes CLI helper: `~/bin/pve list`
2. Helper loads credentials from `~/.config/pve/credentials`
3. Helper authenticates to Proxmox API at `core.georgsen.dk:8006` using token auth
4. Proxmox returns cluster resource state (VMs/containers)
5. Helper formats and displays output to user
**Service Management Flow (e.g., `dns add myhost 10.5.0.50`):**
1. User invokes: `~/bin/dns add myhost 10.5.0.50`
2. DNS helper loads credentials and authenticates to Technitium at `10.5.0.2:5380`
3. Helper makes HTTP API call to add A record
4. Technitium stores in zone file and updates DNS records
5. Helper confirms success to user
**Backup Status Flow (e.g., `/pbs` command in Telegram):**
1. Telegram user sends `/pbs` command
2. Bot handler in `telegram/bot.py` executes `~/bin/pbs status`
3. PBS helper SSH's to `10.5.0.6` as root
4. SSH command reads backup logs and GC status from PBS container
5. Helper formats human-readable output
6. Bot sends result back to Telegram chat (truncated to 4000 chars for Telegram API limit)
**State Management:**
- Credentials: Stored in `~/.config/*/credentials` files (sourced at runtime)
- Telegram messages: Appended to `telegram/inbox` file for Claude to read
- Media uploads: Saved to `telegram/images/` and `telegram/files/` with timestamps
- Authorization: `telegram/authorized_users` file maintains allowlist of chat IDs
## Key Abstractions
**Helper Scripts (API Adapters):**
- Purpose: Translate user intent into remote service API calls
- Examples: `~/bin/pve`, `~/bin/dns`, `~/bin/pbs`, `~/bin/beszel`, `~/bin/kuma`
- Pattern: Load credentials → authenticate → execute command → format output
- Language: Mix of Python (pve, updates, telegram) and Bash (dns, pbs, beszel, kuma)
**Telegram Bot:**
- Purpose: Provides two-way interactive access to management functions
- Implementation: `telegram/bot.py` using python-telegram-bot library
- Pattern: Command handlers dispatch to helper scripts, results sent back to user
- Channels: Commands (e.g., `/pbs`), free-text messages saved to inbox, photos/files downloaded
**Service Registry (Documentation):**
- Purpose: Centralized reference for service locations and access patterns
- Implementation: `homelab-documentation.md` and `CLAUDE.md`
- Contents: IP addresses, ports, authentication methods, SSH targets, network topology
## Entry Points
**CLI Usage (Direct):**
- Location: `~/bin/{helper}` scripts
- Triggers: Manual invocation by user or cron jobs
- Responsibilities: Execute service operations, format output, validate inputs
**Telegram Bot:**
- Location: `telegram/bot.py` (systemd service: `telegram-bot.service`)
- Triggers: Telegram message or command from authorized user
- Responsibilities: Authenticate user, route command/message, execute via helper scripts, send response
**Automation Scripts:**
- Location: Potential cron jobs or scheduled tasks
- Triggers: Time-based scheduling
- Responsibilities: Execute periodic management tasks (e.g., backup checks, updates)
**Manual Execution:**
- Location: Interactive shell in mgmt container
- Triggers: User SSH session
- Responsibilities: Run helpers for ad-hoc infrastructure management
## Error Handling
**Strategy:** Graceful degradation with informative messaging.
**Patterns:**
- CLI helpers return non-zero exit codes on failure (exception handling in Python, `set -e` in Bash)
- Timeout protection: Telegram bot commands have 30-second timeout (configurable per command)
- Service unavailability: Caught in try/except blocks, fall back to next option (e.g., `pve` tries LXC first, then QEMU)
- Credential failures: Load-time validation, clear error message if credentials file missing
- Network errors: SSH timeouts, API connection failures logged to stdout/stderr
## Cross-Cutting Concerns
**Logging:**
- Telegram bot uses Python stdlib logging (INFO level, writes to systemd journal)
- CLI helpers write directly to stdout/stderr
- PBS helper uses SSH error output for remote command failures
**Validation:**
- Telegram bot validates hostnames (alphanumeric + dots + hyphens only) before ping
- DNS helper validates that name and IP are provided before API call
- PVE helper validates VMID is integer before API call
**Authentication:**
- Credentials stored in `~/.config/{service}/credentials` as simple key=value files
- Sourced at runtime (Bash) or read at startup (Python)
- Token-based auth for Proxmox (no password in memory)
- Basic auth for DNS and other REST APIs (credentials URL-encoded if needed)
- Bearer token for Uptime Kuma (API key-based)
---
*Architecture analysis: 2026-02-04*