docs: add codebase map and domain research

Codebase: 7 documents (stack, architecture, structure, conventions, testing, integrations, concerns) Research: 5 documents (stack, features, architecture, pitfalls, summary)
2026-02-04 13:50:03 +00:00 · 2026-02-04 13:50:03 +00:00 · a639a53b0b
commit a639a53b0b
parent 6cf6bfb8d1
7 changed files with 1662 additions and 0 deletions
--- a/.planning/codebase/ARCHITECTURE.md
+++ b/.planning/codebase/ARCHITECTURE.md
@ -0,0 +1,151 @@
+# Architecture
+
+**Analysis Date:** 2026-02-04
+
+## Pattern Overview
+
+**Overall:** Hub-and-spoke service orchestration with API-driven infrastructure management.
+
+**Key Characteristics:**
+- Centralized management container (VMID 102 - mgmt) coordinating all infrastructure
+- Layered abstraction: CLI helpers → REST APIs → external services
+- Event-driven notifications (Telegram bot bridges management layer to user)
+- Credential-based authentication for all service integrations
+
+## Layers
+
+**Management Layer:**
+- Purpose: Orchestration and automation entry point for the homelab
+- Location: `/home/mikkel/homelab` (git repository in mgmt container)
+- Contains: CLI helper scripts (`~/bin/*`), Telegram bot, documentation
+- Depends on: Remote SSH access to container/VM IP addresses, Proxmox API, service REST APIs
+- Used by: Claude Code automation, Telegram bot commands, cron jobs
+
+**API Integration Layer:**
+- Purpose: Abstracts service APIs into simple CLI interfaces
+- Location: `~/bin/` (pve, npm-api, dns, pbs, beszel, kuma, updates, telegram)
+- Contains: Python and Bash wrappers around external service APIs
+- Depends on: Proxmox API, Nginx Proxy Manager API, Technitium DNS API, PBS REST API, Beszel PocketBase, Uptime Kuma REST API, Telegram Bot API
+- Used by: Telegram bot, CI/CD automation, interactive CLI usage
+
+**Service Layer:**
+- Purpose: Individual hosted services providing infrastructure capabilities
+- Location: Distributed across containers (NPM, DNS, PBS, Dockge, Forgejo, etc.)
+- Contains: Docker containers, LXC services, backup systems
+- Depends on: PVE host networking, shared storage, external integrations
+- Used by: API layer, end-user access via web UI or CLI
+
+**Data & Communication Layer:**
+- Purpose: State persistence and inter-service communication
+- Location: Shared storage (`~/stuff` - ZFS bind mount), credential files (`~/.config/*/credentials`)
+- Contains: Backup data, configuration files, Telegram inbox/images/files
+- Depends on: PVE ZFS dataset, filesystem access
+- Used by: All services, backup/restore operations
+
+## Data Flow
+
+**Infrastructure Query Flow (e.g., `pve list`):**
+
+1. User invokes CLI helper: `~/bin/pve list`
+2. Helper loads credentials from `~/.config/pve/credentials`
+3. Helper authenticates to Proxmox API at `core.georgsen.dk:8006` using token auth
+4. Proxmox returns cluster resource state (VMs/containers)
+5. Helper formats and displays output to user
+
+**Service Management Flow (e.g., `dns add myhost 10.5.0.50`):**
+
+1. User invokes: `~/bin/dns add myhost 10.5.0.50`
+2. DNS helper loads credentials and authenticates to Technitium at `10.5.0.2:5380`
+3. Helper makes HTTP API call to add A record
+4. Technitium stores in zone file and updates DNS records
+5. Helper confirms success to user
+
+**Backup Status Flow (e.g., `/pbs` command in Telegram):**
+
+1. Telegram user sends `/pbs` command
+2. Bot handler in `telegram/bot.py` executes `~/bin/pbs status`
+3. PBS helper SSH's to `10.5.0.6` as root
+4. SSH command reads backup logs and GC status from PBS container
+5. Helper formats human-readable output
+6. Bot sends result back to Telegram chat (truncated to 4000 chars for Telegram API limit)
+
+**State Management:**
+- Credentials: Stored in `~/.config/*/credentials` files (sourced at runtime)
+- Telegram messages: Appended to `telegram/inbox` file for Claude to read
+- Media uploads: Saved to `telegram/images/` and `telegram/files/` with timestamps
+- Authorization: `telegram/authorized_users` file maintains allowlist of chat IDs
+
+## Key Abstractions
+
+**Helper Scripts (API Adapters):**
+- Purpose: Translate user intent into remote service API calls
+- Examples: `~/bin/pve`, `~/bin/dns`, `~/bin/pbs`, `~/bin/beszel`, `~/bin/kuma`
+- Pattern: Load credentials → authenticate → execute command → format output
+- Language: Mix of Python (pve, updates, telegram) and Bash (dns, pbs, beszel, kuma)
+
+**Telegram Bot:**
+- Purpose: Provides two-way interactive access to management functions
+- Implementation: `telegram/bot.py` using python-telegram-bot library
+- Pattern: Command handlers dispatch to helper scripts, results sent back to user
+- Channels: Commands (e.g., `/pbs`), free-text messages saved to inbox, photos/files downloaded
+
+**Service Registry (Documentation):**
+- Purpose: Centralized reference for service locations and access patterns
+- Implementation: `homelab-documentation.md` and `CLAUDE.md`
+- Contents: IP addresses, ports, authentication methods, SSH targets, network topology
+
+## Entry Points
+
+**CLI Usage (Direct):**
+- Location: `~/bin/{helper}` scripts
+- Triggers: Manual invocation by user or cron jobs
+- Responsibilities: Execute service operations, format output, validate inputs
+
+**Telegram Bot:**
+- Location: `telegram/bot.py` (systemd service: `telegram-bot.service`)
+- Triggers: Telegram message or command from authorized user
+- Responsibilities: Authenticate user, route command/message, execute via helper scripts, send response
+
+**Automation Scripts:**
+- Location: Potential cron jobs or scheduled tasks
+- Triggers: Time-based scheduling
+- Responsibilities: Execute periodic management tasks (e.g., backup checks, updates)
+
+**Manual Execution:**
+- Location: Interactive shell in mgmt container
+- Triggers: User SSH session
+- Responsibilities: Run helpers for ad-hoc infrastructure management
+
+## Error Handling
+
+**Strategy:** Graceful degradation with informative messaging.
+
+**Patterns:**
+- CLI helpers return non-zero exit codes on failure (exception handling in Python, `set -e` in Bash)
+- Timeout protection: Telegram bot commands have 30-second timeout (configurable per command)
+- Service unavailability: Caught in try/except blocks, fall back to next option (e.g., `pve` tries LXC first, then QEMU)
+- Credential failures: Load-time validation, clear error message if credentials file missing
+- Network errors: SSH timeouts, API connection failures logged to stdout/stderr
+
+## Cross-Cutting Concerns
+
+**Logging:**
+- Telegram bot uses Python stdlib logging (INFO level, writes to systemd journal)
+- CLI helpers write directly to stdout/stderr
+- PBS helper uses SSH error output for remote command failures
+
+**Validation:**
+- Telegram bot validates hostnames (alphanumeric + dots + hyphens only) before ping
+- DNS helper validates that name and IP are provided before API call
+- PVE helper validates VMID is integer before API call
+
+**Authentication:**
+- Credentials stored in `~/.config/{service}/credentials` as simple key=value files
+- Sourced at runtime (Bash) or read at startup (Python)
+- Token-based auth for Proxmox (no password in memory)
+- Basic auth for DNS and other REST APIs (credentials URL-encoded if needed)
+- Bearer token for Uptime Kuma (API key-based)
+
+---
+
+*Architecture analysis: 2026-02-04*
--- a/.planning/codebase/CONCERNS.md
+++ b/.planning/codebase/CONCERNS.md
@ -0,0 +1,272 @@
+# Codebase Concerns
+
+**Analysis Date:** 2026-02-04
+
+## Tech Debt
+
+**IP Addressing Scheme Inconsistency:**
+- Issue: Container IPs don't follow VMID convention. NPM (VMID 100) is at .1, Dockge (VMID 101) at .10, PBS (VMID 106) at .6, instead of matching .100, .101, .106
+- Files: `homelab-documentation.md` (lines 139-159)
+- Impact: Manual IP tracking required, DNS records must be maintained separately, new containers require manual IP assignment planning, documentation drift risk
+- Fix approach: Execute TODO task to reorganize vmbr1 to VMID=IP scheme (.100-.253 range), update NPM proxy hosts, DNS records (lab.georgsen.dk), and documentation
+
+**DNS Record Maintenance Manual:**
+- Issue: Internal DNS (Technitium) and external DNS (dns.services) require manual updates when IPs/domains change
+- Files: `homelab-documentation.md` (lines 432-449), `~/bin/dns` script
+- Impact: Risk of records becoming stale after IP migrations, no automation for new containers
+- Fix approach: Implement `dns-services` helper script (TODO.md line 27) with API integration for automatic updates
+
+**Unimplemented Helper Scripts:**
+- Issue: `dns-services` API integration promised in TODO but not implemented
+- Files: `TODO.md` (line 27), `dns-services/credentials` exists but script doesn't
+- Impact: Manual dns.services operations required, cannot automate domain setup
+- Fix approach: Create `~/bin/dns-services` wrapper (endpoint documented in TODO)
+
+**Ping Capability Missing on 12 Containers:**
+- Issue: Unprivileged LXC containers drop cap_net_raw, breaking ping on VMIDs 100, 101, 102, 103, 104, 105, 107, 108, 110, 111, 112, 114, 115, 1000
+- Files: `TODO.md` (lines 31-33), `CLAUDE.md` (line 252-255)
+- Impact: Health monitoring fails, network diagnostics broken, Telegram bot status checks incomplete (bot has no ping on home network itself), Uptime Kuma monitors may show false negatives
+- Fix approach: Run `setcap cap_net_raw+ep /bin/ping` on each container (must be reapplied after iputils-ping updates)
+
+**Version Pinning Warnings:**
+- Issue: CLAUDE.md section 227-241 warns about hardcoded versions becoming stale
+- Files: `homelab-documentation.md` (lines 217, 228, 239), `~/bin/updates` script shows version checking is implemented but some configs have `latest` tags
+- Impact: Security patch delays, incompatibilities when manually deploying services
+- Fix approach: Always query GitHub API for latest versions (updates script does this correctly for discovery phase)
+
+## Known Bugs
+
+**Telegram Bot Inbox Storage Race Condition:**
+- Symptoms: Concurrent message writes could corrupt inbox file, messages may be lost
+- Files: `telegram/bot.py` (lines 39, 200-220 message handling), `~/bin/telegram` (lines 73-79 clear command)
+- Trigger: Multiple rapid messages from admin or concurrent bot operations
+- Workaround: Clear inbox frequently and check for corruption; bot currently appends to file without locking
+- Root cause: File-based inbox with no atomic writes or mutex protection
+
+**PBS Backup Mount Dependency Not Enforced:**
+- Symptoms: PBS services may start before Synology CIFS mount is available, backup path unreachable
+- Files: `homelab-documentation.md` (lines 372-384), container 106 config
+- Trigger: System reboot when Tailscale connectivity is delayed
+- Workaround: Manual restart of proxmox-backup-proxy and proxmox-backup services
+- Root cause: systemd dependency chain `After=mnt-synology.mount` doesn't guarantee mount is ready at service start time
+
+**DragonflyDB Password in Plain Text in Documentation:**
+- Symptoms: Database password visible in compose file and documentation
+- Files: `homelab-documentation.md` (lines 248-250)
+- Trigger: Anyone reading docs or inspecting git history
+- Workaround: Consider password non-critical if container only accessible on internal network
+- Root cause: Password stored in version control and documentation rather than .env or secrets file
+
+**NPM Proxy Host 18 (mh.datalos.dk) Not Configured:**
+- Symptoms: Domain not resolving despite DNS record missing and NPM entry (ID 18) mentioned in TODO
+- Files: `TODO.md` (line 29), `homelab-documentation.md` (proxy hosts section)
+- Trigger: Accessing mh.datalos.dk from browser
+- Workaround: Must be configured manually via NPM web UI
+- Root cause: Setup referenced in TODO but not completed
+
+## Security Considerations
+
+**Exposed Credentials in Git History:**
+- Risk: Credential files committed (credentials, SSH keys, telegram token examples)
+- Files: All credential files in `telegram/`, `pve/`, `forgejo/`, `dns/`, `dockge/`, `uptime-kuma/`, `beszel/`, `dns-services/` directories (8+ files)
+- Current mitigation: Files are .gitignored in main repo but present in working directory
+- Recommendations: Rotate all credentials listed, audit git log for historical commits, use HashiCorp Vault or pass for credential storage, document secret rotation procedure
+
+**Public IP Hardcoded in Documentation:**
+- Risk: Home IP 83.89.248.247 exposed in multiple locations
+- Files: `homelab-documentation.md` (lines 98, 102), `CLAUDE.md` (line 256)
+- Current mitigation: IP is already public/static, used for whitelist access
+- Recommendations: Document that whitelisting this IP is intentional, no other PII mixed in
+
+**Telegram Bot Authorization Model Too Permissive:**
+- Risk: First user to message bot becomes admin automatically with no verification
+- Files: `telegram/bot.py` (lines 86-95)
+- Current mitigation: Bot only responds to authorized user, requires bot discovery
+- Recommendations: Require multi-factor authorization on first start (e.g., PIN from environment variable), implement audit logging of all bot commands
+
+**Database Credentials in Environment Variables:**
+- Risk: DragonflyDB password passed via Docker command line (visible in `docker ps`, logs, process listings)
+- Files: `homelab-documentation.md` (line 248)
+- Current mitigation: Container only accessible on internal vmbr1 network
+- Recommendations: Use Docker secrets or mounted .env files instead of command-line arguments
+
+**Synology CIFS Credentials in fstab:**
+- Risk: SMB credentials stored in plaintext in fstab file with mode 0644 (world-readable)
+- Files: `homelab-documentation.md` (line 369)
+- Current mitigation: Mounted on container-only network, requires PBS container access
+- Recommendations: Use credentials file with mode 0600, rotate credentials regularly, monitor file permissions
+
+**SSH Keys Included in Documentation:**
+- Risk: Public SSH keys hardcoded in CLAUDE.md setup examples
+- Files: `CLAUDE.md` and `homelab-documentation.md` SSH key examples
+- Current mitigation: Public keys only (not private), used for container access
+- Recommendations: Rotate these keys if documentation is ever exposed, don't include in public repos
+
+## Performance Bottlenecks
+
+**Single NVMe Storage (RAID0) Without Local Redundancy:**
+- Problem: Core server has 2x1TB NVMe in RAID0 (striped, no redundancy)
+- Files: `homelab-documentation.md` (lines 17-24)
+- Cause: Cost optimization for Hetzner dedicated server
+- Impact: Single drive failure = total data loss; database corruption risk from RAID0 stripe inconsistency
+- Improvement path: (1) Ensure PBS backups run successfully to Synology, (2) Test backup restore procedure monthly, (3) Plan upgrade path if budget allows (3-way mirror or RAID1)
+
+**Backup Dependency on Single Tailscale Gateway:**
+- Problem: All PBS backups to Synology go through Tailscale relay (10.5.0.134), single point of failure
+- Files: `homelab-documentation.md` (lines 317-427)
+- Cause: Synology only accessible via Tailscale network, relay container required
+- Impact: Tailscale relay downtime = backup failure; no local backup option
+- Improvement path: (1) Add second Tailscale relay for redundancy, (2) Explore PBS direct SSH backup mode, (3) Monitor relay container health
+
+**DNS Queries All Route Through Single Technitium Container:**
+- Problem: All internal DNS (lab.georgsen.dk) goes through container 115, DHCP defaults to this server
+- Files: `homelab-documentation.md` (lines 309-315), container config
+- Cause: Single container architecture
+- Impact: DNS outage = network unreachable (containers can't resolve any hostnames)
+- Improvement path: (1) Deploy DNS replica on another container, (2) Configure DHCP to use multiple DNS servers, (3) Set upstream DNS fallback
+
+**Script Execution via Telegram Bot with Subprocess Timeout:**
+- Problem: Bot runs helper scripts with 30-second timeout, commands like PBS backup query can exceed limit
+- Files: `telegram/bot.py` (lines 60-78, 191)
+- Cause: Helper scripts do remote SSH execution, network latency variable
+- Impact: Commands truncated mid-execution, incomplete status reports, timeouts on slow networks
+- Improvement path: Increase timeout selectively, implement command queuing, cache results for frequently-called commands
+
+## Fragile Areas
+
+**Installer Shell Script with Unimplemented Sections:**
+- Files: `pve-homelab-kit/install.sh` (495+ lines with TODO comments)
+- Why fragile: Multiple TODO placeholders indicate incomplete implementation; wizard UI done but ~30 implementation TODOs remain
+- Safe modification: (1) Don't merge branches without running through full install, (2) Test each section independently, (3) Add shell `set -e` error handling
+- Test coverage: Script has no tests, no dry-run mode, no rollback capability
+
+**Container Configuration Manual in LXC Config Files:**
+- Files: `/etc/pve/lxc/*.conf` across Proxmox host (not in repo, not version controlled)
+- Why fragile: Critical settings (features, ulimits, AppArmor) outside version control, drift risk after manual fixes
+- Safe modification: Keep backup copies in `homelab-documentation.md` (already done for PBS), automate via Terraform/Ansible if future containers added
+- Test coverage: Config changes only tested on live container (no staging env)
+
+**Helper Scripts with Hardcoded IPs and Paths:**
+- Files: `~/bin/updates` (lines 16-17, 130), `~/bin/pbs`, `~/bin/pve`, `~/bin/dns`
+- Why fragile: DOCKGE_HOST, PVE_HOST hardcoded; if IPs change during migration, all scripts must be updated manually
+- Safe modification: Extract to config file (e.g., `/etc/homelab/config.sh` or environment variables)
+- Test coverage: Scripts tested against live infrastructure only
+
+**SSH-Based Container Access Without Key Verification:**
+- Files: `~/bin/updates` (lines 115-131), scripts use `-q` flag suppressing host key checks
+- Why fragile: `ssh -q` disables StrictHostKeyChecking, vulnerable to MITM; scripts assume SSH keys are pre-installed
+- Safe modification: Add `-o StrictHostKeyChecking=accept-new` to verify on first connection, document key distribution procedure
+- Test coverage: SSH connectivity assumed working
+
+**Backup Monitoring Without Alerting on Failure:**
+- Files: `~/bin/pbs`, `telegram/bot.py` (status command only, no automatic failure alerts)
+- Why fragile: Failed backups only visible if manually checked; no monitoring of backup completion
+- Safe modification: Add systemd timer to check PBS status hourly, send Telegram alert on failure
+- Test coverage: Manual checks only
+
+## Scaling Limits
+
+**Container IP Space Exhaustion:**
+- Current capacity: vmbr1 is /24 (256 IPs, .0-.255), DHCP range .100-.200 (101 IPs available for DHCP), static IPs scattered
+- Limit: After ~150 containers, IP fragmentation becomes difficult to manage; DHCP range conflicts with static allocation
+- Scaling path: (1) Implement TODO IP scheme (VMID=IP), (2) Expand to /23 (512 IPs) if more containers needed, (3) Use vmbr2 (vSwitch) for secondary network
+
+**Backup Datastore Single Synology Volume:**
+- Current capacity: Synology `pbs-backup` share unknown size (not documented)
+- Limit: Unknown when share becomes full; no warning system implemented
+- Scaling path: (1) Document share capacity in homelab-documentation.md, (2) Add usage monitoring to `beszel` or Uptime Kuma, (3) Plan expansion to second NAS
+
+**Dockge Stack Limit:**
+- Current capacity: Dockge container 101 running ~8-10 stacks visible in documentation
+- Limit: No documented resource constraints; may hit CPU/RAM limits on Hetzner AX52 with more containers
+- Scaling path: (1) Monitor Dockge resource usage via Beszel, (2) Profile Dragonfly memory usage, (3) Plan VM migration for heavy workloads
+
+**DNS Query Throughput:**
+- Current capacity: Single Technitium container handling all internal DNS
+- Limit: Container CPU/RAM limits unknown; no QPS monitoring
+- Scaling path: (1) Add DNS replica, (2) Monitor query latency, (3) Profile Technitium logs for slow queries
+
+## Dependencies at Risk
+
+**Technitium DNS (Unmaintained Risk):**
+- Risk: TechnitiumSoftware/DnsServer has irregular commit history; last significant release early 2024
+- Impact: Security fixes may be delayed; compatibility with newer Linux kernels unknown
+- Migration plan: (1) Profile current Technitium features used, (2) Evaluate CoreDNS or Dnsmasq alternatives, (3) Plan gradual migration with dual DNS
+
+**DragonflyDB as Redis Replacement:**
+- Risk: Dragonfly smaller ecosystem than Redis; breaking changes possible in minor updates
+- Impact: Applications expecting Redis behavior may fail; less community support for issues
+- Migration plan: (1) Pin Dragonfly version in compose file (currently `latest`), (2) Test upgrades in dev environment, (3) Document any API incompatibilities found
+
+**Dockge (Single Maintainer Project):**
+- Risk: Dockge maintained by one developer (louislam); bus factor high
+- Impact: If maintainer loses interest, fixes and features stop; dependency on their release schedule
+- Migration plan: (1) Use Dockge for UI only, don't depend on it for production orchestration, (2) Keep docker-compose expertise on team, (3) Consider Portainer as fallback alternative
+
+**Forgejo (Younger than Gitea):**
+- Risk: Forgejo is recent fork of Gitea; database schema changes possible in patch versions
+- Impact: Upgrades may require manual migrations; data loss risk if migration fails
+- Migration plan: (1) Test Forgejo upgrades on backup copy first, (2) Document upgrade procedure, (3) Keep Gitea as fallback if Forgejo breaks
+
+## Missing Critical Features
+
+**No Automated Health Monitoring/Alerting:**
+- Problem: Status checks exist (via Telegram bot, Uptime Kuma) but no automatic alerts when services fail
+- Blocks: Cannot sleep soundly; must manually check status to detect outages
+- Implementation path: (1) Add Uptime Kuma HTTP monitors for all public services, (2) Create Telegram alert webhook, (3) Monitor PBS backup success daily
+
+**No Automated Certificate Renewal Verification:**
+- Problem: NPM handles Let's Encrypt renewal, but no monitoring for renewal failures
+- Blocks: Certificates could expire silently; discovered during service failures
+- Implementation path: (1) Add Uptime Kuma alert for HTTP 200 on https://* services, (2) Add monthly certificate expiry check, (3) Set up renewal failure alerts
+
+**No Disaster Recovery Runbook:**
+- Problem: Procedures for rescuing locked-out server (Hetzner Rescue Mode) not documented
+- Blocks: If SSH access lost, cannot recover without external procedures
+- Implementation path: (1) Document Hetzner Rescue Mode recovery steps, (2) Create network reconfiguration backup procedures, (3) Test rescue mode monthly
+
+**No Change Log / Audit Trail:**
+- Problem: Infrastructure changes not logged; drift from documentation occurs silently
+- Blocks: Unknown who made changes, when, and why; cannot track config evolution
+- Implementation path: (1) Add git commit requirement for all manual changes, (2) Create change notification to Telegram, (3) Weekly drift detection report
+
+**No Secrets Management System:**
+- Problem: Credentials scattered across plaintext files, git history, and documentation
+- Blocks: Cannot safely share access with team members; no credential rotation capability
+- Implementation path: (1) Deploy HashiCorp Vault or Vaultwarden, (2) Migrate all secrets to vault, (3) Create credential rotation procedures
+
+## Test Coverage Gaps
+
+**PBS Backup Restore Not Tested:**
+- What's not tested: Full restore procedures; assumed to work but never verified
+- Files: `homelab-documentation.md` (lines 325-392), no restore test documented
+- Risk: If restore needed, may discover issues during actual data loss emergency
+- Priority: HIGH - Add monthly restore test procedure (restore single VM to temporary location, verify data integrity)
+
+**Network Failover Scenarios:**
+- What's not tested: What happens if Tailscale relay (1000) goes down, if NPM container restarts, if DNS returns SERVFAIL
+- Files: No documented failure scenarios
+- Risk: Unknown recovery time; applications may hang instead of failing gracefully
+- Priority: HIGH - Document and test each service's failure mode
+
+**Helper Script Error Handling:**
+- What's not tested: Scripts with SSH timeouts, host unreachable, malformed responses
+- Files: `~/bin/updates`, `~/bin/pbs`, `~/bin/pve` (error handling exists but not tested against failures)
+- Risk: Silent failures could go unnoticed; incomplete output returned to caller
+- Priority: MEDIUM - Add error injection tests (mock SSH failures)
+
+**Telegram Bot Commands Under Load:**
+- What's not tested: Bot response when running concurrent commands, or when helper scripts timeout
+- Files: `telegram/bot.py` (no load tests, concurrency behavior unknown)
+- Risk: Bot may hang or lose messages under heavy load
+- Priority: MEDIUM - Add load test with 10+ concurrent commands
+
+**Container Migration (VMID IP Scheme Change):**
+- What's not tested: Migration of 15+ containers to new IP scheme; full rollback procedures
+- Files: `TODO.md` (line 5-15, planned but not executed)
+- Risk: Single IP misconfiguration could take multiple services offline
+- Priority: HIGH - Create detailed migration runbook with rollback at each step before executing
+
+---
+
+*Concerns audit: 2026-02-04*
--- a/.planning/codebase/CONVENTIONS.md
+++ b/.planning/codebase/CONVENTIONS.md
@ -0,0 +1,274 @@
+# Coding Conventions
+
+**Analysis Date:** 2026-02-04
+
+## Naming Patterns
+
+**Files:**
+- Python files: lowercase with underscores (e.g., `bot.py`, `credentials`)
+- Bash scripts: lowercase with hyphens (e.g., `npm-api`, `uptime-kuma`)
+- Helper scripts in `~/bin/`: all lowercase, no extension (e.g., `pve`, `pbs`, `dns`)
+
+**Functions:**
+- Python: snake_case (e.g., `cmd_status()`, `get_authorized_users()`, `run_command()`)
+- Bash: snake_case with `cmd_` prefix for command handlers (e.g., `cmd_status()`, `cmd_tasks()`)
+- Bash: auxiliary functions also use snake_case (e.g., `ssh_pbs()`, `get_token()`)
+
+**Variables:**
+- Python: snake_case for local/module vars (e.g., `authorized_users`, `output_lines`)
+- Python: UPPERCASE for constants (e.g., `TOKEN`, `INBOX_FILE`, `AUTHORIZED_FILE`, `NODE`, `PBS_HOST`)
+- Bash: UPPERCASE for environment variables and constants (e.g., `PBS_HOST`, `TOKEN`, `BASE`, `DEFAULT_ZONE`)
+- Bash: lowercase for local variables (e.g., `hours`, `cutoff`, `status_icon`)
+
+**Types/Classes:**
+- Python: PascalCase for imported classes (e.g., `ProxmoxAPI`, `Update`, `Application`)
+- Dictionary/config keys: lowercase with hyphens or underscores (e.g., `token_name`, `max-mem`)
+
+## Code Style
+
+**Formatting:**
+- No automated formatter detected in codebase
+- Python: PEP 8 conventions followed informally
+  - 4-space indentation
+  - Max line length ~90-100 characters (observed in practice)
+  - Blank lines: 2 lines before module-level functions, 1 line before methods
+- Bash: 4-space indentation (observed)
+
+**Linting:**
+- No linting configuration detected (no .pylintrc, .flake8, .eslintrc)
+- Code style is manually maintained
+
+**Docstrings:**
+- Python: Triple-quoted strings at module level describing purpose
+  - Example from `telegram/bot.py`:
+  ```python
+  """
+  Homelab Telegram Bot
+  Two-way interactive bot for homelab management and notifications.
+  """
+  ```
+- Python: Function docstrings used for major functions
+  - Single-line format for simple functions
+  - Example: `"""Handle /start command - first contact with bot."""`
+  - Example: `"""Load authorized user IDs."""`
+
+## Import Organization
+
+**Order:**
+1. Standard library imports (e.g., `sys`, `os`, `json`, `subprocess`)
+2. Third-party imports (e.g., `ProxmoxAPI`, `telegram`, `pocketbase`)
+3. Local imports (rarely used in this codebase)
+
+**Path Aliases:**
+- No aliases detected
+- Absolute imports used throughout
+
+**Credential Loading Pattern:**
+All scripts that need credentials follow the same pattern:
+```python
+# Load credentials
+creds_path = Path.home() / ".config" / <service> / "credentials"
+creds = {}
+with open(creds_path) as f:
+    for line in f:
+        if '=' in line:
+            key, value = line.strip().split('=', 1)
+            creds[key] = value
+```
+
+Or in Bash:
+```bash
+source ~/.config/dns/credentials
+```
+
+## Error Handling
+
+**Patterns:**
+- Python: Try-except with broad exception catching (bare `except:` used in `pve` script lines 70, 82, 95, 101)
+  - Not ideal but pragmatic for CLI tools that need to try multiple approaches
+  - Example from `pve`:
+  ```python
+  try:
+      status = pve.nodes(NODE).lxc(vmid).status.current.get()
+      # ...
+      return
+  except:
+      pass
+  ```
+
+- Python: Explicit exception handling in telegram bot
+  - Catches `subprocess.TimeoutExpired` specifically in `run_command()` function
+  - Example from `telegram/bot.py`:
+  ```python
+  try:
+      result = subprocess.run(...)
+      output = result.stdout or result.stderr or "No output"
+      if len(output) > 4000:
+          output = output[:4000] + "\n... (truncated)"
+      return output
+  except subprocess.TimeoutExpired:
+      return "Command timed out"
+  except Exception as e:
+      return f"Error: {e}"
+  ```
+
+- Bash: Set strict mode with `set -e` in some scripts (`dns` script line 12)
+  - Causes script to exit on first error
+
+- Bash: No error handling in most scripts (`pbs`, `beszel`, `kuma`)
+  - Relies on exit codes implicitly
+
+**Return Value Handling:**
+- Python: Functions return data directly or None on failure
+  - Example from `pbs` helper: Returns JSON-parsed data or string output
+  - Example from `pve`: Returns nothing (prints output), but uses exceptions for flow control
+
+- Python: Command runner returns error strings: `"Command timed out"`, `"Error: {e}"`
+
+## Logging
+
+**Framework:**
+- Python: Standard `logging` module
+  - Configured in `telegram/bot.py` lines 18-22:
+  ```python
+  logging.basicConfig(
+      format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',
+      level=logging.INFO
+  )
+  logger = logging.getLogger(__name__)
+  ```
+  - Log level: INFO
+  - Format includes timestamp, logger name, level, message
+
+**Patterns:**
+- `logger.info()` for general informational messages
+  - Example: `logger.info("Starting Homelab Bot...")`
+  - Example: `logger.info(f"Inbox message from {user.first_name}: {message[:50]}...")`
+  - Example: `logger.info(f"Photo saved from {user.first_name}: {filepath}")`
+
+- Bash: Uses `echo` for output, no structured logging
+  - Informational messages for user feedback
+  - Error messages sent to stdout (not stderr)
+
+## Comments
+
+**When to Comment:**
+- Module-level docstrings at top of file (required for all scripts)
+- Usage examples in module docstrings (e.g., `pve`, `pbs`, `kuma`)
+- Inline comments for complex logic (e.g., in `pbs` script parsing hex timestamps)
+- Comments on tricky regex patterns (e.g., `pbs` tasks parsing)
+
+**Bash Comments:**
+- Header comment with script name, purpose, and usage (lines 1-10)
+- Inline comments before major sections (e.g., `# Datastore info`, `# Storage stats`)
+- No comments in simple expressions
+
+**Python Comments:**
+- Header comment with purpose (module docstring)
+- Sparse inline comments except for complex sections
+- Example from `telegram/bot.py` line 71: `# Telegram has 4096 char limit per message`
+- Example from `pve` line 70: `# Try as container first`
+
+## Function Design
+
+**Size:**
+- Python: Functions are generally 10-50 lines
+  - Smaller functions for simple operations (e.g., `is_authorized()` is 2 lines)
+  - Larger functions for command handlers that do setup + API calls (e.g., `status()` is 40 lines)
+
+- Bash: Functions are typically 20-80 lines
+  - Longer functions acceptable for self-contained operations like `cmd_status()` in `pbs`
+
+**Parameters:**
+- Python: Explicit parameters, typically 1-5 parameters per function
+  - Optional parameters with defaults (e.g., `timeout: int = 30`, `port=45876`)
+  - Type hints not used consistently (some functions have them, many don't)
+
+- Bash: Parameters passed as positional arguments
+  - Some functions take zero parameters and rely on global variables
+  - Example: `ssh_pbs()` in `pbs` uses global `$PBS_HOST`
+
+**Return Values:**
+- Python: Functions return data (strings, dicts, lists) or None
+  - Command handlers often return nothing (implicitly None)
+  - Helper functions return computed values (e.g., `is_authorized()` returns bool)
+
+- Bash: Functions print output directly, return exit codes
+  - No explicit return values beyond exit codes
+  - Output captured by caller with `$()`
+
+## Module Design
+
+**Exports:**
+- Python: All functions are module-level, no explicit exports
+  - `if __name__ == "__main__":` pattern used in all scripts to guard main execution
+  - Example from `beszel` lines 101-152
+
+- Bash: All functions are script-level, called via case statement
+  - Main dispatch logic at bottom of script
+  - Example from `dns` lines 29-106: `case "$1" in ... esac`
+
+**Async/Await (Telegram Bot Only):**
+- Python telegram bot uses `asyncio` and `async def` for all handlers
+- All command handlers are async (e.g., `async def start()`)
+- Use `await` for async operations (e.g., `await update.message.reply_text()`)
+- Example from `telegram/bot.py` lines 81-94:
+```python
+async def start(update: Update, context: ContextTypes.DEFAULT_TYPE):
+    """Handle /start command - first contact with bot."""
+    user = update.effective_user
+    chat_id = update.effective_chat.id
+    # ... async operations with await
+```
+
+**File Structure:**
+- Single-file modules: Most helpers are single files
+- `telegram/bot.py`: Main bot implementation with all handlers
+- `/bin/` scripts: Each script is self-contained with helper functions + main dispatch
+
+## Data Structures
+
+**JSON/Config Files:**
+- Credentials files: Simple `KEY=value` format (no JSON)
+- PBS task logging: Uses hex-encoded UPID format, parsed with regex
+- Telegram bot: Saves messages to text files with timestamp prefix
+- JSON output: Parsed with `python3 -c "import sys, json; ..."` in Bash scripts
+
+**Error Response Patterns:**
+- API calls: Check for `.get('status') == 'ok'` or similar
+- Command execution: Check `returncode == 0`, capture stdout/stderr
+- API clients: Let exceptions bubble up, caught at command handler level
+
+## Conditionals and Flow Control
+
+**Python:**
+- if/elif/else chains for command dispatch
+- Simple truthiness checks: `if not user_id:`, `if not alerts:`
+- Example from `telegram/bot.py` line 86-100: Authorization check pattern
+
+**Bash:**
+- case/esac for command dispatch (preferred)
+- if [[ ]] with regex matching for parsing
+- Example from `pbs` lines 122-143: Complex regex with BASH_REMATCH array
+
+## Security Patterns
+
+**Credential Management:**
+- Credentials stored in `~/.config/<service>/credentials` with restricted permissions (not enforced in code)
+- Telegram token loaded from file, not environment
+- Credentials never logged or printed
+
+**Input Validation:**
+- Bash: Basic validation with isalnum() check in `ping_host()` function
+  - Example: `if not host.replace('.', '').replace('-', '').isalnum():`
+- Bash: Whitelist command names from case statements
+- No SQL injection risk (no databases used directly)
+
+**Shell Injection:**
+- Bash scripts use quoted variables appropriately
+- Some inline Python in Bash uses string interpolation (potential risk)
+  - Example from `dns` lines 31-37: `curl ... | python3 -c "..."` with variable interpolation
+
+---
+
+*Convention analysis: 2026-02-04*
--- a/.planning/codebase/INTEGRATIONS.md
+++ b/.planning/codebase/INTEGRATIONS.md
@ -0,0 +1,261 @@
+# External Integrations
+
+**Analysis Date:** 2026-02-04
+
+## APIs & External Services
+
+**Hypervisor Management:**
+- **Proxmox VE (PVE)** - Cluster/node management
+  - SDK/Client: `proxmoxer` v2.2.0 (Python)
+  - Auth: Token-based (`root@pam!mgmt` token)
+  - Config: `~/.config/pve/credentials`
+  - Helper: `~/bin/pve` (list, status, start, stop, create-ct)
+  - Endpoint: https://65.108.14.165:8006 (local host core.georgsen.dk)
+
+**Backup Management:**
+- **Proxmox Backup Server (PBS)** - Centralized backup infrastructure
+  - API: REST over HTTPS at 10.5.0.6:8007
+  - Auth: Token-based (`root@pam!pve` token)
+  - Helper: `~/bin/pbs` (status, backups, tasks, errors, gc, snapshots, storage)
+  - Targets: core.georgsen.dk, pve01.warradejendomme.dk, pve02.warradejendomme.dk namespaces
+  - Datastore: Synology NAS via CIFS at 100.105.26.130 (Tailscale)
+
+**DNS Management:**
+- **Technitium DNS** - Internal DNS with API
+  - API: REST at http://10.5.0.2:5380/api/
+  - Auth: Username/password based
+  - Config: `~/.config/dns/credentials`
+  - Helper: `~/bin/dns` (list, records, add, delete, lookup)
+  - Internal zone: `lab.georgsen.dk`
+  - Upstream: Cloudflare (1.1.1.1), Google (8.8.8.8), Quad9 (9.9.9.9)
+
+**Monitoring APIs:**
+- **Uptime Kuma** - Status page & endpoint monitoring
+  - API: HTTP at 10.5.0.10:3001
+  - SDK/Client: `uptime-kuma-api` v1.2.1 (Python)
+  - Auth: Username/password login
+  - Config: `~/.config/uptime-kuma/credentials`
+  - Helper: `~/bin/kuma` (list, info, add-http, add-port, add-ping, delete, pause, resume)
+  - URL: https://status.georgsen.dk
+
+- **Beszel** - Server metrics dashboard
+  - Backend: PocketBase REST API at 10.5.0.10:8090
+  - SDK/Client: `pocketbase` v0.15.0 (Python)
+  - Auth: Admin email/password
+  - Config: `~/.config/beszel/credentials`
+  - Helper: `~/bin/beszel` (list, status, add, delete, alerts)
+  - URL: https://dashboard.georgsen.dk
+  - Agents: core (10.5.0.254), PBS (10.5.0.6), Dockge (10.5.0.10 + Docker stats)
+  - Data retention: 30 days (automatic)
+
+**Reverse Proxy & SSL:**
+- **Nginx Proxy Manager (NPM)** - Reverse proxy with SSL
+  - API: JSON-RPC style (internal Docker API)
+  - Helper: `~/bin/npm-api` (--host-list, --host-create, --host-delete, --cert-list)
+  - Config: `~/.config/npm/npm-api.conf` (custom API wrapper)
+  - UI: http://10.5.0.1:81 (admin panel)
+  - SSL Provider: Let's Encrypt (HTTP-01 challenge)
+  - Access Control: NPM Access Lists (ID 1: "home_only" whitelist 83.89.248.247)
+
+**Git/Version Control:**
+- **Forgejo** - Self-hosted Git server
+  - API: REST at 10.5.0.14:3000/api/v1/
+  - Auth: API token based
+  - Config: `~/.config/forgejo/credentials`
+  - URL: https://git.georgsen.dk
+  - Repo: `git@10.5.0.14:mikkel/homelab.git`
+  - Version: v10.0.1
+
+**Data Stores:**
+- **DragonflyDB** - Redis-compatible in-memory store
+  - Host: 10.5.0.10 (Docker in Dockge)
+  - Port: 6379
+  - Protocol: Redis protocol
+  - Auth: Password protected (`nUq/IfoIQJf/kouckKHRQOk7vV0NwCuI`)
+  - Client: redis-cli or any Redis library
+  - Usage: Session/cache storage
+
+- **PostgreSQL** - Relational database
+  - Host: 10.5.0.109 (VMID 103)
+  - Default port: 5432
+  - Managed by: Community (Proxmox LXC community images)
+  - Usage: Sentry system and other applications
+
+## Data Storage
+
+**Databases:**
+- **PostgreSQL 13+** (VMID 103)
+  - Connection: `postgresql://user@10.5.0.109:5432/dbname`
+  - Client: psql (CLI) or any PostgreSQL driver
+  - Usage: Sentry defense intelligence system, application databases
+
+- **DragonflyDB** (Redis-compatible)
+  - Connection: `redis://10.5.0.10:6379` (with auth)
+  - Client: redis-cli or Python redis library
+  - Backup: Enabled in Docker config, persists to `./data/`
+
+- **Redis** (VMID 104, deprecated in favor of DragonflyDB)
+  - Host: 10.5.0.111
+  - Status: Still active but DragonflyDB preferred
+
+**File Storage:**
+- **Local Filesystem:** Each container has ZFS subvolume storage at /
+- **Shared Storage (ZFS):** `/shared/mikkel/stuff` bind-mounted into containers
+  - PVE: `rpool/shared/mikkel` dataset
+  - mgmt (102): `~/stuff` with backup=1 (included in PBS backups)
+  - dev (111): `~/stuff` (shared access)
+  - general (113): `~/stuff` (shared access)
+  - SMB Access: `\\mgmt\stuff` via Tailscale MagicDNS
+
+**Backup Target:**
+- **Synology NAS** (home network)
+  - Tailscale IP: 100.105.26.130
+  - Mount: `/mnt/synology` on PBS
+  - Protocol: CIFS/SMB 3.0
+  - Share: `/volume1/pbs-backup`
+  - UID mapping: Mapped to admin (squash: map all)
+
+## Authentication & Identity
+
+**Auth Providers:**
+- **Proxmox PAM** - System-based authentication for PVE/PBS
+  - Users: root@pam, other system users
+  - Token auth: `root@pam!mgmt` (PVE), `root@pam!pve` (PBS)
+
+**SSH Key Authentication:**
+- **Ed25519 keys** for user access
+  - Key: `ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIIOQrK06zVkfY6C1ec69kEZYjf8tC98icCcBju4V751i mikkel@georgsen.dk`
+  - Deployed to all containers at `~/.ssh/authorized_keys` and `/root/.ssh/authorized_keys`
+
+**Telegram Bot Authentication:**
+- **Telegram Bot Token** - Stored in `~/telegram/credentials`
+- **Authorized Users:** Whitelist stored in `~/telegram/authorized_users` (chat IDs)
+- **First user:** Auto-authorized on first `/start` command
+- **Two-way messaging:** Text/photos/files saved to `~/telegram/inbox`
+
+## Monitoring & Observability
+
+**Error Tracking:**
+- **Sentry** (custom defense intelligence system, VMID 105)
+  - Purpose: Monitor military contracting opportunities
+  - Databases: PostgreSQL (103) + Redis (104)
+  - Not a traditional error tracker - custom business intelligence system
+
+**Metrics & Monitoring:**
+- **Beszel**: Server CPU, RAM, disk usage metrics
+- **Uptime Kuma**: HTTP, TCP port, ICMP ping monitoring
+- **PBS**: Backup task logs, storage metrics, dedup stats
+
+**Logs:**
+- **PBS logs:** SSH queries via `~/bin/pbs`, stored on PBS container
+- **Forgejo logs:** `/var/lib/forgejo/log/forgejo.log` (for fail2ban)
+- **Telegram bot logs:** stdout to systemd service `telegram-bot.service`
+- **Helper scripts:** Output to stdout, can be piped/redirected
+
+## CI/CD & Deployment
+
+**Hosting:**
+- **Hetzner** (public cloud) - Primary: core.georgsen.dk (AX52)
+- **Home Infrastructure** - Synology NAS for backups, future NUC cluster
+- **Docker/Dockge** - Application deployment via Docker Compose (10.5.0.10)
+
+**CI Pipeline:**
+- **None detected** - Manual deployment via Dockge or container management
+- **Version control:** Forgejo (self-hosted Git server)
+- **Update checks:** `~/bin/updates` script checks for updates across services
+  - Tracked: dragonfly, beszel, uptime-kuma, snappymail, dockge, npm, forgejo, dns, pbs
+
+**Deployment Tools:**
+- **Dockge** - Docker Compose UI for stack management
+- **PVE API** - Proxmox VE for container/VM provisioning
+- **Helper scripts** - `~/bin/pve create-ct` for automated container creation
+
+## Environment Configuration
+
+**Required Environment Variables (in credential files):**
+
+DNS (`~/.config/dns/credentials`):
+```
+DNS_HOST=10.5.0.2
+DNS_PORT=5380
+DNS_USER=admin
+DNS_PASS=<password>
+```
+
+Proxmox (`~/.config/pve/credentials`):
+```
+host=65.108.14.165:8006
+user=root@pam
+token_name=mgmt
+token_value=<token>
+```
+
+Uptime Kuma (`~/.config/uptime-kuma/credentials`):
+```
+KUMA_HOST=10.5.0.10
+KUMA_PORT=3001
+KUMA_USER=admin
+KUMA_PASS=<password>
+```
+
+Beszel (`~/.config/beszel/credentials`):
+```
+BESZEL_HOST=10.5.0.10
+BESZEL_PORT=8090
+BESZEL_USER=admin@example.com
+BESZEL_PASS=<password>
+```
+
+Telegram (`~/telegram/credentials`):
+```
+TELEGRAM_BOT_TOKEN=<token>
+```
+
+## Webhooks & Callbacks
+
+**Incoming Webhooks:**
+- **Uptime Kuma** - No webhook ingestion detected
+- **PBS** - Backup completion tasks (internal scheduling, no external webhooks)
+- **Forgejo** - No webhook configuration documented
+
+**Outgoing Notifications:**
+- **Telegram Bot** - Two-way messaging for homelab status
+  - Commands: /status, /pbs, /backups, /beszel, /kuma, /ping
+  - File uploads: Photos saved to `~/telegram/images/`, documents to `~/telegram/files/`
+  - Text inbox: Messages saved to `~/telegram/inbox` for Claude review
+
+**Event-Driven:**
+- **PBS Scheduling** - Daily backup tasks at 01:00, 01:30, 02:00 (core, pve01, pve02)
+- **Prune/GC** - Scheduled at 21:00 (prune) and 22:30 (garbage collection)
+
+## VPN & Remote Access
+
+**Tailscale Network:**
+- **Primary relay:** 10.5.0.134 + 10.9.1.10 (VMID 1000, exit node capable)
+- **Tailscale IPs:**
+  - PBS: 100.115.85.120
+  - Synology NAS: 100.105.26.130
+  - dev: 100.85.227.17
+  - sentry: 100.83.236.113
+  - Friends' nodes: pve01 (100.99.118.54), pve02 (100.82.87.108)
+  - Other devices: mge-t14, mikflix, xanderryzen, nvr01, tailscalemg
+
+**SSH Access Pattern:**
+- All containers/VMs accessible via SSH from mgmt (102)
+- SSH keys pre-deployed to all systems
+- Tailscale used for accessing from external networks
+
+## External DNS
+
+**DNS Provider:** dns.services (Danish free DNS with API)
+- Domains managed:
+  - georgsen.dk
+  - dataloes.dk
+  - microsux.dk
+  - warradejendomme.dk
+- Used for external domain registration only
+- Internal zone lookups go to Technitium (10.5.0.2)
+
+---
+
+*Integration audit: 2026-02-04*
--- a/.planning/codebase/STACK.md
+++ b/.planning/codebase/STACK.md
@ -0,0 +1,152 @@
+# Technology Stack
+
+**Analysis Date:** 2026-02-04
+
+## Languages
+
+**Primary:**
+- **Bash** - Infrastructure automation, API wrappers, system integration
+  - Helper scripts at `~/bin/` for service APIs
+  - Installation and setup in `pve-homelab-kit/install.sh`
+
+- **Python 3.12.3** - Management tools, monitoring, bot automation
+  - Virtual environment: `~/venv/` (activated with `source ~/venv/bin/activate`)
+  - Primary usage: API clients, Telegram bot, helper scripts
+
+## Runtime
+
+**Environment:**
+- **Python 3.12.3** (system)
+- **Bash 5+** (system shell)
+
+**Package Manager:**
+- **pip** v24.0 (Python package manager)
+- Lockfile: Virtual environment at `~/venv/` (not traditional pip.lock)
+
+## Frameworks
+
+**Core Infrastructure:**
+- **Proxmox VE** (v8.x) - Hypervisor/container platform on core.georgsen.dk
+- **Proxmox Backup Server (PBS)** v2.x - Backup infrastructure (10.5.0.6:8007)
+- **LXC Containers** - Primary virtualization method
+- **KVM VMs** - Full VMs when needed (mail server VM 200)
+- **Docker/Docker Compose** - Application deployment via Dockge (10.5.0.10)
+
+**Application Frameworks:**
+- **Nginx Proxy Manager (NPM)** v2.x - Reverse proxy, SSL (10.5.0.1:80/443/81)
+- **Dockge** - Docker Compose stack management UI (10.5.0.10:5001)
+- **Forgejo** v10.0.1 - Self-hosted Git server (10.5.0.14:3000)
+- **Technitium DNS** - DNS server with API (10.5.0.2:5380)
+
+**Monitoring & Observability:**
+- **Uptime Kuma** - Service/endpoint monitoring (10.5.0.10:3001)
+- **Beszel** - Server metrics dashboard (10.5.0.10:8090)
+
+**Messaging:**
+- **Stalwart Mail Server** - Mail server (VM 200, IP 65.108.14.164)
+- **Snappymail** - Webmail UI (djmaze/snappymail:latest, 10.5.0.10:8888)
+
+**Data Storage:**
+- **DragonflyDB** - Redis-compatible in-memory datastore (10.5.0.10:6379)
+  - Password protected, used for session/cache storage
+- **PostgreSQL 13+** (VMID 103, 10.5.0.109) - Community managed database
+- **Redis/DragonflyDB** (VMID 104, 10.5.0.111) - Session/cache store
+
+## Key Dependencies
+
+**Python Packages (in ~/venv/):**
+
+**Proxmox API:**
+- `proxmoxer` v2.2.0 - Python API client for Proxmox VE
+  - File: `~/bin/pve` (list, status, start, stop, create-ct operations)
+
+**Monitoring APIs:**
+- `uptime-kuma-api` v1.2.1 - Uptime Kuma monitoring client
+  - File: `~/bin/kuma` (monitor management)
+- `pocketbase` v0.15.0 - Beszel dashboard backend client
+  - File: `~/bin/beszel` (system monitoring)
+
+**Communications:**
+- `python-telegram-bot` v22.5 - Telegram Bot API
+  - File: `~/telegram/bot.py` (homelab management bot)
+
+**HTTP Clients:**
+- `requests` v2.32.5 - HTTP library for API calls
+- `httpx` v0.28.1 - Async HTTP client
+- `urllib3` v2.6.3 - Low-level HTTP client
+
+**Networking & WebSockets:**
+- `websocket-client` v1.9.0 - WebSocket client library
+- `python-socketio` v5.16.0 - Socket.IO client
+- `simple-websocket` v1.1.0 - WebSocket utilities
+
+**Utilities:**
+- `certifi` v2026.1.4 - SSL certificate verification
+- `charset-normalizer` v3.4.4 - Character encoding detection
+- `packaging` v25.0 - Version/requirement parsing
+
+## Configuration
+
+**Environment:**
+- **Bash scripts:** Load credentials from `~/.config/{service}/credentials` files
+  - `~/.config/pve/credentials` - Proxmox API token
+  - `~/.config/dns/credentials` - Technitium DNS API
+  - `~/.config/beszel/credentials` - Beszel dashboard API
+  - `~/.config/uptime-kuma/credentials` - Uptime Kuma API
+  - `~/.config/forgejo/credentials` - Forgejo Git API
+- **Python scripts:** Similar credential loading pattern
+- **Telegram bot:** `~/telegram/credentials` file with `TELEGRAM_BOT_TOKEN`
+
+**Build & Runtime Configuration:**
+- Python venv activation: `source ~/venv/bin/activate`
+- Helper scripts use shebang: `#!/home/mikkel/venv/bin/python3` or `#!/bin/bash`
+- All scripts in `~/bin/` are executable and PATH-accessible
+
+**Documentation:**
+- `CLAUDE.md` - Development environment guidance
+- `homelab-documentation.md` - Infrastructure reference (22KB, comprehensive)
+- `README.md` - Quick container/service overview
+- `TODO.md` - Pending maintenance tasks
+
+## Platform Requirements
+
+**Development/Management:**
+- **Container:** LXC on Proxmox VE (VMID 102, "mgmt")
+- **OS:** Debian-based Linux (venv requires Linux filesystem)
+- **User:** mikkel (UID 1000, group georgsen GID 1000)
+- **SSH:** Pre-installed keys for accessing other containers/VMs
+- **Network:** Tailscale VPN for external access, internal vmbr1 (10.5.0.0/24)
+
+**Production (Core Server):**
+- **Provider:** Hetzner AX52 (Helsinki)
+- **CPU:** AMD Ryzen 7 3700X
+- **RAM:** 64GB ECC
+- **Storage:** 2x 1TB NVMe (RAID0 via ZFS)
+- **Public IP:** 65.108.14.165/26 (BGP routed)
+- **Network bridges:** vmbr0 (public), vmbr1 (internal), vmbr2 (vSwitch)
+
+**Backup Target:**
+- **Synology NAS** (home network via Tailscale)
+- **Protocol:** CIFS/SMB 3.0 over Tailscale
+- **Mount point on PBS:** `/mnt/synology` (bind-mounted as datastore)
+
+## Deployment & Access
+
+**Service URLs:**
+- **Proxmox Web UI:** https://65.108.14.165:8006 (public, home IP whitelisted)
+- **NPM Admin:** http://10.5.0.1:81 (internal only)
+- **DNS Admin:** https://dns.georgsen.dk (home IP whitelisted via access list)
+- **PBS Web UI:** https://pbs.georgsen.dk:8007 (home IP whitelisted)
+- **Dockge Admin:** https://dockge.georgsen.dk:5001 (home IP whitelisted)
+- **Forgejo:** https://git.georgsen.dk (public)
+- **Status Page:** https://status.georgsen.dk (Uptime Kuma)
+- **Dashboard:** https://dashboard.georgsen.dk (Beszel metrics)
+
+**SSL Certificates:**
+- **Provider:** Let's Encrypt via NPM
+- **Challenge method:** HTTP-01
+- **Auto-renewal:** Handled by NPM
+
+---
+
+*Stack analysis: 2026-02-04*
--- a/.planning/codebase/STRUCTURE.md
+++ b/.planning/codebase/STRUCTURE.md
@ -0,0 +1,228 @@
+# Codebase Structure
+
+**Analysis Date:** 2026-02-04
+
+## Directory Layout
+
+```
+/home/mikkel/homelab/
+├── .planning/                      # Planning and analysis artifacts
+│   └── codebase/                   # Codebase documentation (ARCHITECTURE.md, STRUCTURE.md, etc.)
+├── .git/                           # Git repository metadata
+├── telegram/                       # Telegram bot and message storage
+│   ├── bot.py                      # Main bot implementation
+│   ├── credentials                 # Telegram bot token (env var: TELEGRAM_BOT_TOKEN)
+│   ├── authorized_users            # Allowlist of chat IDs (one per line)
+│   ├── inbox                       # Messages from admin (appended on each message)
+│   ├── images/                     # Photos sent via Telegram (timestamped)
+│   └── files/                      # Files sent via Telegram (timestamped)
+├── pve-homelab-kit/               # PVE installation kit (subproject)
+│   ├── install.sh                  # Installation script
+│   ├── PROMPT.md                   # Project context for Claude
+│   ├── .planning/                  # Subproject planning docs
+│   └── README.md                   # Setup instructions
+├── npm/                            # Nginx Proxy Manager configuration
+│   └── npm-api.conf                # API credentials reference
+├── dockge/                         # Docker Compose Manager configuration
+│   └── credentials                 # Dockge API access
+├── dns/                            # Technitium DNS configuration
+│   └── credentials                 # DNS API credentials (env vars: DNS_HOST, DNS_PORT, DNS_USER, DNS_PASS)
+├── dns-services/                   # DNS services configuration
+│   └── credentials                 # Alternative DNS credentials
+├── pve/                            # Proxmox VE configuration
+│   └── credentials                 # PVE API credentials (env vars: host, user, token_name, token_value)
+├── beszel/                         # Beszel monitoring dashboard
+│   ├── credentials                 # Beszel API credentials
+│   └── README.md                   # API and agent setup guide
+├── forgejo/                        # Forgejo Git server configuration
+│   └── credentials                 # Forgejo API access
+├── uptime-kuma/                    # Uptime Kuma monitoring
+│   ├── credentials                 # Kuma API credentials (env vars: KUMA_HOST, KUMA_PORT, KUMA_API_KEY)
+│   ├── README.md                   # REST API reference and Socket.IO documentation
+│   └── kuma_api_doc.png            # Full API documentation screenshot
+├── README.md                       # Repository overview and service table
+├── CLAUDE.md                       # Claude Code guidance and infrastructure quick reference
+├── homelab-documentation.md        # Authoritative infrastructure documentation
+├── TODO.md                         # Pending maintenance tasks
+└── .gitignore                      # Git ignore patterns (credentials, sensitive files)
+```
+
+## Directory Purposes
+
+**telegram/:**
+- Purpose: Two-way Telegram bot for management commands and admin notifications
+- Contains: Python bot code, token credentials, authorized user allowlist, message inbox, uploaded media
+- Key files: `bot.py` (407 lines), `credentials`, `authorized_users`, `inbox`
+- Not committed: `credentials`, `inbox`, `images/*`, `files/*` (in `.gitignore`)
+
+**pve-homelab-kit/:**
+- Purpose: Standalone PVE installation and initial setup toolkit
+- Contains: Installation script, configuration examples, planning documents
+- Key files: `install.sh` (executable automation), `PROMPT.md` (context for Claude), subproject `.planning/`
+- Notes: Separate git repository (submodule or independent), for initial PVE deployment
+
+**npm/:**
+- Purpose: Nginx Proxy Manager reverse proxy configuration
+- Contains: API credentials reference
+- Key files: `npm-api.conf`
+
+**dns/ & dns-services/:**
+- Purpose: Technitium DNS server configuration (dual credential sets)
+- Contains: API authentication credentials
+- Key files: `credentials` (host, port, user, password)
+
+**pve/:**
+- Purpose: Proxmox VE API access credentials
+- Contains: Token-based authentication data
+- Key files: `credentials` (host, user, token_name, token_value)
+
+**dockge/, forgejo/, beszel/, uptime-kuma/:**
+- Purpose: Service-specific API credentials and documentation
+- Contains: Token/API key for each service
+- Key files: `credentials`, service-specific `README.md` (beszel, uptime-kuma)
+
+**homelab-documentation.md:**
+- Purpose: Authoritative reference for all infrastructure details
+- Contains: Network topology, VM/container registry, service mappings, security rules, firewall config
+- Must be updated whenever: services added/removed, IPs changed, configurations modified
+
+**CLAUDE.md:**
+- Purpose: Claude Code (AI assistant) guidance and quick reference
+- Contains: Environment setup, helper script signatures, API access patterns, security notes
+- Auto-loaded by Claude when working in this repository
+
+**.planning/codebase/:**
+- Purpose: GSD codebase analysis artifacts
+- Will contain: ARCHITECTURE.md, STRUCTURE.md, CONVENTIONS.md, TESTING.md, STACK.md, INTEGRATIONS.md, CONCERNS.md
+- Generated by: GSD codebase mapper, consumed by GSD planner/executor
+
+## Key File Locations
+
+**Entry Points:**
+- `telegram/bot.py`: Telegram bot entry point (asyncio-based)
+- `pve-homelab-kit/install.sh`: Initial PVE setup entry point
+
+**Configuration:**
+- `homelab-documentation.md`: Infrastructure reference (IPs, ports, network topology, firewall rules)
+- `CLAUDE.md`: Claude Code environment setup and quick reference
+- `.planning/`: Planning and analysis artifacts
+
+**Core Logic:**
+- `~/bin/pve`: Proxmox VE API wrapper (Python, 200 lines)
+- `~/bin/dns`: Technitium DNS API wrapper (Bash, 107 lines)
+- `~/bin/pbs`: PBS backup status and management (Bash, 400+ lines)
+- `~/bin/beszel`: Beszel monitoring dashboard API (Bash/Python, 137 lines)
+- `~/bin/kuma`: Uptime Kuma monitor management (Bash, 144 lines)
+- `~/bin/updates`: Service version checking and updates (Bash, 450+ lines)
+- `~/bin/telegram`: CLI helper for Telegram bot control (2-way messaging)
+- `~/bin/npm-api`: NPM reverse proxy management (wrapper script)
+- `telegram/bot.py`: Telegram bot with command handlers and media management
+
+**Testing:**
+- Not applicable (no automated tests in this repository)
+
+## Naming Conventions
+
+**Files:**
+- Lowercase with hyphens for multi-word names: `npm-api`, `uptime-kuma`, `pve-homelab-kit`
+- Markdown documentation: UPPERCASE.md (`README.md`, `CLAUDE.md`, `homelab-documentation.md`)
+- Configuration/credential files: lowercase `credentials` with optional zone prefix
+
+**Directories:**
+- Service-specific: lowercase, match service name (`npm`, `dns`, `dockge`, `forgejo`, `beszel`, `telegram`)
+- Functional: category name (`pve`, `pve-homelab-kit`)
+- Hidden: `.planning`, `.git` for system metadata
+
+**Variables & Parameters:**
+- Environment variables: UPPERCASE_WITH_UNDERSCORES (e.g., `TELEGRAM_BOT_TOKEN`, `DNS_HOST`, `KUMA_API_KEY`)
+- Bash functions: lowercase_with_underscores (e.g., `get_token()`, `run_command()`, `ssh_pbs()`)
+- Python functions: lowercase_with_underscores (e.g., `is_authorized()`, `run_command()`, `get_status()`)
+
+## Where to Add New Code
+
+**New Helper Script (CLI tool):**
+- Primary code: `~/bin/{service_name}` (no extension, executable)
+- Credentials: `~/.config/{service_name}/credentials`
+- Documentation: Top-of-file comment with usage examples
+- Language: Bash for shell commands/APIs, Python for complex logic (use Python venv)
+
+**New Service Configuration:**
+- Directory: `/home/mikkel/homelab/{service_name}/`
+- Credentials file: `{service_name}/credentials`
+- Documentation: `{service_name}/README.md` (include API examples and setup)
+- Git handling: All credentials in `.gitignore`, document as `credentials.example` if needed
+
+**New Telegram Bot Command:**
+- File: `telegram/bot.py` (add function to existing handlers section)
+- Pattern: Async function named `cmd_name()`, check authorization first with `is_authorized()`
+- Result: Send back via `update.message.reply_text()`
+- Timeout: Default 30 seconds (configurable via `run_command()`)
+
+**New Documentation:**
+- Infrastructure changes: Update `homelab-documentation.md` (IPs, service registry, network config)
+- Claude Code guidance: Update `CLAUDE.md` (new helper scripts, environment setup)
+- Service-specific: Create `{service_name}/README.md` with API examples and access patterns
+
+**Shared Utilities:**
+- Location: Create in `~/lib/` or `~/venv/lib/` for Python packages
+- Access: Import in other scripts or source in Bash
+
+## Special Directories
+
+**.planning/codebase/:**
+- Purpose: GSD analysis artifacts
+- Generated: Yes (by GSD codebase mapper)
+- Committed: Yes (part of repository for reference)
+
+**telegram/images/ & telegram/files/:**
+- Purpose: Media uploaded via Telegram bot
+- Generated: Yes (bot downloads on receipt)
+- Committed: No (in `.gitignore`)
+
+**telegram/inbox:**
+- Purpose: Admin messages to Claude
+- Generated: Yes (bot appends messages)
+- Committed: No (in `.gitignore`)
+
+**.git/**
+- Purpose: Git repository metadata
+- Generated: Yes (by git)
+- Committed: No (system directory)
+
+**pve-homelab-kit/.planning/**
+- Purpose: Subproject planning documents
+- Generated: Yes (by GSD mapper on subproject)
+- Committed: Yes (tracked in subproject)
+
+## Credential File Organization
+
+All credentials stored in `~/.config/{service}/credentials` using key=value format (one per line):
+
+```bash
+# ~/.config/pve/credentials
+host=core.georgsen.dk
+user=root@pam
+token_name=automation
+token_value=<token-uuid>
+
+# ~/.config/dns/credentials
+DNS_HOST=10.5.0.2
+DNS_PORT=5380
+DNS_USER=admin
+DNS_PASS=<password>
+
+# ~/.config/beszel/credentials
+BESZEL_HOST=10.5.0.10
+BESZEL_PORT=8090
+BESZEL_USER=<email>
+BESZEL_PASS=<password>
+```
+
+**Loading Pattern:**
+- Bash: `source ~/.config/{service}/credentials` or inline `$(cat ~/.config/{service}/credentials | grep ^KEY= | cut -d= -f2-)`
+- Python: Read file, parse `key=value` lines into dict
+- Never hardcode credentials in scripts
+
+---
+
+*Structure analysis: 2026-02-04*
--- a/.planning/codebase/TESTING.md
+++ b/.planning/codebase/TESTING.md
@ -0,0 +1,324 @@
+# Testing Patterns
+
+**Analysis Date:** 2026-02-04
+
+## Test Framework
+
+**Current State:**
+- **No automated testing detected** in this codebase
+- No test files found (no `*.test.py`, `*_test.py`, `*.spec.py` files)
+- No testing configuration files (no `pytest.ini`, `tox.ini`, `setup.cfg`)
+- No test dependencies in requirements (no pytest, unittest, mock imports)
+
+**Implications:**
+This is a **scripts-only codebase** - all code consists of CLI helper scripts and one bot automation. Manual testing is the primary validation method.
+
+## Script Testing Approach
+
+Since this codebase consists entirely of helper scripts and automation, testing is manual and implicit:
+
+**Command-Line Validation:**
+- Each script has a usage/help message showing all commands
+- Example from `pve`:
+  ```python
+  if len(sys.argv) < 2:
+      print(__doc__)
+      sys.exit(1)
+  ```
+- Example from `telegram`:
+  ```bash
+  case "${1:-}" in
+      send) cmd_send "$2" ;;
+      inbox) cmd_inbox ;;
+      *) usage; exit 1 ;;
+  esac
+  ```
+
+**Entry Point Testing:**
+Main execution guards are used throughout:
+```python
+if __name__ == "__main__":
+    main()
+```
+
+This allows scripts to be imported (theoretically) without side effects, though in practice they are not used as modules.
+
+## API Integration Testing
+
+**Pattern: Try-Except Fallback:**
+Many scripts handle multiple service types by trying different approaches:
+
+From `pve` script (lines 55-85):
+```python
+def get_status(vmid):
+    """Get detailed status of a VM/container."""
+    vmid = int(vmid)
+    # Try as container first
+    try:
+        status = pve.nodes(NODE).lxc(vmid).status.current.get()
+        # ... container-specific logic
+        return
+    except:
+        pass
+
+    # Try as VM
+    try:
+        status = pve.nodes(NODE).qemu(vmid).status.current.get()
+        # ... VM-specific logic
+        return
+    except:
+        pass
+
+    print(f"VMID {vmid} not found")
+```
+
+This is a pragmatic testing pattern: if one API call fails, try another. Useful for development but fragile without structured error handling.
+
+## Command Dispatch Testing
+
+**Pattern: Argument Validation:**
+All scripts validate argument count before executing commands:
+
+From `beszel` script (lines 101-124):
+```python
+if __name__ == "__main__":
+    if len(sys.argv) < 2:
+        usage()
+
+    cmd = sys.argv[1]
+
+    try:
+        if cmd == "list":
+            cmd_list()
+        elif cmd == "info" and len(sys.argv) == 3:
+            cmd_info(sys.argv[2])
+        elif cmd == "add" and len(sys.argv) >= 4:
+            # ...
+        else:
+            usage()
+    except Exception as e:
+        print(f"Error: {e}")
+        sys.exit(1)
+```
+
+This catches typos in command names and wrong argument counts, showing usage help.
+
+## Data Processing Testing
+
+**Bash String Parsing:**
+Complex regex patterns used in `pbs` script require careful testing:
+
+From `pbs` (lines 122-143):
+```bash
+ssh_pbs 'tail -500 /var/log/proxmox-backup/tasks/archive 2>/dev/null' | while IFS= read -r line; do
+    if [[ "$line" =~ UPID:pbs:[^:]+:[^:]+:[^:]+:([0-9A-Fa-f]+):([^:]+):([^:]+):.*\ [0-9A-Fa-f]+\ (OK|ERROR|WARNINGS[^$]*) ]]; then
+        task_time=$((16#${BASH_REMATCH[1]}))
+        task_type="${BASH_REMATCH[2]}"
+        task_target="${BASH_REMATCH[3]}"
+        status="${BASH_REMATCH[4]}"
+        # ... process matched groups
+    fi
+done
+```
+
+**Manual Testing Approach:**
+- Run command against live services
+- Inspect output format visually
+- Verify JSON parsing with inline Python:
+  ```bash
+  echo "$gc_json" | python3 -c "import sys,json; d=json.load(sys.stdin); print(d.get('disk-bytes',0))"
+  ```
+
+## Mock Testing Pattern (Telegram Bot)
+
+The telegram bot has one pattern that resembles mocking - subprocess mocking via `run_command()`:
+
+From `telegram/bot.py` (lines 60-78):
+```python
+def run_command(cmd: list, timeout: int = 30) -> str:
+    """Run a shell command and return output."""
+    try:
+        result = subprocess.run(
+            cmd,
+            capture_output=True,
+            text=True,
+            timeout=timeout,
+            env={**os.environ, 'PATH': f"/home/mikkel/bin:{os.environ.get('PATH', '')}"}
+        )
+        output = result.stdout or result.stderr or "No output"
+        # Telegram has 4096 char limit per message
+        if len(output) > 4000:
+            output = output[:4000] + "\n... (truncated)"
+        return output
+    except subprocess.TimeoutExpired:
+        return "Command timed out"
+    except Exception as e:
+        return f"Error: {e}"
+```
+
+This function:
+- Runs external commands with timeout protection
+- Handles both stdout and stderr
+- Truncates output for Telegram's message size limits
+- Returns error messages instead of raising exceptions
+
+This enables testing command handlers by mocking which commands are available.
+
+## Timeout Testing
+
+The telegram bot handles timeouts explicitly:
+
+From `telegram/bot.py`:
+```python
+result = subprocess.run(
+    ["ping", "-c", "3", "-W", "2", host],
+    capture_output=True,
+    text=True,
+    timeout=10  # 10 second timeout
+)
+```
+
+Different commands have different timeouts:
+- `ping_host()`: 10 second timeout
+- `run_command()`: 30 second default (configurable)
+- `backups()`: 60 second timeout (passed to run_command)
+
+This prevents the bot from hanging on slow/unresponsive services.
+
+## Error Message Testing
+
+Scripts validate successful API responses:
+
+From `dns` script (lines 62-69):
+```bash
+curl -s "$BASE/zones/records/add?..." | python3 -c "
+import sys, json
+data = json.load(sys.stdin)
+if data['status'] == 'ok':
+    print(f\"Added: {data['response']['addedRecord']['name']} -> ...\")
+else:
+    print(f\"Error: {data.get('errorMessage', 'Unknown error')}\")
+"
+```
+
+This pattern:
+- Parses JSON response
+- Checks status field
+- Returns user-friendly error message on failure
+
+## Credential Testing
+
+Scripts assume credentials exist and are properly formatted:
+
+From `pve` (lines 17-34):
+```python
+creds_path = Path.home() / ".config" / "pve" / "credentials"
+creds = {}
+with open(creds_path) as f:
+    for line in f:
+        if "=" in line:
+            key, value = line.strip().split("=", 1)
+            creds[key] = value
+
+pve = ProxmoxAPI(
+    creds["host"],
+    user=creds["user"],
+    token_name=creds["token_name"],
+    token_value=creds["token_value"],
+    verify_ssl=False
+)
+```
+
+**Missing Error Handling:**
+- No check that credentials file exists
+- No check that required keys are present
+- No validation that API connection succeeds
+- Will crash with KeyError or FileNotFoundError if file missing
+
+**Recommendation for Testing:**
+Add pre-flight validation:
+```python
+required_keys = ["host", "user", "token_name", "token_value"]
+missing = [k for k in required_keys if k not in creds]
+if missing:
+    print(f"Error: Missing credentials: {', '.join(missing)}")
+    sys.exit(1)
+```
+
+## File I/O Testing
+
+Telegram bot handles file operations defensively:
+
+From `telegram/bot.py` (lines 277-286):
+```python
+# Create images directory
+images_dir = Path(__file__).parent / 'images'
+images_dir.mkdir(exist_ok=True)
+
+# Get the largest photo (best quality)
+photo = update.message.photo[-1]
+file = await context.bot.get_file(photo.file_id)
+
+# Download the image
+filename = f"{file_timestamp}.jpg"
+filepath = images_dir / filename
+await file.download_to_drive(filepath)
+```
+
+**Patterns:**
+- `mkdir(exist_ok=True)`: Safely creates directory, doesn't error if exists
+- Timestamp-based filenames to avoid collisions: `f"{file_timestamp}_{original_name}"`
+- Pathlib for cross-platform path handling
+
+## What to Test If Writing Tests
+
+If converting to automated tests, prioritize:
+
+**High Priority:**
+1. **Telegram bot command dispatch** (`telegram/bot.py` lines 107-366)
+   - Each command handler should have unit tests
+   - Mock `subprocess.run()` to avoid calling actual commands
+   - Test authorization checks (`is_authorized()`)
+   - Test output truncation for large responses
+
+2. **Credential loading** (all helper scripts)
+   - Test missing credentials file error
+   - Test malformed credentials
+   - Test missing required keys
+
+3. **API response parsing** (`dns`, `pbs`, `beszel`, `kuma`)
+   - Test JSON parsing errors
+   - Test malformed responses
+   - Test status code handling
+
+**Medium Priority:**
+1. **Bash regex parsing** (`pbs` task/error log parsing)
+   - Test hex timestamp conversion
+   - Test status code extraction
+   - Test task target parsing with special characters
+
+2. **Timeout handling** (all `run_command()` calls)
+   - Test command timeout
+   - Test output truncation
+   - Test error message formatting
+
+**Low Priority:**
+1. Integration tests with real services (kept in separate test suite)
+2. Performance tests for large data sets
+
+## Current Test Coverage
+
+**Implicit Testing:**
+- Manual CLI testing during development
+- Live service testing (commands run against real PVE, PBS, DNS, etc.)
+- User/admin interaction testing (Telegram bot testing via /start, /status, etc.)
+
+**Gap:**
+- No regression testing
+- No automated validation of API response formats
+- No error case testing
+- No refactoring safety net
+
+---
+
+*Testing analysis: 2026-02-04*