docs: add codebase map and domain research

Codebase: 7 documents (stack, architecture, structure, conventions, testing, integrations, concerns)
Research: 5 documents (stack, features, architecture, pitfalls, summary)
This commit is contained in:
Mikkel Georgsen 2026-02-04 13:50:03 +00:00
parent 6cf6bfb8d1
commit a639a53b0b
7 changed files with 1662 additions and 0 deletions

View file

@ -0,0 +1,151 @@
# Architecture
**Analysis Date:** 2026-02-04
## Pattern Overview
**Overall:** Hub-and-spoke service orchestration with API-driven infrastructure management.
**Key Characteristics:**
- Centralized management container (VMID 102 - mgmt) coordinating all infrastructure
- Layered abstraction: CLI helpers → REST APIs → external services
- Event-driven notifications (Telegram bot bridges management layer to user)
- Credential-based authentication for all service integrations
## Layers
**Management Layer:**
- Purpose: Orchestration and automation entry point for the homelab
- Location: `/home/mikkel/homelab` (git repository in mgmt container)
- Contains: CLI helper scripts (`~/bin/*`), Telegram bot, documentation
- Depends on: Remote SSH access to container/VM IP addresses, Proxmox API, service REST APIs
- Used by: Claude Code automation, Telegram bot commands, cron jobs
**API Integration Layer:**
- Purpose: Abstracts service APIs into simple CLI interfaces
- Location: `~/bin/` (pve, npm-api, dns, pbs, beszel, kuma, updates, telegram)
- Contains: Python and Bash wrappers around external service APIs
- Depends on: Proxmox API, Nginx Proxy Manager API, Technitium DNS API, PBS REST API, Beszel PocketBase, Uptime Kuma REST API, Telegram Bot API
- Used by: Telegram bot, CI/CD automation, interactive CLI usage
**Service Layer:**
- Purpose: Individual hosted services providing infrastructure capabilities
- Location: Distributed across containers (NPM, DNS, PBS, Dockge, Forgejo, etc.)
- Contains: Docker containers, LXC services, backup systems
- Depends on: PVE host networking, shared storage, external integrations
- Used by: API layer, end-user access via web UI or CLI
**Data & Communication Layer:**
- Purpose: State persistence and inter-service communication
- Location: Shared storage (`~/stuff` - ZFS bind mount), credential files (`~/.config/*/credentials`)
- Contains: Backup data, configuration files, Telegram inbox/images/files
- Depends on: PVE ZFS dataset, filesystem access
- Used by: All services, backup/restore operations
## Data Flow
**Infrastructure Query Flow (e.g., `pve list`):**
1. User invokes CLI helper: `~/bin/pve list`
2. Helper loads credentials from `~/.config/pve/credentials`
3. Helper authenticates to Proxmox API at `core.georgsen.dk:8006` using token auth
4. Proxmox returns cluster resource state (VMs/containers)
5. Helper formats and displays output to user
**Service Management Flow (e.g., `dns add myhost 10.5.0.50`):**
1. User invokes: `~/bin/dns add myhost 10.5.0.50`
2. DNS helper loads credentials and authenticates to Technitium at `10.5.0.2:5380`
3. Helper makes HTTP API call to add A record
4. Technitium stores in zone file and updates DNS records
5. Helper confirms success to user
**Backup Status Flow (e.g., `/pbs` command in Telegram):**
1. Telegram user sends `/pbs` command
2. Bot handler in `telegram/bot.py` executes `~/bin/pbs status`
3. PBS helper SSH's to `10.5.0.6` as root
4. SSH command reads backup logs and GC status from PBS container
5. Helper formats human-readable output
6. Bot sends result back to Telegram chat (truncated to 4000 chars for Telegram API limit)
**State Management:**
- Credentials: Stored in `~/.config/*/credentials` files (sourced at runtime)
- Telegram messages: Appended to `telegram/inbox` file for Claude to read
- Media uploads: Saved to `telegram/images/` and `telegram/files/` with timestamps
- Authorization: `telegram/authorized_users` file maintains allowlist of chat IDs
## Key Abstractions
**Helper Scripts (API Adapters):**
- Purpose: Translate user intent into remote service API calls
- Examples: `~/bin/pve`, `~/bin/dns`, `~/bin/pbs`, `~/bin/beszel`, `~/bin/kuma`
- Pattern: Load credentials → authenticate → execute command → format output
- Language: Mix of Python (pve, updates, telegram) and Bash (dns, pbs, beszel, kuma)
**Telegram Bot:**
- Purpose: Provides two-way interactive access to management functions
- Implementation: `telegram/bot.py` using python-telegram-bot library
- Pattern: Command handlers dispatch to helper scripts, results sent back to user
- Channels: Commands (e.g., `/pbs`), free-text messages saved to inbox, photos/files downloaded
**Service Registry (Documentation):**
- Purpose: Centralized reference for service locations and access patterns
- Implementation: `homelab-documentation.md` and `CLAUDE.md`
- Contents: IP addresses, ports, authentication methods, SSH targets, network topology
## Entry Points
**CLI Usage (Direct):**
- Location: `~/bin/{helper}` scripts
- Triggers: Manual invocation by user or cron jobs
- Responsibilities: Execute service operations, format output, validate inputs
**Telegram Bot:**
- Location: `telegram/bot.py` (systemd service: `telegram-bot.service`)
- Triggers: Telegram message or command from authorized user
- Responsibilities: Authenticate user, route command/message, execute via helper scripts, send response
**Automation Scripts:**
- Location: Potential cron jobs or scheduled tasks
- Triggers: Time-based scheduling
- Responsibilities: Execute periodic management tasks (e.g., backup checks, updates)
**Manual Execution:**
- Location: Interactive shell in mgmt container
- Triggers: User SSH session
- Responsibilities: Run helpers for ad-hoc infrastructure management
## Error Handling
**Strategy:** Graceful degradation with informative messaging.
**Patterns:**
- CLI helpers return non-zero exit codes on failure (exception handling in Python, `set -e` in Bash)
- Timeout protection: Telegram bot commands have 30-second timeout (configurable per command)
- Service unavailability: Caught in try/except blocks, fall back to next option (e.g., `pve` tries LXC first, then QEMU)
- Credential failures: Load-time validation, clear error message if credentials file missing
- Network errors: SSH timeouts, API connection failures logged to stdout/stderr
## Cross-Cutting Concerns
**Logging:**
- Telegram bot uses Python stdlib logging (INFO level, writes to systemd journal)
- CLI helpers write directly to stdout/stderr
- PBS helper uses SSH error output for remote command failures
**Validation:**
- Telegram bot validates hostnames (alphanumeric + dots + hyphens only) before ping
- DNS helper validates that name and IP are provided before API call
- PVE helper validates VMID is integer before API call
**Authentication:**
- Credentials stored in `~/.config/{service}/credentials` as simple key=value files
- Sourced at runtime (Bash) or read at startup (Python)
- Token-based auth for Proxmox (no password in memory)
- Basic auth for DNS and other REST APIs (credentials URL-encoded if needed)
- Bearer token for Uptime Kuma (API key-based)
---
*Architecture analysis: 2026-02-04*

View file

@ -0,0 +1,272 @@
# Codebase Concerns
**Analysis Date:** 2026-02-04
## Tech Debt
**IP Addressing Scheme Inconsistency:**
- Issue: Container IPs don't follow VMID convention. NPM (VMID 100) is at .1, Dockge (VMID 101) at .10, PBS (VMID 106) at .6, instead of matching .100, .101, .106
- Files: `homelab-documentation.md` (lines 139-159)
- Impact: Manual IP tracking required, DNS records must be maintained separately, new containers require manual IP assignment planning, documentation drift risk
- Fix approach: Execute TODO task to reorganize vmbr1 to VMID=IP scheme (.100-.253 range), update NPM proxy hosts, DNS records (lab.georgsen.dk), and documentation
**DNS Record Maintenance Manual:**
- Issue: Internal DNS (Technitium) and external DNS (dns.services) require manual updates when IPs/domains change
- Files: `homelab-documentation.md` (lines 432-449), `~/bin/dns` script
- Impact: Risk of records becoming stale after IP migrations, no automation for new containers
- Fix approach: Implement `dns-services` helper script (TODO.md line 27) with API integration for automatic updates
**Unimplemented Helper Scripts:**
- Issue: `dns-services` API integration promised in TODO but not implemented
- Files: `TODO.md` (line 27), `dns-services/credentials` exists but script doesn't
- Impact: Manual dns.services operations required, cannot automate domain setup
- Fix approach: Create `~/bin/dns-services` wrapper (endpoint documented in TODO)
**Ping Capability Missing on 12 Containers:**
- Issue: Unprivileged LXC containers drop cap_net_raw, breaking ping on VMIDs 100, 101, 102, 103, 104, 105, 107, 108, 110, 111, 112, 114, 115, 1000
- Files: `TODO.md` (lines 31-33), `CLAUDE.md` (line 252-255)
- Impact: Health monitoring fails, network diagnostics broken, Telegram bot status checks incomplete (bot has no ping on home network itself), Uptime Kuma monitors may show false negatives
- Fix approach: Run `setcap cap_net_raw+ep /bin/ping` on each container (must be reapplied after iputils-ping updates)
**Version Pinning Warnings:**
- Issue: CLAUDE.md section 227-241 warns about hardcoded versions becoming stale
- Files: `homelab-documentation.md` (lines 217, 228, 239), `~/bin/updates` script shows version checking is implemented but some configs have `latest` tags
- Impact: Security patch delays, incompatibilities when manually deploying services
- Fix approach: Always query GitHub API for latest versions (updates script does this correctly for discovery phase)
## Known Bugs
**Telegram Bot Inbox Storage Race Condition:**
- Symptoms: Concurrent message writes could corrupt inbox file, messages may be lost
- Files: `telegram/bot.py` (lines 39, 200-220 message handling), `~/bin/telegram` (lines 73-79 clear command)
- Trigger: Multiple rapid messages from admin or concurrent bot operations
- Workaround: Clear inbox frequently and check for corruption; bot currently appends to file without locking
- Root cause: File-based inbox with no atomic writes or mutex protection
**PBS Backup Mount Dependency Not Enforced:**
- Symptoms: PBS services may start before Synology CIFS mount is available, backup path unreachable
- Files: `homelab-documentation.md` (lines 372-384), container 106 config
- Trigger: System reboot when Tailscale connectivity is delayed
- Workaround: Manual restart of proxmox-backup-proxy and proxmox-backup services
- Root cause: systemd dependency chain `After=mnt-synology.mount` doesn't guarantee mount is ready at service start time
**DragonflyDB Password in Plain Text in Documentation:**
- Symptoms: Database password visible in compose file and documentation
- Files: `homelab-documentation.md` (lines 248-250)
- Trigger: Anyone reading docs or inspecting git history
- Workaround: Consider password non-critical if container only accessible on internal network
- Root cause: Password stored in version control and documentation rather than .env or secrets file
**NPM Proxy Host 18 (mh.datalos.dk) Not Configured:**
- Symptoms: Domain not resolving despite DNS record missing and NPM entry (ID 18) mentioned in TODO
- Files: `TODO.md` (line 29), `homelab-documentation.md` (proxy hosts section)
- Trigger: Accessing mh.datalos.dk from browser
- Workaround: Must be configured manually via NPM web UI
- Root cause: Setup referenced in TODO but not completed
## Security Considerations
**Exposed Credentials in Git History:**
- Risk: Credential files committed (credentials, SSH keys, telegram token examples)
- Files: All credential files in `telegram/`, `pve/`, `forgejo/`, `dns/`, `dockge/`, `uptime-kuma/`, `beszel/`, `dns-services/` directories (8+ files)
- Current mitigation: Files are .gitignored in main repo but present in working directory
- Recommendations: Rotate all credentials listed, audit git log for historical commits, use HashiCorp Vault or pass for credential storage, document secret rotation procedure
**Public IP Hardcoded in Documentation:**
- Risk: Home IP 83.89.248.247 exposed in multiple locations
- Files: `homelab-documentation.md` (lines 98, 102), `CLAUDE.md` (line 256)
- Current mitigation: IP is already public/static, used for whitelist access
- Recommendations: Document that whitelisting this IP is intentional, no other PII mixed in
**Telegram Bot Authorization Model Too Permissive:**
- Risk: First user to message bot becomes admin automatically with no verification
- Files: `telegram/bot.py` (lines 86-95)
- Current mitigation: Bot only responds to authorized user, requires bot discovery
- Recommendations: Require multi-factor authorization on first start (e.g., PIN from environment variable), implement audit logging of all bot commands
**Database Credentials in Environment Variables:**
- Risk: DragonflyDB password passed via Docker command line (visible in `docker ps`, logs, process listings)
- Files: `homelab-documentation.md` (line 248)
- Current mitigation: Container only accessible on internal vmbr1 network
- Recommendations: Use Docker secrets or mounted .env files instead of command-line arguments
**Synology CIFS Credentials in fstab:**
- Risk: SMB credentials stored in plaintext in fstab file with mode 0644 (world-readable)
- Files: `homelab-documentation.md` (line 369)
- Current mitigation: Mounted on container-only network, requires PBS container access
- Recommendations: Use credentials file with mode 0600, rotate credentials regularly, monitor file permissions
**SSH Keys Included in Documentation:**
- Risk: Public SSH keys hardcoded in CLAUDE.md setup examples
- Files: `CLAUDE.md` and `homelab-documentation.md` SSH key examples
- Current mitigation: Public keys only (not private), used for container access
- Recommendations: Rotate these keys if documentation is ever exposed, don't include in public repos
## Performance Bottlenecks
**Single NVMe Storage (RAID0) Without Local Redundancy:**
- Problem: Core server has 2x1TB NVMe in RAID0 (striped, no redundancy)
- Files: `homelab-documentation.md` (lines 17-24)
- Cause: Cost optimization for Hetzner dedicated server
- Impact: Single drive failure = total data loss; database corruption risk from RAID0 stripe inconsistency
- Improvement path: (1) Ensure PBS backups run successfully to Synology, (2) Test backup restore procedure monthly, (3) Plan upgrade path if budget allows (3-way mirror or RAID1)
**Backup Dependency on Single Tailscale Gateway:**
- Problem: All PBS backups to Synology go through Tailscale relay (10.5.0.134), single point of failure
- Files: `homelab-documentation.md` (lines 317-427)
- Cause: Synology only accessible via Tailscale network, relay container required
- Impact: Tailscale relay downtime = backup failure; no local backup option
- Improvement path: (1) Add second Tailscale relay for redundancy, (2) Explore PBS direct SSH backup mode, (3) Monitor relay container health
**DNS Queries All Route Through Single Technitium Container:**
- Problem: All internal DNS (lab.georgsen.dk) goes through container 115, DHCP defaults to this server
- Files: `homelab-documentation.md` (lines 309-315), container config
- Cause: Single container architecture
- Impact: DNS outage = network unreachable (containers can't resolve any hostnames)
- Improvement path: (1) Deploy DNS replica on another container, (2) Configure DHCP to use multiple DNS servers, (3) Set upstream DNS fallback
**Script Execution via Telegram Bot with Subprocess Timeout:**
- Problem: Bot runs helper scripts with 30-second timeout, commands like PBS backup query can exceed limit
- Files: `telegram/bot.py` (lines 60-78, 191)
- Cause: Helper scripts do remote SSH execution, network latency variable
- Impact: Commands truncated mid-execution, incomplete status reports, timeouts on slow networks
- Improvement path: Increase timeout selectively, implement command queuing, cache results for frequently-called commands
## Fragile Areas
**Installer Shell Script with Unimplemented Sections:**
- Files: `pve-homelab-kit/install.sh` (495+ lines with TODO comments)
- Why fragile: Multiple TODO placeholders indicate incomplete implementation; wizard UI done but ~30 implementation TODOs remain
- Safe modification: (1) Don't merge branches without running through full install, (2) Test each section independently, (3) Add shell `set -e` error handling
- Test coverage: Script has no tests, no dry-run mode, no rollback capability
**Container Configuration Manual in LXC Config Files:**
- Files: `/etc/pve/lxc/*.conf` across Proxmox host (not in repo, not version controlled)
- Why fragile: Critical settings (features, ulimits, AppArmor) outside version control, drift risk after manual fixes
- Safe modification: Keep backup copies in `homelab-documentation.md` (already done for PBS), automate via Terraform/Ansible if future containers added
- Test coverage: Config changes only tested on live container (no staging env)
**Helper Scripts with Hardcoded IPs and Paths:**
- Files: `~/bin/updates` (lines 16-17, 130), `~/bin/pbs`, `~/bin/pve`, `~/bin/dns`
- Why fragile: DOCKGE_HOST, PVE_HOST hardcoded; if IPs change during migration, all scripts must be updated manually
- Safe modification: Extract to config file (e.g., `/etc/homelab/config.sh` or environment variables)
- Test coverage: Scripts tested against live infrastructure only
**SSH-Based Container Access Without Key Verification:**
- Files: `~/bin/updates` (lines 115-131), scripts use `-q` flag suppressing host key checks
- Why fragile: `ssh -q` disables StrictHostKeyChecking, vulnerable to MITM; scripts assume SSH keys are pre-installed
- Safe modification: Add `-o StrictHostKeyChecking=accept-new` to verify on first connection, document key distribution procedure
- Test coverage: SSH connectivity assumed working
**Backup Monitoring Without Alerting on Failure:**
- Files: `~/bin/pbs`, `telegram/bot.py` (status command only, no automatic failure alerts)
- Why fragile: Failed backups only visible if manually checked; no monitoring of backup completion
- Safe modification: Add systemd timer to check PBS status hourly, send Telegram alert on failure
- Test coverage: Manual checks only
## Scaling Limits
**Container IP Space Exhaustion:**
- Current capacity: vmbr1 is /24 (256 IPs, .0-.255), DHCP range .100-.200 (101 IPs available for DHCP), static IPs scattered
- Limit: After ~150 containers, IP fragmentation becomes difficult to manage; DHCP range conflicts with static allocation
- Scaling path: (1) Implement TODO IP scheme (VMID=IP), (2) Expand to /23 (512 IPs) if more containers needed, (3) Use vmbr2 (vSwitch) for secondary network
**Backup Datastore Single Synology Volume:**
- Current capacity: Synology `pbs-backup` share unknown size (not documented)
- Limit: Unknown when share becomes full; no warning system implemented
- Scaling path: (1) Document share capacity in homelab-documentation.md, (2) Add usage monitoring to `beszel` or Uptime Kuma, (3) Plan expansion to second NAS
**Dockge Stack Limit:**
- Current capacity: Dockge container 101 running ~8-10 stacks visible in documentation
- Limit: No documented resource constraints; may hit CPU/RAM limits on Hetzner AX52 with more containers
- Scaling path: (1) Monitor Dockge resource usage via Beszel, (2) Profile Dragonfly memory usage, (3) Plan VM migration for heavy workloads
**DNS Query Throughput:**
- Current capacity: Single Technitium container handling all internal DNS
- Limit: Container CPU/RAM limits unknown; no QPS monitoring
- Scaling path: (1) Add DNS replica, (2) Monitor query latency, (3) Profile Technitium logs for slow queries
## Dependencies at Risk
**Technitium DNS (Unmaintained Risk):**
- Risk: TechnitiumSoftware/DnsServer has irregular commit history; last significant release early 2024
- Impact: Security fixes may be delayed; compatibility with newer Linux kernels unknown
- Migration plan: (1) Profile current Technitium features used, (2) Evaluate CoreDNS or Dnsmasq alternatives, (3) Plan gradual migration with dual DNS
**DragonflyDB as Redis Replacement:**
- Risk: Dragonfly smaller ecosystem than Redis; breaking changes possible in minor updates
- Impact: Applications expecting Redis behavior may fail; less community support for issues
- Migration plan: (1) Pin Dragonfly version in compose file (currently `latest`), (2) Test upgrades in dev environment, (3) Document any API incompatibilities found
**Dockge (Single Maintainer Project):**
- Risk: Dockge maintained by one developer (louislam); bus factor high
- Impact: If maintainer loses interest, fixes and features stop; dependency on their release schedule
- Migration plan: (1) Use Dockge for UI only, don't depend on it for production orchestration, (2) Keep docker-compose expertise on team, (3) Consider Portainer as fallback alternative
**Forgejo (Younger than Gitea):**
- Risk: Forgejo is recent fork of Gitea; database schema changes possible in patch versions
- Impact: Upgrades may require manual migrations; data loss risk if migration fails
- Migration plan: (1) Test Forgejo upgrades on backup copy first, (2) Document upgrade procedure, (3) Keep Gitea as fallback if Forgejo breaks
## Missing Critical Features
**No Automated Health Monitoring/Alerting:**
- Problem: Status checks exist (via Telegram bot, Uptime Kuma) but no automatic alerts when services fail
- Blocks: Cannot sleep soundly; must manually check status to detect outages
- Implementation path: (1) Add Uptime Kuma HTTP monitors for all public services, (2) Create Telegram alert webhook, (3) Monitor PBS backup success daily
**No Automated Certificate Renewal Verification:**
- Problem: NPM handles Let's Encrypt renewal, but no monitoring for renewal failures
- Blocks: Certificates could expire silently; discovered during service failures
- Implementation path: (1) Add Uptime Kuma alert for HTTP 200 on https://* services, (2) Add monthly certificate expiry check, (3) Set up renewal failure alerts
**No Disaster Recovery Runbook:**
- Problem: Procedures for rescuing locked-out server (Hetzner Rescue Mode) not documented
- Blocks: If SSH access lost, cannot recover without external procedures
- Implementation path: (1) Document Hetzner Rescue Mode recovery steps, (2) Create network reconfiguration backup procedures, (3) Test rescue mode monthly
**No Change Log / Audit Trail:**
- Problem: Infrastructure changes not logged; drift from documentation occurs silently
- Blocks: Unknown who made changes, when, and why; cannot track config evolution
- Implementation path: (1) Add git commit requirement for all manual changes, (2) Create change notification to Telegram, (3) Weekly drift detection report
**No Secrets Management System:**
- Problem: Credentials scattered across plaintext files, git history, and documentation
- Blocks: Cannot safely share access with team members; no credential rotation capability
- Implementation path: (1) Deploy HashiCorp Vault or Vaultwarden, (2) Migrate all secrets to vault, (3) Create credential rotation procedures
## Test Coverage Gaps
**PBS Backup Restore Not Tested:**
- What's not tested: Full restore procedures; assumed to work but never verified
- Files: `homelab-documentation.md` (lines 325-392), no restore test documented
- Risk: If restore needed, may discover issues during actual data loss emergency
- Priority: HIGH - Add monthly restore test procedure (restore single VM to temporary location, verify data integrity)
**Network Failover Scenarios:**
- What's not tested: What happens if Tailscale relay (1000) goes down, if NPM container restarts, if DNS returns SERVFAIL
- Files: No documented failure scenarios
- Risk: Unknown recovery time; applications may hang instead of failing gracefully
- Priority: HIGH - Document and test each service's failure mode
**Helper Script Error Handling:**
- What's not tested: Scripts with SSH timeouts, host unreachable, malformed responses
- Files: `~/bin/updates`, `~/bin/pbs`, `~/bin/pve` (error handling exists but not tested against failures)
- Risk: Silent failures could go unnoticed; incomplete output returned to caller
- Priority: MEDIUM - Add error injection tests (mock SSH failures)
**Telegram Bot Commands Under Load:**
- What's not tested: Bot response when running concurrent commands, or when helper scripts timeout
- Files: `telegram/bot.py` (no load tests, concurrency behavior unknown)
- Risk: Bot may hang or lose messages under heavy load
- Priority: MEDIUM - Add load test with 10+ concurrent commands
**Container Migration (VMID IP Scheme Change):**
- What's not tested: Migration of 15+ containers to new IP scheme; full rollback procedures
- Files: `TODO.md` (line 5-15, planned but not executed)
- Risk: Single IP misconfiguration could take multiple services offline
- Priority: HIGH - Create detailed migration runbook with rollback at each step before executing
---
*Concerns audit: 2026-02-04*

View file

@ -0,0 +1,274 @@
# Coding Conventions
**Analysis Date:** 2026-02-04
## Naming Patterns
**Files:**
- Python files: lowercase with underscores (e.g., `bot.py`, `credentials`)
- Bash scripts: lowercase with hyphens (e.g., `npm-api`, `uptime-kuma`)
- Helper scripts in `~/bin/`: all lowercase, no extension (e.g., `pve`, `pbs`, `dns`)
**Functions:**
- Python: snake_case (e.g., `cmd_status()`, `get_authorized_users()`, `run_command()`)
- Bash: snake_case with `cmd_` prefix for command handlers (e.g., `cmd_status()`, `cmd_tasks()`)
- Bash: auxiliary functions also use snake_case (e.g., `ssh_pbs()`, `get_token()`)
**Variables:**
- Python: snake_case for local/module vars (e.g., `authorized_users`, `output_lines`)
- Python: UPPERCASE for constants (e.g., `TOKEN`, `INBOX_FILE`, `AUTHORIZED_FILE`, `NODE`, `PBS_HOST`)
- Bash: UPPERCASE for environment variables and constants (e.g., `PBS_HOST`, `TOKEN`, `BASE`, `DEFAULT_ZONE`)
- Bash: lowercase for local variables (e.g., `hours`, `cutoff`, `status_icon`)
**Types/Classes:**
- Python: PascalCase for imported classes (e.g., `ProxmoxAPI`, `Update`, `Application`)
- Dictionary/config keys: lowercase with hyphens or underscores (e.g., `token_name`, `max-mem`)
## Code Style
**Formatting:**
- No automated formatter detected in codebase
- Python: PEP 8 conventions followed informally
- 4-space indentation
- Max line length ~90-100 characters (observed in practice)
- Blank lines: 2 lines before module-level functions, 1 line before methods
- Bash: 4-space indentation (observed)
**Linting:**
- No linting configuration detected (no .pylintrc, .flake8, .eslintrc)
- Code style is manually maintained
**Docstrings:**
- Python: Triple-quoted strings at module level describing purpose
- Example from `telegram/bot.py`:
```python
"""
Homelab Telegram Bot
Two-way interactive bot for homelab management and notifications.
"""
```
- Python: Function docstrings used for major functions
- Single-line format for simple functions
- Example: `"""Handle /start command - first contact with bot."""`
- Example: `"""Load authorized user IDs."""`
## Import Organization
**Order:**
1. Standard library imports (e.g., `sys`, `os`, `json`, `subprocess`)
2. Third-party imports (e.g., `ProxmoxAPI`, `telegram`, `pocketbase`)
3. Local imports (rarely used in this codebase)
**Path Aliases:**
- No aliases detected
- Absolute imports used throughout
**Credential Loading Pattern:**
All scripts that need credentials follow the same pattern:
```python
# Load credentials
creds_path = Path.home() / ".config" / <service> / "credentials"
creds = {}
with open(creds_path) as f:
for line in f:
if '=' in line:
key, value = line.strip().split('=', 1)
creds[key] = value
```
Or in Bash:
```bash
source ~/.config/dns/credentials
```
## Error Handling
**Patterns:**
- Python: Try-except with broad exception catching (bare `except:` used in `pve` script lines 70, 82, 95, 101)
- Not ideal but pragmatic for CLI tools that need to try multiple approaches
- Example from `pve`:
```python
try:
status = pve.nodes(NODE).lxc(vmid).status.current.get()
# ...
return
except:
pass
```
- Python: Explicit exception handling in telegram bot
- Catches `subprocess.TimeoutExpired` specifically in `run_command()` function
- Example from `telegram/bot.py`:
```python
try:
result = subprocess.run(...)
output = result.stdout or result.stderr or "No output"
if len(output) > 4000:
output = output[:4000] + "\n... (truncated)"
return output
except subprocess.TimeoutExpired:
return "Command timed out"
except Exception as e:
return f"Error: {e}"
```
- Bash: Set strict mode with `set -e` in some scripts (`dns` script line 12)
- Causes script to exit on first error
- Bash: No error handling in most scripts (`pbs`, `beszel`, `kuma`)
- Relies on exit codes implicitly
**Return Value Handling:**
- Python: Functions return data directly or None on failure
- Example from `pbs` helper: Returns JSON-parsed data or string output
- Example from `pve`: Returns nothing (prints output), but uses exceptions for flow control
- Python: Command runner returns error strings: `"Command timed out"`, `"Error: {e}"`
## Logging
**Framework:**
- Python: Standard `logging` module
- Configured in `telegram/bot.py` lines 18-22:
```python
logging.basicConfig(
format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',
level=logging.INFO
)
logger = logging.getLogger(__name__)
```
- Log level: INFO
- Format includes timestamp, logger name, level, message
**Patterns:**
- `logger.info()` for general informational messages
- Example: `logger.info("Starting Homelab Bot...")`
- Example: `logger.info(f"Inbox message from {user.first_name}: {message[:50]}...")`
- Example: `logger.info(f"Photo saved from {user.first_name}: {filepath}")`
- Bash: Uses `echo` for output, no structured logging
- Informational messages for user feedback
- Error messages sent to stdout (not stderr)
## Comments
**When to Comment:**
- Module-level docstrings at top of file (required for all scripts)
- Usage examples in module docstrings (e.g., `pve`, `pbs`, `kuma`)
- Inline comments for complex logic (e.g., in `pbs` script parsing hex timestamps)
- Comments on tricky regex patterns (e.g., `pbs` tasks parsing)
**Bash Comments:**
- Header comment with script name, purpose, and usage (lines 1-10)
- Inline comments before major sections (e.g., `# Datastore info`, `# Storage stats`)
- No comments in simple expressions
**Python Comments:**
- Header comment with purpose (module docstring)
- Sparse inline comments except for complex sections
- Example from `telegram/bot.py` line 71: `# Telegram has 4096 char limit per message`
- Example from `pve` line 70: `# Try as container first`
## Function Design
**Size:**
- Python: Functions are generally 10-50 lines
- Smaller functions for simple operations (e.g., `is_authorized()` is 2 lines)
- Larger functions for command handlers that do setup + API calls (e.g., `status()` is 40 lines)
- Bash: Functions are typically 20-80 lines
- Longer functions acceptable for self-contained operations like `cmd_status()` in `pbs`
**Parameters:**
- Python: Explicit parameters, typically 1-5 parameters per function
- Optional parameters with defaults (e.g., `timeout: int = 30`, `port=45876`)
- Type hints not used consistently (some functions have them, many don't)
- Bash: Parameters passed as positional arguments
- Some functions take zero parameters and rely on global variables
- Example: `ssh_pbs()` in `pbs` uses global `$PBS_HOST`
**Return Values:**
- Python: Functions return data (strings, dicts, lists) or None
- Command handlers often return nothing (implicitly None)
- Helper functions return computed values (e.g., `is_authorized()` returns bool)
- Bash: Functions print output directly, return exit codes
- No explicit return values beyond exit codes
- Output captured by caller with `$()`
## Module Design
**Exports:**
- Python: All functions are module-level, no explicit exports
- `if __name__ == "__main__":` pattern used in all scripts to guard main execution
- Example from `beszel` lines 101-152
- Bash: All functions are script-level, called via case statement
- Main dispatch logic at bottom of script
- Example from `dns` lines 29-106: `case "$1" in ... esac`
**Async/Await (Telegram Bot Only):**
- Python telegram bot uses `asyncio` and `async def` for all handlers
- All command handlers are async (e.g., `async def start()`)
- Use `await` for async operations (e.g., `await update.message.reply_text()`)
- Example from `telegram/bot.py` lines 81-94:
```python
async def start(update: Update, context: ContextTypes.DEFAULT_TYPE):
"""Handle /start command - first contact with bot."""
user = update.effective_user
chat_id = update.effective_chat.id
# ... async operations with await
```
**File Structure:**
- Single-file modules: Most helpers are single files
- `telegram/bot.py`: Main bot implementation with all handlers
- `/bin/` scripts: Each script is self-contained with helper functions + main dispatch
## Data Structures
**JSON/Config Files:**
- Credentials files: Simple `KEY=value` format (no JSON)
- PBS task logging: Uses hex-encoded UPID format, parsed with regex
- Telegram bot: Saves messages to text files with timestamp prefix
- JSON output: Parsed with `python3 -c "import sys, json; ..."` in Bash scripts
**Error Response Patterns:**
- API calls: Check for `.get('status') == 'ok'` or similar
- Command execution: Check `returncode == 0`, capture stdout/stderr
- API clients: Let exceptions bubble up, caught at command handler level
## Conditionals and Flow Control
**Python:**
- if/elif/else chains for command dispatch
- Simple truthiness checks: `if not user_id:`, `if not alerts:`
- Example from `telegram/bot.py` line 86-100: Authorization check pattern
**Bash:**
- case/esac for command dispatch (preferred)
- if [[ ]] with regex matching for parsing
- Example from `pbs` lines 122-143: Complex regex with BASH_REMATCH array
## Security Patterns
**Credential Management:**
- Credentials stored in `~/.config/<service>/credentials` with restricted permissions (not enforced in code)
- Telegram token loaded from file, not environment
- Credentials never logged or printed
**Input Validation:**
- Bash: Basic validation with isalnum() check in `ping_host()` function
- Example: `if not host.replace('.', '').replace('-', '').isalnum():`
- Bash: Whitelist command names from case statements
- No SQL injection risk (no databases used directly)
**Shell Injection:**
- Bash scripts use quoted variables appropriately
- Some inline Python in Bash uses string interpolation (potential risk)
- Example from `dns` lines 31-37: `curl ... | python3 -c "..."` with variable interpolation
---
*Convention analysis: 2026-02-04*

View file

@ -0,0 +1,261 @@
# External Integrations
**Analysis Date:** 2026-02-04
## APIs & External Services
**Hypervisor Management:**
- **Proxmox VE (PVE)** - Cluster/node management
- SDK/Client: `proxmoxer` v2.2.0 (Python)
- Auth: Token-based (`root@pam!mgmt` token)
- Config: `~/.config/pve/credentials`
- Helper: `~/bin/pve` (list, status, start, stop, create-ct)
- Endpoint: https://65.108.14.165:8006 (local host core.georgsen.dk)
**Backup Management:**
- **Proxmox Backup Server (PBS)** - Centralized backup infrastructure
- API: REST over HTTPS at 10.5.0.6:8007
- Auth: Token-based (`root@pam!pve` token)
- Helper: `~/bin/pbs` (status, backups, tasks, errors, gc, snapshots, storage)
- Targets: core.georgsen.dk, pve01.warradejendomme.dk, pve02.warradejendomme.dk namespaces
- Datastore: Synology NAS via CIFS at 100.105.26.130 (Tailscale)
**DNS Management:**
- **Technitium DNS** - Internal DNS with API
- API: REST at http://10.5.0.2:5380/api/
- Auth: Username/password based
- Config: `~/.config/dns/credentials`
- Helper: `~/bin/dns` (list, records, add, delete, lookup)
- Internal zone: `lab.georgsen.dk`
- Upstream: Cloudflare (1.1.1.1), Google (8.8.8.8), Quad9 (9.9.9.9)
**Monitoring APIs:**
- **Uptime Kuma** - Status page & endpoint monitoring
- API: HTTP at 10.5.0.10:3001
- SDK/Client: `uptime-kuma-api` v1.2.1 (Python)
- Auth: Username/password login
- Config: `~/.config/uptime-kuma/credentials`
- Helper: `~/bin/kuma` (list, info, add-http, add-port, add-ping, delete, pause, resume)
- URL: https://status.georgsen.dk
- **Beszel** - Server metrics dashboard
- Backend: PocketBase REST API at 10.5.0.10:8090
- SDK/Client: `pocketbase` v0.15.0 (Python)
- Auth: Admin email/password
- Config: `~/.config/beszel/credentials`
- Helper: `~/bin/beszel` (list, status, add, delete, alerts)
- URL: https://dashboard.georgsen.dk
- Agents: core (10.5.0.254), PBS (10.5.0.6), Dockge (10.5.0.10 + Docker stats)
- Data retention: 30 days (automatic)
**Reverse Proxy & SSL:**
- **Nginx Proxy Manager (NPM)** - Reverse proxy with SSL
- API: JSON-RPC style (internal Docker API)
- Helper: `~/bin/npm-api` (--host-list, --host-create, --host-delete, --cert-list)
- Config: `~/.config/npm/npm-api.conf` (custom API wrapper)
- UI: http://10.5.0.1:81 (admin panel)
- SSL Provider: Let's Encrypt (HTTP-01 challenge)
- Access Control: NPM Access Lists (ID 1: "home_only" whitelist 83.89.248.247)
**Git/Version Control:**
- **Forgejo** - Self-hosted Git server
- API: REST at 10.5.0.14:3000/api/v1/
- Auth: API token based
- Config: `~/.config/forgejo/credentials`
- URL: https://git.georgsen.dk
- Repo: `git@10.5.0.14:mikkel/homelab.git`
- Version: v10.0.1
**Data Stores:**
- **DragonflyDB** - Redis-compatible in-memory store
- Host: 10.5.0.10 (Docker in Dockge)
- Port: 6379
- Protocol: Redis protocol
- Auth: Password protected (`nUq/IfoIQJf/kouckKHRQOk7vV0NwCuI`)
- Client: redis-cli or any Redis library
- Usage: Session/cache storage
- **PostgreSQL** - Relational database
- Host: 10.5.0.109 (VMID 103)
- Default port: 5432
- Managed by: Community (Proxmox LXC community images)
- Usage: Sentry system and other applications
## Data Storage
**Databases:**
- **PostgreSQL 13+** (VMID 103)
- Connection: `postgresql://user@10.5.0.109:5432/dbname`
- Client: psql (CLI) or any PostgreSQL driver
- Usage: Sentry defense intelligence system, application databases
- **DragonflyDB** (Redis-compatible)
- Connection: `redis://10.5.0.10:6379` (with auth)
- Client: redis-cli or Python redis library
- Backup: Enabled in Docker config, persists to `./data/`
- **Redis** (VMID 104, deprecated in favor of DragonflyDB)
- Host: 10.5.0.111
- Status: Still active but DragonflyDB preferred
**File Storage:**
- **Local Filesystem:** Each container has ZFS subvolume storage at /
- **Shared Storage (ZFS):** `/shared/mikkel/stuff` bind-mounted into containers
- PVE: `rpool/shared/mikkel` dataset
- mgmt (102): `~/stuff` with backup=1 (included in PBS backups)
- dev (111): `~/stuff` (shared access)
- general (113): `~/stuff` (shared access)
- SMB Access: `\\mgmt\stuff` via Tailscale MagicDNS
**Backup Target:**
- **Synology NAS** (home network)
- Tailscale IP: 100.105.26.130
- Mount: `/mnt/synology` on PBS
- Protocol: CIFS/SMB 3.0
- Share: `/volume1/pbs-backup`
- UID mapping: Mapped to admin (squash: map all)
## Authentication & Identity
**Auth Providers:**
- **Proxmox PAM** - System-based authentication for PVE/PBS
- Users: root@pam, other system users
- Token auth: `root@pam!mgmt` (PVE), `root@pam!pve` (PBS)
**SSH Key Authentication:**
- **Ed25519 keys** for user access
- Key: `ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIIOQrK06zVkfY6C1ec69kEZYjf8tC98icCcBju4V751i mikkel@georgsen.dk`
- Deployed to all containers at `~/.ssh/authorized_keys` and `/root/.ssh/authorized_keys`
**Telegram Bot Authentication:**
- **Telegram Bot Token** - Stored in `~/telegram/credentials`
- **Authorized Users:** Whitelist stored in `~/telegram/authorized_users` (chat IDs)
- **First user:** Auto-authorized on first `/start` command
- **Two-way messaging:** Text/photos/files saved to `~/telegram/inbox`
## Monitoring & Observability
**Error Tracking:**
- **Sentry** (custom defense intelligence system, VMID 105)
- Purpose: Monitor military contracting opportunities
- Databases: PostgreSQL (103) + Redis (104)
- Not a traditional error tracker - custom business intelligence system
**Metrics & Monitoring:**
- **Beszel**: Server CPU, RAM, disk usage metrics
- **Uptime Kuma**: HTTP, TCP port, ICMP ping monitoring
- **PBS**: Backup task logs, storage metrics, dedup stats
**Logs:**
- **PBS logs:** SSH queries via `~/bin/pbs`, stored on PBS container
- **Forgejo logs:** `/var/lib/forgejo/log/forgejo.log` (for fail2ban)
- **Telegram bot logs:** stdout to systemd service `telegram-bot.service`
- **Helper scripts:** Output to stdout, can be piped/redirected
## CI/CD & Deployment
**Hosting:**
- **Hetzner** (public cloud) - Primary: core.georgsen.dk (AX52)
- **Home Infrastructure** - Synology NAS for backups, future NUC cluster
- **Docker/Dockge** - Application deployment via Docker Compose (10.5.0.10)
**CI Pipeline:**
- **None detected** - Manual deployment via Dockge or container management
- **Version control:** Forgejo (self-hosted Git server)
- **Update checks:** `~/bin/updates` script checks for updates across services
- Tracked: dragonfly, beszel, uptime-kuma, snappymail, dockge, npm, forgejo, dns, pbs
**Deployment Tools:**
- **Dockge** - Docker Compose UI for stack management
- **PVE API** - Proxmox VE for container/VM provisioning
- **Helper scripts** - `~/bin/pve create-ct` for automated container creation
## Environment Configuration
**Required Environment Variables (in credential files):**
DNS (`~/.config/dns/credentials`):
```
DNS_HOST=10.5.0.2
DNS_PORT=5380
DNS_USER=admin
DNS_PASS=<password>
```
Proxmox (`~/.config/pve/credentials`):
```
host=65.108.14.165:8006
user=root@pam
token_name=mgmt
token_value=<token>
```
Uptime Kuma (`~/.config/uptime-kuma/credentials`):
```
KUMA_HOST=10.5.0.10
KUMA_PORT=3001
KUMA_USER=admin
KUMA_PASS=<password>
```
Beszel (`~/.config/beszel/credentials`):
```
BESZEL_HOST=10.5.0.10
BESZEL_PORT=8090
BESZEL_USER=admin@example.com
BESZEL_PASS=<password>
```
Telegram (`~/telegram/credentials`):
```
TELEGRAM_BOT_TOKEN=<token>
```
## Webhooks & Callbacks
**Incoming Webhooks:**
- **Uptime Kuma** - No webhook ingestion detected
- **PBS** - Backup completion tasks (internal scheduling, no external webhooks)
- **Forgejo** - No webhook configuration documented
**Outgoing Notifications:**
- **Telegram Bot** - Two-way messaging for homelab status
- Commands: /status, /pbs, /backups, /beszel, /kuma, /ping
- File uploads: Photos saved to `~/telegram/images/`, documents to `~/telegram/files/`
- Text inbox: Messages saved to `~/telegram/inbox` for Claude review
**Event-Driven:**
- **PBS Scheduling** - Daily backup tasks at 01:00, 01:30, 02:00 (core, pve01, pve02)
- **Prune/GC** - Scheduled at 21:00 (prune) and 22:30 (garbage collection)
## VPN & Remote Access
**Tailscale Network:**
- **Primary relay:** 10.5.0.134 + 10.9.1.10 (VMID 1000, exit node capable)
- **Tailscale IPs:**
- PBS: 100.115.85.120
- Synology NAS: 100.105.26.130
- dev: 100.85.227.17
- sentry: 100.83.236.113
- Friends' nodes: pve01 (100.99.118.54), pve02 (100.82.87.108)
- Other devices: mge-t14, mikflix, xanderryzen, nvr01, tailscalemg
**SSH Access Pattern:**
- All containers/VMs accessible via SSH from mgmt (102)
- SSH keys pre-deployed to all systems
- Tailscale used for accessing from external networks
## External DNS
**DNS Provider:** dns.services (Danish free DNS with API)
- Domains managed:
- georgsen.dk
- dataloes.dk
- microsux.dk
- warradejendomme.dk
- Used for external domain registration only
- Internal zone lookups go to Technitium (10.5.0.2)
---
*Integration audit: 2026-02-04*

152
.planning/codebase/STACK.md Normal file
View file

@ -0,0 +1,152 @@
# Technology Stack
**Analysis Date:** 2026-02-04
## Languages
**Primary:**
- **Bash** - Infrastructure automation, API wrappers, system integration
- Helper scripts at `~/bin/` for service APIs
- Installation and setup in `pve-homelab-kit/install.sh`
- **Python 3.12.3** - Management tools, monitoring, bot automation
- Virtual environment: `~/venv/` (activated with `source ~/venv/bin/activate`)
- Primary usage: API clients, Telegram bot, helper scripts
## Runtime
**Environment:**
- **Python 3.12.3** (system)
- **Bash 5+** (system shell)
**Package Manager:**
- **pip** v24.0 (Python package manager)
- Lockfile: Virtual environment at `~/venv/` (not traditional pip.lock)
## Frameworks
**Core Infrastructure:**
- **Proxmox VE** (v8.x) - Hypervisor/container platform on core.georgsen.dk
- **Proxmox Backup Server (PBS)** v2.x - Backup infrastructure (10.5.0.6:8007)
- **LXC Containers** - Primary virtualization method
- **KVM VMs** - Full VMs when needed (mail server VM 200)
- **Docker/Docker Compose** - Application deployment via Dockge (10.5.0.10)
**Application Frameworks:**
- **Nginx Proxy Manager (NPM)** v2.x - Reverse proxy, SSL (10.5.0.1:80/443/81)
- **Dockge** - Docker Compose stack management UI (10.5.0.10:5001)
- **Forgejo** v10.0.1 - Self-hosted Git server (10.5.0.14:3000)
- **Technitium DNS** - DNS server with API (10.5.0.2:5380)
**Monitoring & Observability:**
- **Uptime Kuma** - Service/endpoint monitoring (10.5.0.10:3001)
- **Beszel** - Server metrics dashboard (10.5.0.10:8090)
**Messaging:**
- **Stalwart Mail Server** - Mail server (VM 200, IP 65.108.14.164)
- **Snappymail** - Webmail UI (djmaze/snappymail:latest, 10.5.0.10:8888)
**Data Storage:**
- **DragonflyDB** - Redis-compatible in-memory datastore (10.5.0.10:6379)
- Password protected, used for session/cache storage
- **PostgreSQL 13+** (VMID 103, 10.5.0.109) - Community managed database
- **Redis/DragonflyDB** (VMID 104, 10.5.0.111) - Session/cache store
## Key Dependencies
**Python Packages (in ~/venv/):**
**Proxmox API:**
- `proxmoxer` v2.2.0 - Python API client for Proxmox VE
- File: `~/bin/pve` (list, status, start, stop, create-ct operations)
**Monitoring APIs:**
- `uptime-kuma-api` v1.2.1 - Uptime Kuma monitoring client
- File: `~/bin/kuma` (monitor management)
- `pocketbase` v0.15.0 - Beszel dashboard backend client
- File: `~/bin/beszel` (system monitoring)
**Communications:**
- `python-telegram-bot` v22.5 - Telegram Bot API
- File: `~/telegram/bot.py` (homelab management bot)
**HTTP Clients:**
- `requests` v2.32.5 - HTTP library for API calls
- `httpx` v0.28.1 - Async HTTP client
- `urllib3` v2.6.3 - Low-level HTTP client
**Networking & WebSockets:**
- `websocket-client` v1.9.0 - WebSocket client library
- `python-socketio` v5.16.0 - Socket.IO client
- `simple-websocket` v1.1.0 - WebSocket utilities
**Utilities:**
- `certifi` v2026.1.4 - SSL certificate verification
- `charset-normalizer` v3.4.4 - Character encoding detection
- `packaging` v25.0 - Version/requirement parsing
## Configuration
**Environment:**
- **Bash scripts:** Load credentials from `~/.config/{service}/credentials` files
- `~/.config/pve/credentials` - Proxmox API token
- `~/.config/dns/credentials` - Technitium DNS API
- `~/.config/beszel/credentials` - Beszel dashboard API
- `~/.config/uptime-kuma/credentials` - Uptime Kuma API
- `~/.config/forgejo/credentials` - Forgejo Git API
- **Python scripts:** Similar credential loading pattern
- **Telegram bot:** `~/telegram/credentials` file with `TELEGRAM_BOT_TOKEN`
**Build & Runtime Configuration:**
- Python venv activation: `source ~/venv/bin/activate`
- Helper scripts use shebang: `#!/home/mikkel/venv/bin/python3` or `#!/bin/bash`
- All scripts in `~/bin/` are executable and PATH-accessible
**Documentation:**
- `CLAUDE.md` - Development environment guidance
- `homelab-documentation.md` - Infrastructure reference (22KB, comprehensive)
- `README.md` - Quick container/service overview
- `TODO.md` - Pending maintenance tasks
## Platform Requirements
**Development/Management:**
- **Container:** LXC on Proxmox VE (VMID 102, "mgmt")
- **OS:** Debian-based Linux (venv requires Linux filesystem)
- **User:** mikkel (UID 1000, group georgsen GID 1000)
- **SSH:** Pre-installed keys for accessing other containers/VMs
- **Network:** Tailscale VPN for external access, internal vmbr1 (10.5.0.0/24)
**Production (Core Server):**
- **Provider:** Hetzner AX52 (Helsinki)
- **CPU:** AMD Ryzen 7 3700X
- **RAM:** 64GB ECC
- **Storage:** 2x 1TB NVMe (RAID0 via ZFS)
- **Public IP:** 65.108.14.165/26 (BGP routed)
- **Network bridges:** vmbr0 (public), vmbr1 (internal), vmbr2 (vSwitch)
**Backup Target:**
- **Synology NAS** (home network via Tailscale)
- **Protocol:** CIFS/SMB 3.0 over Tailscale
- **Mount point on PBS:** `/mnt/synology` (bind-mounted as datastore)
## Deployment & Access
**Service URLs:**
- **Proxmox Web UI:** https://65.108.14.165:8006 (public, home IP whitelisted)
- **NPM Admin:** http://10.5.0.1:81 (internal only)
- **DNS Admin:** https://dns.georgsen.dk (home IP whitelisted via access list)
- **PBS Web UI:** https://pbs.georgsen.dk:8007 (home IP whitelisted)
- **Dockge Admin:** https://dockge.georgsen.dk:5001 (home IP whitelisted)
- **Forgejo:** https://git.georgsen.dk (public)
- **Status Page:** https://status.georgsen.dk (Uptime Kuma)
- **Dashboard:** https://dashboard.georgsen.dk (Beszel metrics)
**SSL Certificates:**
- **Provider:** Let's Encrypt via NPM
- **Challenge method:** HTTP-01
- **Auto-renewal:** Handled by NPM
---
*Stack analysis: 2026-02-04*

View file

@ -0,0 +1,228 @@
# Codebase Structure
**Analysis Date:** 2026-02-04
## Directory Layout
```
/home/mikkel/homelab/
├── .planning/ # Planning and analysis artifacts
│ └── codebase/ # Codebase documentation (ARCHITECTURE.md, STRUCTURE.md, etc.)
├── .git/ # Git repository metadata
├── telegram/ # Telegram bot and message storage
│ ├── bot.py # Main bot implementation
│ ├── credentials # Telegram bot token (env var: TELEGRAM_BOT_TOKEN)
│ ├── authorized_users # Allowlist of chat IDs (one per line)
│ ├── inbox # Messages from admin (appended on each message)
│ ├── images/ # Photos sent via Telegram (timestamped)
│ └── files/ # Files sent via Telegram (timestamped)
├── pve-homelab-kit/ # PVE installation kit (subproject)
│ ├── install.sh # Installation script
│ ├── PROMPT.md # Project context for Claude
│ ├── .planning/ # Subproject planning docs
│ └── README.md # Setup instructions
├── npm/ # Nginx Proxy Manager configuration
│ └── npm-api.conf # API credentials reference
├── dockge/ # Docker Compose Manager configuration
│ └── credentials # Dockge API access
├── dns/ # Technitium DNS configuration
│ └── credentials # DNS API credentials (env vars: DNS_HOST, DNS_PORT, DNS_USER, DNS_PASS)
├── dns-services/ # DNS services configuration
│ └── credentials # Alternative DNS credentials
├── pve/ # Proxmox VE configuration
│ └── credentials # PVE API credentials (env vars: host, user, token_name, token_value)
├── beszel/ # Beszel monitoring dashboard
│ ├── credentials # Beszel API credentials
│ └── README.md # API and agent setup guide
├── forgejo/ # Forgejo Git server configuration
│ └── credentials # Forgejo API access
├── uptime-kuma/ # Uptime Kuma monitoring
│ ├── credentials # Kuma API credentials (env vars: KUMA_HOST, KUMA_PORT, KUMA_API_KEY)
│ ├── README.md # REST API reference and Socket.IO documentation
│ └── kuma_api_doc.png # Full API documentation screenshot
├── README.md # Repository overview and service table
├── CLAUDE.md # Claude Code guidance and infrastructure quick reference
├── homelab-documentation.md # Authoritative infrastructure documentation
├── TODO.md # Pending maintenance tasks
└── .gitignore # Git ignore patterns (credentials, sensitive files)
```
## Directory Purposes
**telegram/:**
- Purpose: Two-way Telegram bot for management commands and admin notifications
- Contains: Python bot code, token credentials, authorized user allowlist, message inbox, uploaded media
- Key files: `bot.py` (407 lines), `credentials`, `authorized_users`, `inbox`
- Not committed: `credentials`, `inbox`, `images/*`, `files/*` (in `.gitignore`)
**pve-homelab-kit/:**
- Purpose: Standalone PVE installation and initial setup toolkit
- Contains: Installation script, configuration examples, planning documents
- Key files: `install.sh` (executable automation), `PROMPT.md` (context for Claude), subproject `.planning/`
- Notes: Separate git repository (submodule or independent), for initial PVE deployment
**npm/:**
- Purpose: Nginx Proxy Manager reverse proxy configuration
- Contains: API credentials reference
- Key files: `npm-api.conf`
**dns/ & dns-services/:**
- Purpose: Technitium DNS server configuration (dual credential sets)
- Contains: API authentication credentials
- Key files: `credentials` (host, port, user, password)
**pve/:**
- Purpose: Proxmox VE API access credentials
- Contains: Token-based authentication data
- Key files: `credentials` (host, user, token_name, token_value)
**dockge/, forgejo/, beszel/, uptime-kuma/:**
- Purpose: Service-specific API credentials and documentation
- Contains: Token/API key for each service
- Key files: `credentials`, service-specific `README.md` (beszel, uptime-kuma)
**homelab-documentation.md:**
- Purpose: Authoritative reference for all infrastructure details
- Contains: Network topology, VM/container registry, service mappings, security rules, firewall config
- Must be updated whenever: services added/removed, IPs changed, configurations modified
**CLAUDE.md:**
- Purpose: Claude Code (AI assistant) guidance and quick reference
- Contains: Environment setup, helper script signatures, API access patterns, security notes
- Auto-loaded by Claude when working in this repository
**.planning/codebase/:**
- Purpose: GSD codebase analysis artifacts
- Will contain: ARCHITECTURE.md, STRUCTURE.md, CONVENTIONS.md, TESTING.md, STACK.md, INTEGRATIONS.md, CONCERNS.md
- Generated by: GSD codebase mapper, consumed by GSD planner/executor
## Key File Locations
**Entry Points:**
- `telegram/bot.py`: Telegram bot entry point (asyncio-based)
- `pve-homelab-kit/install.sh`: Initial PVE setup entry point
**Configuration:**
- `homelab-documentation.md`: Infrastructure reference (IPs, ports, network topology, firewall rules)
- `CLAUDE.md`: Claude Code environment setup and quick reference
- `.planning/`: Planning and analysis artifacts
**Core Logic:**
- `~/bin/pve`: Proxmox VE API wrapper (Python, 200 lines)
- `~/bin/dns`: Technitium DNS API wrapper (Bash, 107 lines)
- `~/bin/pbs`: PBS backup status and management (Bash, 400+ lines)
- `~/bin/beszel`: Beszel monitoring dashboard API (Bash/Python, 137 lines)
- `~/bin/kuma`: Uptime Kuma monitor management (Bash, 144 lines)
- `~/bin/updates`: Service version checking and updates (Bash, 450+ lines)
- `~/bin/telegram`: CLI helper for Telegram bot control (2-way messaging)
- `~/bin/npm-api`: NPM reverse proxy management (wrapper script)
- `telegram/bot.py`: Telegram bot with command handlers and media management
**Testing:**
- Not applicable (no automated tests in this repository)
## Naming Conventions
**Files:**
- Lowercase with hyphens for multi-word names: `npm-api`, `uptime-kuma`, `pve-homelab-kit`
- Markdown documentation: UPPERCASE.md (`README.md`, `CLAUDE.md`, `homelab-documentation.md`)
- Configuration/credential files: lowercase `credentials` with optional zone prefix
**Directories:**
- Service-specific: lowercase, match service name (`npm`, `dns`, `dockge`, `forgejo`, `beszel`, `telegram`)
- Functional: category name (`pve`, `pve-homelab-kit`)
- Hidden: `.planning`, `.git` for system metadata
**Variables & Parameters:**
- Environment variables: UPPERCASE_WITH_UNDERSCORES (e.g., `TELEGRAM_BOT_TOKEN`, `DNS_HOST`, `KUMA_API_KEY`)
- Bash functions: lowercase_with_underscores (e.g., `get_token()`, `run_command()`, `ssh_pbs()`)
- Python functions: lowercase_with_underscores (e.g., `is_authorized()`, `run_command()`, `get_status()`)
## Where to Add New Code
**New Helper Script (CLI tool):**
- Primary code: `~/bin/{service_name}` (no extension, executable)
- Credentials: `~/.config/{service_name}/credentials`
- Documentation: Top-of-file comment with usage examples
- Language: Bash for shell commands/APIs, Python for complex logic (use Python venv)
**New Service Configuration:**
- Directory: `/home/mikkel/homelab/{service_name}/`
- Credentials file: `{service_name}/credentials`
- Documentation: `{service_name}/README.md` (include API examples and setup)
- Git handling: All credentials in `.gitignore`, document as `credentials.example` if needed
**New Telegram Bot Command:**
- File: `telegram/bot.py` (add function to existing handlers section)
- Pattern: Async function named `cmd_name()`, check authorization first with `is_authorized()`
- Result: Send back via `update.message.reply_text()`
- Timeout: Default 30 seconds (configurable via `run_command()`)
**New Documentation:**
- Infrastructure changes: Update `homelab-documentation.md` (IPs, service registry, network config)
- Claude Code guidance: Update `CLAUDE.md` (new helper scripts, environment setup)
- Service-specific: Create `{service_name}/README.md` with API examples and access patterns
**Shared Utilities:**
- Location: Create in `~/lib/` or `~/venv/lib/` for Python packages
- Access: Import in other scripts or source in Bash
## Special Directories
**.planning/codebase/:**
- Purpose: GSD analysis artifacts
- Generated: Yes (by GSD codebase mapper)
- Committed: Yes (part of repository for reference)
**telegram/images/ & telegram/files/:**
- Purpose: Media uploaded via Telegram bot
- Generated: Yes (bot downloads on receipt)
- Committed: No (in `.gitignore`)
**telegram/inbox:**
- Purpose: Admin messages to Claude
- Generated: Yes (bot appends messages)
- Committed: No (in `.gitignore`)
**.git/**
- Purpose: Git repository metadata
- Generated: Yes (by git)
- Committed: No (system directory)
**pve-homelab-kit/.planning/**
- Purpose: Subproject planning documents
- Generated: Yes (by GSD mapper on subproject)
- Committed: Yes (tracked in subproject)
## Credential File Organization
All credentials stored in `~/.config/{service}/credentials` using key=value format (one per line):
```bash
# ~/.config/pve/credentials
host=core.georgsen.dk
user=root@pam
token_name=automation
token_value=<token-uuid>
# ~/.config/dns/credentials
DNS_HOST=10.5.0.2
DNS_PORT=5380
DNS_USER=admin
DNS_PASS=<password>
# ~/.config/beszel/credentials
BESZEL_HOST=10.5.0.10
BESZEL_PORT=8090
BESZEL_USER=<email>
BESZEL_PASS=<password>
```
**Loading Pattern:**
- Bash: `source ~/.config/{service}/credentials` or inline `$(cat ~/.config/{service}/credentials | grep ^KEY= | cut -d= -f2-)`
- Python: Read file, parse `key=value` lines into dict
- Never hardcode credentials in scripts
---
*Structure analysis: 2026-02-04*

View file

@ -0,0 +1,324 @@
# Testing Patterns
**Analysis Date:** 2026-02-04
## Test Framework
**Current State:**
- **No automated testing detected** in this codebase
- No test files found (no `*.test.py`, `*_test.py`, `*.spec.py` files)
- No testing configuration files (no `pytest.ini`, `tox.ini`, `setup.cfg`)
- No test dependencies in requirements (no pytest, unittest, mock imports)
**Implications:**
This is a **scripts-only codebase** - all code consists of CLI helper scripts and one bot automation. Manual testing is the primary validation method.
## Script Testing Approach
Since this codebase consists entirely of helper scripts and automation, testing is manual and implicit:
**Command-Line Validation:**
- Each script has a usage/help message showing all commands
- Example from `pve`:
```python
if len(sys.argv) < 2:
print(__doc__)
sys.exit(1)
```
- Example from `telegram`:
```bash
case "${1:-}" in
send) cmd_send "$2" ;;
inbox) cmd_inbox ;;
*) usage; exit 1 ;;
esac
```
**Entry Point Testing:**
Main execution guards are used throughout:
```python
if __name__ == "__main__":
main()
```
This allows scripts to be imported (theoretically) without side effects, though in practice they are not used as modules.
## API Integration Testing
**Pattern: Try-Except Fallback:**
Many scripts handle multiple service types by trying different approaches:
From `pve` script (lines 55-85):
```python
def get_status(vmid):
"""Get detailed status of a VM/container."""
vmid = int(vmid)
# Try as container first
try:
status = pve.nodes(NODE).lxc(vmid).status.current.get()
# ... container-specific logic
return
except:
pass
# Try as VM
try:
status = pve.nodes(NODE).qemu(vmid).status.current.get()
# ... VM-specific logic
return
except:
pass
print(f"VMID {vmid} not found")
```
This is a pragmatic testing pattern: if one API call fails, try another. Useful for development but fragile without structured error handling.
## Command Dispatch Testing
**Pattern: Argument Validation:**
All scripts validate argument count before executing commands:
From `beszel` script (lines 101-124):
```python
if __name__ == "__main__":
if len(sys.argv) < 2:
usage()
cmd = sys.argv[1]
try:
if cmd == "list":
cmd_list()
elif cmd == "info" and len(sys.argv) == 3:
cmd_info(sys.argv[2])
elif cmd == "add" and len(sys.argv) >= 4:
# ...
else:
usage()
except Exception as e:
print(f"Error: {e}")
sys.exit(1)
```
This catches typos in command names and wrong argument counts, showing usage help.
## Data Processing Testing
**Bash String Parsing:**
Complex regex patterns used in `pbs` script require careful testing:
From `pbs` (lines 122-143):
```bash
ssh_pbs 'tail -500 /var/log/proxmox-backup/tasks/archive 2>/dev/null' | while IFS= read -r line; do
if [[ "$line" =~ UPID:pbs:[^:]+:[^:]+:[^:]+:([0-9A-Fa-f]+):([^:]+):([^:]+):.*\ [0-9A-Fa-f]+\ (OK|ERROR|WARNINGS[^$]*) ]]; then
task_time=$((16#${BASH_REMATCH[1]}))
task_type="${BASH_REMATCH[2]}"
task_target="${BASH_REMATCH[3]}"
status="${BASH_REMATCH[4]}"
# ... process matched groups
fi
done
```
**Manual Testing Approach:**
- Run command against live services
- Inspect output format visually
- Verify JSON parsing with inline Python:
```bash
echo "$gc_json" | python3 -c "import sys,json; d=json.load(sys.stdin); print(d.get('disk-bytes',0))"
```
## Mock Testing Pattern (Telegram Bot)
The telegram bot has one pattern that resembles mocking - subprocess mocking via `run_command()`:
From `telegram/bot.py` (lines 60-78):
```python
def run_command(cmd: list, timeout: int = 30) -> str:
"""Run a shell command and return output."""
try:
result = subprocess.run(
cmd,
capture_output=True,
text=True,
timeout=timeout,
env={**os.environ, 'PATH': f"/home/mikkel/bin:{os.environ.get('PATH', '')}"}
)
output = result.stdout or result.stderr or "No output"
# Telegram has 4096 char limit per message
if len(output) > 4000:
output = output[:4000] + "\n... (truncated)"
return output
except subprocess.TimeoutExpired:
return "Command timed out"
except Exception as e:
return f"Error: {e}"
```
This function:
- Runs external commands with timeout protection
- Handles both stdout and stderr
- Truncates output for Telegram's message size limits
- Returns error messages instead of raising exceptions
This enables testing command handlers by mocking which commands are available.
## Timeout Testing
The telegram bot handles timeouts explicitly:
From `telegram/bot.py`:
```python
result = subprocess.run(
["ping", "-c", "3", "-W", "2", host],
capture_output=True,
text=True,
timeout=10 # 10 second timeout
)
```
Different commands have different timeouts:
- `ping_host()`: 10 second timeout
- `run_command()`: 30 second default (configurable)
- `backups()`: 60 second timeout (passed to run_command)
This prevents the bot from hanging on slow/unresponsive services.
## Error Message Testing
Scripts validate successful API responses:
From `dns` script (lines 62-69):
```bash
curl -s "$BASE/zones/records/add?..." | python3 -c "
import sys, json
data = json.load(sys.stdin)
if data['status'] == 'ok':
print(f\"Added: {data['response']['addedRecord']['name']} -> ...\")
else:
print(f\"Error: {data.get('errorMessage', 'Unknown error')}\")
"
```
This pattern:
- Parses JSON response
- Checks status field
- Returns user-friendly error message on failure
## Credential Testing
Scripts assume credentials exist and are properly formatted:
From `pve` (lines 17-34):
```python
creds_path = Path.home() / ".config" / "pve" / "credentials"
creds = {}
with open(creds_path) as f:
for line in f:
if "=" in line:
key, value = line.strip().split("=", 1)
creds[key] = value
pve = ProxmoxAPI(
creds["host"],
user=creds["user"],
token_name=creds["token_name"],
token_value=creds["token_value"],
verify_ssl=False
)
```
**Missing Error Handling:**
- No check that credentials file exists
- No check that required keys are present
- No validation that API connection succeeds
- Will crash with KeyError or FileNotFoundError if file missing
**Recommendation for Testing:**
Add pre-flight validation:
```python
required_keys = ["host", "user", "token_name", "token_value"]
missing = [k for k in required_keys if k not in creds]
if missing:
print(f"Error: Missing credentials: {', '.join(missing)}")
sys.exit(1)
```
## File I/O Testing
Telegram bot handles file operations defensively:
From `telegram/bot.py` (lines 277-286):
```python
# Create images directory
images_dir = Path(__file__).parent / 'images'
images_dir.mkdir(exist_ok=True)
# Get the largest photo (best quality)
photo = update.message.photo[-1]
file = await context.bot.get_file(photo.file_id)
# Download the image
filename = f"{file_timestamp}.jpg"
filepath = images_dir / filename
await file.download_to_drive(filepath)
```
**Patterns:**
- `mkdir(exist_ok=True)`: Safely creates directory, doesn't error if exists
- Timestamp-based filenames to avoid collisions: `f"{file_timestamp}_{original_name}"`
- Pathlib for cross-platform path handling
## What to Test If Writing Tests
If converting to automated tests, prioritize:
**High Priority:**
1. **Telegram bot command dispatch** (`telegram/bot.py` lines 107-366)
- Each command handler should have unit tests
- Mock `subprocess.run()` to avoid calling actual commands
- Test authorization checks (`is_authorized()`)
- Test output truncation for large responses
2. **Credential loading** (all helper scripts)
- Test missing credentials file error
- Test malformed credentials
- Test missing required keys
3. **API response parsing** (`dns`, `pbs`, `beszel`, `kuma`)
- Test JSON parsing errors
- Test malformed responses
- Test status code handling
**Medium Priority:**
1. **Bash regex parsing** (`pbs` task/error log parsing)
- Test hex timestamp conversion
- Test status code extraction
- Test task target parsing with special characters
2. **Timeout handling** (all `run_command()` calls)
- Test command timeout
- Test output truncation
- Test error message formatting
**Low Priority:**
1. Integration tests with real services (kept in separate test suite)
2. Performance tests for large data sets
## Current Test Coverage
**Implicit Testing:**
- Manual CLI testing during development
- Live service testing (commands run against real PVE, PBS, DNS, etc.)
- User/admin interaction testing (Telegram bot testing via /start, /status, etc.)
**Gap:**
- No regression testing
- No automated validation of API response formats
- No error case testing
- No refactoring safety net
---
*Testing analysis: 2026-02-04*