homelab/.planning/research/STACK.md
Mikkel Georgsen 1648a986bc docs: complete project research
Files:
- STACK.md
- FEATURES.md
- ARCHITECTURE.md
- PITFALLS.md
- SUMMARY.md

Key findings:
- Stack: Python 3.12+ with python-telegram-bot 22.6, asyncio subprocess management
- Architecture: Path-based session routing with state machine lifecycle management
- Critical pitfall: Asyncio PIPE deadlock requires concurrent stdout/stderr draining

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-04 13:37:24 +00:00

12 KiB

Stack Research

Domain: Telegram bot with Claude Code CLI subprocess management Researched: 2026-02-04 Confidence: HIGH

Core Technologies

Technology Version Purpose Why Recommended
Python 3.12+ Runtime environment Already deployed (3.12.3), excellent asyncio support, required by python-telegram-bot 22.6 (needs 3.10+)
python-telegram-bot 22.6 Telegram Bot API wrapper Latest stable (Jan 2026), native async/await, httpx-based (modern), active maintenance, supports Bot API 9.3
asyncio stdlib Async/await runtime Native subprocess management with create_subprocess_exec, non-blocking I/O for multiple concurrent sessions
httpx 0.27-0.28 HTTP client Required dependency of python-telegram-bot 22.6, modern async HTTP library

Supporting Libraries

Library Version Purpose When to Use
aiofiles 25.1.0 Async file I/O Reading/writing session files, inbox processing, file uploads without blocking event loop
APScheduler 3.11.2 Job scheduling Idle timeout timers, periodic polling checks, session cleanup; AsyncIOScheduler supports native coroutines
ptyprocess 0.7.0 PTY management If Claude Code requires interactive terminal (TTY detection); NOT needed if --resume works with pipes

Development Tools

Tool Purpose Notes
systemd Service management Existing telegram-bot.service, user service with proper delegation
Python venv Dependency isolation Already deployed at ~/venv, keeps system Python clean

Installation

# Activate existing venv
source ~/venv/bin/activate

# Core dependencies (if not already installed)
pip install python-telegram-bot==22.6

# Supporting libraries
pip install aiofiles==25.1.0
pip install APScheduler==3.11.2

# Optional: PTY support (only if needed for Claude Code)
pip install ptyprocess==0.7.0

Alternatives Considered

Recommended Alternative When to Use Alternative
asyncio subprocess threading + subprocess.Popen Never for this use case; asyncio is superior for I/O-bound operations with multiple sessions
python-telegram-bot pyTelegramBotAPI (telebot) If starting from scratch and wanting simpler API, but python-telegram-bot offers better async integration
APScheduler asyncio.create_task + sleep loop Simple timeout logic only; APScheduler overkill if just tracking last activity timestamp
aiofiles asyncio thread executor + sync I/O Small files only; for session logs and file handling, aiofiles cleaner
asyncio.create_subprocess_exec ptyprocess If Claude Code needs TTY/color output; start with pipes first, add PTY if needed

What NOT to Use

Avoid Why Use Instead
Batch API for polling Polling needs instant response, batch has 24hr latency Real-time API calls with Haiku
Synchronous subprocess.Popen Blocks event loop, kills concurrency asyncio.create_subprocess_exec
Global timeout on subprocess Claude Code may take variable time per task Per-session idle timeout tracking
telegram.Bot (sync) python-telegram-bot 20+ is async-first telegram.ext.Application (async)
flask/django for webhooks Overkill for single-user bot python-telegram-bot's built-in polling

Stack Patterns by Variant

Session Management Pattern:

  • Use asyncio.create_subprocess_exec(['claude', '--resume'], cwd=session_path, stdout=PIPE, stderr=PIPE)
  • Set cwd to session directory: ~/telegram/sessions/<name>/
  • Claude Code creates .claude/ in working directory for session state
  • Each session isolated by filesystem path

Idle Timeout Pattern:

  • APScheduler's AsyncIOScheduler with IntervalTrigger checks every 30-60s
  • Track last_activity_time per session in memory (dict)
  • On timeout: call process.terminate(), wait for graceful exit, mark session as suspended
  • On new message: if suspended, spawn new process with --resume in same directory

Cost-Optimized Polling Pattern:

  • Main polling loop: python-telegram-bot's run_polling() with Haiku context
  • Haiku evaluates: "Does this need a response?" (simple commands vs conversation)
  • If yes: spawn/resume Opus session, pass message, capture output
  • If no: handle with built-in command handlers (/status, /pbs, etc.)

Output Streaming Pattern:

  • await process.stdout.readline() in async loop until EOF
  • Send incremental Telegram messages for tool-call notifications
  • Use asyncio.Queue to buffer output between read loop and Telegram send loop
  • Avoid deadlock: use communicate() for simple cases, readline() for streaming

File Handling Pattern:

  • Telegram bot saves files to sessions/<name>/files/
  • Claude Code automatically sees files in working directory
  • Use aiofiles for async downloads: async with aiofiles.open(path, 'wb') as f: await f.write(data)

Version Compatibility

Package A Compatible With Notes
python-telegram-bot 22.6 httpx 0.27-0.28 Required dependency, auto-installed
python-telegram-bot 22.6 Python 3.10-3.14 Official support range, tested on 3.12
APScheduler 3.11.2 asyncio stdlib AsyncIOScheduler native coroutine support
aiofiles 25.1.0 Python 3.9-3.14 Thread pool delegation, works with asyncio
ptyprocess 0.7.0 Unix only LXC container on Linux, no Windows needed

Process Management Deep Dive

Why asyncio.create_subprocess_exec (not shell, not Popen)

Correct approach:

process = await asyncio.create_subprocess_exec(
    'claude', '--resume',
    cwd=session_path,
    stdout=asyncio.subprocess.PIPE,
    stderr=asyncio.subprocess.PIPE,
    stdin=asyncio.subprocess.PIPE
)

Why this over create_subprocess_shell:

  • Direct exec avoids shell injection risks (even with single user, good hygiene)
  • More control over arguments and environment
  • Slightly faster (no shell intermediary)

Why this over threading + subprocess.Popen:

  • Non-blocking: multiple Claude sessions can run concurrently
  • Event loop integration: natural with python-telegram-bot's async handlers
  • Resource efficient: no thread overhead per session

Claude Code CLI Integration Approach

Discovery needed:

  1. Test if claude --resume works with stdin/stdout pipes (likely yes)
  2. If Claude Code detects non-TTY and disables features, try ptyprocess
  3. Verify --resume preserves conversation history across process restarts

Stdin handling:

  • Write prompt to stdin: process.stdin.write(message.encode() + b'\n')
  • Close stdin to signal end: process.stdin.close()
  • Or use communicate() for simple request-response

Stdout/stderr handling:

  • Tool calls likely go to stderr (or special markers in stdout)
  • Parse output for progress indicators vs final answer
  • Buffer partial lines, split on \n for structured output

Session Lifecycle

State machine:
IDLE → (message arrives) → SPAWNING → RUNNING → (response sent) → IDLE
                                    ↓
                               (timeout) → SUSPENDED
                                    ↓
                            (new message) → RESUMING → RUNNING

Implementation:

  • IDLE: No process running, session directory exists
  • SPAWNING: await create_subprocess_exec() in progress
  • RUNNING: Process alive, process.returncode is None
  • SUSPENDED: Process terminated, ready for --resume
  • RESUMING: Re-spawning with --resume flag

Graceful shutdown:

  • Send SIGTERM: process.terminate()
  • Wait with timeout: await asyncio.wait_for(process.wait(), timeout=10)
  • Force kill if needed: process.kill()
  • Claude Code should flush conversation state on SIGTERM

Haiku Polling Strategy

Architecture:

[Telegram Message] → [Haiku Triage] → Simple? → [Execute Command]
                                    ↓ Complex? ↓
                                    [Spawn Opus Session]

Haiku's role:

  • Read message content
  • Classify: command, question, or conversation
  • For commands: map to existing handlers (/status → status())
  • For conversation: trigger Opus session

Implementation options:

Option A: Anthropic API directly

  • Separate Haiku API call per message
  • Lightweight prompt: "Classify this message: [message]. Output: COMMAND, QUESTION, or CHAT"
  • Pro: Fast, cheap ($1/MTok input, $5/MTok output)
  • Con: Extra API integration beyond Claude Code

Option B: Haiku via Claude Code CLI

  • claude --model haiku "Is this a command or conversation: [message]"
  • Pro: Reuses Claude Code setup, consistent interface
  • Con: Spawns extra process per triage

Recommendation: Option A for production, Option B for MVP

  • MVP: Skip Haiku triage, spawn Opus for all messages (simpler)
  • Production: Add Haiku API triage once Opus costs become noticeable

Batch API consideration:

  • NOT suitable for polling: 24hr latency unacceptable
  • MAYBE suitable for session cleanup: "Summarize and compress old sessions" overnight

Resource Constraints (4GB RAM, 4 CPU)

Memory budget:

  • python-telegram-bot: ~50MB base
  • Each Claude Code subprocess: estimate 100-300MB
  • Safe concurrent sessions: 3-4 active, 10+ suspended
  • File uploads: stream to disk with aiofiles, don't buffer in RAM

CPU considerations:

  • I/O bound workload (Telegram API, Claude API, disk)
  • asyncio perfect fit: single-threaded event loop handles concurrency
  • Claude Code subprocess CPU usage unknown: monitor with process.cpu_percent()

Disk constraints:

  • Session directories grow with conversation history
  • Periodic cleanup: delete sessions inactive >30 days
  • File uploads: cap at 100MB per file (Telegram bot API limit is 50MB)

Security Considerations

Single-user simplification:

  • No auth beyond existing Telegram bot authorization
  • Session isolation not security boundary (all same Unix user)
  • BUT: still isolate by path for organization, not security

Command injection prevention:

  • Use create_subprocess_exec() with argument list (not shell)
  • Validate session names: [a-z0-9_-]+ only
  • Don't pass user input directly to shell commands

File handling:

  • Save files with sanitized names: timestamp_originalname
  • Check file extensions: allow common types, reject executables
  • Limit file size: 100MB hard cap

Sources

High Confidence (Official Documentation)

Medium Confidence (Verified Community Sources)

Low Confidence (Needs Validation)

  • Claude Code --resume behavior with pipes vs PTY — Not documented, needs testing
  • Claude Code output format for tool calls — Needs empirical observation
  • Claude Code resource usage per session — Unknown, monitor in practice

Stack research for: Telegram Claude Code Bridge Researched: 2026-02-04