homelab/.planning/research/STACK.md
Mikkel Georgsen 1648a986bc docs: complete project research
Files:
- STACK.md
- FEATURES.md
- ARCHITECTURE.md
- PITFALLS.md
- SUMMARY.md

Key findings:
- Stack: Python 3.12+ with python-telegram-bot 22.6, asyncio subprocess management
- Architecture: Path-based session routing with state machine lifecycle management
- Critical pitfall: Asyncio PIPE deadlock requires concurrent stdout/stderr draining

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-04 13:37:24 +00:00

272 lines
12 KiB
Markdown

# Stack Research
**Domain:** Telegram bot with Claude Code CLI subprocess management
**Researched:** 2026-02-04
**Confidence:** HIGH
## Recommended Stack
### Core Technologies
| Technology | Version | Purpose | Why Recommended |
|------------|---------|---------|-----------------|
| Python | 3.12+ | Runtime environment | Already deployed (3.12.3), excellent asyncio support, required by python-telegram-bot 22.6 (needs 3.10+) |
| python-telegram-bot | 22.6 | Telegram Bot API wrapper | Latest stable (Jan 2026), native async/await, httpx-based (modern), active maintenance, supports Bot API 9.3 |
| asyncio | stdlib | Async/await runtime | Native subprocess management with create_subprocess_exec, non-blocking I/O for multiple concurrent sessions |
| httpx | 0.27-0.28 | HTTP client | Required dependency of python-telegram-bot 22.6, modern async HTTP library |
### Supporting Libraries
| Library | Version | Purpose | When to Use |
|---------|---------|---------|-------------|
| aiofiles | 25.1.0 | Async file I/O | Reading/writing session files, inbox processing, file uploads without blocking event loop |
| APScheduler | 3.11.2 | Job scheduling | Idle timeout timers, periodic polling checks, session cleanup; AsyncIOScheduler supports native coroutines |
| ptyprocess | 0.7.0 | PTY management | If Claude Code requires interactive terminal (TTY detection); NOT needed if --resume works with pipes |
### Development Tools
| Tool | Purpose | Notes |
|------|---------|-------|
| systemd | Service management | Existing telegram-bot.service, user service with proper delegation |
| Python venv | Dependency isolation | Already deployed at ~/venv, keeps system Python clean |
## Installation
```bash
# Activate existing venv
source ~/venv/bin/activate
# Core dependencies (if not already installed)
pip install python-telegram-bot==22.6
# Supporting libraries
pip install aiofiles==25.1.0
pip install APScheduler==3.11.2
# Optional: PTY support (only if needed for Claude Code)
pip install ptyprocess==0.7.0
```
## Alternatives Considered
| Recommended | Alternative | When to Use Alternative |
|-------------|-------------|-------------------------|
| asyncio subprocess | threading + subprocess.Popen | Never for this use case; asyncio is superior for I/O-bound operations with multiple sessions |
| python-telegram-bot | pyTelegramBotAPI (telebot) | If starting from scratch and wanting simpler API, but python-telegram-bot offers better async integration |
| APScheduler | asyncio.create_task + sleep loop | Simple timeout logic only; APScheduler overkill if just tracking last activity timestamp |
| aiofiles | asyncio thread executor + sync I/O | Small files only; for session logs and file handling, aiofiles cleaner |
| asyncio.create_subprocess_exec | ptyprocess | If Claude Code needs TTY/color output; start with pipes first, add PTY if needed |
## What NOT to Use
| Avoid | Why | Use Instead |
|-------|-----|-------------|
| Batch API for polling | Polling needs instant response, batch has 24hr latency | Real-time API calls with Haiku |
| Synchronous subprocess.Popen | Blocks event loop, kills concurrency | asyncio.create_subprocess_exec |
| Global timeout on subprocess | Claude Code may take variable time per task | Per-session idle timeout tracking |
| telegram.Bot (sync) | python-telegram-bot 20+ is async-first | telegram.ext.Application (async) |
| flask/django for webhooks | Overkill for single-user bot | python-telegram-bot's built-in polling |
## Stack Patterns by Variant
**Session Management Pattern:**
- Use `asyncio.create_subprocess_exec(['claude', '--resume'], cwd=session_path, stdout=PIPE, stderr=PIPE)`
- Set `cwd` to session directory: `~/telegram/sessions/<name>/`
- Claude Code creates `.claude/` in working directory for session state
- Each session isolated by filesystem path
**Idle Timeout Pattern:**
- APScheduler's AsyncIOScheduler with IntervalTrigger checks every 30-60s
- Track `last_activity_time` per session in memory (dict)
- On timeout: call `process.terminate()`, wait for graceful exit, mark session as suspended
- On new message: if suspended, spawn new process with `--resume` in same directory
**Cost-Optimized Polling Pattern:**
- Main polling loop: python-telegram-bot's `run_polling()` with Haiku context
- Haiku evaluates: "Does this need a response?" (simple commands vs conversation)
- If yes: spawn/resume Opus session, pass message, capture output
- If no: handle with built-in command handlers (/status, /pbs, etc.)
**Output Streaming Pattern:**
- `await process.stdout.readline()` in async loop until EOF
- Send incremental Telegram messages for tool-call notifications
- Use `asyncio.Queue` to buffer output between read loop and Telegram send loop
- Avoid deadlock: use `communicate()` for simple cases, `readline()` for streaming
**File Handling Pattern:**
- Telegram bot saves files to `sessions/<name>/files/`
- Claude Code automatically sees files in working directory
- Use aiofiles for async downloads: `async with aiofiles.open(path, 'wb') as f: await f.write(data)`
## Version Compatibility
| Package A | Compatible With | Notes |
|-----------|-----------------|-------|
| python-telegram-bot 22.6 | httpx 0.27-0.28 | Required dependency, auto-installed |
| python-telegram-bot 22.6 | Python 3.10-3.14 | Official support range, tested on 3.12 |
| APScheduler 3.11.2 | asyncio stdlib | AsyncIOScheduler native coroutine support |
| aiofiles 25.1.0 | Python 3.9-3.14 | Thread pool delegation, works with asyncio |
| ptyprocess 0.7.0 | Unix only | LXC container on Linux, no Windows needed |
## Process Management Deep Dive
### Why asyncio.create_subprocess_exec (not shell, not Popen)
**Correct approach:**
```python
process = await asyncio.create_subprocess_exec(
'claude', '--resume',
cwd=session_path,
stdout=asyncio.subprocess.PIPE,
stderr=asyncio.subprocess.PIPE,
stdin=asyncio.subprocess.PIPE
)
```
**Why this over create_subprocess_shell:**
- Direct exec avoids shell injection risks (even with single user, good hygiene)
- More control over arguments and environment
- Slightly faster (no shell intermediary)
**Why this over threading + subprocess.Popen:**
- Non-blocking: multiple Claude sessions can run concurrently
- Event loop integration: natural with python-telegram-bot's async handlers
- Resource efficient: no thread overhead per session
### Claude Code CLI Integration Approach
**Discovery needed:**
1. Test if `claude --resume` works with stdin/stdout pipes (likely yes)
2. If Claude Code detects non-TTY and disables features, try ptyprocess
3. Verify --resume preserves conversation history across process restarts
**Stdin handling:**
- Write prompt to stdin: `process.stdin.write(message.encode() + b'\n')`
- Close stdin to signal end: `process.stdin.close()`
- Or use `communicate()` for simple request-response
**Stdout/stderr handling:**
- Tool calls likely go to stderr (or special markers in stdout)
- Parse output for progress indicators vs final answer
- Buffer partial lines, split on `\n` for structured output
### Session Lifecycle
```
State machine:
IDLE → (message arrives) → SPAWNING → RUNNING → (response sent) → IDLE
(timeout) → SUSPENDED
(new message) → RESUMING → RUNNING
```
**Implementation:**
- IDLE: No process running, session directory exists
- SPAWNING: `await create_subprocess_exec()` in progress
- RUNNING: Process alive, `process.returncode is None`
- SUSPENDED: Process terminated, ready for --resume
- RESUMING: Re-spawning with --resume flag
**Graceful shutdown:**
- Send SIGTERM: `process.terminate()`
- Wait with timeout: `await asyncio.wait_for(process.wait(), timeout=10)`
- Force kill if needed: `process.kill()`
- Claude Code should flush conversation state on SIGTERM
## Haiku Polling Strategy
**Architecture:**
```
[Telegram Message] → [Haiku Triage] → Simple? → [Execute Command]
↓ Complex? ↓
[Spawn Opus Session]
```
**Haiku's role:**
- Read message content
- Classify: command, question, or conversation
- For commands: map to existing handlers (/status → status())
- For conversation: trigger Opus session
**Implementation options:**
**Option A: Anthropic API directly**
- Separate Haiku API call per message
- Lightweight prompt: "Classify this message: [message]. Output: COMMAND, QUESTION, or CHAT"
- Pro: Fast, cheap ($1/MTok input, $5/MTok output)
- Con: Extra API integration beyond Claude Code
**Option B: Haiku via Claude Code CLI**
- `claude --model haiku "Is this a command or conversation: [message]"`
- Pro: Reuses Claude Code setup, consistent interface
- Con: Spawns extra process per triage
**Recommendation: Option A for production, Option B for MVP**
- MVP: Skip Haiku triage, spawn Opus for all messages (simpler)
- Production: Add Haiku API triage once Opus costs become noticeable
**Batch API consideration:**
- NOT suitable for polling: 24hr latency unacceptable
- MAYBE suitable for session cleanup: "Summarize and compress old sessions" overnight
## Resource Constraints (4GB RAM, 4 CPU)
**Memory budget:**
- python-telegram-bot: ~50MB base
- Each Claude Code subprocess: estimate 100-300MB
- Safe concurrent sessions: 3-4 active, 10+ suspended
- File uploads: stream to disk with aiofiles, don't buffer in RAM
**CPU considerations:**
- I/O bound workload (Telegram API, Claude API, disk)
- asyncio perfect fit: single-threaded event loop handles concurrency
- Claude Code subprocess CPU usage unknown: monitor with `process.cpu_percent()`
**Disk constraints:**
- Session directories grow with conversation history
- Periodic cleanup: delete sessions inactive >30 days
- File uploads: cap at 100MB per file (Telegram bot API limit is 50MB)
## Security Considerations
**Single-user simplification:**
- No auth beyond existing Telegram bot authorization
- Session isolation not security boundary (all same Unix user)
- BUT: still isolate by path for organization, not security
**Command injection prevention:**
- Use `create_subprocess_exec()` with argument list (not shell)
- Validate session names: `[a-z0-9_-]+` only
- Don't pass user input directly to shell commands
**File handling:**
- Save files with sanitized names: `timestamp_originalname`
- Check file extensions: allow common types, reject executables
- Limit file size: 100MB hard cap
## Sources
### High Confidence (Official Documentation)
- [python-telegram-bot PyPI](https://pypi.org/project/python-telegram-bot/) — Version 22.6, dependencies
- [python-telegram-bot Documentation](https://docs.python-telegram-bot.org/) — v22.6 API reference
- [Python asyncio Subprocess](https://docs.python.org/3/library/asyncio-subprocess.html) — Official stdlib docs (Feb 2026)
- [aiofiles PyPI](https://pypi.org/project/aiofiles/) — Version 25.1.0
- [APScheduler PyPI](https://pypi.org/project/APScheduler/) — Version 3.11.2
- [ptyprocess PyPI](https://pypi.org/project/ptyprocess/) — Version 0.7.0
- [Claude Code CLI Reference](https://code.claude.com/docs/en/cli-reference) — Official documentation
### Medium Confidence (Verified Community Sources)
- [Async IO in Python: Subprocesses (Medium)](https://medium.com/@kalmlake/async-io-in-python-subprocesses-af2171d1ff31) — Subprocess patterns
- [Better Stack: Timeouts in Python](https://betterstack.com/community/guides/scaling-python/python-timeouts/) — Timeout best practices
- [APScheduler Guide (Better Stack)](https://betterstack.com/community/guides/scaling-python/apscheduler-scheduled-tasks/) — Job scheduling patterns
- [Anthropic API Pricing (Multiple)](https://www.finout.io/blog/anthropic-api-pricing) — Haiku costs, batch API
### Low Confidence (Needs Validation)
- Claude Code --resume behavior with pipes vs PTY — Not documented, needs testing
- Claude Code output format for tool calls — Needs empirical observation
- Claude Code resource usage per session — Unknown, monitor in practice
---
*Stack research for: Telegram Claude Code Bridge*
*Researched: 2026-02-04*