Mikkel Georgsen 1648a986bc docs: complete project research

Files:
- STACK.md
- FEATURES.md
- ARCHITECTURE.md
- PITFALLS.md
- SUMMARY.md

Key findings:
- Stack: Python 3.12+ with python-telegram-bot 22.6, asyncio subprocess management
- Architecture: Path-based session routing with state machine lifecycle management
- Critical pitfall: Asyncio PIPE deadlock requires concurrent stdout/stderr draining

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

2026-02-04 13:37:24 +00:00

20 KiB

Raw Blame History

Feature Research: Telegram-to-Claude Code Bridge

Domain: AI chatbot bridge / Remote code assistant interface Researched: 2026-02-04 Confidence: HIGH

Feature Landscape

Table Stakes (Users Expect These)

Features users assume exist. Missing these = product feels incomplete.

Feature	Why Expected	Complexity	Notes
Basic message send/receive	Core functionality of any chat interface	LOW	Python-telegram-bot or grammY provide this out-of-box
Session persistence	Users expect conversations to continue where they left off	MEDIUM	Store session state to disk/DB; must survive bot restarts
Command interface	Standard way to control bot behavior (`/help`, `/new`, `/status`)	LOW	Built-in to telegram bot frameworks
Typing indicator	Shows bot is processing (expected for AI bots with 10-60s response times)	LOW	Use `sendChatAction` every 5s during processing
Error messages	Clear feedback when something goes wrong	LOW	Graceful error handling with user-friendly messages
File upload support	Send files/images to Claude for analysis	MEDIUM	Telegram supports up to 50MB files; larger requires self-hosted Bot API
File download	Receive files Claude generates (scripts, configs, reports)	MEDIUM	Bot sends files back; organize in user-specific folders
Authentication	Only authorized users can access the bot	LOW	User ID whitelist in config (for single-user: just one ID)
Multi-message handling	Long responses split intelligently across multiple messages	MEDIUM	Telegram has 4096 char limit; need smart splitting at code block/paragraph boundaries

Differentiators (Competitive Advantage)

Features that set the product apart. Not required, but valuable.

Feature	Value Proposition	Complexity	Notes
Named session management	Switch between multiple projects/contexts (`/session work`, `/session personal`)	MEDIUM	Session key = user:session_name; list/switch/delete sessions
Idle timeout with graceful suspension	Auto-suspend idle sessions to save costs, easy resume with context preserved	MEDIUM	Timer-based monitoring; serialize session state; clear resume UX with `/resume <session>`
Smart output modes	Choose verbosity: final answer only / verbose with tool calls / auto-smart truncation	HIGH	Requires parsing Claude Code output stream and making intelligent display decisions
Tool call progress notifications	Real-time updates as Claude uses tools ("Reading file X", "Running command Y")	HIGH	Stream parsing + progressive message editing; balance info vs notification spam
Cost tracking per session	Show token usage and $ cost for each conversation	MEDIUM	Track input/output tokens; calculate using Anthropic pricing; display in `/stats`
Session-specific folders	Each session gets isolated file workspace (~/stuff/sessions//)	LOW	Create directory per session; pass as cwd to Claude Code
Inline keyboard menus	Button-based navigation (session list, quick commands) instead of typing	MEDIUM	Telegram InlineKeyboardMarkup for cleaner UX
Voice message support	Send voice, bot transcribes and processes	HIGH	Requires Whisper API or similar; adds complexity but strong UX boost
Photo/image analysis	Send photos, Claude analyzes with vision	MEDIUM	Claude supports vision natively; just pass image data
Proactive heartbeat	Bot checks in periodically ("Task done?", "Anything broken?")	HIGH	Cron-based with intelligent prompting; OpenClaw-style feature
Multi-model routing	Use Haiku for simple tasks, Sonnet for complex, Opus for critical	HIGH	Analyze message complexity; route intelligently; 80% cost savings potential
Session export	Export full conversation history as markdown/JSON	LOW	Serialize messages to file, send via Telegram
Undo/rollback	Revert to previous message in conversation	HIGH	Requires conversation tree management; complex but powerful

Anti-Features (Commonly Requested, Often Problematic)

Features that seem good but create problems.

Feature	Why Requested	Why Problematic	Alternative
Multi-user support (v1)	Seems like natural evolution	Adds auth complexity, resource contention, security surface, and user isolation requirements before core experience is validated	Build single-user first; prove value; then add multi-user with proper tenant isolation
Real-time streaming text	Shows AI thinking character-by-character	Telegram message editing has rate limits; causes flickering; annoying for code blocks	Use typing indicator + tool call progress updates + send complete responses
Inline bot mode (@mention in any chat)	Convenience of using bot anywhere	Security nightmare (exposes bot to all chats, leaks context); hard to maintain session isolation	Keep bot in dedicated chat; use `/share` to export results elsewhere
Voice response (TTS)	"Complete voice assistant" feel	Adds latency, quality issues, limited Telegram voice note support, user often reading anyway	Text-first; voice input OK but output stays text
Auto-response to all messages	Bot always active, no explicit commands needed	Burns tokens on noise; user loses control; hard to have side conversations	Require explicit command or @mention; clear when bot is listening
Unlimited session history	"Never forget anything"	Memory bloat, context window waste, cost explosion	Implement sliding window (last N messages) + summarization; store full history off-context
Advanced NLP for command parsing	"Natural language commands"	Adds unreliability; burns tokens; users prefer explicit commands for tools	Use standard `/command` syntax; save NLP tokens for actual Claude conversations
Rich formatting (bold, italic, links) in bot messages	Prettier output	Telegram markdown syntax fragile; breaks on code blocks; debugging nightmare	Use plain text with clear structure; minimal formatting for critical info only

Feature Dependencies

Authentication (whitelist)
    └──requires──> Session Management
                       ├──requires──> Message Handling
                       │                  └──requires──> Claude Code Integration
                       └──requires──> File Handling
                                          └──requires──> Session Folders

Smart Output Modes
    └──requires──> Output Stream Parsing
                       └──requires──> Message Splitting

Tool Call Progress
    └──requires──> Output Stream Parsing
                       └──requires──> Typing Indicator

Idle Timeout
    └──requires──> Session Persistence
                       └──requires──> Session Management

Cost Tracking
    └──requires──> Token Counting
                       └──requires──> Claude Code Integration

Multi-Model Routing
    └──requires──> Message Complexity Analysis
                       └──enhances──> Cost Tracking

Dependency Notes

Session Management is foundational: Nearly everything depends on solid session management. This must be robust before adding advanced features.
Output Stream Parsing enables differentiators: Many high-value features (smart output modes, tool progress, cost tracking) require parsing Claude Code's output stream. Build this infrastructure early.
File Handling is isolated: Can be built in parallel with core message flow; minimal dependencies.
Authentication gates everything: Single-user whitelist is simplest; must be in place before any other features.

MVP Definition

Launch With (v0.1 - Prove Value)

Minimum viable product — what's needed to validate the concept.

User whitelist authentication — Only owner can use bot (security baseline)
Basic message send/receive — Chat with Claude Code via Telegram
Session persistence — Conversations survive bot restarts
Simple session management — /new, /continue, /list commands
Typing indicator — Shows bot is thinking during long AI responses
File upload — Send files to Claude (PDFs, screenshots, code)
File download — Receive files Claude creates
Error handling — Clear messages when things break
Message splitting — Long responses broken into readable chunks
Session folders — Each session has isolated file workspace

MVP Success Criteria: Can manage homelab from phone during commute. Can send screenshot of error, Claude analyzes and suggests fix, can review and apply.

Add After Validation (v0.2-0.5 - Polish Core Experience)

Features to add once core is working and usage patterns emerge.

Named sessions — Switch between projects (/session ansible, /session docker)
Idle timeout with suspend/resume — Save costs on unused sessions
Basic output modes — Toggle verbose (/verbose on) for debugging
Cost tracking — See token usage per session (/stats)
Inline keyboard menus — Button-based session picker
Session export — Download conversation as markdown (/export)
Image analysis — Send photos, Claude describes/debugs

Trigger for adding: Using bot daily, patterns clear, requesting these features organically.

Future Consideration (v1.0+ - Differentiating Power Features)

Features to defer until product-market fit is established.

Smart output modes — AI decides what to show based on context
Tool call progress notifications — Real-time updates on Claude's actions
Multi-model routing — Haiku for simple, Sonnet for complex (cost optimization)
Voice message support — Voice input with Whisper transcription
Proactive heartbeat — Bot checks in on long-running tasks
Undo/rollback — Revert conversation to previous state
Multi-user support — Share bot with team (requires tenant isolation)

Why defer: These are complex, require significant engineering, and value unclear until core experience proven. Some (like multi-model routing) need usage data to optimize.

Feature Prioritization Matrix

Feature	User Value	Implementation Cost	Priority	Phase
Message send/receive	HIGH	LOW	P1	MVP
Session persistence	HIGH	MEDIUM	P1	MVP
File upload/download	HIGH	MEDIUM	P1	MVP
Typing indicator	HIGH	LOW	P1	MVP
User authentication	HIGH	LOW	P1	MVP
Message splitting	HIGH	MEDIUM	P1	MVP
Error handling	HIGH	LOW	P1	MVP
Session folders	MEDIUM	LOW	P1	MVP
Basic commands	HIGH	LOW	P1	MVP
Named sessions	HIGH	MEDIUM	P2	Post-MVP
Idle timeout	MEDIUM	MEDIUM	P2	Post-MVP
Cost tracking	MEDIUM	MEDIUM	P2	Post-MVP
Inline keyboards	MEDIUM	MEDIUM	P2	Post-MVP
Session export	LOW	LOW	P2	Post-MVP
Image analysis	MEDIUM	MEDIUM	P2	Post-MVP
Smart output modes	HIGH	HIGH	P3	Future
Tool progress	MEDIUM	HIGH	P3	Future
Multi-model routing	HIGH	HIGH	P3	Future
Voice messages	LOW	HIGH	P3	Future
Proactive heartbeat	LOW	HIGH	P3	Future

Priority key:

P1: Must have for launch (MVP)
P2: Should have, add when core working (Post-MVP)
P3: Nice to have, future consideration (v1.0+)

Competitor Feature Analysis

Feature	OpenClaw	claude-code-telegram	Claude-Code-Remote	Our Approach
Session Management	Multi-agent sessions with isolation	Session persistence, project switching	Smart session detection (24h tokens)	Named sessions with manual switch
Authentication	Pairing allowlist, mention gating	User ID whitelist + optional token	User ID whitelist	Single-user whitelist (simplest)
File Handling	Full file operations	Directory navigation (cd/ls/pwd)	File transfers	Upload to session folders, download results
Progress Updates	Proactive heartbeat	Command output shown	Real-time notifications	Tool call progress (stretch goal)
Multi-Platform	Telegram, Discord, Slack, WhatsApp, iMessage	Telegram only	Telegram, Email, Discord, LINE	Telegram only (focused)
Output Management	Native streaming	Full responses	Smart content handling	Smart truncation + output modes
Cost Optimization	Not mentioned	Rate limiting	Cost tracking	Multi-model routing (future)
Voice Support	Not mentioned	Not mentioned	Not mentioned	Future consideration
Proactive Features	Heartbeat + cron jobs	Not mentioned	Not mentioned	Defer to v1+

Our Differentiation Strategy:

Simpler than OpenClaw: No multi-platform complexity, focus on Telegram-Claude Code excellence
Smarter than claude-code-telegram: Output modes, cost tracking, idle management (post-MVP)
More focused than Claude-Code-Remote: Single platform, deep integration, better UX
Unique angle: Cost-conscious design with multi-model routing and idle timeout (future)

Implementation Complexity Assessment

Low Complexity (1-2 days)

User whitelist authentication
Basic message send/receive
Typing indicator
Simple command interface
Error messages
Session folders
Session export

Medium Complexity (3-5 days)

Session persistence (state serialization)
File upload/download (Telegram file API)
Message splitting (intelligent chunking)
Named session management
Idle timeout implementation
Cost tracking
Inline keyboards
Image analysis (using Claude vision)

High Complexity (1-2 weeks)

Smart output modes (AI-driven truncation)
Tool call progress parsing
Multi-model routing (complexity analysis)
Voice message support (Whisper integration)
Proactive heartbeat (cron + intelligent prompting)
Undo/rollback (conversation tree)

Technical Considerations

Telegram Bot Framework Options

python-telegram-bot (Recommended)

Mature, well-documented (v21.8 as of 2026)
ConversationHandler for state management
Built-in file handling
Already familiar to user (Python preference noted)

Alternative: grammY (TypeScript/Node)

Used by OpenClaw
Excellent session plugin
Not aligned with user's Python preference

Decision: Use python-telegram-bot for consistency with existing homelab Python scripts.

Session Storage Options

SQLite (Recommended for MVP)

Simple, file-based, no server needed
Built into Python
Easy to backup (single file)

Alternative: JSON files

Even simpler but no transaction safety
Good for prototyping, migrate to SQLite quickly

Decision: Start with JSON for rapid prototyping, migrate to SQLite by v0.2.

Claude Code Integration

Subprocess Approach (Recommended)

Spawn claude-code CLI as subprocess
Capture stdout/stderr
Parse output for tool calls, costs, errors
Clean isolation, no SDK dependency

Challenge: claude-code CLI doesn't expose token counts in output yet. Will need to:

Parse prompts/responses to estimate tokens
Or wait for CLI feature addition
Or use Anthropic API directly (breaks "use Claude Code" requirement)

File Handling Architecture

~/stuff/telegram-sessions/
    ├── <session_name_1>/
    │   ├── uploads/          # User-sent files
    │   ├── downloads/        # Claude-generated files
    │   └── metadata.json     # Session info
    └── <session_name_2>/
        └── ...

Each session gets isolated folder on shared ZFS storage (~/stuff). Pass session folder as cwd to Claude Code.

Cost Optimization Strategy

Haiku vs Sonnet Pricing (2026):

Haiku 4.5: $1 input / $5 output per MTok
Sonnet 4.5: $3 input / $15 output per MTok

Haiku is 1/3 the cost of Sonnet, performs within 5% on many tasks.

Polling Pattern (Future Optimization):

Use Haiku for idle checking: "Any new messages? Reply WAIT or process request"
If WAIT: sleep and poll again (cheap)
If action needed: Hand off to Sonnet for actual work
Potential 70-80% cost reduction for always-on bot

Not MVP: Requires significant engineering, usage patterns unclear.

Security & Privacy Notes

Single-User Design Benefits:

No multi-tenant isolation complexity
No user data privacy concerns (owner = user)
Simple whitelist auth sufficient
Can run with full system access (owner trusts self)

Risks to Mitigate:

Telegram token leakage (store in config, never commit)
User ID spoofing (validate against hardcoded whitelist)
File upload exploits (validate file types, scan for malware if paranoid)
Command injection via filenames (sanitize all user input)

Session Security:

Sessions stored on local disk (~/stuff)
Accessed only by bot user (mikkel)
No encryption needed (single-user, trusted environment)

Performance Considerations

Telegram API Limits:

Bot messages: 30/sec across all chats
Message edits: 1/sec per chat
File uploads: 50MB default, 2000MB with self-hosted Bot API

Implications:

Typing indicator: Max 1 update per 5-6 seconds (rate limit safe)
Tool progress: Batch updates, don't spam on every tool call
File handling: 50MB sufficient for most use cases (PDFs, screenshots, scripts)

Claude Code Response Times:

Simple queries: 2-5 seconds
Complex with tools: 10-60 seconds
Very long responses: 60+ seconds