# Phase 3: Lifecycle Management - Research **Researched:** 2026-02-04 **Domain:** Process lifecycle (suspend/resume), asyncio idle timeout detection, graceful shutdown patterns, Claude Code --resume flag **Confidence:** HIGH ## Summary Phase 3 implements automatic session suspension after configurable idle timeout and transparent resumption with full conversation history. The core technical challenges are: (1) detecting true idle state (no user messages AND no Claude activity), (2) choosing between SIGSTOP/SIGCONT (pause in-place) vs SIGTERM + --resume (terminate and restart), and (3) graceful cleanup on bot restart to prevent zombie processes. Research confirms that asyncio provides robust timeout primitives (`asyncio.Event`, `asyncio.wait_for`, `asyncio.create_task`) for per-session idle timers. Claude Code's `--continue` flag already handles session resumption from `.claude/` state in the session directory — no separate `--resume` flag is needed when using persistent subprocesses in one directory. The critical decision is suspension method: SIGSTOP/SIGCONT saves spawn overhead but keeps memory allocated, while SIGTERM + restart trades memory for CPU overhead. Key findings: (1) Idle detection requires tracking both user message time AND Claude completion time to avoid suspending mid-processing, (2) SIGSTOP/SIGCONT keeps process memory allocated but saves ~1s restart overhead, (3) SIGTERM + --continue is safer for long idle periods (releases memory, prevents stale state), (4) Graceful shutdown requires signal handlers to cancel idle timer tasks and terminate subprocesses with timeout + SIGKILL fallback. **Primary recommendation:** Use SIGTERM + restart approach for suspension. Track last activity timestamp per session. After idle timeout, terminate subprocess gracefully (SIGTERM with 5s timeout, SIGKILL fallback). On next user message, spawn fresh subprocess with `--continue` to restore context. This balances memory efficiency (released during idle) with reasonable restart cost (~1s). Store timeout value in session metadata for per-session configuration. ## Standard Stack The established libraries/tools for this domain: ### Core | Library | Version | Purpose | Why Standard | |---------|---------|---------|--------------| | asyncio | stdlib (3.12+) | Timeout detection, task scheduling, signal handling | Native async primitives for idle timers, event-based cancellation | | Claude Code CLI | 2.1.31+ | Session resumption via --continue | Built-in session state persistence to `.claude/` directory | | signal (stdlib) | stdlib | SIGTERM/SIGKILL for graceful shutdown | Standard Unix signal handling for process termination | ### Supporting | Library | Version | Purpose | When to Use | |---------|---------|---------|-------------| | datetime (stdlib) | stdlib | Last activity timestamps | Track idle periods per session | | json (stdlib) | stdlib | Session metadata updates | Store timeout configuration per session | ### Alternatives Considered | Instead of | Could Use | Tradeoff | |------------|-----------|----------| | SIGTERM + restart | SIGSTOP/SIGCONT | Pause keeps memory but saves 1s restart; terminate releases memory but costs CPU | | Per-session timers | Global timeout for all sessions | Per-session allows custom timeouts (long for task sessions, short for chat) | | asyncio.Event cancellation | Thread-based timers | asyncio integrates cleanly with subprocess management, threads add complexity | **Installation:** ```bash # All components are stdlib or already installed python3 --version # 3.12+ required for modern asyncio claude --version # 2.1.31 (already installed) ``` ## Architecture Patterns ### Recommended Lifecycle State Machine ``` Session States: ├── Created (no subprocess) → User message → Active ├── Active (subprocess running, processing) → Completion → Idle ├── Idle (subprocess running, waiting) → Timeout → Suspended ├── Suspended (no subprocess) → User message → Active (restart) └── Any state → Bot restart → Suspended (cleanup) Idle Timer: - Starts: After Claude completion event (subprocess.on_complete) - Resets: On user message OR Claude starts processing - Fires: After idle_timeout seconds of inactivity - Action: Terminate subprocess (SIGTERM, 5s timeout, SIGKILL fallback) ``` ### Pattern 1: Per-Session Idle Timer with asyncio **What:** Track last activity timestamp, spawn background task to check timeout, cancel on activity **When to use:** After each message completion, restart on new message **Example:** ```python # Source: https://docs.python.org/3/library/asyncio-task.html import asyncio from datetime import datetime, timezone class SessionIdleTimer: """Manages idle timeout for a session.""" def __init__(self, session_name: str, timeout_seconds: int, on_timeout: callable): self.session_name = session_name self.timeout_seconds = timeout_seconds self.on_timeout = on_timeout self._timer_task: Optional[asyncio.Task] = None self._last_activity = datetime.now(timezone.utc) def reset(self): """Reset idle timer on activity.""" self._last_activity = datetime.now(timezone.utc) # Cancel existing timer if self._timer_task and not self._timer_task.done(): self._timer_task.cancel() # Start new timer self._timer_task = asyncio.create_task(self._wait_for_timeout()) async def _wait_for_timeout(self): """Wait for timeout duration, then fire callback.""" try: await asyncio.sleep(self.timeout_seconds) # Timeout reached - fire callback await self.on_timeout(self.session_name) except asyncio.CancelledError: # Timer was reset by activity pass def cancel(self): """Cancel idle timer on session shutdown.""" if self._timer_task and not self._timer_task.done(): self._timer_task.cancel() # Usage in bot idle_timers: dict[str, SessionIdleTimer] = {} async def on_message_complete(session_name: str): """Called when Claude finishes processing.""" # Start idle timer after completion if session_name not in idle_timers: timeout = get_session_timeout(session_name) # From metadata idle_timers[session_name] = SessionIdleTimer( session_name, timeout, on_timeout=suspend_session ) idle_timers[session_name].reset() async def on_user_message(session_name: str, message: str): """Called when user sends message.""" # Reset timer on activity if session_name in idle_timers: idle_timers[session_name].reset() # Send to Claude... ``` ### Pattern 2: Graceful Subprocess Termination **What:** Send SIGTERM, wait for clean exit with timeout, SIGKILL if needed **When to use:** Suspending session, bot shutdown, session archival **Example:** ```python # Source: https://roguelynn.com/words/asyncio-graceful-shutdowns/ import asyncio import signal async def terminate_subprocess_gracefully( process: asyncio.subprocess.Process, timeout: int = 5 ) -> None: """ Terminate subprocess with graceful shutdown. 1. Close stdin to signal end of input 2. Send SIGTERM for graceful shutdown 3. Wait up to timeout seconds 4. SIGKILL if still running 5. Always reap process to prevent zombie """ if not process or process.returncode is not None: return # Already terminated try: # Close stdin to signal no more input if process.stdin: process.stdin.close() await process.stdin.wait_closed() # Send SIGTERM for graceful shutdown process.terminate() # Wait for clean exit try: await asyncio.wait_for(process.wait(), timeout=timeout) logger.info(f"Process {process.pid} terminated gracefully") except asyncio.TimeoutError: # Timeout - force kill logger.warning(f"Process {process.pid} did not terminate, sending SIGKILL") process.kill() await process.wait() # CRITICAL: Always reap to prevent zombie logger.info(f"Process {process.pid} killed") except Exception as e: logger.error(f"Error terminating process: {e}") # Force kill as last resort try: process.kill() await process.wait() except: pass ``` ### Pattern 3: Session Resume with --continue **What:** Spawn subprocess with `--continue` flag to restore conversation from `.claude/` state **When to use:** First message after suspension, bot restart resuming active session **Example:** ```python # Source: https://code.claude.com/docs/en/cli-reference async def resume_session(session_name: str) -> ClaudeSubprocess: """ Resume suspended session by spawning subprocess with --continue. Claude Code automatically loads conversation history from .claude/ directory in session folder. """ session_dir = get_session_dir(session_name) persona = load_persona_for_session(session_name) # Check if .claude directory exists (has prior conversation) has_history = (session_dir / ".claude").exists() cmd = [ 'claude', '-p', '--input-format', 'stream-json', '--output-format', 'stream-json', '--verbose', '--dangerously-skip-permissions', ] # Add --continue if session has history if has_history: cmd.append('--continue') logger.info(f"Resuming session '{session_name}' with --continue") else: logger.info(f"Starting fresh session '{session_name}'") # Add persona settings (model, system prompt, etc) if persona: settings = persona.get('settings', {}) if 'model' in settings: cmd.extend(['--model', settings['model']]) if 'system_prompt' in persona: cmd.extend(['--append-system-prompt', persona['system_prompt']]) # Spawn subprocess subprocess = ClaudeSubprocess( session_dir=session_dir, persona=persona, on_output=..., on_error=..., on_complete=lambda: on_message_complete(session_name), on_status=..., on_tool_use=..., ) await subprocess.start() return subprocess ``` ### Pattern 4: Bot Shutdown with Subprocess Cleanup **What:** Signal handler to cancel all idle timers and terminate all subprocesses on SIGTERM/SIGINT **When to use:** Bot stop, systemctl stop, Ctrl+C **Example:** ```python # Source: https://roguelynn.com/words/asyncio-graceful-shutdowns/ + # https://github.com/wbenny/python-graceful-shutdown import signal import asyncio async def shutdown(sig: signal.Signals, loop: asyncio.AbstractEventLoop): """ Graceful shutdown handler for bot. 1. Log signal received 2. Cancel all idle timers 3. Terminate all subprocesses gracefully 4. Cancel all outstanding tasks 5. Stop event loop """ logger.info(f"Received exit signal {sig.name}") # Cancel all idle timers for timer in idle_timers.values(): timer.cancel() # Terminate all active subprocesses termination_tasks = [] for session_name, subprocess in subprocesses.items(): if subprocess.is_alive: logger.info(f"Terminating subprocess for session '{session_name}'") termination_tasks.append( terminate_subprocess_gracefully(subprocess._process, timeout=5) ) # Wait for all terminations to complete if termination_tasks: await asyncio.gather(*termination_tasks, return_exceptions=True) # Cancel all other tasks tasks = [t for t in asyncio.all_tasks() if t is not asyncio.current_task()] for task in tasks: task.cancel() # Wait for cancellation, ignore exceptions await asyncio.gather(*tasks, return_exceptions=True) # Stop the loop loop.stop() # Install signal handlers on startup def main(): app = Application.builder().token(TOKEN).build() # Add signal handlers loop = asyncio.get_event_loop() signals = (signal.SIGTERM, signal.SIGINT) for sig in signals: loop.add_signal_handler( sig, lambda s=sig: asyncio.create_task(shutdown(s, loop)) ) # Start bot app.run_polling() ``` ### Pattern 5: Session Metadata for Timeout Configuration **What:** Store idle_timeout in session metadata, allow per-session customization via /timeout command **When to use:** Session creation, /timeout command handler **Example:** ```python # Session metadata structure { "name": "task-session", "created": "2026-02-04T12:00:00+00:00", "last_active": "2026-02-04T12:30:00+00:00", "persona": "default", "pid": null, "status": "suspended", "idle_timeout": 600 # seconds (10 minutes) } # /timeout command handler async def timeout_cmd(update: Update, context: ContextTypes.DEFAULT_TYPE): """Set idle timeout for active session.""" if not context.args: # Show current timeout active = session_manager.get_active_session() if not active: await update.message.reply_text("No active session") return metadata = session_manager.get_session(active) timeout = metadata.get('idle_timeout', 600) await update.message.reply_text( f"Current idle timeout: {timeout // 60} minutes\n\n" f"Usage: /timeout " ) return # Parse timeout value try: minutes = int(context.args[0]) if minutes < 1 or minutes > 120: await update.message.reply_text("Timeout must be between 1 and 120 minutes") return timeout_seconds = minutes * 60 except ValueError: await update.message.reply_text("Invalid number. Usage: /timeout ") return # Update session metadata active = session_manager.get_active_session() session_manager.update_session(active, idle_timeout=timeout_seconds) # Restart idle timer with new timeout if active in idle_timers: idle_timers[active].timeout_seconds = timeout_seconds idle_timers[active].reset() await update.message.reply_text(f"Idle timeout set to {minutes} minutes") ``` ### Pattern 6: /sessions Command with Status Display **What:** List all sessions with name, status, persona, last active time, sorted by activity **When to use:** User wants to see session overview **Example:** ```python async def sessions_cmd(update: Update, context: ContextTypes.DEFAULT_TYPE): """List all sessions sorted by last activity.""" sessions = session_manager.list_sessions() if not sessions: await update.message.reply_text("No sessions found. Use /new to create one.") return active_session = session_manager.get_active_session() # Build formatted list lines = ["*Sessions:*\n"] for session in sessions: # Already sorted by last_active name = session['name'] status = session['status'] persona = session.get('persona', 'default') last_active = session.get('last_active', 'unknown') # Format timestamp try: dt = datetime.fromisoformat(last_active) time_str = dt.strftime('%Y-%m-%d %H:%M') except: time_str = 'unknown' # Mark active session marker = "→ " if name == active_session else " " # Status emoji emoji = "🟢" if status == "active" else "🔵" if status == "idle" else "⚪" lines.append( f"{marker}{emoji} `{name}` ({persona})\n" f" {time_str}" ) await update.message.reply_text("\n".join(lines), parse_mode='Markdown') ``` ### Anti-Patterns to Avoid - **Suspending during processing:** Never suspend while `subprocess.is_busy` is True — will lose in-progress work - **Not resetting timer on user message:** If idle timer only resets on completion, user's message during timeout window gets ignored - **Zombie processes on bot crash:** Without signal handlers, subprocess outlives bot and becomes zombie (orphaned) - **SIGSTOP without resource consideration:** Paused processes hold memory, file handles, network sockets — unsafe for long idle periods - **Shared idle timer for all sessions:** Different sessions have different needs (task vs chat), per-session timeout is more flexible ## Don't Hand-Roll Problems that look simple but have existing solutions: | Problem | Don't Build | Use Instead | Why | |---------|-------------|-------------|-----| | Idle timeout detection | Manual timestamp checks in loop | asyncio.Event + asyncio.sleep() | Event-based cancellation is cleaner, no polling overhead | | Graceful shutdown | Just process.terminate() | SIGTERM + timeout + SIGKILL pattern | Prevents zombie processes, handles hung processes | | Per-object timers | Single global timeout thread | asyncio.create_task per session | Native async integration, automatic cleanup | | Resume conversation | Manual state serialization | Claude Code --continue flag | Built-in, tested, handles all edge cases | **Key insight:** Process lifecycle management has subtle races (subprocess dies mid-shutdown, signal arrives during cleanup, timer fires after cancellation). Using battle-tested patterns (signal handlers, timeout with fallback, event-based cancellation) prevents these races. Don't reinvent async subprocess management. ## Common Pitfalls ### Pitfall 1: Race Between Timer Fire and User Message **What goes wrong:** Idle timer fires (subprocess terminated), user message arrives during termination, new subprocess spawns, old one still dying — two subprocesses running **Why it happens:** Timer callback and message handler run concurrently. No synchronization between timer firing and subprocess state change. **How to avoid:** Use asyncio.Lock around subprocess state transitions (terminate, spawn). Timer callback acquires lock before terminating, message handler acquires lock before spawning. **Warning signs:** Duplicate responses, sessions becoming unresponsive, "subprocess already running" errors ```python # WRONG - No synchronization async def on_timeout(session_name): await terminate_subprocess(session_name) async def on_message(session_name, message): subprocess = await spawn_subprocess(session_name) await subprocess.send_message(message) # RIGHT - Lock around transitions subprocess_locks: dict[str, asyncio.Lock] = {} async def on_timeout(session_name): async with subprocess_locks[session_name]: await terminate_subprocess(session_name) async def on_message(session_name, message): async with subprocess_locks[session_name]: if not subprocess_exists(session_name): await spawn_subprocess(session_name) await subprocess.send_message(message) ``` ### Pitfall 2: Terminating Subprocess During Tool Execution **What goes wrong:** Claude is running a long tool (git clone, npm install), idle timer fires, subprocess terminated mid-operation, corrupted state **Why it happens:** Idle timer only checks elapsed time since last message, doesn't check if subprocess is actively executing tools. **How to avoid:** Track subprocess busy state (`is_busy` flag set during processing). Only start idle timer after `on_complete` callback fires (subprocess is truly idle). **Warning signs:** Corrupted git repos, partial file writes, timeout errors from tools ```python # WRONG - Timer starts immediately after message send await subprocess.send_message(message) idle_timers[session_name].reset() # Bad: Claude still processing # RIGHT - Timer starts after completion await subprocess.send_message(message) # ... subprocess processes, calls tools, emits result event ... # on_complete callback fires async def on_complete(): idle_timers[session_name].reset() # Good: Claude is truly idle ``` ### Pitfall 3: Not Canceling Idle Timer on Session Switch **What goes wrong:** Switch from session A to session B, session A's timer fires 5 minutes later, terminates session A subprocess (which might have been switched back to) **Why it happens:** Session switch doesn't cancel old session's timer, timer continues running independently **How to avoid:** When switching sessions, don't cancel old timer — let it run. Old subprocess suspends on its own timer. This allows multiple concurrent sessions with independent lifetimes. **Warning signs:** Sessions suspend unexpectedly after switching away and back ```python # CORRECT - Don't cancel old timer on switch async def switch_session(new_session_name): old_session = get_active_session() # Don't touch old session's timer - let it suspend naturally # if old_session in idle_timers: # idle_timers[old_session].cancel() # NO set_active_session(new_session_name) # Start new session's timer if needed if new_session_name not in idle_timers: # Create timer for new session pass ``` ### Pitfall 4: Subprocess Outlives Bot on Crash **What goes wrong:** Bot crashes or is killed with SIGKILL, signal handlers never run, subprocesses become orphans, eat memory/CPU **Why it happens:** SIGKILL can't be caught (by design), no cleanup code runs **How to avoid:** Can't prevent SIGKILL zombies, but minimize with: (1) Store PID in session metadata, check on bot restart, (2) Use systemd with KillMode=control-group to kill all child processes, (3) Bot startup cleanup: scan for orphaned pids from metadata **Warning signs:** Multiple claude processes running after bot restart, memory usage grows over time ```python # Startup cleanup - kill orphaned subprocesses async def cleanup_orphaned_subprocesses(): """Kill any subprocesses that outlived previous bot run.""" sessions = session_manager.list_sessions() for session in sessions: pid = session.get('pid') if pid: # Check if process still exists try: os.kill(pid, 0) # Signal 0 = check existence # Process exists - kill it logger.warning(f"Killing orphaned subprocess: PID {pid}") os.kill(pid, signal.SIGTERM) await asyncio.sleep(2) try: os.kill(pid, signal.SIGKILL) except ProcessLookupError: pass # Already dead except ProcessLookupError: pass # Already dead # Clear PID from metadata session_manager.update_session(session['name'], pid=None, status='suspended') ``` ### Pitfall 5: Storing Stale PIDs in Metadata **What goes wrong:** Session metadata shows pid=12345, but subprocess already terminated. On bot restart, try to kill PID 12345 which is now a different process. **Why it happens:** Subprocess crashes or is manually killed, metadata not updated **How to avoid:** Clear PID from metadata when subprocess terminates (exit code detected). Before killing PID from metadata, verify it's a claude process (check /proc/{pid}/cmdline on Linux). **Warning signs:** Bot kills wrong processes on restart, random crashes ```python # Safe PID cleanup with verification async def kill_subprocess_by_pid(pid: int): """Kill subprocess with PID verification.""" try: # Verify it's a claude process (Linux-specific) cmdline_path = f"/proc/{pid}/cmdline" if os.path.exists(cmdline_path): with open(cmdline_path) as f: cmdline = f.read() if 'claude' not in cmdline: logger.warning(f"PID {pid} is not a claude process: {cmdline}") return # Don't kill # Kill the process os.kill(pid, signal.SIGTERM) await asyncio.sleep(2) try: os.kill(pid, signal.SIGKILL) except ProcessLookupError: pass except ProcessLookupError: pass # Already dead except Exception as e: logger.error(f"Error killing PID {pid}: {e}") ``` ## Code Examples Verified patterns from official sources: ### Complete Idle Timer Implementation ```python # Source: https://docs.python.org/3/library/asyncio-task.html import asyncio from datetime import datetime, timezone from typing import Callable, Optional class SessionIdleTimer: """ Per-session idle timeout manager. Tracks last activity, spawns background task to fire after timeout. Cancels and restarts timer on activity (reset). """ def __init__( self, session_name: str, timeout_seconds: int, on_timeout: Callable[[str], None] ): """ Args: session_name: Session identifier timeout_seconds: Idle seconds before firing on_timeout: Async callback(session_name) to invoke on timeout """ self.session_name = session_name self.timeout_seconds = timeout_seconds self.on_timeout = on_timeout self._timer_task: Optional[asyncio.Task] = None self._last_activity = datetime.now(timezone.utc) def reset(self): """Reset timer on activity (user message or completion).""" self._last_activity = datetime.now(timezone.utc) # Cancel existing timer if self._timer_task and not self._timer_task.done(): self._timer_task.cancel() # Start fresh timer self._timer_task = asyncio.create_task(self._wait_for_timeout()) async def _wait_for_timeout(self): """Background task that waits for timeout duration.""" try: await asyncio.sleep(self.timeout_seconds) # Timeout reached - invoke callback await self.on_timeout(self.session_name) except asyncio.CancelledError: # Timer was reset by activity pass def cancel(self): """Cancel timer on session shutdown.""" if self._timer_task and not self._timer_task.done(): self._timer_task.cancel() @property def seconds_since_activity(self) -> float: """Get seconds elapsed since last activity.""" delta = datetime.now(timezone.utc) - self._last_activity return delta.total_seconds() ``` ### Graceful Subprocess Termination with Timeout ```python # Source: https://roguelynn.com/words/asyncio-graceful-shutdowns/ import asyncio import signal import logging logger = logging.getLogger(__name__) async def terminate_subprocess_gracefully( process: asyncio.subprocess.Process, timeout: int = 5 ) -> None: """ Terminate subprocess with graceful shutdown sequence. 1. Close stdin (signal end of input) 2. Send SIGTERM (request graceful shutdown) 3. Wait up to timeout seconds 4. Send SIGKILL if still running (force kill) 5. Always reap process (prevent zombie) Args: process: asyncio subprocess to terminate timeout: Seconds to wait before SIGKILL """ if not process or process.returncode is not None: logger.debug("Process already terminated") return pid = process.pid logger.info(f"Terminating subprocess PID {pid}") try: # Close stdin to signal no more input if process.stdin and not process.stdin.is_closing(): process.stdin.close() await process.stdin.wait_closed() # Send SIGTERM for graceful exit process.terminate() # Wait for clean exit with timeout try: await asyncio.wait_for(process.wait(), timeout=timeout) logger.info(f"Process {pid} terminated gracefully") except asyncio.TimeoutError: # Timeout - force kill logger.warning(f"Process {pid} did not exit within {timeout}s, sending SIGKILL") process.kill() await process.wait() # CRITICAL: Reap to prevent zombie logger.info(f"Process {pid} killed") except Exception as e: logger.error(f"Error terminating process {pid}: {e}") # Last resort force kill try: process.kill() await process.wait() except: pass ``` ### Bot Shutdown Signal Handler ```python # Source: https://roguelynn.com/words/asyncio-graceful-shutdowns/ + # https://github.com/wbenny/python-graceful-shutdown import signal import asyncio import logging logger = logging.getLogger(__name__) async def shutdown_handler( sig: signal.Signals, loop: asyncio.AbstractEventLoop, idle_timers: dict, subprocesses: dict ): """ Graceful shutdown handler for bot. Invoked on SIGTERM/SIGINT to clean up before exit. Steps: 1. Log signal received 2. Cancel all idle timers 3. Terminate all subprocesses with timeout 4. Cancel all other asyncio tasks 5. Stop event loop Args: sig: Signal that triggered shutdown loop: Event loop to stop idle_timers: Dict of SessionIdleTimer objects subprocesses: Dict of ClaudeSubprocess objects """ logger.info(f"Received exit signal {sig.name}, initiating graceful shutdown") # Step 1: Cancel all idle timers logger.info("Canceling idle timers...") for session_name, timer in idle_timers.items(): timer.cancel() # Step 2: Terminate all active subprocesses logger.info("Terminating subprocesses...") termination_tasks = [] for session_name, subprocess in subprocesses.items(): if subprocess.is_alive: logger.info(f"Terminating subprocess for '{session_name}'") termination_tasks.append( terminate_subprocess_gracefully(subprocess._process, timeout=5) ) # Wait for all terminations (with exceptions handled) if termination_tasks: await asyncio.gather(*termination_tasks, return_exceptions=True) # Step 3: Cancel all other asyncio tasks logger.info("Canceling remaining tasks...") tasks = [t for t in asyncio.all_tasks() if t is not asyncio.current_task()] for task in tasks: task.cancel() # Wait for cancellations, ignore exceptions await asyncio.gather(*tasks, return_exceptions=True) # Step 4: Stop event loop logger.info("Stopping event loop") loop.stop() # Install signal handlers in main() def main(): """Bot entry point with signal handler installation.""" app = Application.builder().token(TOKEN).build() # Get event loop loop = asyncio.get_event_loop() # Install signal handlers for graceful shutdown signals_to_handle = (signal.SIGTERM, signal.SIGINT) for sig in signals_to_handle: loop.add_signal_handler( sig, lambda s=sig: asyncio.create_task( shutdown_handler(s, loop, idle_timers, subprocesses) ) ) logger.info("Signal handlers installed") # Start bot app.run_polling() ``` ### Session Resume with Status Message ```python # Source: https://code.claude.com/docs/en/cli-reference from datetime import datetime, timezone async def resume_suspended_session( bot, chat_id: int, session_name: str, message: str ) -> None: """ Resume suspended session and send message. Sends brief status message to user, spawns subprocess with --continue, sends user's message to Claude. Args: bot: Telegram bot instance chat_id: Telegram chat ID session_name: Session to resume message: User message to send after resume """ metadata = session_manager.get_session(session_name) # Calculate idle duration last_active = datetime.fromisoformat(metadata['last_active']) now = datetime.now(timezone.utc) idle_minutes = (now - last_active).total_seconds() / 60 # Send status message if idle_minutes > 1: status_text = f"Resuming session (idle for {int(idle_minutes)} min)..." else: status_text = "Resuming session..." await bot.send_message(chat_id=chat_id, text=status_text) # Spawn subprocess with --continue session_dir = session_manager.get_session_dir(session_name) persona = load_persona_for_session(session_name) callbacks = make_callbacks(bot, chat_id, session_name) subprocess = ClaudeSubprocess( session_dir=session_dir, persona=persona, on_output=callbacks['on_output'], on_error=callbacks['on_error'], on_complete=lambda: on_completion(session_name), on_status=callbacks['on_status'], on_tool_use=callbacks['on_tool_use'], ) await subprocess.start() subprocesses[session_name] = subprocess # Update metadata session_manager.update_session( session_name, status='active', last_active=now.isoformat(), pid=subprocess._process.pid ) # Send user's message await subprocess.send_message(message) # Start idle timer timeout = metadata.get('idle_timeout', 600) idle_timers[session_name] = SessionIdleTimer( session_name, timeout, on_timeout=suspend_session ) ``` ## State of the Art | Old Approach | Current Approach | When Changed | Impact | |--------------|------------------|--------------|--------| | Manual timestamp polling | asyncio.Event + asyncio.sleep() | asyncio maturity (2020+) | Cleaner cancellation, no polling overhead | | SIGKILL only | SIGTERM + timeout + SIGKILL fallback | Best practice evolution (2018+) | Prevents zombie processes, allows cleanup | | Global timeout thread | Per-object asyncio tasks | Modern asyncio patterns (2022+) | Per-session configuration, native async integration | | Manual state files | Claude Code --continue with .claude/ | Claude Code 2.0+ (2024) | Built-in, tested, handles edge cases | | SIGSTOP/SIGCONT | SIGTERM + restart | Resource efficiency awareness (ongoing) | Releases memory during idle, safer for long periods | **Deprecated/outdated:** - **Thread-based timers for async code:** Mixing threading with asyncio adds complexity, use asyncio.create_task - **Blocking time.sleep() in async context:** Use asyncio.sleep() instead - **Not reaping terminated subprocesses:** Always call process.wait() to prevent zombies ## Open Questions Things that couldn't be fully resolved: 1. **Optimal default idle timeout** - What we know: Common ranges are 5-15 minutes for chat bots, longer for task automation - What's unclear: What's the sweet spot for balancing memory usage vs restart friction? - Recommendation: Start with 10 minutes default. Allow per-session override via /timeout. Monitor actual usage patterns and adjust. 2. **SIGSTOP/SIGCONT vs SIGTERM tradeoff** - What we know: SIGSTOP keeps memory but saves restart cost (~1s), SIGTERM releases memory but costs CPU - What's unclear: At what idle duration does memory savings outweigh restart cost? - Recommendation: Use SIGTERM approach. Memory release is more important than 1s restart cost. Claude processes can grow large (100-500MB) with long conversations. SIGSTOP is only beneficial for <5min idle periods. 3. **Resume status message verbosity** - What we know: User decision says "brief status message on resume" - What's unclear: Should it show idle duration? Session name? Model? - Recommendation: Show idle duration if >1 minute ("Resuming session (idle for 15 min)..."). Don't show session name (user knows what session they messaged). Keep brief. 4. **Multi-session concurrent subprocess limit** - What we know: Multiple sessions can have live subprocesses simultaneously - What's unclear: Should there be a cap? What if user has 20 sessions all active? - Recommendation: No hard cap initially. Each subprocess uses ~100-500MB. On an 8GB system, 10-20 concurrent sessions is reasonable. Add warning in /sessions if >10 active. Add global concurrent limit (e.g., 15) in Phase 4 if needed. 5. **Session switch behavior for previous subprocess** - What we know: User decision says "switching leaves previous subprocess running" - What's unclear: Should switching reset the previous session's idle timer? - Recommendation: Don't reset on switch. Previous session's timer continues from last activity. If it was idle for 8 minutes when you switched away, it will suspend in 2 more minutes. This is intuitive — switching doesn't "touch" the old session. ## Sources ### Primary (HIGH confidence) - [Coroutines and Tasks - Python 3.14.3 Documentation](https://docs.python.org/3/library/asyncio-task.html) - Official asyncio timeout and task management - [CLI reference - Claude Code Docs](https://code.claude.com/docs/en/cli-reference) - Official Claude Code --continue flag documentation - [Graceful Shutdowns with asyncio - roguelynn](https://roguelynn.com/words/asyncio-graceful-shutdowns/) - Signal handlers and shutdown orchestration - [python-graceful-shutdown - GitHub](https://github.com/wbenny/python-graceful-shutdown) - Complete example of shutdown patterns - [Stopping and Resuming Processes with SIGSTOP and SIGCONT - TheLinuxCode](https://thelinuxcode.com/stop-process-using-sigstop-signal-linux/) - SIGSTOP/SIGCONT behavior and resource tradeoffs ### Secondary (MEDIUM confidence) - [Session Management - Claude API Docs](https://platform.claude.com/docs/en/agent-sdk/sessions) - Session persistence patterns - [SIGTERM, SIGKILL & SIGSTOP Signals - Medium](https://medium.com/@4techusage/sigterm-sigkill-sigstop-signals-63cb919431e8) - Signal comparison - [A Complete Guide to Timeouts in Python - Better Stack](https://betterstack.com/community/guides/scaling-python/python-timeouts/) - Timeout mechanisms in Python ### Tertiary (LOW confidence) - WebSearch results on asyncio subprocess management and idle detection patterns - Multiple sources, cross-referenced ## Metadata **Confidence breakdown:** - Standard stack: HIGH - All stdlib components, Claude Code CLI verified - Architecture: HIGH - Patterns based on official asyncio docs and battle-tested libraries - Pitfalls: MEDIUM-HIGH - Common races and edge cases documented, some based on general async patterns rather than lifecycle-specific sources **Research date:** 2026-02-04 **Valid until:** 2026-03-04 (30 days - asyncio stdlib is stable, Claude Code --continue is established)