docs(03): research phase domain
Phase 03: lifecycle-management - Process lifecycle patterns (suspend/resume) - Asyncio idle timeout detection - Graceful shutdown strategies - SIGTERM vs SIGSTOP tradeoffs - Claude Code --continue for resumption Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
parent
134124f04e
commit
8f7b67a91b
1 changed files with 951 additions and 0 deletions
951
.planning/phases/03-lifecycle-management/03-RESEARCH.md
Normal file
951
.planning/phases/03-lifecycle-management/03-RESEARCH.md
Normal file
|
|
@ -0,0 +1,951 @@
|
||||||
|
# Phase 3: Lifecycle Management - Research
|
||||||
|
|
||||||
|
**Researched:** 2026-02-04
|
||||||
|
**Domain:** Process lifecycle (suspend/resume), asyncio idle timeout detection, graceful shutdown patterns, Claude Code --resume flag
|
||||||
|
**Confidence:** HIGH
|
||||||
|
|
||||||
|
## Summary
|
||||||
|
|
||||||
|
Phase 3 implements automatic session suspension after configurable idle timeout and transparent resumption with full conversation history. The core technical challenges are: (1) detecting true idle state (no user messages AND no Claude activity), (2) choosing between SIGSTOP/SIGCONT (pause in-place) vs SIGTERM + --resume (terminate and restart), and (3) graceful cleanup on bot restart to prevent zombie processes.
|
||||||
|
|
||||||
|
Research confirms that asyncio provides robust timeout primitives (`asyncio.Event`, `asyncio.wait_for`, `asyncio.create_task`) for per-session idle timers. Claude Code's `--continue` flag already handles session resumption from `.claude/` state in the session directory — no separate `--resume` flag is needed when using persistent subprocesses in one directory. The critical decision is suspension method: SIGSTOP/SIGCONT saves spawn overhead but keeps memory allocated, while SIGTERM + restart trades memory for CPU overhead.
|
||||||
|
|
||||||
|
Key findings: (1) Idle detection requires tracking both user message time AND Claude completion time to avoid suspending mid-processing, (2) SIGSTOP/SIGCONT keeps process memory allocated but saves ~1s restart overhead, (3) SIGTERM + --continue is safer for long idle periods (releases memory, prevents stale state), (4) Graceful shutdown requires signal handlers to cancel idle timer tasks and terminate subprocesses with timeout + SIGKILL fallback.
|
||||||
|
|
||||||
|
**Primary recommendation:** Use SIGTERM + restart approach for suspension. Track last activity timestamp per session. After idle timeout, terminate subprocess gracefully (SIGTERM with 5s timeout, SIGKILL fallback). On next user message, spawn fresh subprocess with `--continue` to restore context. This balances memory efficiency (released during idle) with reasonable restart cost (~1s). Store timeout value in session metadata for per-session configuration.
|
||||||
|
|
||||||
|
## Standard Stack
|
||||||
|
|
||||||
|
The established libraries/tools for this domain:
|
||||||
|
|
||||||
|
### Core
|
||||||
|
| Library | Version | Purpose | Why Standard |
|
||||||
|
|---------|---------|---------|--------------|
|
||||||
|
| asyncio | stdlib (3.12+) | Timeout detection, task scheduling, signal handling | Native async primitives for idle timers, event-based cancellation |
|
||||||
|
| Claude Code CLI | 2.1.31+ | Session resumption via --continue | Built-in session state persistence to `.claude/` directory |
|
||||||
|
| signal (stdlib) | stdlib | SIGTERM/SIGKILL for graceful shutdown | Standard Unix signal handling for process termination |
|
||||||
|
|
||||||
|
### Supporting
|
||||||
|
| Library | Version | Purpose | When to Use |
|
||||||
|
|---------|---------|---------|-------------|
|
||||||
|
| datetime (stdlib) | stdlib | Last activity timestamps | Track idle periods per session |
|
||||||
|
| json (stdlib) | stdlib | Session metadata updates | Store timeout configuration per session |
|
||||||
|
|
||||||
|
### Alternatives Considered
|
||||||
|
| Instead of | Could Use | Tradeoff |
|
||||||
|
|------------|-----------|----------|
|
||||||
|
| SIGTERM + restart | SIGSTOP/SIGCONT | Pause keeps memory but saves 1s restart; terminate releases memory but costs CPU |
|
||||||
|
| Per-session timers | Global timeout for all sessions | Per-session allows custom timeouts (long for task sessions, short for chat) |
|
||||||
|
| asyncio.Event cancellation | Thread-based timers | asyncio integrates cleanly with subprocess management, threads add complexity |
|
||||||
|
|
||||||
|
**Installation:**
|
||||||
|
```bash
|
||||||
|
# All components are stdlib or already installed
|
||||||
|
python3 --version # 3.12+ required for modern asyncio
|
||||||
|
claude --version # 2.1.31 (already installed)
|
||||||
|
```
|
||||||
|
|
||||||
|
## Architecture Patterns
|
||||||
|
|
||||||
|
### Recommended Lifecycle State Machine
|
||||||
|
|
||||||
|
```
|
||||||
|
Session States:
|
||||||
|
├── Created (no subprocess) → User message → Active
|
||||||
|
├── Active (subprocess running, processing) → Completion → Idle
|
||||||
|
├── Idle (subprocess running, waiting) → Timeout → Suspended
|
||||||
|
├── Suspended (no subprocess) → User message → Active (restart)
|
||||||
|
└── Any state → Bot restart → Suspended (cleanup)
|
||||||
|
|
||||||
|
Idle Timer:
|
||||||
|
- Starts: After Claude completion event (subprocess.on_complete)
|
||||||
|
- Resets: On user message OR Claude starts processing
|
||||||
|
- Fires: After idle_timeout seconds of inactivity
|
||||||
|
- Action: Terminate subprocess (SIGTERM, 5s timeout, SIGKILL fallback)
|
||||||
|
```
|
||||||
|
|
||||||
|
### Pattern 1: Per-Session Idle Timer with asyncio
|
||||||
|
**What:** Track last activity timestamp, spawn background task to check timeout, cancel on activity
|
||||||
|
**When to use:** After each message completion, restart on new message
|
||||||
|
**Example:**
|
||||||
|
```python
|
||||||
|
# Source: https://docs.python.org/3/library/asyncio-task.html
|
||||||
|
import asyncio
|
||||||
|
from datetime import datetime, timezone
|
||||||
|
|
||||||
|
class SessionIdleTimer:
|
||||||
|
"""Manages idle timeout for a session."""
|
||||||
|
|
||||||
|
def __init__(self, session_name: str, timeout_seconds: int, on_timeout: callable):
|
||||||
|
self.session_name = session_name
|
||||||
|
self.timeout_seconds = timeout_seconds
|
||||||
|
self.on_timeout = on_timeout
|
||||||
|
self._timer_task: Optional[asyncio.Task] = None
|
||||||
|
self._last_activity = datetime.now(timezone.utc)
|
||||||
|
|
||||||
|
def reset(self):
|
||||||
|
"""Reset idle timer on activity."""
|
||||||
|
self._last_activity = datetime.now(timezone.utc)
|
||||||
|
|
||||||
|
# Cancel existing timer
|
||||||
|
if self._timer_task and not self._timer_task.done():
|
||||||
|
self._timer_task.cancel()
|
||||||
|
|
||||||
|
# Start new timer
|
||||||
|
self._timer_task = asyncio.create_task(self._wait_for_timeout())
|
||||||
|
|
||||||
|
async def _wait_for_timeout(self):
|
||||||
|
"""Wait for timeout duration, then fire callback."""
|
||||||
|
try:
|
||||||
|
await asyncio.sleep(self.timeout_seconds)
|
||||||
|
|
||||||
|
# Timeout reached - fire callback
|
||||||
|
await self.on_timeout(self.session_name)
|
||||||
|
except asyncio.CancelledError:
|
||||||
|
# Timer was reset by activity
|
||||||
|
pass
|
||||||
|
|
||||||
|
def cancel(self):
|
||||||
|
"""Cancel idle timer on session shutdown."""
|
||||||
|
if self._timer_task and not self._timer_task.done():
|
||||||
|
self._timer_task.cancel()
|
||||||
|
|
||||||
|
# Usage in bot
|
||||||
|
idle_timers: dict[str, SessionIdleTimer] = {}
|
||||||
|
|
||||||
|
async def on_message_complete(session_name: str):
|
||||||
|
"""Called when Claude finishes processing."""
|
||||||
|
# Start idle timer after completion
|
||||||
|
if session_name not in idle_timers:
|
||||||
|
timeout = get_session_timeout(session_name) # From metadata
|
||||||
|
idle_timers[session_name] = SessionIdleTimer(
|
||||||
|
session_name,
|
||||||
|
timeout,
|
||||||
|
on_timeout=suspend_session
|
||||||
|
)
|
||||||
|
|
||||||
|
idle_timers[session_name].reset()
|
||||||
|
|
||||||
|
async def on_user_message(session_name: str, message: str):
|
||||||
|
"""Called when user sends message."""
|
||||||
|
# Reset timer on activity
|
||||||
|
if session_name in idle_timers:
|
||||||
|
idle_timers[session_name].reset()
|
||||||
|
|
||||||
|
# Send to Claude...
|
||||||
|
```
|
||||||
|
|
||||||
|
### Pattern 2: Graceful Subprocess Termination
|
||||||
|
**What:** Send SIGTERM, wait for clean exit with timeout, SIGKILL if needed
|
||||||
|
**When to use:** Suspending session, bot shutdown, session archival
|
||||||
|
**Example:**
|
||||||
|
```python
|
||||||
|
# Source: https://roguelynn.com/words/asyncio-graceful-shutdowns/
|
||||||
|
import asyncio
|
||||||
|
import signal
|
||||||
|
|
||||||
|
async def terminate_subprocess_gracefully(
|
||||||
|
process: asyncio.subprocess.Process,
|
||||||
|
timeout: int = 5
|
||||||
|
) -> None:
|
||||||
|
"""
|
||||||
|
Terminate subprocess with graceful shutdown.
|
||||||
|
|
||||||
|
1. Close stdin to signal end of input
|
||||||
|
2. Send SIGTERM for graceful shutdown
|
||||||
|
3. Wait up to timeout seconds
|
||||||
|
4. SIGKILL if still running
|
||||||
|
5. Always reap process to prevent zombie
|
||||||
|
"""
|
||||||
|
if not process or process.returncode is not None:
|
||||||
|
return # Already terminated
|
||||||
|
|
||||||
|
try:
|
||||||
|
# Close stdin to signal no more input
|
||||||
|
if process.stdin:
|
||||||
|
process.stdin.close()
|
||||||
|
await process.stdin.wait_closed()
|
||||||
|
|
||||||
|
# Send SIGTERM for graceful shutdown
|
||||||
|
process.terminate()
|
||||||
|
|
||||||
|
# Wait for clean exit
|
||||||
|
try:
|
||||||
|
await asyncio.wait_for(process.wait(), timeout=timeout)
|
||||||
|
logger.info(f"Process {process.pid} terminated gracefully")
|
||||||
|
except asyncio.TimeoutError:
|
||||||
|
# Timeout - force kill
|
||||||
|
logger.warning(f"Process {process.pid} did not terminate, sending SIGKILL")
|
||||||
|
process.kill()
|
||||||
|
await process.wait() # CRITICAL: Always reap to prevent zombie
|
||||||
|
logger.info(f"Process {process.pid} killed")
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
logger.error(f"Error terminating process: {e}")
|
||||||
|
# Force kill as last resort
|
||||||
|
try:
|
||||||
|
process.kill()
|
||||||
|
await process.wait()
|
||||||
|
except:
|
||||||
|
pass
|
||||||
|
```
|
||||||
|
|
||||||
|
### Pattern 3: Session Resume with --continue
|
||||||
|
**What:** Spawn subprocess with `--continue` flag to restore conversation from `.claude/` state
|
||||||
|
**When to use:** First message after suspension, bot restart resuming active session
|
||||||
|
**Example:**
|
||||||
|
```python
|
||||||
|
# Source: https://code.claude.com/docs/en/cli-reference
|
||||||
|
async def resume_session(session_name: str) -> ClaudeSubprocess:
|
||||||
|
"""
|
||||||
|
Resume suspended session by spawning subprocess with --continue.
|
||||||
|
|
||||||
|
Claude Code automatically loads conversation history from .claude/
|
||||||
|
directory in session folder.
|
||||||
|
"""
|
||||||
|
session_dir = get_session_dir(session_name)
|
||||||
|
persona = load_persona_for_session(session_name)
|
||||||
|
|
||||||
|
# Check if .claude directory exists (has prior conversation)
|
||||||
|
has_history = (session_dir / ".claude").exists()
|
||||||
|
|
||||||
|
cmd = [
|
||||||
|
'claude',
|
||||||
|
'-p',
|
||||||
|
'--input-format', 'stream-json',
|
||||||
|
'--output-format', 'stream-json',
|
||||||
|
'--verbose',
|
||||||
|
'--dangerously-skip-permissions',
|
||||||
|
]
|
||||||
|
|
||||||
|
# Add --continue if session has history
|
||||||
|
if has_history:
|
||||||
|
cmd.append('--continue')
|
||||||
|
logger.info(f"Resuming session '{session_name}' with --continue")
|
||||||
|
else:
|
||||||
|
logger.info(f"Starting fresh session '{session_name}'")
|
||||||
|
|
||||||
|
# Add persona settings (model, system prompt, etc)
|
||||||
|
if persona:
|
||||||
|
settings = persona.get('settings', {})
|
||||||
|
if 'model' in settings:
|
||||||
|
cmd.extend(['--model', settings['model']])
|
||||||
|
if 'system_prompt' in persona:
|
||||||
|
cmd.extend(['--append-system-prompt', persona['system_prompt']])
|
||||||
|
|
||||||
|
# Spawn subprocess
|
||||||
|
subprocess = ClaudeSubprocess(
|
||||||
|
session_dir=session_dir,
|
||||||
|
persona=persona,
|
||||||
|
on_output=...,
|
||||||
|
on_error=...,
|
||||||
|
on_complete=lambda: on_message_complete(session_name),
|
||||||
|
on_status=...,
|
||||||
|
on_tool_use=...,
|
||||||
|
)
|
||||||
|
await subprocess.start()
|
||||||
|
|
||||||
|
return subprocess
|
||||||
|
```
|
||||||
|
|
||||||
|
### Pattern 4: Bot Shutdown with Subprocess Cleanup
|
||||||
|
**What:** Signal handler to cancel all idle timers and terminate all subprocesses on SIGTERM/SIGINT
|
||||||
|
**When to use:** Bot stop, systemctl stop, Ctrl+C
|
||||||
|
**Example:**
|
||||||
|
```python
|
||||||
|
# Source: https://roguelynn.com/words/asyncio-graceful-shutdowns/ +
|
||||||
|
# https://github.com/wbenny/python-graceful-shutdown
|
||||||
|
import signal
|
||||||
|
import asyncio
|
||||||
|
|
||||||
|
async def shutdown(sig: signal.Signals, loop: asyncio.AbstractEventLoop):
|
||||||
|
"""
|
||||||
|
Graceful shutdown handler for bot.
|
||||||
|
|
||||||
|
1. Log signal received
|
||||||
|
2. Cancel all idle timers
|
||||||
|
3. Terminate all subprocesses gracefully
|
||||||
|
4. Cancel all outstanding tasks
|
||||||
|
5. Stop event loop
|
||||||
|
"""
|
||||||
|
logger.info(f"Received exit signal {sig.name}")
|
||||||
|
|
||||||
|
# Cancel all idle timers
|
||||||
|
for timer in idle_timers.values():
|
||||||
|
timer.cancel()
|
||||||
|
|
||||||
|
# Terminate all active subprocesses
|
||||||
|
termination_tasks = []
|
||||||
|
for session_name, subprocess in subprocesses.items():
|
||||||
|
if subprocess.is_alive:
|
||||||
|
logger.info(f"Terminating subprocess for session '{session_name}'")
|
||||||
|
termination_tasks.append(
|
||||||
|
terminate_subprocess_gracefully(subprocess._process, timeout=5)
|
||||||
|
)
|
||||||
|
|
||||||
|
# Wait for all terminations to complete
|
||||||
|
if termination_tasks:
|
||||||
|
await asyncio.gather(*termination_tasks, return_exceptions=True)
|
||||||
|
|
||||||
|
# Cancel all other tasks
|
||||||
|
tasks = [t for t in asyncio.all_tasks() if t is not asyncio.current_task()]
|
||||||
|
for task in tasks:
|
||||||
|
task.cancel()
|
||||||
|
|
||||||
|
# Wait for cancellation, ignore exceptions
|
||||||
|
await asyncio.gather(*tasks, return_exceptions=True)
|
||||||
|
|
||||||
|
# Stop the loop
|
||||||
|
loop.stop()
|
||||||
|
|
||||||
|
# Install signal handlers on startup
|
||||||
|
def main():
|
||||||
|
app = Application.builder().token(TOKEN).build()
|
||||||
|
|
||||||
|
# Add signal handlers
|
||||||
|
loop = asyncio.get_event_loop()
|
||||||
|
signals = (signal.SIGTERM, signal.SIGINT)
|
||||||
|
for sig in signals:
|
||||||
|
loop.add_signal_handler(
|
||||||
|
sig,
|
||||||
|
lambda s=sig: asyncio.create_task(shutdown(s, loop))
|
||||||
|
)
|
||||||
|
|
||||||
|
# Start bot
|
||||||
|
app.run_polling()
|
||||||
|
```
|
||||||
|
|
||||||
|
### Pattern 5: Session Metadata for Timeout Configuration
|
||||||
|
**What:** Store idle_timeout in session metadata, allow per-session customization via /timeout command
|
||||||
|
**When to use:** Session creation, /timeout command handler
|
||||||
|
**Example:**
|
||||||
|
```python
|
||||||
|
# Session metadata structure
|
||||||
|
{
|
||||||
|
"name": "task-session",
|
||||||
|
"created": "2026-02-04T12:00:00+00:00",
|
||||||
|
"last_active": "2026-02-04T12:30:00+00:00",
|
||||||
|
"persona": "default",
|
||||||
|
"pid": null,
|
||||||
|
"status": "suspended",
|
||||||
|
"idle_timeout": 600 # seconds (10 minutes)
|
||||||
|
}
|
||||||
|
|
||||||
|
# /timeout command handler
|
||||||
|
async def timeout_cmd(update: Update, context: ContextTypes.DEFAULT_TYPE):
|
||||||
|
"""Set idle timeout for active session."""
|
||||||
|
if not context.args:
|
||||||
|
# Show current timeout
|
||||||
|
active = session_manager.get_active_session()
|
||||||
|
if not active:
|
||||||
|
await update.message.reply_text("No active session")
|
||||||
|
return
|
||||||
|
|
||||||
|
metadata = session_manager.get_session(active)
|
||||||
|
timeout = metadata.get('idle_timeout', 600)
|
||||||
|
await update.message.reply_text(
|
||||||
|
f"Current idle timeout: {timeout // 60} minutes\n\n"
|
||||||
|
f"Usage: /timeout <minutes>"
|
||||||
|
)
|
||||||
|
return
|
||||||
|
|
||||||
|
# Parse timeout value
|
||||||
|
try:
|
||||||
|
minutes = int(context.args[0])
|
||||||
|
if minutes < 1 or minutes > 120:
|
||||||
|
await update.message.reply_text("Timeout must be between 1 and 120 minutes")
|
||||||
|
return
|
||||||
|
|
||||||
|
timeout_seconds = minutes * 60
|
||||||
|
except ValueError:
|
||||||
|
await update.message.reply_text("Invalid number. Usage: /timeout <minutes>")
|
||||||
|
return
|
||||||
|
|
||||||
|
# Update session metadata
|
||||||
|
active = session_manager.get_active_session()
|
||||||
|
session_manager.update_session(active, idle_timeout=timeout_seconds)
|
||||||
|
|
||||||
|
# Restart idle timer with new timeout
|
||||||
|
if active in idle_timers:
|
||||||
|
idle_timers[active].timeout_seconds = timeout_seconds
|
||||||
|
idle_timers[active].reset()
|
||||||
|
|
||||||
|
await update.message.reply_text(f"Idle timeout set to {minutes} minutes")
|
||||||
|
```
|
||||||
|
|
||||||
|
### Pattern 6: /sessions Command with Status Display
|
||||||
|
**What:** List all sessions with name, status, persona, last active time, sorted by activity
|
||||||
|
**When to use:** User wants to see session overview
|
||||||
|
**Example:**
|
||||||
|
```python
|
||||||
|
async def sessions_cmd(update: Update, context: ContextTypes.DEFAULT_TYPE):
|
||||||
|
"""List all sessions sorted by last activity."""
|
||||||
|
sessions = session_manager.list_sessions()
|
||||||
|
|
||||||
|
if not sessions:
|
||||||
|
await update.message.reply_text("No sessions found. Use /new <name> to create one.")
|
||||||
|
return
|
||||||
|
|
||||||
|
active_session = session_manager.get_active_session()
|
||||||
|
|
||||||
|
# Build formatted list
|
||||||
|
lines = ["*Sessions:*\n"]
|
||||||
|
for session in sessions: # Already sorted by last_active
|
||||||
|
name = session['name']
|
||||||
|
status = session['status']
|
||||||
|
persona = session.get('persona', 'default')
|
||||||
|
last_active = session.get('last_active', 'unknown')
|
||||||
|
|
||||||
|
# Format timestamp
|
||||||
|
try:
|
||||||
|
dt = datetime.fromisoformat(last_active)
|
||||||
|
time_str = dt.strftime('%Y-%m-%d %H:%M')
|
||||||
|
except:
|
||||||
|
time_str = 'unknown'
|
||||||
|
|
||||||
|
# Mark active session
|
||||||
|
marker = "→ " if name == active_session else " "
|
||||||
|
|
||||||
|
# Status emoji
|
||||||
|
emoji = "🟢" if status == "active" else "🔵" if status == "idle" else "⚪"
|
||||||
|
|
||||||
|
lines.append(
|
||||||
|
f"{marker}{emoji} `{name}` ({persona})\n"
|
||||||
|
f" {time_str}"
|
||||||
|
)
|
||||||
|
|
||||||
|
await update.message.reply_text("\n".join(lines), parse_mode='Markdown')
|
||||||
|
```
|
||||||
|
|
||||||
|
### Anti-Patterns to Avoid
|
||||||
|
- **Suspending during processing:** Never suspend while `subprocess.is_busy` is True — will lose in-progress work
|
||||||
|
- **Not resetting timer on user message:** If idle timer only resets on completion, user's message during timeout window gets ignored
|
||||||
|
- **Zombie processes on bot crash:** Without signal handlers, subprocess outlives bot and becomes zombie (orphaned)
|
||||||
|
- **SIGSTOP without resource consideration:** Paused processes hold memory, file handles, network sockets — unsafe for long idle periods
|
||||||
|
- **Shared idle timer for all sessions:** Different sessions have different needs (task vs chat), per-session timeout is more flexible
|
||||||
|
|
||||||
|
## Don't Hand-Roll
|
||||||
|
|
||||||
|
Problems that look simple but have existing solutions:
|
||||||
|
|
||||||
|
| Problem | Don't Build | Use Instead | Why |
|
||||||
|
|---------|-------------|-------------|-----|
|
||||||
|
| Idle timeout detection | Manual timestamp checks in loop | asyncio.Event + asyncio.sleep() | Event-based cancellation is cleaner, no polling overhead |
|
||||||
|
| Graceful shutdown | Just process.terminate() | SIGTERM + timeout + SIGKILL pattern | Prevents zombie processes, handles hung processes |
|
||||||
|
| Per-object timers | Single global timeout thread | asyncio.create_task per session | Native async integration, automatic cleanup |
|
||||||
|
| Resume conversation | Manual state serialization | Claude Code --continue flag | Built-in, tested, handles all edge cases |
|
||||||
|
|
||||||
|
**Key insight:** Process lifecycle management has subtle races (subprocess dies mid-shutdown, signal arrives during cleanup, timer fires after cancellation). Using battle-tested patterns (signal handlers, timeout with fallback, event-based cancellation) prevents these races. Don't reinvent async subprocess management.
|
||||||
|
|
||||||
|
## Common Pitfalls
|
||||||
|
|
||||||
|
### Pitfall 1: Race Between Timer Fire and User Message
|
||||||
|
**What goes wrong:** Idle timer fires (subprocess terminated), user message arrives during termination, new subprocess spawns, old one still dying — two subprocesses running
|
||||||
|
**Why it happens:** Timer callback and message handler run concurrently. No synchronization between timer firing and subprocess state change.
|
||||||
|
**How to avoid:** Use asyncio.Lock around subprocess state transitions (terminate, spawn). Timer callback acquires lock before terminating, message handler acquires lock before spawning.
|
||||||
|
**Warning signs:** Duplicate responses, sessions becoming unresponsive, "subprocess already running" errors
|
||||||
|
|
||||||
|
```python
|
||||||
|
# WRONG - No synchronization
|
||||||
|
async def on_timeout(session_name):
|
||||||
|
await terminate_subprocess(session_name)
|
||||||
|
|
||||||
|
async def on_message(session_name, message):
|
||||||
|
subprocess = await spawn_subprocess(session_name)
|
||||||
|
await subprocess.send_message(message)
|
||||||
|
|
||||||
|
# RIGHT - Lock around transitions
|
||||||
|
subprocess_locks: dict[str, asyncio.Lock] = {}
|
||||||
|
|
||||||
|
async def on_timeout(session_name):
|
||||||
|
async with subprocess_locks[session_name]:
|
||||||
|
await terminate_subprocess(session_name)
|
||||||
|
|
||||||
|
async def on_message(session_name, message):
|
||||||
|
async with subprocess_locks[session_name]:
|
||||||
|
if not subprocess_exists(session_name):
|
||||||
|
await spawn_subprocess(session_name)
|
||||||
|
await subprocess.send_message(message)
|
||||||
|
```
|
||||||
|
|
||||||
|
### Pitfall 2: Terminating Subprocess During Tool Execution
|
||||||
|
**What goes wrong:** Claude is running a long tool (git clone, npm install), idle timer fires, subprocess terminated mid-operation, corrupted state
|
||||||
|
**Why it happens:** Idle timer only checks elapsed time since last message, doesn't check if subprocess is actively executing tools.
|
||||||
|
**How to avoid:** Track subprocess busy state (`is_busy` flag set during processing). Only start idle timer after `on_complete` callback fires (subprocess is truly idle).
|
||||||
|
**Warning signs:** Corrupted git repos, partial file writes, timeout errors from tools
|
||||||
|
|
||||||
|
```python
|
||||||
|
# WRONG - Timer starts immediately after message send
|
||||||
|
await subprocess.send_message(message)
|
||||||
|
idle_timers[session_name].reset() # Bad: Claude still processing
|
||||||
|
|
||||||
|
# RIGHT - Timer starts after completion
|
||||||
|
await subprocess.send_message(message)
|
||||||
|
# ... subprocess processes, calls tools, emits result event ...
|
||||||
|
# on_complete callback fires
|
||||||
|
async def on_complete():
|
||||||
|
idle_timers[session_name].reset() # Good: Claude is truly idle
|
||||||
|
```
|
||||||
|
|
||||||
|
### Pitfall 3: Not Canceling Idle Timer on Session Switch
|
||||||
|
**What goes wrong:** Switch from session A to session B, session A's timer fires 5 minutes later, terminates session A subprocess (which might have been switched back to)
|
||||||
|
**Why it happens:** Session switch doesn't cancel old session's timer, timer continues running independently
|
||||||
|
**How to avoid:** When switching sessions, don't cancel old timer — let it run. Old subprocess suspends on its own timer. This allows multiple concurrent sessions with independent lifetimes.
|
||||||
|
**Warning signs:** Sessions suspend unexpectedly after switching away and back
|
||||||
|
|
||||||
|
```python
|
||||||
|
# CORRECT - Don't cancel old timer on switch
|
||||||
|
async def switch_session(new_session_name):
|
||||||
|
old_session = get_active_session()
|
||||||
|
|
||||||
|
# Don't touch old session's timer - let it suspend naturally
|
||||||
|
# if old_session in idle_timers:
|
||||||
|
# idle_timers[old_session].cancel() # NO
|
||||||
|
|
||||||
|
set_active_session(new_session_name)
|
||||||
|
|
||||||
|
# Start new session's timer if needed
|
||||||
|
if new_session_name not in idle_timers:
|
||||||
|
# Create timer for new session
|
||||||
|
pass
|
||||||
|
```
|
||||||
|
|
||||||
|
### Pitfall 4: Subprocess Outlives Bot on Crash
|
||||||
|
**What goes wrong:** Bot crashes or is killed with SIGKILL, signal handlers never run, subprocesses become orphans, eat memory/CPU
|
||||||
|
**Why it happens:** SIGKILL can't be caught (by design), no cleanup code runs
|
||||||
|
**How to avoid:** Can't prevent SIGKILL zombies, but minimize with: (1) Store PID in session metadata, check on bot restart, (2) Use systemd with KillMode=control-group to kill all child processes, (3) Bot startup cleanup: scan for orphaned pids from metadata
|
||||||
|
**Warning signs:** Multiple claude processes running after bot restart, memory usage grows over time
|
||||||
|
|
||||||
|
```python
|
||||||
|
# Startup cleanup - kill orphaned subprocesses
|
||||||
|
async def cleanup_orphaned_subprocesses():
|
||||||
|
"""Kill any subprocesses that outlived previous bot run."""
|
||||||
|
sessions = session_manager.list_sessions()
|
||||||
|
|
||||||
|
for session in sessions:
|
||||||
|
pid = session.get('pid')
|
||||||
|
if pid:
|
||||||
|
# Check if process still exists
|
||||||
|
try:
|
||||||
|
os.kill(pid, 0) # Signal 0 = check existence
|
||||||
|
# Process exists - kill it
|
||||||
|
logger.warning(f"Killing orphaned subprocess: PID {pid}")
|
||||||
|
os.kill(pid, signal.SIGTERM)
|
||||||
|
await asyncio.sleep(2)
|
||||||
|
try:
|
||||||
|
os.kill(pid, signal.SIGKILL)
|
||||||
|
except ProcessLookupError:
|
||||||
|
pass # Already dead
|
||||||
|
except ProcessLookupError:
|
||||||
|
pass # Already dead
|
||||||
|
|
||||||
|
# Clear PID from metadata
|
||||||
|
session_manager.update_session(session['name'], pid=None, status='suspended')
|
||||||
|
```
|
||||||
|
|
||||||
|
### Pitfall 5: Storing Stale PIDs in Metadata
|
||||||
|
**What goes wrong:** Session metadata shows pid=12345, but subprocess already terminated. On bot restart, try to kill PID 12345 which is now a different process.
|
||||||
|
**Why it happens:** Subprocess crashes or is manually killed, metadata not updated
|
||||||
|
**How to avoid:** Clear PID from metadata when subprocess terminates (exit code detected). Before killing PID from metadata, verify it's a claude process (check /proc/{pid}/cmdline on Linux).
|
||||||
|
**Warning signs:** Bot kills wrong processes on restart, random crashes
|
||||||
|
|
||||||
|
```python
|
||||||
|
# Safe PID cleanup with verification
|
||||||
|
async def kill_subprocess_by_pid(pid: int):
|
||||||
|
"""Kill subprocess with PID verification."""
|
||||||
|
try:
|
||||||
|
# Verify it's a claude process (Linux-specific)
|
||||||
|
cmdline_path = f"/proc/{pid}/cmdline"
|
||||||
|
if os.path.exists(cmdline_path):
|
||||||
|
with open(cmdline_path) as f:
|
||||||
|
cmdline = f.read()
|
||||||
|
if 'claude' not in cmdline:
|
||||||
|
logger.warning(f"PID {pid} is not a claude process: {cmdline}")
|
||||||
|
return # Don't kill
|
||||||
|
|
||||||
|
# Kill the process
|
||||||
|
os.kill(pid, signal.SIGTERM)
|
||||||
|
await asyncio.sleep(2)
|
||||||
|
try:
|
||||||
|
os.kill(pid, signal.SIGKILL)
|
||||||
|
except ProcessLookupError:
|
||||||
|
pass
|
||||||
|
except ProcessLookupError:
|
||||||
|
pass # Already dead
|
||||||
|
except Exception as e:
|
||||||
|
logger.error(f"Error killing PID {pid}: {e}")
|
||||||
|
```
|
||||||
|
|
||||||
|
## Code Examples
|
||||||
|
|
||||||
|
Verified patterns from official sources:
|
||||||
|
|
||||||
|
### Complete Idle Timer Implementation
|
||||||
|
```python
|
||||||
|
# Source: https://docs.python.org/3/library/asyncio-task.html
|
||||||
|
import asyncio
|
||||||
|
from datetime import datetime, timezone
|
||||||
|
from typing import Callable, Optional
|
||||||
|
|
||||||
|
class SessionIdleTimer:
|
||||||
|
"""
|
||||||
|
Per-session idle timeout manager.
|
||||||
|
|
||||||
|
Tracks last activity, spawns background task to fire after timeout.
|
||||||
|
Cancels and restarts timer on activity (reset).
|
||||||
|
"""
|
||||||
|
|
||||||
|
def __init__(
|
||||||
|
self,
|
||||||
|
session_name: str,
|
||||||
|
timeout_seconds: int,
|
||||||
|
on_timeout: Callable[[str], None]
|
||||||
|
):
|
||||||
|
"""
|
||||||
|
Args:
|
||||||
|
session_name: Session identifier
|
||||||
|
timeout_seconds: Idle seconds before firing
|
||||||
|
on_timeout: Async callback(session_name) to invoke on timeout
|
||||||
|
"""
|
||||||
|
self.session_name = session_name
|
||||||
|
self.timeout_seconds = timeout_seconds
|
||||||
|
self.on_timeout = on_timeout
|
||||||
|
self._timer_task: Optional[asyncio.Task] = None
|
||||||
|
self._last_activity = datetime.now(timezone.utc)
|
||||||
|
|
||||||
|
def reset(self):
|
||||||
|
"""Reset timer on activity (user message or completion)."""
|
||||||
|
self._last_activity = datetime.now(timezone.utc)
|
||||||
|
|
||||||
|
# Cancel existing timer
|
||||||
|
if self._timer_task and not self._timer_task.done():
|
||||||
|
self._timer_task.cancel()
|
||||||
|
|
||||||
|
# Start fresh timer
|
||||||
|
self._timer_task = asyncio.create_task(self._wait_for_timeout())
|
||||||
|
|
||||||
|
async def _wait_for_timeout(self):
|
||||||
|
"""Background task that waits for timeout duration."""
|
||||||
|
try:
|
||||||
|
await asyncio.sleep(self.timeout_seconds)
|
||||||
|
|
||||||
|
# Timeout reached - invoke callback
|
||||||
|
await self.on_timeout(self.session_name)
|
||||||
|
except asyncio.CancelledError:
|
||||||
|
# Timer was reset by activity
|
||||||
|
pass
|
||||||
|
|
||||||
|
def cancel(self):
|
||||||
|
"""Cancel timer on session shutdown."""
|
||||||
|
if self._timer_task and not self._timer_task.done():
|
||||||
|
self._timer_task.cancel()
|
||||||
|
|
||||||
|
@property
|
||||||
|
def seconds_since_activity(self) -> float:
|
||||||
|
"""Get seconds elapsed since last activity."""
|
||||||
|
delta = datetime.now(timezone.utc) - self._last_activity
|
||||||
|
return delta.total_seconds()
|
||||||
|
```
|
||||||
|
|
||||||
|
### Graceful Subprocess Termination with Timeout
|
||||||
|
```python
|
||||||
|
# Source: https://roguelynn.com/words/asyncio-graceful-shutdowns/
|
||||||
|
import asyncio
|
||||||
|
import signal
|
||||||
|
import logging
|
||||||
|
|
||||||
|
logger = logging.getLogger(__name__)
|
||||||
|
|
||||||
|
async def terminate_subprocess_gracefully(
|
||||||
|
process: asyncio.subprocess.Process,
|
||||||
|
timeout: int = 5
|
||||||
|
) -> None:
|
||||||
|
"""
|
||||||
|
Terminate subprocess with graceful shutdown sequence.
|
||||||
|
|
||||||
|
1. Close stdin (signal end of input)
|
||||||
|
2. Send SIGTERM (request graceful shutdown)
|
||||||
|
3. Wait up to timeout seconds
|
||||||
|
4. Send SIGKILL if still running (force kill)
|
||||||
|
5. Always reap process (prevent zombie)
|
||||||
|
|
||||||
|
Args:
|
||||||
|
process: asyncio subprocess to terminate
|
||||||
|
timeout: Seconds to wait before SIGKILL
|
||||||
|
"""
|
||||||
|
if not process or process.returncode is not None:
|
||||||
|
logger.debug("Process already terminated")
|
||||||
|
return
|
||||||
|
|
||||||
|
pid = process.pid
|
||||||
|
logger.info(f"Terminating subprocess PID {pid}")
|
||||||
|
|
||||||
|
try:
|
||||||
|
# Close stdin to signal no more input
|
||||||
|
if process.stdin and not process.stdin.is_closing():
|
||||||
|
process.stdin.close()
|
||||||
|
await process.stdin.wait_closed()
|
||||||
|
|
||||||
|
# Send SIGTERM for graceful exit
|
||||||
|
process.terminate()
|
||||||
|
|
||||||
|
# Wait for clean exit with timeout
|
||||||
|
try:
|
||||||
|
await asyncio.wait_for(process.wait(), timeout=timeout)
|
||||||
|
logger.info(f"Process {pid} terminated gracefully")
|
||||||
|
except asyncio.TimeoutError:
|
||||||
|
# Timeout - force kill
|
||||||
|
logger.warning(f"Process {pid} did not exit within {timeout}s, sending SIGKILL")
|
||||||
|
process.kill()
|
||||||
|
await process.wait() # CRITICAL: Reap to prevent zombie
|
||||||
|
logger.info(f"Process {pid} killed")
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
logger.error(f"Error terminating process {pid}: {e}")
|
||||||
|
# Last resort force kill
|
||||||
|
try:
|
||||||
|
process.kill()
|
||||||
|
await process.wait()
|
||||||
|
except:
|
||||||
|
pass
|
||||||
|
```
|
||||||
|
|
||||||
|
### Bot Shutdown Signal Handler
|
||||||
|
```python
|
||||||
|
# Source: https://roguelynn.com/words/asyncio-graceful-shutdowns/ +
|
||||||
|
# https://github.com/wbenny/python-graceful-shutdown
|
||||||
|
import signal
|
||||||
|
import asyncio
|
||||||
|
import logging
|
||||||
|
|
||||||
|
logger = logging.getLogger(__name__)
|
||||||
|
|
||||||
|
async def shutdown_handler(
|
||||||
|
sig: signal.Signals,
|
||||||
|
loop: asyncio.AbstractEventLoop,
|
||||||
|
idle_timers: dict,
|
||||||
|
subprocesses: dict
|
||||||
|
):
|
||||||
|
"""
|
||||||
|
Graceful shutdown handler for bot.
|
||||||
|
|
||||||
|
Invoked on SIGTERM/SIGINT to clean up before exit.
|
||||||
|
|
||||||
|
Steps:
|
||||||
|
1. Log signal received
|
||||||
|
2. Cancel all idle timers
|
||||||
|
3. Terminate all subprocesses with timeout
|
||||||
|
4. Cancel all other asyncio tasks
|
||||||
|
5. Stop event loop
|
||||||
|
|
||||||
|
Args:
|
||||||
|
sig: Signal that triggered shutdown
|
||||||
|
loop: Event loop to stop
|
||||||
|
idle_timers: Dict of SessionIdleTimer objects
|
||||||
|
subprocesses: Dict of ClaudeSubprocess objects
|
||||||
|
"""
|
||||||
|
logger.info(f"Received exit signal {sig.name}, initiating graceful shutdown")
|
||||||
|
|
||||||
|
# Step 1: Cancel all idle timers
|
||||||
|
logger.info("Canceling idle timers...")
|
||||||
|
for session_name, timer in idle_timers.items():
|
||||||
|
timer.cancel()
|
||||||
|
|
||||||
|
# Step 2: Terminate all active subprocesses
|
||||||
|
logger.info("Terminating subprocesses...")
|
||||||
|
termination_tasks = []
|
||||||
|
for session_name, subprocess in subprocesses.items():
|
||||||
|
if subprocess.is_alive:
|
||||||
|
logger.info(f"Terminating subprocess for '{session_name}'")
|
||||||
|
termination_tasks.append(
|
||||||
|
terminate_subprocess_gracefully(subprocess._process, timeout=5)
|
||||||
|
)
|
||||||
|
|
||||||
|
# Wait for all terminations (with exceptions handled)
|
||||||
|
if termination_tasks:
|
||||||
|
await asyncio.gather(*termination_tasks, return_exceptions=True)
|
||||||
|
|
||||||
|
# Step 3: Cancel all other asyncio tasks
|
||||||
|
logger.info("Canceling remaining tasks...")
|
||||||
|
tasks = [t for t in asyncio.all_tasks() if t is not asyncio.current_task()]
|
||||||
|
for task in tasks:
|
||||||
|
task.cancel()
|
||||||
|
|
||||||
|
# Wait for cancellations, ignore exceptions
|
||||||
|
await asyncio.gather(*tasks, return_exceptions=True)
|
||||||
|
|
||||||
|
# Step 4: Stop event loop
|
||||||
|
logger.info("Stopping event loop")
|
||||||
|
loop.stop()
|
||||||
|
|
||||||
|
# Install signal handlers in main()
|
||||||
|
def main():
|
||||||
|
"""Bot entry point with signal handler installation."""
|
||||||
|
app = Application.builder().token(TOKEN).build()
|
||||||
|
|
||||||
|
# Get event loop
|
||||||
|
loop = asyncio.get_event_loop()
|
||||||
|
|
||||||
|
# Install signal handlers for graceful shutdown
|
||||||
|
signals_to_handle = (signal.SIGTERM, signal.SIGINT)
|
||||||
|
for sig in signals_to_handle:
|
||||||
|
loop.add_signal_handler(
|
||||||
|
sig,
|
||||||
|
lambda s=sig: asyncio.create_task(
|
||||||
|
shutdown_handler(s, loop, idle_timers, subprocesses)
|
||||||
|
)
|
||||||
|
)
|
||||||
|
|
||||||
|
logger.info("Signal handlers installed")
|
||||||
|
|
||||||
|
# Start bot
|
||||||
|
app.run_polling()
|
||||||
|
```
|
||||||
|
|
||||||
|
### Session Resume with Status Message
|
||||||
|
```python
|
||||||
|
# Source: https://code.claude.com/docs/en/cli-reference
|
||||||
|
from datetime import datetime, timezone
|
||||||
|
|
||||||
|
async def resume_suspended_session(
|
||||||
|
bot,
|
||||||
|
chat_id: int,
|
||||||
|
session_name: str,
|
||||||
|
message: str
|
||||||
|
) -> None:
|
||||||
|
"""
|
||||||
|
Resume suspended session and send message.
|
||||||
|
|
||||||
|
Sends brief status message to user, spawns subprocess with --continue,
|
||||||
|
sends user's message to Claude.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
bot: Telegram bot instance
|
||||||
|
chat_id: Telegram chat ID
|
||||||
|
session_name: Session to resume
|
||||||
|
message: User message to send after resume
|
||||||
|
"""
|
||||||
|
metadata = session_manager.get_session(session_name)
|
||||||
|
|
||||||
|
# Calculate idle duration
|
||||||
|
last_active = datetime.fromisoformat(metadata['last_active'])
|
||||||
|
now = datetime.now(timezone.utc)
|
||||||
|
idle_minutes = (now - last_active).total_seconds() / 60
|
||||||
|
|
||||||
|
# Send status message
|
||||||
|
if idle_minutes > 1:
|
||||||
|
status_text = f"Resuming session (idle for {int(idle_minutes)} min)..."
|
||||||
|
else:
|
||||||
|
status_text = "Resuming session..."
|
||||||
|
|
||||||
|
await bot.send_message(chat_id=chat_id, text=status_text)
|
||||||
|
|
||||||
|
# Spawn subprocess with --continue
|
||||||
|
session_dir = session_manager.get_session_dir(session_name)
|
||||||
|
persona = load_persona_for_session(session_name)
|
||||||
|
|
||||||
|
callbacks = make_callbacks(bot, chat_id, session_name)
|
||||||
|
|
||||||
|
subprocess = ClaudeSubprocess(
|
||||||
|
session_dir=session_dir,
|
||||||
|
persona=persona,
|
||||||
|
on_output=callbacks['on_output'],
|
||||||
|
on_error=callbacks['on_error'],
|
||||||
|
on_complete=lambda: on_completion(session_name),
|
||||||
|
on_status=callbacks['on_status'],
|
||||||
|
on_tool_use=callbacks['on_tool_use'],
|
||||||
|
)
|
||||||
|
|
||||||
|
await subprocess.start()
|
||||||
|
subprocesses[session_name] = subprocess
|
||||||
|
|
||||||
|
# Update metadata
|
||||||
|
session_manager.update_session(
|
||||||
|
session_name,
|
||||||
|
status='active',
|
||||||
|
last_active=now.isoformat(),
|
||||||
|
pid=subprocess._process.pid
|
||||||
|
)
|
||||||
|
|
||||||
|
# Send user's message
|
||||||
|
await subprocess.send_message(message)
|
||||||
|
|
||||||
|
# Start idle timer
|
||||||
|
timeout = metadata.get('idle_timeout', 600)
|
||||||
|
idle_timers[session_name] = SessionIdleTimer(
|
||||||
|
session_name,
|
||||||
|
timeout,
|
||||||
|
on_timeout=suspend_session
|
||||||
|
)
|
||||||
|
```
|
||||||
|
|
||||||
|
## State of the Art
|
||||||
|
|
||||||
|
| Old Approach | Current Approach | When Changed | Impact |
|
||||||
|
|--------------|------------------|--------------|--------|
|
||||||
|
| Manual timestamp polling | asyncio.Event + asyncio.sleep() | asyncio maturity (2020+) | Cleaner cancellation, no polling overhead |
|
||||||
|
| SIGKILL only | SIGTERM + timeout + SIGKILL fallback | Best practice evolution (2018+) | Prevents zombie processes, allows cleanup |
|
||||||
|
| Global timeout thread | Per-object asyncio tasks | Modern asyncio patterns (2022+) | Per-session configuration, native async integration |
|
||||||
|
| Manual state files | Claude Code --continue with .claude/ | Claude Code 2.0+ (2024) | Built-in, tested, handles edge cases |
|
||||||
|
| SIGSTOP/SIGCONT | SIGTERM + restart | Resource efficiency awareness (ongoing) | Releases memory during idle, safer for long periods |
|
||||||
|
|
||||||
|
**Deprecated/outdated:**
|
||||||
|
- **Thread-based timers for async code:** Mixing threading with asyncio adds complexity, use asyncio.create_task
|
||||||
|
- **Blocking time.sleep() in async context:** Use asyncio.sleep() instead
|
||||||
|
- **Not reaping terminated subprocesses:** Always call process.wait() to prevent zombies
|
||||||
|
|
||||||
|
## Open Questions
|
||||||
|
|
||||||
|
Things that couldn't be fully resolved:
|
||||||
|
|
||||||
|
1. **Optimal default idle timeout**
|
||||||
|
- What we know: Common ranges are 5-15 minutes for chat bots, longer for task automation
|
||||||
|
- What's unclear: What's the sweet spot for balancing memory usage vs restart friction?
|
||||||
|
- Recommendation: Start with 10 minutes default. Allow per-session override via /timeout. Monitor actual usage patterns and adjust.
|
||||||
|
|
||||||
|
2. **SIGSTOP/SIGCONT vs SIGTERM tradeoff**
|
||||||
|
- What we know: SIGSTOP keeps memory but saves restart cost (~1s), SIGTERM releases memory but costs CPU
|
||||||
|
- What's unclear: At what idle duration does memory savings outweigh restart cost?
|
||||||
|
- Recommendation: Use SIGTERM approach. Memory release is more important than 1s restart cost. Claude processes can grow large (100-500MB) with long conversations. SIGSTOP is only beneficial for <5min idle periods.
|
||||||
|
|
||||||
|
3. **Resume status message verbosity**
|
||||||
|
- What we know: User decision says "brief status message on resume"
|
||||||
|
- What's unclear: Should it show idle duration? Session name? Model?
|
||||||
|
- Recommendation: Show idle duration if >1 minute ("Resuming session (idle for 15 min)..."). Don't show session name (user knows what session they messaged). Keep brief.
|
||||||
|
|
||||||
|
4. **Multi-session concurrent subprocess limit**
|
||||||
|
- What we know: Multiple sessions can have live subprocesses simultaneously
|
||||||
|
- What's unclear: Should there be a cap? What if user has 20 sessions all active?
|
||||||
|
- Recommendation: No hard cap initially. Each subprocess uses ~100-500MB. On an 8GB system, 10-20 concurrent sessions is reasonable. Add warning in /sessions if >10 active. Add global concurrent limit (e.g., 15) in Phase 4 if needed.
|
||||||
|
|
||||||
|
5. **Session switch behavior for previous subprocess**
|
||||||
|
- What we know: User decision says "switching leaves previous subprocess running"
|
||||||
|
- What's unclear: Should switching reset the previous session's idle timer?
|
||||||
|
- Recommendation: Don't reset on switch. Previous session's timer continues from last activity. If it was idle for 8 minutes when you switched away, it will suspend in 2 more minutes. This is intuitive — switching doesn't "touch" the old session.
|
||||||
|
|
||||||
|
## Sources
|
||||||
|
|
||||||
|
### Primary (HIGH confidence)
|
||||||
|
- [Coroutines and Tasks - Python 3.14.3 Documentation](https://docs.python.org/3/library/asyncio-task.html) - Official asyncio timeout and task management
|
||||||
|
- [CLI reference - Claude Code Docs](https://code.claude.com/docs/en/cli-reference) - Official Claude Code --continue flag documentation
|
||||||
|
- [Graceful Shutdowns with asyncio - roguelynn](https://roguelynn.com/words/asyncio-graceful-shutdowns/) - Signal handlers and shutdown orchestration
|
||||||
|
- [python-graceful-shutdown - GitHub](https://github.com/wbenny/python-graceful-shutdown) - Complete example of shutdown patterns
|
||||||
|
- [Stopping and Resuming Processes with SIGSTOP and SIGCONT - TheLinuxCode](https://thelinuxcode.com/stop-process-using-sigstop-signal-linux/) - SIGSTOP/SIGCONT behavior and resource tradeoffs
|
||||||
|
|
||||||
|
### Secondary (MEDIUM confidence)
|
||||||
|
- [Session Management - Claude API Docs](https://platform.claude.com/docs/en/agent-sdk/sessions) - Session persistence patterns
|
||||||
|
- [SIGTERM, SIGKILL & SIGSTOP Signals - Medium](https://medium.com/@4techusage/sigterm-sigkill-sigstop-signals-63cb919431e8) - Signal comparison
|
||||||
|
- [A Complete Guide to Timeouts in Python - Better Stack](https://betterstack.com/community/guides/scaling-python/python-timeouts/) - Timeout mechanisms in Python
|
||||||
|
|
||||||
|
### Tertiary (LOW confidence)
|
||||||
|
- WebSearch results on asyncio subprocess management and idle detection patterns - Multiple sources, cross-referenced
|
||||||
|
|
||||||
|
## Metadata
|
||||||
|
|
||||||
|
**Confidence breakdown:**
|
||||||
|
- Standard stack: HIGH - All stdlib components, Claude Code CLI verified
|
||||||
|
- Architecture: HIGH - Patterns based on official asyncio docs and battle-tested libraries
|
||||||
|
- Pitfalls: MEDIUM-HIGH - Common races and edge cases documented, some based on general async patterns rather than lifecycle-specific sources
|
||||||
|
|
||||||
|
**Research date:** 2026-02-04
|
||||||
|
**Valid until:** 2026-03-04 (30 days - asyncio stdlib is stable, Claude Code --continue is established)
|
||||||
Loading…
Add table
Reference in a new issue