Phase 03: Lifecycle Management - 2 plans in 2 waves - Plan 01 (wave 1): Idle timer module + session metadata + PID tracking - Plan 02 (wave 2): Suspend/resume wiring, /timeout, /sessions, startup cleanup, graceful shutdown - Ready for execution Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
16 KiB
| phase | plan | type | wave | depends_on | files_modified | autonomous | must_haves | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 03-lifecycle-management | 02 | execute | 2 |
|
|
true |
|
Purpose: This is the core integration plan that makes sessions automatically suspend after idle timeout, resume transparently on user message, and provides /timeout + /sessions commands. Also adds startup orphan cleanup and graceful shutdown signal handling.
Output: Updated bot.py with full lifecycle management
<execution_context> @/home/mikkel/.claude/get-shit-done/workflows/execute-plan.md @/home/mikkel/.claude/get-shit-done/templates/summary.md </execution_context>
@.planning/PROJECT.md @.planning/ROADMAP.md @.planning/STATE.md @.planning/phases/03-lifecycle-management/03-CONTEXT.md @.planning/phases/03-lifecycle-management/03-RESEARCH.md @.planning/phases/03-lifecycle-management/03-01-SUMMARY.md @telegram/bot.py @telegram/idle_timer.py @telegram/session_manager.py @telegram/claude_subprocess.py Task 1: Suspend/resume wiring with race locks, startup cleanup, and graceful shutdown telegram/bot.py This is the core lifecycle wiring in bot.py. Make these changes:New imports and globals:
import signal, os(for shutdown handlers and PID checks)from idle_timer import SessionIdleTimer- Add global dict:
idle_timers: dict[str, SessionIdleTimer] = {} - Add global dict:
subprocess_locks: dict[str, asyncio.Lock] = {}(one lock per session, prevents races between timeout-fire and user-message)
Helper: get_subprocess_lock(session_name)
- Returns existing lock or creates new one for session. Pattern:
subprocess_locks.setdefault(session_name, asyncio.Lock())
Suspend function: async def suspend_session(session_name: str)
- This is the idle timer's on_timeout callback.
- Acquire the session's subprocess lock.
- Check if subprocess exists and is_alive. If not alive, just update metadata and return.
- Check
subprocesses[session_name].is_busy-- if busy, DON'T suspend (Claude is mid-processing). Instead, reset the idle timer to try again later. Log this. Return. - Store the subprocess PID for logging.
- Call
await subprocesses[session_name].terminate()(existing method with SIGTERM + timeout + SIGKILL). - Remove from
subprocessesdict. - Flush and remove batcher if exists:
if session_name in batchers: await batchers[session_name].flush_immediately(); del batchers[session_name] - Update session metadata:
session_manager.update_session(session_name, status='suspended', pid=None) - Cancel and remove idle timer:
if session_name in idle_timers: idle_timers[session_name].cancel(); del idle_timers[session_name] - Log:
logger.info(f"Session '{session_name}' suspended after idle timeout") - DECISION (from CONTEXT.md): Silent suspension -- do NOT send any Telegram message.
Modify make_callbacks() -- add on_complete idle timer integration:
- The
on_completecallback already exists. Wrap it: after existing logic (stop typing), add idle timer reset:# Reset idle timer (only start counting AFTER Claude finishes) if session_name in idle_timers: idle_timers[session_name].reset() - This ensures timer only starts when Claude is truly idle, never during processing.
Modify handle_message() -- add resume logic:
- After checking for active session, BEFORE the subprocess check, add:
Wrap the subprocess get-or-create and message send in this lock.# Acquire lock to prevent race with suspend_session lock = get_subprocess_lock(active_session) async with lock: - Inside the lock, when subprocess is not alive:
- Check if session has
.claude/dir (has history). If yes, this is a resume. - If resuming: send status message to user:
"Resuming session..."(include idle duration if >1 min from metadata last_active). Example:"Resuming session (idle for 15 min)..." - Spawn subprocess normally (the existing ClaudeSubprocess constructor + start() already handles --continue when .claude/ exists).
- Store PID in metadata:
session_manager.update_session(active_session, status='active', last_active=now_iso, pid=subprocesses[active_session].pid)
- Check if session has
- After sending message (outside lock), create/reset idle timer for the session:
timeout_secs = session_manager.get_session_timeout(active_session) if active_session not in idle_timers: idle_timers[active_session] = SessionIdleTimer(active_session, timeout_secs, on_timeout=suspend_session) # Don't reset here -- timer resets in on_complete when Claude finishes - IMPORTANT: Also reset the idle timer when user sends a message (user activity should reset timer too, per CONTEXT.md):
Put this BEFORE sending to subprocess (so timer is reset even if message queues).if active_session in idle_timers: idle_timers[active_session].reset()
Similarly update handle_photo() and handle_document():
- Add the same lock acquisition, resume detection, and idle timer reset as handle_message().
- Keep the existing photo/document save and notification logic.
Modify new_session() -- initialize idle timer after creation:
- After subprocess creation, add:
timeout_secs = session_manager.get_session_timeout(name) idle_timers[name] = SessionIdleTimer(name, timeout_secs, on_timeout=suspend_session) - Store PID in metadata: after subprocess is created/started,
session_manager.update_session(name, pid=subprocesses[name].pid)(only after start()). Note: The existing code creates ClaudeSubprocess but does NOT call start() -- start happens lazily on first send_message. So PID tracking happens in handle_message when subprocess auto-starts.
Modify switch_session_cmd():
- Per CONTEXT.md LOCKED decision: switching sessions leaves previous subprocess running (it suspends on its own timer). Do NOT cancel old session's idle timer.
- When auto-spawning subprocess for new session, set up idle timer as above.
Modify archive_session_cmd():
- Cancel idle timer if exists:
if name in idle_timers: idle_timers[name].cancel(); del idle_timers[name] - Remove subprocess lock if exists:
subprocess_locks.pop(name, None)
Modify model_cmd():
- After terminating subprocess for model change, cancel idle timer:
if active_session in idle_timers: idle_timers[active_session].cancel(); del idle_timers[active_session]
Startup cleanup function: async def cleanup_orphaned_subprocesses()
- Called once at bot startup (before polling starts).
- Iterate all sessions via
session_manager.list_sessions(). - For each session with a non-None
pid:- Check if PID process exists:
os.kill(pid, 0)wrapped in try/except ProcessLookupError. - If process exists, verify it's a claude process: read
/proc/{pid}/cmdline, check if "claude" is in it. If not claude, skip killing. - If it IS a claude process:
os.kill(pid, signal.SIGTERM), sleep 2s, then tryos.kill(pid, signal.SIGKILL)(catch ProcessLookupError if already dead). - Update metadata:
session_manager.update_session(session['name'], pid=None, status='suspended')
- Check if PID process exists:
- For sessions with status != 'suspended' and no pid, also set status to 'suspended'.
- Log summary: "Cleaned up N orphaned subprocesses"
Graceful shutdown:
- python-telegram-bot's
Application.run_polling()handles signal installation internally. Instead of overriding signal handlers (which conflicts with the library), use thepost_shutdowncallback:async def post_shutdown(application): """Clean up subprocesses and timers on bot shutdown.""" logger.info("Bot shutting down, cleaning up...") # Cancel all idle timers for name, timer in idle_timers.items(): timer.cancel() # Terminate all subprocesses for name, proc in list(subprocesses.items()): if proc.is_alive: logger.info(f"Terminating subprocess for '{name}'") await proc.terminate() logger.info("Cleanup complete") - Register in main():
app.post_shutdown = post_shutdown - Also add a
post_initcallback for startup cleanup:
Register:async def post_init(application): """Run startup cleanup.""" await cleanup_orphaned_subprocesses()app = Application.builder().token(TOKEN).post_init(post_init).build()
Update help text:
- Add
/timeout <minutes>and/sessionsto the help_command text under "Claude Sessions" section.python3 -c "import bot"from telegram/ directory should not error (syntax check). Look for: idle_timers dict, subprocess_locks dict, suspend_session function, cleanup_orphaned_subprocesses function, post_shutdown callback. - suspend_session() terminates subprocess on idle timeout, updates metadata to suspended, silent (no Telegram notification)
- handle_message() detects suspended session, sends "Resuming session..." status, spawns with --continue
- Race lock prevents concurrent suspend + resume on same session
- Startup cleanup kills orphaned PIDs verified via /proc/cmdline
- Graceful shutdown terminates all subprocesses and cancels all timers
- handle_photo/handle_document also support resume from suspended state
/timeout command: async def timeout_cmd(update, context)
- Auth check (same pattern as other commands).
- If no active session: reply "No active session. Use /new to start one."
- If no args: show current timeout.
timeout_secs = session_manager.get_session_timeout(active_session) minutes = timeout_secs // 60 await update.message.reply_text(f"Idle timeout: {minutes} minutes\n\nUsage: /timeout <minutes> (1-120)") - If args: parse first arg as int.
- Validate range 1-120. If out of range:
"Timeout must be between 1 and 120 minutes" - If not a valid int:
"Invalid number. Usage: /timeout <minutes>" - Convert to seconds:
timeout_seconds = minutes * 60 - Update session metadata:
session_manager.update_session(active_session, idle_timeout=timeout_seconds) - If idle timer exists for this session, update its timeout_seconds attribute and reset:
idle_timers[active_session].timeout_seconds = timeout_seconds; idle_timers[active_session].reset() - Reply:
f"Idle timeout set to {minutes} minutes for session '{active_session}'."
- Validate range 1-120. If out of range:
/sessions command: async def sessions_cmd(update, context)
- Auth check.
- Get all sessions:
session_manager.list_sessions()(already sorted by last_active desc). - If empty: reply "No sessions. Use /new to create one."
- Build formatted list. For each session:
- Status indicator: active subprocess running -> "LIVE", status == "active" (in metadata) -> "ACTIVE", status == "suspended" -> "IDLE", else -> status
- Actually, check real subprocess state:
name in subprocesses and subprocesses[name].is_alive-> "LIVE" - Format last_active as relative time (e.g., "2m ago", "1h ago", "3d ago") using a small helper function:
def format_relative_time(iso_str): dt = datetime.fromisoformat(iso_str) delta = datetime.now(timezone.utc) - dt secs = delta.total_seconds() if secs < 60: return "just now" if secs < 3600: return f"{int(secs/60)}m ago" if secs < 86400: return f"{int(secs/3600)}h ago" return f"{int(secs/86400)}d ago" - Mark current active session with arrow prefix.
- Format line:
"{marker}{status_emoji} {name} ({persona}) - {relative_time}" - Status emojis: LIVE -> green circle, IDLE/suspended -> white circle
- Join lines, reply with parse_mode='Markdown'. Use backticks around session names for monospace.
Register handlers in main():
app.add_handler(CommandHandler("timeout", timeout_cmd))-- after the model handlerapp.add_handler(CommandHandler("sessions", sessions_cmd))-- after the session handler
Update help text in help_command():
- Under "Claude Sessions" section, add:
/sessions- List all sessions with status/timeout <minutes>- Set idle timeout (1-120)python3 -c "import bot; print('OK')"succeeds. Grep for "timeout_cmd" and "sessions_cmd" in bot.py to confirm both exist. Grep for "CommandHandler.*timeout" and "CommandHandler.*sessions" to confirm registration.
- /timeout shows current timeout when called without args, sets timeout (1-120 min range) when called with arg
- /sessions lists all sessions sorted by last active, showing live/idle status, persona, relative time
- Both commands registered as handlers in main()
- Help text updated with new commands
<success_criteria>
- Session auto-suspends after idle timeout (subprocess terminated, metadata status=suspended, no Telegram notification)
- Message to suspended session shows "Resuming session..." then Claude responds with full history
- If resume fails, error message sent (no auto-fresh-start)
- asyncio.Lock prevents race between timeout-fire and incoming message
- Bot startup kills orphaned subprocess PIDs (verified via /proc/cmdline)
- Bot shutdown terminates all subprocesses gracefully
- /timeout sets per-session idle timeout (1-120 range), shows current value without args
- /sessions lists all sessions with LIVE/IDLE status, persona, and relative last-active time
- Help text includes new commands
- Bot service restarts cleanly </success_criteria>