Phase 03: Lifecycle Management - 2 plans in 2 waves - Plan 01 (wave 1): Idle timer module + session metadata + PID tracking - Plan 02 (wave 2): Suspend/resume wiring, /timeout, /sessions, startup cleanup, graceful shutdown - Ready for execution Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
311 lines
16 KiB
Markdown
311 lines
16 KiB
Markdown
---
|
|
phase: 03-lifecycle-management
|
|
plan: 02
|
|
type: execute
|
|
wave: 2
|
|
depends_on: ["03-01"]
|
|
files_modified:
|
|
- telegram/bot.py
|
|
autonomous: true
|
|
|
|
must_haves:
|
|
truths:
|
|
- "Session suspends automatically after idle timeout (subprocess terminated, status set to suspended)"
|
|
- "User message to suspended session resumes it with --continue and shows 'Resuming session...' status"
|
|
- "Resume failure sends error to user and does not auto-create fresh session"
|
|
- "Race between timeout-fire and user-message is prevented by asyncio.Lock"
|
|
- "Bot startup kills orphaned subprocess PIDs and sets all sessions to suspended"
|
|
- "Bot shutdown terminates all subprocesses gracefully (SIGTERM + 5s timeout + SIGKILL)"
|
|
- "/timeout <minutes> sets per-session idle timeout (1-120 range)"
|
|
- "/sessions lists all sessions with status indicator, persona, and last active time"
|
|
artifacts:
|
|
- path: "telegram/bot.py"
|
|
provides: "Suspend/resume wiring, idle timers, /timeout, /sessions, startup cleanup, graceful shutdown"
|
|
contains: "idle_timers"
|
|
key_links:
|
|
- from: "telegram/bot.py"
|
|
to: "telegram/idle_timer.py"
|
|
via: "import and instantiate SessionIdleTimer per session"
|
|
pattern: "from idle_timer import SessionIdleTimer"
|
|
- from: "telegram/bot.py on_complete callback"
|
|
to: "idle_timer.reset()"
|
|
via: "Timer starts after Claude finishes processing"
|
|
pattern: "idle_timers.*reset"
|
|
- from: "telegram/bot.py handle_message"
|
|
to: "resume logic"
|
|
via: "Detect suspended session, spawn with --continue, send status"
|
|
pattern: "Resuming session"
|
|
- from: "telegram/bot.py suspend_session"
|
|
to: "ClaudeSubprocess.terminate()"
|
|
via: "Idle timer fires, terminates subprocess"
|
|
pattern: "await.*terminate"
|
|
---
|
|
|
|
<objective>
|
|
Wire suspend/resume lifecycle, idle timers, new commands, and cleanup into the bot.
|
|
|
|
Purpose: This is the core integration plan that makes sessions automatically suspend after idle timeout, resume transparently on user message, and provides /timeout + /sessions commands. Also adds startup orphan cleanup and graceful shutdown signal handling.
|
|
Output: Updated `bot.py` with full lifecycle management
|
|
</objective>
|
|
|
|
<execution_context>
|
|
@/home/mikkel/.claude/get-shit-done/workflows/execute-plan.md
|
|
@/home/mikkel/.claude/get-shit-done/templates/summary.md
|
|
</execution_context>
|
|
|
|
<context>
|
|
@.planning/PROJECT.md
|
|
@.planning/ROADMAP.md
|
|
@.planning/STATE.md
|
|
@.planning/phases/03-lifecycle-management/03-CONTEXT.md
|
|
@.planning/phases/03-lifecycle-management/03-RESEARCH.md
|
|
@.planning/phases/03-lifecycle-management/03-01-SUMMARY.md
|
|
@telegram/bot.py
|
|
@telegram/idle_timer.py
|
|
@telegram/session_manager.py
|
|
@telegram/claude_subprocess.py
|
|
</context>
|
|
|
|
<tasks>
|
|
|
|
<task type="auto">
|
|
<name>Task 1: Suspend/resume wiring with race locks, startup cleanup, and graceful shutdown</name>
|
|
<files>telegram/bot.py</files>
|
|
<action>
|
|
This is the core lifecycle wiring in bot.py. Make these changes:
|
|
|
|
**New imports and globals:**
|
|
- `import signal, os` (for shutdown handlers and PID checks)
|
|
- `from idle_timer import SessionIdleTimer`
|
|
- Add global dict: `idle_timers: dict[str, SessionIdleTimer] = {}`
|
|
- Add global dict: `subprocess_locks: dict[str, asyncio.Lock] = {}` (one lock per session, prevents races between timeout-fire and user-message)
|
|
|
|
**Helper: get_subprocess_lock(session_name)**
|
|
- Returns existing lock or creates new one for session. Pattern: `subprocess_locks.setdefault(session_name, asyncio.Lock())`
|
|
|
|
**Suspend function: `async def suspend_session(session_name: str)`**
|
|
- This is the idle timer's on_timeout callback.
|
|
- Acquire the session's subprocess lock.
|
|
- Check if subprocess exists and is_alive. If not alive, just update metadata and return.
|
|
- Check `subprocesses[session_name].is_busy` -- if busy, DON'T suspend (Claude is mid-processing). Instead, reset the idle timer to try again later. Log this. Return.
|
|
- Store the subprocess PID for logging.
|
|
- Call `await subprocesses[session_name].terminate()` (existing method with SIGTERM + timeout + SIGKILL).
|
|
- Remove from `subprocesses` dict.
|
|
- Flush and remove batcher if exists: `if session_name in batchers: await batchers[session_name].flush_immediately(); del batchers[session_name]`
|
|
- Update session metadata: `session_manager.update_session(session_name, status='suspended', pid=None)`
|
|
- Cancel and remove idle timer: `if session_name in idle_timers: idle_timers[session_name].cancel(); del idle_timers[session_name]`
|
|
- Log: `logger.info(f"Session '{session_name}' suspended after idle timeout")`
|
|
- DECISION (from CONTEXT.md): Silent suspension -- do NOT send any Telegram message.
|
|
|
|
**Modify make_callbacks() -- add on_complete idle timer integration:**
|
|
- The `on_complete` callback already exists. Wrap it: after existing logic (stop typing), add idle timer reset:
|
|
```python
|
|
# Reset idle timer (only start counting AFTER Claude finishes)
|
|
if session_name in idle_timers:
|
|
idle_timers[session_name].reset()
|
|
```
|
|
- This ensures timer only starts when Claude is truly idle, never during processing.
|
|
|
|
**Modify handle_message() -- add resume logic:**
|
|
- After checking for active session, BEFORE the subprocess check, add:
|
|
```python
|
|
# Acquire lock to prevent race with suspend_session
|
|
lock = get_subprocess_lock(active_session)
|
|
async with lock:
|
|
```
|
|
Wrap the subprocess get-or-create and message send in this lock.
|
|
- Inside the lock, when subprocess is not alive:
|
|
1. Check if session has `.claude/` dir (has history). If yes, this is a resume.
|
|
2. If resuming: send status message to user: `"Resuming session..."` (include idle duration if >1 min from metadata last_active). Example: `"Resuming session (idle for 15 min)..."`
|
|
3. Spawn subprocess normally (the existing ClaudeSubprocess constructor + start() already handles --continue when .claude/ exists).
|
|
4. Store PID in metadata: `session_manager.update_session(active_session, status='active', last_active=now_iso, pid=subprocesses[active_session].pid)`
|
|
- After sending message (outside lock), create/reset idle timer for the session:
|
|
```python
|
|
timeout_secs = session_manager.get_session_timeout(active_session)
|
|
if active_session not in idle_timers:
|
|
idle_timers[active_session] = SessionIdleTimer(active_session, timeout_secs, on_timeout=suspend_session)
|
|
# Don't reset here -- timer resets in on_complete when Claude finishes
|
|
```
|
|
- IMPORTANT: Also reset the idle timer when user sends a message (user activity should reset timer too, per CONTEXT.md):
|
|
```python
|
|
if active_session in idle_timers:
|
|
idle_timers[active_session].reset()
|
|
```
|
|
Put this BEFORE sending to subprocess (so timer is reset even if message queues).
|
|
|
|
**Similarly update handle_photo() and handle_document():**
|
|
- Add the same lock acquisition, resume detection, and idle timer reset as handle_message().
|
|
- Keep the existing photo/document save and notification logic.
|
|
|
|
**Modify new_session() -- initialize idle timer after creation:**
|
|
- After subprocess creation, add:
|
|
```python
|
|
timeout_secs = session_manager.get_session_timeout(name)
|
|
idle_timers[name] = SessionIdleTimer(name, timeout_secs, on_timeout=suspend_session)
|
|
```
|
|
- Store PID in metadata: after subprocess is created/started, `session_manager.update_session(name, pid=subprocesses[name].pid)` (only after start()).
|
|
Note: The existing code creates ClaudeSubprocess but does NOT call start() -- start happens lazily on first send_message. So PID tracking happens in handle_message when subprocess auto-starts.
|
|
|
|
**Modify switch_session_cmd():**
|
|
- Per CONTEXT.md LOCKED decision: switching sessions leaves previous subprocess running (it suspends on its own timer). Do NOT cancel old session's idle timer.
|
|
- When auto-spawning subprocess for new session, set up idle timer as above.
|
|
|
|
**Modify archive_session_cmd():**
|
|
- Cancel idle timer if exists: `if name in idle_timers: idle_timers[name].cancel(); del idle_timers[name]`
|
|
- Remove subprocess lock if exists: `subprocess_locks.pop(name, None)`
|
|
|
|
**Modify model_cmd():**
|
|
- After terminating subprocess for model change, cancel idle timer: `if active_session in idle_timers: idle_timers[active_session].cancel(); del idle_timers[active_session]`
|
|
|
|
**Startup cleanup function: `async def cleanup_orphaned_subprocesses()`**
|
|
- Called once at bot startup (before polling starts).
|
|
- Iterate all sessions via `session_manager.list_sessions()`.
|
|
- For each session with a non-None `pid`:
|
|
1. Check if PID process exists: `os.kill(pid, 0)` wrapped in try/except ProcessLookupError.
|
|
2. If process exists, verify it's a claude process: read `/proc/{pid}/cmdline`, check if "claude" is in it. If not claude, skip killing.
|
|
3. If it IS a claude process: `os.kill(pid, signal.SIGTERM)`, sleep 2s, then try `os.kill(pid, signal.SIGKILL)` (catch ProcessLookupError if already dead).
|
|
4. Update metadata: `session_manager.update_session(session['name'], pid=None, status='suspended')`
|
|
- For sessions with status != 'suspended' and no pid, also set status to 'suspended'.
|
|
- Log summary: "Cleaned up N orphaned subprocesses"
|
|
|
|
**Graceful shutdown:**
|
|
- python-telegram-bot's `Application.run_polling()` handles signal installation internally. Instead of overriding signal handlers (which conflicts with the library), use the `post_shutdown` callback:
|
|
```python
|
|
async def post_shutdown(application):
|
|
"""Clean up subprocesses and timers on bot shutdown."""
|
|
logger.info("Bot shutting down, cleaning up...")
|
|
|
|
# Cancel all idle timers
|
|
for name, timer in idle_timers.items():
|
|
timer.cancel()
|
|
|
|
# Terminate all subprocesses
|
|
for name, proc in list(subprocesses.items()):
|
|
if proc.is_alive:
|
|
logger.info(f"Terminating subprocess for '{name}'")
|
|
await proc.terminate()
|
|
|
|
logger.info("Cleanup complete")
|
|
```
|
|
- Register in main(): `app.post_shutdown = post_shutdown`
|
|
- Also add a `post_init` callback for startup cleanup:
|
|
```python
|
|
async def post_init(application):
|
|
"""Run startup cleanup."""
|
|
await cleanup_orphaned_subprocesses()
|
|
```
|
|
Register: `app = Application.builder().token(TOKEN).post_init(post_init).build()`
|
|
|
|
**Update help text:**
|
|
- Add `/timeout <minutes>` and `/sessions` to the help_command text under "Claude Sessions" section.
|
|
</action>
|
|
<verify>
|
|
`python3 -c "import bot"` from telegram/ directory should not error (syntax check). Look for: idle_timers dict, subprocess_locks dict, suspend_session function, cleanup_orphaned_subprocesses function, post_shutdown callback.
|
|
</verify>
|
|
<done>
|
|
- suspend_session() terminates subprocess on idle timeout, updates metadata to suspended, silent (no Telegram notification)
|
|
- handle_message() detects suspended session, sends "Resuming session..." status, spawns with --continue
|
|
- Race lock prevents concurrent suspend + resume on same session
|
|
- Startup cleanup kills orphaned PIDs verified via /proc/cmdline
|
|
- Graceful shutdown terminates all subprocesses and cancels all timers
|
|
- handle_photo/handle_document also support resume from suspended state
|
|
</done>
|
|
</task>
|
|
|
|
<task type="auto">
|
|
<name>Task 2: /timeout and /sessions commands</name>
|
|
<files>telegram/bot.py</files>
|
|
<action>
|
|
Add two new command handlers to bot.py:
|
|
|
|
**/timeout command: `async def timeout_cmd(update, context)`**
|
|
- Auth check (same pattern as other commands).
|
|
- If no active session: reply "No active session. Use /new <name> to start one."
|
|
- If no args: show current timeout.
|
|
```python
|
|
timeout_secs = session_manager.get_session_timeout(active_session)
|
|
minutes = timeout_secs // 60
|
|
await update.message.reply_text(f"Idle timeout: {minutes} minutes\n\nUsage: /timeout <minutes> (1-120)")
|
|
```
|
|
- If args: parse first arg as int.
|
|
- Validate range 1-120. If out of range: `"Timeout must be between 1 and 120 minutes"`
|
|
- If not a valid int: `"Invalid number. Usage: /timeout <minutes>"`
|
|
- Convert to seconds: `timeout_seconds = minutes * 60`
|
|
- Update session metadata: `session_manager.update_session(active_session, idle_timeout=timeout_seconds)`
|
|
- If idle timer exists for this session, update its timeout_seconds attribute and reset: `idle_timers[active_session].timeout_seconds = timeout_seconds; idle_timers[active_session].reset()`
|
|
- Reply: `f"Idle timeout set to {minutes} minutes for session '{active_session}'."`
|
|
|
|
**/sessions command: `async def sessions_cmd(update, context)`**
|
|
- Auth check.
|
|
- Get all sessions: `session_manager.list_sessions()` (already sorted by last_active desc).
|
|
- If empty: reply "No sessions. Use /new <name> to create one."
|
|
- Build formatted list. For each session:
|
|
- Status indicator: active subprocess running -> "LIVE", status == "active" (in metadata) -> "ACTIVE", status == "suspended" -> "IDLE", else -> status
|
|
- Actually, check real subprocess state: `name in subprocesses and subprocesses[name].is_alive` -> "LIVE"
|
|
- Format last_active as relative time (e.g., "2m ago", "1h ago", "3d ago") using a small helper function:
|
|
```python
|
|
def format_relative_time(iso_str):
|
|
dt = datetime.fromisoformat(iso_str)
|
|
delta = datetime.now(timezone.utc) - dt
|
|
secs = delta.total_seconds()
|
|
if secs < 60: return "just now"
|
|
if secs < 3600: return f"{int(secs/60)}m ago"
|
|
if secs < 86400: return f"{int(secs/3600)}h ago"
|
|
return f"{int(secs/86400)}d ago"
|
|
```
|
|
- Mark current active session with arrow prefix.
|
|
- Format line: `"{marker}{status_emoji} {name} ({persona}) - {relative_time}"`
|
|
- Status emojis: LIVE -> green circle, IDLE/suspended -> white circle
|
|
- Join lines, reply with parse_mode='Markdown'. Use backticks around session names for monospace.
|
|
|
|
**Register handlers in main():**
|
|
- `app.add_handler(CommandHandler("timeout", timeout_cmd))` -- after the model handler
|
|
- `app.add_handler(CommandHandler("sessions", sessions_cmd))` -- after the session handler
|
|
|
|
**Update help text in help_command():**
|
|
- Under "Claude Sessions" section, add:
|
|
- `/sessions` - List all sessions with status
|
|
- `/timeout <minutes>` - Set idle timeout (1-120)
|
|
</action>
|
|
<verify>
|
|
`python3 -c "import bot; print('OK')"` succeeds. Grep for "timeout_cmd" and "sessions_cmd" in bot.py to confirm both exist. Grep for "CommandHandler.*timeout" and "CommandHandler.*sessions" to confirm registration.
|
|
</verify>
|
|
<done>
|
|
- /timeout shows current timeout when called without args, sets timeout (1-120 min range) when called with arg
|
|
- /sessions lists all sessions sorted by last active, showing live/idle status, persona, relative time
|
|
- Both commands registered as handlers in main()
|
|
- Help text updated with new commands
|
|
</done>
|
|
</task>
|
|
|
|
</tasks>
|
|
|
|
<verification>
|
|
1. `cd ~/homelab/telegram && python3 -c "import bot; print('All OK')"` -- no import errors
|
|
2. Grep for key integration points:
|
|
- `grep -n "suspend_session" telegram/bot.py` -- suspend function exists
|
|
- `grep -n "idle_timers" telegram/bot.py` -- idle timer dict used
|
|
- `grep -n "subprocess_locks" telegram/bot.py` -- race locks exist
|
|
- `grep -n "cleanup_orphaned" telegram/bot.py` -- startup cleanup exists
|
|
- `grep -n "post_shutdown" telegram/bot.py` -- graceful shutdown exists
|
|
- `grep -n "Resuming session" telegram/bot.py` -- resume status message exists
|
|
- `grep -n "timeout_cmd\|sessions_cmd" telegram/bot.py` -- new commands exist
|
|
3. Restart bot service: `systemctl --user restart telegram-bot.service && sleep 2 && systemctl --user status telegram-bot.service` -- should show active
|
|
</verification>
|
|
|
|
<success_criteria>
|
|
- Session auto-suspends after idle timeout (subprocess terminated, metadata status=suspended, no Telegram notification)
|
|
- Message to suspended session shows "Resuming session..." then Claude responds with full history
|
|
- If resume fails, error message sent (no auto-fresh-start)
|
|
- asyncio.Lock prevents race between timeout-fire and incoming message
|
|
- Bot startup kills orphaned subprocess PIDs (verified via /proc/cmdline)
|
|
- Bot shutdown terminates all subprocesses gracefully
|
|
- /timeout <minutes> sets per-session idle timeout (1-120 range), shows current value without args
|
|
- /sessions lists all sessions with LIVE/IDLE status, persona, and relative last-active time
|
|
- Help text includes new commands
|
|
- Bot service restarts cleanly
|
|
</success_criteria>
|
|
|
|
<output>
|
|
After completion, create `.planning/phases/03-lifecycle-management/03-02-SUMMARY.md`
|
|
</output>
|