nexus/.planning/phases/38-telegram-bridge/38-02-SUMMARY.md

5.4 KiB

phase plan subsystem tags requires provides affects tech-stack key-files key-decisions metrics
38-telegram-bridge 02 api
telegram
grammy
voice
whisper
piper
ogg
ffmpeg
tts
stt
phase provides
38-01 telegramService factory with text relay, session map, bot lifecycle
phase provides
36-voice-pipeline voicePipelineService (transcribe, synthesize, formatForVoice, transcodeToWav16k)
Voice message handler
OGG download via ctx.getFile(), transcription via voicePipelineService
Shared relayToAgent() function used by both text and voice message handlers
transcodeToOggOpus() helper
raw PCM s16le (Piper 22050Hz) -> OGG Opus 48000Hz for Telegram
TTS voice reply
agent responses synthesized to OGG voice note via ctx.replyWithVoice()
Graceful TTS degradation
text reply always sent first; voice is a bonus that silently fails
38-03 (onboarding step unchanged — already uses POST /telegram/token)
added patterns
Immediate 'Transcribing...' reply prevents Telegram update resend (Pitfall 1)
Fire-and-forget async: processVoiceMessage() not awaited inside handler body
Shared relayToAgent(ctx, chatId, userText, db, voiceMode) eliminates duplicate relay logic
TTS reply wrapped in try/catch — voice failure never blocks text response
transcodeToOggOpus uses same ffmpeg spawn pattern as voice-pipeline.ts
created modified
server/src/services/telegram.ts
Both tasks implemented together in one atomic file write — Task 1 (voice handler + relay refactor) and Task 2 (TTS reply) both modify telegram.ts; committing as one coherent change
processVoiceMessage() extracted as top-level async function — keeps bot handler clean and makes error handling explicit
voiceMode flag passed to relayToAgent() rather than checking ctx type — simpler and avoids grammy type gymnastics
botToken stored as module-level mutable ref (botToken = token) in start() — processVoiceMessage needs token for CDN URL construction
Piper hardcoded to 22050Hz in transcodeToOggOpus with comment — matches en_US-lessac-medium model spec
duration completed tasks_completed tasks_total files_modified
10min 2026-04-04 2 2 1

Phase 38 Plan 02: Telegram Voice Handling Summary

OGG download + Whisper transcription + Piper TTS reply wired into existing telegramService, with shared relayToAgent() function and graceful voice degradation

Performance

  • Duration: ~10 min
  • Completed: 2026-04-04
  • Tasks: 2 of 2
  • Files modified: 1

Accomplishments

  • Refactored text relay into shared relayToAgent(ctx, chatId, userText, db, voiceMode) — eliminates duplicate logic between text and voice handlers
  • Added bot.on("message:voice", ...) handler that sends immediate "Transcribing..." reply (prevents Telegram resend) and processes async
  • processVoiceMessage(): downloads OGG from Telegram CDN via ctx.getFile() + fetch, transcribes via voicePipelineService().transcribe(oggBuffer, "ogg"), sends "Heard: ..." confirmation, relays to agent
  • transcodeToOggOpus(): uses ffmpeg-static spawn pattern to convert raw PCM s16le (Piper 22050Hz) to OGG Opus 48000Hz for Telegram voice notes
  • TTS voice reply: after text reply, calls voiceSvc.formatForVoice() + synthesize() + transcodeToOggOpus() + ctx.replyWithVoice(InputFile(...)) — wrapped in try/catch so Piper unavailability degrades silently

Task Commits

  1. Task 1 + Task 2: Voice handler + TTS reply - e7205724 (feat) — both tasks in single atomic commit (same file)

Files Created/Modified

  • server/src/services/telegram.ts (322 lines, was 187) — voice handler, relayToAgent(), transcodeToOggOpus(), TTS reply

Decisions Made

  • botToken stored as module-level mutable ref alongside bot — processVoiceMessage() needs token string to construct the Telegram CDN download URL
  • voiceMode = false default parameter on relayToAgent() — text handler calls without flag, voice handler passes true
  • TTS failure is a warning (not an error) — voice reply is bonus feature, text always delivered first
  • transcodeToOggOpus hardcodes 22050Hz input rate with explanatory comment — matches Piper en_US-lessac-medium output spec

Deviations from Plan

Minor adjustments

1. [Rule 1 - Structural] Tasks 1 and 2 implemented and committed together

  • Found during: Task 1 planning
  • Issue: Both tasks modify the same file; splitting into two commits would require an intermediate state where voice handler exists without TTS, which is not a meaningful checkpoint
  • Fix: Single commit covers both tasks; commit message documents both additions
  • Files modified: server/src/services/telegram.ts
  • Commit: e7205724

Known Stubs

None — voice relay is fully wired:

  • OGG download: real Telegram CDN fetch via ctx.getFile()
  • Transcription: real voicePipelineService().transcribe() (Whisper)
  • TTS synthesis: real voicePipelineService().synthesize() (Piper)
  • Voice reply: real ctx.replyWithVoice(InputFile(oggBuffer))
  • Text relay: real puterProxyService().chatStream() (same as Plan 01)

The only runtime dependency is Whisper/Piper availability — both degrade gracefully with informative error messages.

Self-Check: PASSED

  • File exists: server/src/services/telegram.ts (322 lines) ✓
  • Commit e7205724 exists ✓
  • All acceptance criteria passing ✓