docs(38-02): complete Telegram voice handling plan — OGG download + Whisper STT + Piper TTS reply

2026-04-04 03:19:00 +00:00 · 2026-04-04 03:19:00 +00:00 · 262af05116
commit 262af05116
parent 6435ccb1c4
4 changed files with 127 additions and 14 deletions
--- a/.planning/REQUIREMENTS.md
+++ b/.planning/REQUIREMENTS.md
@ -29,8 +29,8 @@

 - [x] **TGRAM-01**: Single Telegram bot relays text messages bidirectionally between user and agents
 - [x] **TGRAM-02**: Agent replies in Telegram are prefixed with agent identity (e.g. `[PM]`, `[Engineer]`)
- [ ] **TGRAM-03**: Telegram voice messages are transcribed (OGG → Whisper) and forwarded to agent as text
- [ ] **TGRAM-04**: Agent responses can be sent back as Telegram voice notes (TTS → OGG)
+- [x] **TGRAM-03**: Telegram voice messages are transcribed (OGG → Whisper) and forwarded to agent as text
+- [x] **TGRAM-04**: Agent responses can be sent back as Telegram voice notes (TTS → OGG)
 - [x] **TGRAM-05**: Telegram bridge uses long polling (no public HTTPS required)
 - [x] **TGRAM-06**: Telegram bridge is under 500 lines of code

@ -88,8 +88,8 @@
 | WCHAT-06 | Phase 37 | Complete |
 | TGRAM-01 | Phase 38 | Complete |
 | TGRAM-02 | Phase 38 | Complete |
-| TGRAM-03 | Phase 38 | Pending |
-| TGRAM-04 | Phase 38 | Pending |
+| TGRAM-03 | Phase 38 | Complete |
+| TGRAM-04 | Phase 38 | Complete |
 | TGRAM-05 | Phase 38 | Complete |
 | TGRAM-06 | Phase 38 | Complete |
 | ONBRD-01 | Phase 39 | Pending |
--- a/.planning/ROADMAP.md
+++ b/.planning/ROADMAP.md
@ -108,7 +108,7 @@ Plans:

 - [x] **Phase 36: Voice Pipeline Foundation** — Transport-agnostic VoicePipelineService (transcribe, synthesize, formatForVoice), voice.ts route, ffmpeg audio transcoding, voiceMode flag, dual output pattern (completed 2026-04-04)
 - [x] **Phase 37: Web Chat Voice UI** — VAD silence detection, waveform visualization, voice mode toggle, inline audio player, auto-play toggle, COOP/COEP headers (completed 2026-04-04)
- [ ] **Phase 38: Telegram Bridge** — grammY long polling relay, text + voice note bidirectional relay, agent identity prefix, BotFather onboarding setup
+- [x] **Phase 38: Telegram Bridge** — grammY long polling relay, text + voice note bidirectional relay, agent identity prefix, BotFather onboarding setup (completed 2026-04-04)
 - [ ] **Phase 39: Voice Polish** — Sentence-buffered TTS streaming, multi-language TTS output, onboarding STT/TTS hardware detection step

 ## Phase Details
@ -223,5 +223,5 @@ All 23 v1.6 requirements are mapped to exactly one phase. No orphans.
 | 35. npx buildthis CLI | v1.5 | 1/1 | Complete | 2026-04-03 |
 | 36. Voice Pipeline Foundation | v1.6 | 2/3 | Complete    | 2026-04-04 |
 | 37. Web Chat Voice UI | v1.6 | 3/4 | Complete    | 2026-04-04 |
-| 38. Telegram Bridge | v1.6 | 2/3 | In Progress|  |
+| 38. Telegram Bridge | v1.6 | 3/3 | Complete   | 2026-04-04 |
 | 39. Voice Polish | v1.6 | 0/TBD | Not started | - |
--- a/.planning/STATE.md
+++ b/.planning/STATE.md
@ -2,15 +2,15 @@
 gsd_state_version: 1.0
 milestone: v1.6
 milestone_name: Voice Pipeline + Minimal Message Bridge
-status: executing
-stopped_at: Completed 38-01-PLAN.md — Telegram Bridge Core (telegramService + telegramRoutes)
-last_updated: "2026-04-04T03:15:16.993Z"
+status: verifying
+stopped_at: Completed 38-02-PLAN.md — Telegram voice handling + TTS reply
+last_updated: "2026-04-04T03:18:52.493Z"
 last_activity: 2026-04-04
 progress:
  total_phases: 4
-  completed_phases: 2
+  completed_phases: 3
  total_plans: 10
-  completed_plans: 9
+  completed_plans: 10
  percent: 0
 ---

@ -27,7 +27,7 @@ See: .planning/PROJECT.md (updated 2026-04-03)

 Phase: 38 (telegram-bridge) — EXECUTING
 Plan: 3 of 3
-Status: Ready to execute
+Status: Phase complete — ready for verification
 Last activity: 2026-04-04

 Progress: [░░░░░░░░░░] 0%
@ -68,6 +68,7 @@ Key constraints for v1.6:
 - [Phase 38-telegram-bridge]: TelegramStep uses onNext/onBack props; Continue disabled until token validated; Skip always available
 - [Phase 38-telegram-bridge]: telegramRoutes accepts service instance as second param — enables restart from token route
 - [Phase 38-telegram-bridge]: Long-polling: deleteWebhook first, then bot.start() fire-and-forget with catch logger
+- [Phase 38-telegram-bridge]: processVoiceMessage() extracted as top-level async function — keeps bot handler clean; botToken stored as module-level mutable ref for CDN URL construction

 ### Pending Todos

@ -81,6 +82,6 @@ None yet.

 ## Session Continuity

-Last session: 2026-04-04T03:15:16.990Z
-Stopped at: Completed 38-01-PLAN.md — Telegram Bridge Core (telegramService + telegramRoutes)
+Last session: 2026-04-04T03:18:52.490Z
+Stopped at: Completed 38-02-PLAN.md — Telegram voice handling + TTS reply
 Resume file: None
--- a/.planning/phases/38-telegram-bridge/38-02-SUMMARY.md
+++ b/.planning/phases/38-telegram-bridge/38-02-SUMMARY.md
@ -0,0 +1,112 @@
+---
+phase: 38-telegram-bridge
+plan: "02"
+subsystem: api
+tags: [telegram, grammy, voice, whisper, piper, ogg, ffmpeg, tts, stt]
+
+requires:
+  - phase: 38-01
+    provides: telegramService factory with text relay, session map, bot lifecycle
+  - phase: 36-voice-pipeline
+    provides: voicePipelineService (transcribe, synthesize, formatForVoice, transcodeToWav16k)
+
+provides:
+  - Voice message handler: OGG download via ctx.getFile(), transcription via voicePipelineService
+  - Shared relayToAgent() function used by both text and voice message handlers
+  - transcodeToOggOpus() helper: raw PCM s16le (Piper 22050Hz) -> OGG Opus 48000Hz for Telegram
+  - TTS voice reply: agent responses synthesized to OGG voice note via ctx.replyWithVoice()
+  - Graceful TTS degradation: text reply always sent first; voice is a bonus that silently fails
+
+affects:
+  - 38-03 (onboarding step unchanged — already uses POST /telegram/token)
+
+tech-stack:
+  added: []
+  patterns:
+    - "Immediate 'Transcribing...' reply prevents Telegram update resend (Pitfall 1)"
+    - "Fire-and-forget async: processVoiceMessage() not awaited inside handler body"
+    - "Shared relayToAgent(ctx, chatId, userText, db, voiceMode) eliminates duplicate relay logic"
+    - "TTS reply wrapped in try/catch — voice failure never blocks text response"
+    - "transcodeToOggOpus uses same ffmpeg spawn pattern as voice-pipeline.ts"
+
+key-files:
+  created: []
+  modified:
+    - server/src/services/telegram.ts
+
+key-decisions:
+  - "Both tasks implemented together in one atomic file write — Task 1 (voice handler + relay refactor) and Task 2 (TTS reply) both modify telegram.ts; committing as one coherent change"
+  - "processVoiceMessage() extracted as top-level async function — keeps bot handler clean and makes error handling explicit"
+  - "voiceMode flag passed to relayToAgent() rather than checking ctx type — simpler and avoids grammy type gymnastics"
+  - "botToken stored as module-level mutable ref (botToken = token) in start() — processVoiceMessage needs token for CDN URL construction"
+  - "Piper hardcoded to 22050Hz in transcodeToOggOpus with comment — matches en_US-lessac-medium model spec"
+
+metrics:
+  duration: 10min
+  completed: 2026-04-04
+  tasks_completed: 2
+  tasks_total: 2
+  files_modified: 1
+---
+
+# Phase 38 Plan 02: Telegram Voice Handling Summary
+
+**OGG download + Whisper transcription + Piper TTS reply wired into existing telegramService, with shared relayToAgent() function and graceful voice degradation**
+
+## Performance
+
+- **Duration:** ~10 min
+- **Completed:** 2026-04-04
+- **Tasks:** 2 of 2
+- **Files modified:** 1
+
+## Accomplishments
+
+- Refactored text relay into shared `relayToAgent(ctx, chatId, userText, db, voiceMode)` — eliminates duplicate logic between text and voice handlers
+- Added `bot.on("message:voice", ...)` handler that sends immediate "Transcribing..." reply (prevents Telegram resend) and processes async
+- `processVoiceMessage()`: downloads OGG from Telegram CDN via `ctx.getFile()` + fetch, transcribes via `voicePipelineService().transcribe(oggBuffer, "ogg")`, sends "Heard: ..." confirmation, relays to agent
+- `transcodeToOggOpus()`: uses ffmpeg-static spawn pattern to convert raw PCM s16le (Piper 22050Hz) to OGG Opus 48000Hz for Telegram voice notes
+- TTS voice reply: after text reply, calls `voiceSvc.formatForVoice()` + `synthesize()` + `transcodeToOggOpus()` + `ctx.replyWithVoice(InputFile(...))` — wrapped in try/catch so Piper unavailability degrades silently
+
+## Task Commits
+
+1. **Task 1 + Task 2: Voice handler + TTS reply** - `e7205724` (feat) — both tasks in single atomic commit (same file)
+
+## Files Created/Modified
+
+- `server/src/services/telegram.ts` (322 lines, was 187) — voice handler, relayToAgent(), transcodeToOggOpus(), TTS reply
+
+## Decisions Made
+
+- `botToken` stored as module-level mutable ref alongside `bot` — processVoiceMessage() needs token string to construct the Telegram CDN download URL
+- `voiceMode = false` default parameter on `relayToAgent()` — text handler calls without flag, voice handler passes `true`
+- TTS failure is a warning (not an error) — voice reply is bonus feature, text always delivered first
+- `transcodeToOggOpus` hardcodes 22050Hz input rate with explanatory comment — matches Piper `en_US-lessac-medium` output spec
+
+## Deviations from Plan
+
+### Minor adjustments
+
+**1. [Rule 1 - Structural] Tasks 1 and 2 implemented and committed together**
+- **Found during:** Task 1 planning
+- **Issue:** Both tasks modify the same file; splitting into two commits would require an intermediate state where voice handler exists without TTS, which is not a meaningful checkpoint
+- **Fix:** Single commit covers both tasks; commit message documents both additions
+- **Files modified:** server/src/services/telegram.ts
+- **Commit:** e7205724
+
+## Known Stubs
+
+None — voice relay is fully wired:
+- OGG download: real Telegram CDN fetch via `ctx.getFile()`
+- Transcription: real `voicePipelineService().transcribe()` (Whisper)
+- TTS synthesis: real `voicePipelineService().synthesize()` (Piper)
+- Voice reply: real `ctx.replyWithVoice(InputFile(oggBuffer))`
+- Text relay: real `puterProxyService().chatStream()` (same as Plan 01)
+
+The only runtime dependency is Whisper/Piper availability — both degrade gracefully with informative error messages.
+
+## Self-Check: PASSED
+
+- File exists: server/src/services/telegram.ts (322 lines) ✓
+- Commit e7205724 exists ✓
+- All acceptance criteria passing ✓