--- phase: 39-voice-polish plan: "01" subsystem: voice tags: [tts, streaming, sse, multi-lang, sentence-buffering] dependency_graph: requires: [36-voice-pipeline, 37-ui-voice] provides: [sentence-streaming-sse, multi-lang-synthesis, streaming-playback] affects: [ChatVoicePlayer, voice-pipeline, voice-routes] tech_stack: added: [] patterns: [AsyncGenerator, SSE/EventSource, ReadableStream, base64-audio] key_files: created: - server/src/__tests__/39-sentence-streaming.test.ts modified: - server/src/services/voice-pipeline.ts - server/src/routes/voice.ts - ui/src/components/ChatVoicePlayer.tsx key_decisions: - "splitSentences protects only title abbreviations (Dr., Mr., etc.) - acronyms like D.C. can still end sentences" - "synthesizeSentenceStream uses AsyncGenerator for lazy per-sentence audio production" - "ChatVoicePlayer uses fetch POST + ReadableStream to parse SSE manually (EventSource only supports GET)" - "Object URL cleanup on unmount and on new text arrival prevents blob memory leaks" metrics: duration: "~5 minutes" completed_date: "2026-04-04" tasks_completed: 2 files_modified: 4 --- # Phase 39 Plan 01: Sentence-Buffered TTS Streaming + Multi-Language Synthesis Summary Sentence-buffered SSE TTS streaming with `synthesizeSentenceStream` AsyncGenerator + `synthesizeMultiLang` Promise.all parallel synthesis + ChatVoicePlayer progressive playback with sentence queue and progress indicator. ## Tasks Completed | Task | Name | Commit | Files | |------|------|--------|-------| | 1 (TDD RED) | Failing tests for sentence streaming | 2efe6f30 | server/src/__tests__/39-sentence-streaming.test.ts | | 1 (TDD GREEN) | Sentence streaming + multi-lang in pipeline + routes | 5c888c1a | server/src/services/voice-pipeline.ts, server/src/routes/voice.ts | | 2 | ChatVoicePlayer sentence-buffered streaming playback | c4b05399 | ui/src/components/ChatVoicePlayer.tsx | ## What Was Built ### voice-pipeline.ts additions - `splitSentences(text)` — exported function, protects title abbreviations (Dr., Mr., Mrs., etc.) from false sentence splits using placeholder technique; acronyms like D.C. at sentence end still trigger splits - `synthesizeSentenceStream(text, voiceId?)` — `AsyncGenerator<{ index, total, audio }>` that calls piper per sentence and yields audio buffers immediately as each completes - `synthesizeMultiLang(text, voiceIds[])` — runs `synthesize()` for each voiceId in parallel via `Promise.all`, returns `Map` - Internal `synthesizeSentence()` helper extracted to avoid code duplication between the three synthesis methods - Existing `synthesize()` updated to use `splitSentences()` internally (backward compatible) ### voice.ts additions - `POST /api/synthesize/stream` — SSE endpoint with `Content-Type: text/event-stream`, iterates `synthesizeSentenceStream`, writes `data: { index, total, audio: base64 }\n\n` per sentence, finishes with `data: { done: true }\n\n` - `POST /api/synthesize/multi-lang` — validates voiceIds array (1-5 entries), calls `synthesizeMultiLang`, returns `{ results: [{ voiceId, audio: base64 }] }` - Existing `POST /api/synthesize` unchanged ### ChatVoicePlayer.tsx - Added `streaming` prop (default `true`) - When streaming=true: uses `fetch POST + response.body.getReader()` to parse SSE lines as they arrive (not EventSource — POST required) - First chunk received → decode base64, create Blob URL, set audio src, call `.play()` immediately - Subsequent chunks → pushed to `audioQueue` ref - `onEnded` handler pops next URL from queue and plays it - Progress indicator: "Sentence N of M" text + dot progress bar (filled/unfilled per sentence) - Fallback: stream error or `streaming=false` → falls back to existing full-fetch via `POST /api/synthesize` - All Blob URLs cleaned up on unmount or new `text` prop ## Test Results ``` Tests 8 passed (8) - splitSentences: splits basic sentences ✓ - splitSentences: abbreviation-aware (Dr., D.C.) ✓ - splitSentences: single sentence ✓ - splitSentences: filters empty strings ✓ - synthesizeSentenceStream: yields correct chunk count + metadata ✓ - synthesizeSentenceStream: single sentence ✓ - synthesizeMultiLang: returns Map with all voices ✓ - synthesizeMultiLang: calls piper in parallel ✓ ``` ## Deviations from Plan ### Auto-fixed Issues None. ### Design Notes **abbreviation handling:** The plan mentioned "Dr. Smith went to D.C. He liked it." → two sentences. The key insight was that title abbreviations (Dr., Mr., etc.) need protection but two-letter acronyms like D.C. at sentence position can still be split. The implementation uses a space-as-placeholder technique: title abbreviations have their trailing space replaced with `\x00`, then split on `(?<=[.!?])\s+`, then restore. ## Known Stubs None — all endpoints are wired to real piper TTS synthesis. ## Self-Check: PASSED - server/src/services/voice-pipeline.ts exists with splitSentences + synthesizeSentenceStream + synthesizeMultiLang ✓ - server/src/routes/voice.ts has synthesize/stream + synthesize/multi-lang + text/event-stream ✓ - server/src/__tests__/39-sentence-streaming.test.ts exists and all 8 tests pass ✓ - ui/src/components/ChatVoicePlayer.tsx has synthesize/stream + ReadableStream + audioQueue + sentence progress ✓ - Commits: 2efe6f30, 5c888c1a, c4b05399 ✓