| phase |
plan |
subsystem |
tags |
dependency_graph |
tech_stack |
key_files |
key_decisions |
metrics |
| 39-voice-polish |
01 |
voice |
| tts |
| streaming |
| sse |
| multi-lang |
| sentence-buffering |
|
| requires |
provides |
affects |
| 36-voice-pipeline |
| 37-ui-voice |
|
| sentence-streaming-sse |
| multi-lang-synthesis |
| streaming-playback |
|
| ChatVoicePlayer |
| voice-pipeline |
| voice-routes |
|
|
| added |
patterns |
|
|
| AsyncGenerator |
| SSE/EventSource |
| ReadableStream |
| base64-audio |
|
|
| created |
modified |
| server/src/__tests__/39-sentence-streaming.test.ts |
|
| server/src/services/voice-pipeline.ts |
| server/src/routes/voice.ts |
| ui/src/components/ChatVoicePlayer.tsx |
|
|
| splitSentences protects only title abbreviations (Dr., Mr., etc.) - acronyms like D.C. can still end sentences |
| synthesizeSentenceStream uses AsyncGenerator for lazy per-sentence audio production |
| ChatVoicePlayer uses fetch POST + ReadableStream to parse SSE manually (EventSource only supports GET) |
| Object URL cleanup on unmount and on new text arrival prevents blob memory leaks |
|
| duration |
completed_date |
tasks_completed |
files_modified |
| ~5 minutes |
2026-04-04 |
2 |
4 |
|
Phase 39 Plan 01: Sentence-Buffered TTS Streaming + Multi-Language Synthesis Summary
Sentence-buffered SSE TTS streaming with synthesizeSentenceStream AsyncGenerator + synthesizeMultiLang Promise.all parallel synthesis + ChatVoicePlayer progressive playback with sentence queue and progress indicator.
Tasks Completed
| Task |
Name |
Commit |
Files |
| 1 (TDD RED) |
Failing tests for sentence streaming |
2efe6f30 |
server/src/tests/39-sentence-streaming.test.ts |
| 1 (TDD GREEN) |
Sentence streaming + multi-lang in pipeline + routes |
5c888c1a |
server/src/services/voice-pipeline.ts, server/src/routes/voice.ts |
| 2 |
ChatVoicePlayer sentence-buffered streaming playback |
c4b05399 |
ui/src/components/ChatVoicePlayer.tsx |
What Was Built
voice-pipeline.ts additions
splitSentences(text) — exported function, protects title abbreviations (Dr., Mr., Mrs., etc.) from false sentence splits using placeholder technique; acronyms like D.C. at sentence end still trigger splits
synthesizeSentenceStream(text, voiceId?) — AsyncGenerator<{ index, total, audio }> that calls piper per sentence and yields audio buffers immediately as each completes
synthesizeMultiLang(text, voiceIds[]) — runs synthesize() for each voiceId in parallel via Promise.all, returns Map<voiceId, Buffer>
- Internal
synthesizeSentence() helper extracted to avoid code duplication between the three synthesis methods
- Existing
synthesize() updated to use splitSentences() internally (backward compatible)
voice.ts additions
POST /api/synthesize/stream — SSE endpoint with Content-Type: text/event-stream, iterates synthesizeSentenceStream, writes data: { index, total, audio: base64 }\n\n per sentence, finishes with data: { done: true }\n\n
POST /api/synthesize/multi-lang — validates voiceIds array (1-5 entries), calls synthesizeMultiLang, returns { results: [{ voiceId, audio: base64 }] }
- Existing
POST /api/synthesize unchanged
ChatVoicePlayer.tsx
- Added
streaming prop (default true)
- When streaming=true: uses
fetch POST + response.body.getReader() to parse SSE lines as they arrive (not EventSource — POST required)
- First chunk received → decode base64, create Blob URL, set audio src, call
.play() immediately
- Subsequent chunks → pushed to
audioQueue ref
onEnded handler pops next URL from queue and plays it
- Progress indicator: "Sentence N of M" text + dot progress bar (filled/unfilled per sentence)
- Fallback: stream error or
streaming=false → falls back to existing full-fetch via POST /api/synthesize
- All Blob URLs cleaned up on unmount or new
text prop
Test Results
Tests 8 passed (8)
- splitSentences: splits basic sentences ✓
- splitSentences: abbreviation-aware (Dr., D.C.) ✓
- splitSentences: single sentence ✓
- splitSentences: filters empty strings ✓
- synthesizeSentenceStream: yields correct chunk count + metadata ✓
- synthesizeSentenceStream: single sentence ✓
- synthesizeMultiLang: returns Map with all voices ✓
- synthesizeMultiLang: calls piper in parallel ✓
Deviations from Plan
Auto-fixed Issues
None.
Design Notes
abbreviation handling: The plan mentioned "Dr. Smith went to D.C. He liked it." → two sentences. The key insight was that title abbreviations (Dr., Mr., etc.) need protection but two-letter acronyms like D.C. at sentence position can still be split. The implementation uses a space-as-placeholder technique: title abbreviations have their trailing space replaced with \x00, then split on (?<=[.!?])\s+, then restore.
Known Stubs
None — all endpoints are wired to real piper TTS synthesis.
Self-Check: PASSED
- server/src/services/voice-pipeline.ts exists with splitSentences + synthesizeSentenceStream + synthesizeMultiLang ✓
- server/src/routes/voice.ts has synthesize/stream + synthesize/multi-lang + text/event-stream ✓
- server/src/tests/39-sentence-streaming.test.ts exists and all 8 tests pass ✓
- ui/src/components/ChatVoicePlayer.tsx has synthesize/stream + ReadableStream + audioQueue + sentence progress ✓
- Commits: 2efe6f30, 5c888c1a, c4b05399 ✓