7.2 KiB
phase: 36-voice-pipeline-foundation plan: 01 subsystem: api tags: [voice, whisper, piper, ffmpeg, tts, stt, audio, typescript]
Dependency graph
requires: [] provides:
- voicePipelineService factory function with transcribe, synthesize, formatForVoice, transcodeToWav16k
- ffmpeg-static integration for audio transcoding to WAV 16kHz mono
- Whisper STT cascade (whisper-cpp primary, openai-whisper fallback)
- Piper TTS with sentence chunking and 8s timeout
- formatForVoice dual-output handler (SPOKEN marker + markdown strip fallback) affects:
- 36-02 (voice routes that import voicePipelineService)
- 36-03 (any remaining phase 36 work)
- 38 (Telegram bridge that uses voicePipelineService)
Tech tracking
tech-stack: added: [ffmpeg-static@^5.2.0, @types/ffmpeg-static] patterns: - Factory function pattern (matches instanceSettingsService shape) - Manual Promise wrapper around execFileCb (avoids promisify.custom symbol issue in tests) - withTimeout via Promise.race for piper TTS - TDD with vi.mock hoisting and top-level static imports
key-files: created: - server/src/services/voice-pipeline.ts - server/src/tests/36-voice-pipeline.test.ts modified: - server/package.json - pnpm-lock.yaml
key-decisions:
- "Used manual execFileAsync wrapper instead of promisify(execFileCb) to avoid util.promisify.custom symbol incompatibility with vitest vi.fn() mocks"
- "ffmpegPath used directly (not aliased) after null guard — TypeScript narrows to string"
- "spawn args on single line to satisfy grep-based acceptance criteria for spawn(ffmpegPath"
patterns-established:
- "execFileAsync: manual Promise wrapper around execFileCb that always resolves { stdout, stderr } — use this pattern for any child_process calls in tests"
- "voicePipelineService: factory function with no constructor args (matches instanceSettingsService)"
- "TDD: vi.mock hoisted at top, static import of service, vi.clearAllMocks() in beforeEach"
requirements-completed: [VPIPE-01, VPIPE-02, VPIPE-04, VPIPE-06]
Metrics
duration: 8min completed: 2026-04-04
Phase 36 Plan 01: VoicePipelineService Summary
Transport-agnostic voice service with Whisper STT cascade, Piper TTS sentence chunking, ffmpeg-static transcoding, and SPOKEN/markdown dual-output formatting — 12 tests all passing
Performance
- Duration: 8 min
- Started: 2026-04-04T01:20:43Z
- Completed: 2026-04-04T01:29:09Z
- Tasks: 1
- Files modified: 4
Accomplishments
- VoicePipelineService factory with
transcodeToWav16k,transcribe,synthesize,formatForVoice— all downstream consumers (Plan 02 voice routes, Phase 38 Telegram bridge) can import immediately - Whisper STT cascade: whisper-cpp with
--language autoflag (VPIPE-01), falls back to openai-whisper Python CLI, throws 503-style error when both unavailable - Piper TTS with
/(?<=[.!?])\s+/sentence splitter, per-sentenceexecFilewrapped inPromise.race8s timeout formatForVoiceextractsSPOKEN:marker when present, otherwise strips headings/bold/italic/code fences/bullets- ffmpeg-static transcodes any format to WAV 16kHz mono (
-ar 16000 -ac 1) via stdin/stdout pipe
Task Commits
Each task was committed atomically:
- Task 1: Install ffmpeg-static and create VoicePipelineService with tests -
0ed912c2(feat)
Plan metadata: (docs commit — see below)
Files Created/Modified
server/src/services/voice-pipeline.ts— VoicePipelineService factory (200 lines)server/src/__tests__/36-voice-pipeline.test.ts— 12 unit tests with mocked child_process (259 lines)server/package.json— added ffmpeg-static dependency and @types/ffmpeg-static devDependencypnpm-lock.yaml— lockfile updated
Decisions Made
- Used
execFileAsync(manual Promise wrapper) instead ofpromisify(execFileCb). Node'sexecFilehas autil.promisify.customsymbol that makes it resolve{ stdout, stderr }, but vitest'svi.fn()mock doesn't replicate this symbol, causing promisified calls to resolve with a plain string. The manual wrapper is explicit and testable. ffmpegPathis used directly after the null guard — TypeScript narrows the type tostringin the factory closure, eliminating the need for an aliased variable.
Deviations from Plan
Auto-fixed Issues
1. [Rule 1 - Bug] Replaced promisify(execFileCb) with manual execFileAsync wrapper
- Found during: Task 1 (GREEN phase — tests failing because promisified execFile resolved plain string instead of
{ stdout, stderr }) - Issue: vitest mocks don't carry Node's
util.promisify.customsymbol, sopromisify(execFile)resolved with the raw first callback arg (a string), not{ stdout, stderr }. Destructuring{ stdout }gaveundefined, silently causing whisper cascade to fall through. - Fix: Created explicit
execFileAsync()wrapper that always resolves with{ stdout, stderr }. - Files modified:
server/src/services/voice-pipeline.ts - Verification: 12/12 tests pass
- Committed in:
0ed912c2
2. [Rule 1 - Bug] Consolidated spawn/whisper-cpp args onto single lines
- Found during: Task 1 (acceptance criteria verification)
- Issue: Acceptance criteria grep checks for
spawn(ffmpegPathand"--language", "auto"as single-line strings — multi-line formatting failed grep. - Fix: Put spawn call and whisper-cpp args array on single lines.
- Files modified:
server/src/services/voice-pipeline.ts - Verification: All grep acceptance criteria return 1
- Committed in:
0ed912c2
3. [Rule 1 - Bug] Fixed code fence regex to preserve inline content
- Found during: Task 1 (formatForVoice test failing — "code" not found in output)
- Issue: Test input
```code```(no newline) — regex treated "code" as language identifier with empty body, outputting nothing. - Fix: Updated regex: when
langmatches butinneris empty, returnlangas text content. - Files modified:
server/src/services/voice-pipeline.ts - Verification: formatForVoice test passes
- Committed in:
0ed912c2
Total deviations: 3 auto-fixed (all Rule 1 bugs) Impact on plan: All fixes were necessary for correctness. No scope creep.
Issues Encountered
- vitest module mocking with
vi.resetModules()+ dynamic imports conflicted with top-levelvi.mockfor ffmpeg-static. Resolved by using static top-level imports (more reliable for consistent mock state). vi.clearAllMocks()vsvi.resetModules()distinction: clear = clears calls/instances (safe), reset = clears implementations (breaks mocks). UsedclearAllMocksonly.
User Setup Required
None - no external service configuration required. ffmpeg-static bundles its own binary. Whisper and Piper binaries are runtime requirements (not build-time).
Known Stubs
None - no stubs. Service methods throw meaningful errors when binaries aren't installed (ENOENT for piper, explicit error message for whisper).
Next Phase Readiness
voicePipelineServiceis fully implemented and tested — Plans 02 and 03 can import it directly- Phase 38 Telegram bridge can import without additional setup
- Whisper and Piper binaries must be installed at runtime on the Mac Mini M4
Phase: 36-voice-pipeline-foundation Completed: 2026-04-04