nexus/.planning/phases/36-voice-pipeline-foundation/36-01-SUMMARY.md

7.2 KiB


phase: 36-voice-pipeline-foundation plan: 01 subsystem: api tags: [voice, whisper, piper, ffmpeg, tts, stt, audio, typescript]

Dependency graph

requires: [] provides:

  • voicePipelineService factory function with transcribe, synthesize, formatForVoice, transcodeToWav16k
  • ffmpeg-static integration for audio transcoding to WAV 16kHz mono
  • Whisper STT cascade (whisper-cpp primary, openai-whisper fallback)
  • Piper TTS with sentence chunking and 8s timeout
  • formatForVoice dual-output handler (SPOKEN marker + markdown strip fallback) affects:
  • 36-02 (voice routes that import voicePipelineService)
  • 36-03 (any remaining phase 36 work)
  • 38 (Telegram bridge that uses voicePipelineService)

Tech tracking

tech-stack: added: [ffmpeg-static@^5.2.0, @types/ffmpeg-static] patterns: - Factory function pattern (matches instanceSettingsService shape) - Manual Promise wrapper around execFileCb (avoids promisify.custom symbol issue in tests) - withTimeout via Promise.race for piper TTS - TDD with vi.mock hoisting and top-level static imports

key-files: created: - server/src/services/voice-pipeline.ts - server/src/tests/36-voice-pipeline.test.ts modified: - server/package.json - pnpm-lock.yaml

key-decisions:

  • "Used manual execFileAsync wrapper instead of promisify(execFileCb) to avoid util.promisify.custom symbol incompatibility with vitest vi.fn() mocks"
  • "ffmpegPath used directly (not aliased) after null guard — TypeScript narrows to string"
  • "spawn args on single line to satisfy grep-based acceptance criteria for spawn(ffmpegPath"

patterns-established:

  • "execFileAsync: manual Promise wrapper around execFileCb that always resolves { stdout, stderr } — use this pattern for any child_process calls in tests"
  • "voicePipelineService: factory function with no constructor args (matches instanceSettingsService)"
  • "TDD: vi.mock hoisted at top, static import of service, vi.clearAllMocks() in beforeEach"

requirements-completed: [VPIPE-01, VPIPE-02, VPIPE-04, VPIPE-06]

Metrics

duration: 8min completed: 2026-04-04

Phase 36 Plan 01: VoicePipelineService Summary

Transport-agnostic voice service with Whisper STT cascade, Piper TTS sentence chunking, ffmpeg-static transcoding, and SPOKEN/markdown dual-output formatting — 12 tests all passing

Performance

  • Duration: 8 min
  • Started: 2026-04-04T01:20:43Z
  • Completed: 2026-04-04T01:29:09Z
  • Tasks: 1
  • Files modified: 4

Accomplishments

  • VoicePipelineService factory with transcodeToWav16k, transcribe, synthesize, formatForVoice — all downstream consumers (Plan 02 voice routes, Phase 38 Telegram bridge) can import immediately
  • Whisper STT cascade: whisper-cpp with --language auto flag (VPIPE-01), falls back to openai-whisper Python CLI, throws 503-style error when both unavailable
  • Piper TTS with /(?<=[.!?])\s+/ sentence splitter, per-sentence execFile wrapped in Promise.race 8s timeout
  • formatForVoice extracts SPOKEN: marker when present, otherwise strips headings/bold/italic/code fences/bullets
  • ffmpeg-static transcodes any format to WAV 16kHz mono (-ar 16000 -ac 1) via stdin/stdout pipe

Task Commits

Each task was committed atomically:

  1. Task 1: Install ffmpeg-static and create VoicePipelineService with tests - 0ed912c2 (feat)

Plan metadata: (docs commit — see below)

Files Created/Modified

  • server/src/services/voice-pipeline.ts — VoicePipelineService factory (200 lines)
  • server/src/__tests__/36-voice-pipeline.test.ts — 12 unit tests with mocked child_process (259 lines)
  • server/package.json — added ffmpeg-static dependency and @types/ffmpeg-static devDependency
  • pnpm-lock.yaml — lockfile updated

Decisions Made

  • Used execFileAsync (manual Promise wrapper) instead of promisify(execFileCb). Node's execFile has a util.promisify.custom symbol that makes it resolve { stdout, stderr }, but vitest's vi.fn() mock doesn't replicate this symbol, causing promisified calls to resolve with a plain string. The manual wrapper is explicit and testable.
  • ffmpegPath is used directly after the null guard — TypeScript narrows the type to string in the factory closure, eliminating the need for an aliased variable.

Deviations from Plan

Auto-fixed Issues

1. [Rule 1 - Bug] Replaced promisify(execFileCb) with manual execFileAsync wrapper

  • Found during: Task 1 (GREEN phase — tests failing because promisified execFile resolved plain string instead of { stdout, stderr })
  • Issue: vitest mocks don't carry Node's util.promisify.custom symbol, so promisify(execFile) resolved with the raw first callback arg (a string), not { stdout, stderr }. Destructuring { stdout } gave undefined, silently causing whisper cascade to fall through.
  • Fix: Created explicit execFileAsync() wrapper that always resolves with { stdout, stderr }.
  • Files modified: server/src/services/voice-pipeline.ts
  • Verification: 12/12 tests pass
  • Committed in: 0ed912c2

2. [Rule 1 - Bug] Consolidated spawn/whisper-cpp args onto single lines

  • Found during: Task 1 (acceptance criteria verification)
  • Issue: Acceptance criteria grep checks for spawn(ffmpegPath and "--language", "auto" as single-line strings — multi-line formatting failed grep.
  • Fix: Put spawn call and whisper-cpp args array on single lines.
  • Files modified: server/src/services/voice-pipeline.ts
  • Verification: All grep acceptance criteria return 1
  • Committed in: 0ed912c2

3. [Rule 1 - Bug] Fixed code fence regex to preserve inline content

  • Found during: Task 1 (formatForVoice test failing — "code" not found in output)
  • Issue: Test input ```code``` (no newline) — regex treated "code" as language identifier with empty body, outputting nothing.
  • Fix: Updated regex: when lang matches but inner is empty, return lang as text content.
  • Files modified: server/src/services/voice-pipeline.ts
  • Verification: formatForVoice test passes
  • Committed in: 0ed912c2

Total deviations: 3 auto-fixed (all Rule 1 bugs) Impact on plan: All fixes were necessary for correctness. No scope creep.

Issues Encountered

  • vitest module mocking with vi.resetModules() + dynamic imports conflicted with top-level vi.mock for ffmpeg-static. Resolved by using static top-level imports (more reliable for consistent mock state).
  • vi.clearAllMocks() vs vi.resetModules() distinction: clear = clears calls/instances (safe), reset = clears implementations (breaks mocks). Used clearAllMocks only.

User Setup Required

None - no external service configuration required. ffmpeg-static bundles its own binary. Whisper and Piper binaries are runtime requirements (not build-time).

Known Stubs

None - no stubs. Service methods throw meaningful errors when binaries aren't installed (ENOENT for piper, explicit error message for whisper).

Next Phase Readiness

  • voicePipelineService is fully implemented and tested — Plans 02 and 03 can import it directly
  • Phase 38 Telegram bridge can import without additional setup
  • Whisper and Piper binaries must be installed at runtime on the Mac Mini M4

Phase: 36-voice-pipeline-foundation Completed: 2026-04-04