nexus/.planning/phases/36-voice-pipeline-foundation/36-VERIFICATION.md

13 KiB

phase verified status score re_verification gaps human_verification
36-voice-pipeline-foundation 2026-04-03T01:45:00Z gaps_found 11/12 must-haves verified false
truth status reason artifacts missing
TypeScript compilation passes with no errors failed voice-pipeline.ts has 10 type errors from ffmpeg-static import type mismatch — ffmpegPath resolves to the module namespace rather than the string default export. spawn() receives wrong type. SUMMARY acknowledged this but framed it as 'pre-existing', despite it originating in Plan 01.
path issue
server/src/services/voice-pipeline.ts TS2769 — spawn(ffmpegPath, ...) fails because @types/ffmpeg-static is not installed in the main repo's pnpm store (only in the worktree). ffmpegPath resolves to module namespace type instead of string | null. spawn stdout/stderr/stdin properties all become never due to overload resolution failure.
Install @types/ffmpeg-static in the main repo (pnpm add -D @types/ffmpeg-static in server/), or cast ffmpegPath to string after the null guard (ffmpegPath as string), or use a ts-ignore annotation with a comment explaining the type mismatch
test expected why_human
Voice pipeline runtime with real whisper-cpp binary POST /api/transcribe with a real WebM audio file transcribes speech and returns { text, language } whisper-cpp is a runtime dependency not present in CI — cannot verify audio-to-text pipeline without the binary installed on the Mac Mini M4
test expected why_human
Voice pipeline runtime with real Piper binary POST /api/synthesize with { text: 'Hello world' } returns a valid WAV file that plays audio piper binary is a runtime dependency not present in CI — cannot verify TTS audio quality or WAV output validity without the binary
test expected why_human
Dual-output voiceMode=full_voice in full AI stream Stream response contains both SPOKEN: and DETAILED: sections; formatForVoice() extracts only the SPOKEN section for TTS delivery Requires a live Puter AI token and microphone input to test end-to-end dual output formatting in real conditions

Phase 36: Voice Pipeline Foundation Verification Report

Phase Goal: The transport-agnostic voice pipeline is live and callable from any consumer — web chat, Telegram, or future integrations — with correct audio transcoding, voice mode flag propagation, and dual output formatting baked in from the start Verified: 2026-04-03T01:45:00Z Status: gaps_found (1 automated gap, 3 human verification items) Re-verification: No — initial verification

Goal Achievement

Observable Truths

# Truth Status Evidence
1 transcribe() accepts a Buffer and format string, returns { text, language? } VERIFIED voice-pipeline.ts:75-124 — full whisper-cpp cascade + fallback
2 synthesize() accepts text and optional voiceId, returns a WAV Buffer VERIFIED voice-pipeline.ts:126-166 — sentence split + piper execFile concat
3 transcodeToWav16k() converts any input format to WAV 16kHz mono via ffmpeg-static VERIFIED voice-pipeline.ts:44 — spawn with -ar 16000 -ac 1 -f wav pipe:1
4 formatForVoice() strips markdown and extracts SPOKEN section when present VERIFIED voice-pipeline.ts:168-212 — SPOKEN: regex + markdown stripping
5 formatForVoice() falls back to markdown stripping when SPOKEN marker absent VERIFIED voice-pipeline.ts:178-211 — fallback branch present
6 POST /api/transcribe accepts audio file upload and returns { text, language? } VERIFIED voice.ts:16-31 — multer single("audio") + svc.transcribe()
7 POST /api/synthesize accepts { text } body and returns audio/wav buffer VERIFIED voice.ts:34-44 — svc.synthesize() + Content-Type audio/wav
8 voiceMode from request body is injected as dual-output system prompt in stream endpoint VERIFIED chat.ts:145-156 — if (voiceMode === "full_voice") inserts SPOKEN:/DETAILED: message
9 voiceMode is persisted to messageType column when message is saved VERIFIED chat.ts:187-189 — messageType: voiceMode === "full_voice" ? "voice_full" : voiceMode === "voice_input" ? "voice_input" : undefined
10 Old /transcribe endpoint removed from chat-files.ts VERIFIED grep finds zero router.post("/transcribe") in chat-files.ts
11 Voice routes are mounted in app.ts VERIFIED app.ts:35 import + app.ts:167 api.use(voiceRoutes())
12 TypeScript compilation passes with no errors FAILED tsc --noEmit exits code 2 — 10 errors all in voice-pipeline.ts (ffmpeg-static type mismatch)

Score: 11/12 truths verified

Required Artifacts

Artifact Expected Status Details
server/src/services/voice-pipeline.ts VoicePipelineService factory with transcribe, synthesize, formatForVoice, transcodeToWav16k VERIFIED 215 lines, exports voicePipelineService() factory
server/src/__tests__/36-voice-pipeline.test.ts Unit tests (min 80 lines) VERIFIED 259 lines, 12 tests all passing
packages/shared/src/validators/chat.ts voiceMode field on createMessageSchema VERIFIED z.enum(VOICE_MODES).optional() at line 23
packages/shared/src/types/chat.ts voiceMode on ChatMessage interface VERIFIED voiceMode?: "text"
server/src/services/nexus-settings.ts voiceMode and telegramToken in settings schema VERIFIED voiceMode default "text", telegramToken optional, piperBinaryPath, whisperBinaryPath
server/src/__tests__/36-voice-schema.test.ts Schema validation tests (min 40 lines) VERIFIED 92 lines, 11 tests all passing
server/src/routes/voice.ts POST /api/transcribe and POST /api/synthesize VERIFIED 47 lines, both routes implemented with assertBoard auth
server/src/__tests__/36-voice-routes.test.ts Integration tests for voice routes (min 60 lines) VERIFIED 103 lines, 5 tests all passing
From To Via Status Details
server/src/services/voice-pipeline.ts ffmpeg-static import ffmpegPath from "ffmpeg-static" WIRED Line 1 — import present, null guard at line 28
server/src/services/voice-pipeline.ts node:child_process execFile and spawn WIRED Line 2 — both imported, spawn at line 44, execFileCb at line 135
server/src/routes/voice.ts server/src/services/voice-pipeline.ts voicePipelineService() import WIRED Line 4 — import + instantiation at line 9
server/src/routes/chat.ts packages/shared/src/validators/chat.ts createMessageSchema preserves voiceMode WIRED createMessageSchema imported at line 14, parsed at line 85; voiceMode also destructured from req.body at line 93 for stream endpoint
server/src/app.ts server/src/routes/voice.ts api.use(voiceRoutes()) WIRED Import at line 35, mount at line 167
server/src/services/nexus-settings.ts nexus-settings.json Zod .default() handles missing voiceMode WIRED voiceMode: z.enum(VOICE_MODES).default("text") at line 15

Data-Flow Trace (Level 4)

This phase produces service and route infrastructure rather than UI components that render data to users. Data-flow trace is not applicable for API endpoints and service factories. The key data flows (Buffer through transcodeToWav16k, text through formatForVoice, voiceMode through stream endpoint) are verified by test coverage (28/28 tests passing).

Behavioral Spot-Checks

Behavior Command Result Status
All 12 voice-pipeline unit tests pass pnpm vitest run src/tests/36-voice-pipeline.test.ts 12/12 pass (27ms) PASS
All 11 schema tests pass pnpm vitest run src/tests/36-voice-schema.test.ts 11/11 pass (6ms) PASS
All 5 route integration tests pass pnpm vitest run src/tests/36-voice-routes.test.ts 5/5 pass (39ms) PASS
TypeScript compilation tsc --noEmit 10 errors in voice-pipeline.ts (ffmpeg-static type) FAIL
Old transcribe endpoint removed grep router.post("/transcribe" chat-files.ts 0 matches PASS
voiceRoutes mounted in app.ts grep voiceRoutes app.ts import at line 35, use at line 167 PASS

Requirements Coverage

Requirement Source Plan Description Status Evidence
VPIPE-01 36-01 Whisper STT with automatic language detection SATISFIED voice-pipeline.ts:90 -- "--language", "auto" flag; langMatch regex at line 96
VPIPE-02 36-01 Piper TTS synthesis SATISFIED voice-pipeline.ts:126-166 — synthesize() with 8s timeout per sentence via Promise.race
VPIPE-03 36-03 Transport-agnostic VoicePipelineService SATISFIED voice.ts:16-44 — HTTP endpoints callable from any transport; voice-pipeline.ts factory reusable by Phase 38 Telegram bridge
VPIPE-04 36-01 WAV 16kHz mono transcoding via ffmpeg SATISFIED voice-pipeline.ts:44 — -ar 16000 -ac 1 flags passed to ffmpeg
VPIPE-05 36-02 voiceMode flag propagation through message pipeline SATISFIED validators/chat.ts voiceMode field; types/chat.ts interface; stream endpoint injects dual-output prompt
VPIPE-06 36-01, 36-03 Dual output: spoken prose + full markdown SATISFIED formatForVoice() extracts SPOKEN section; chat.ts:145-156 injects SPOKEN:/DETAILED: system prompt for full_voice mode

All 6 required requirements (VPIPE-01 through VPIPE-06) are accounted for. VPIPE-07 and VPIPE-08 are mapped to Phase 39 (pending) and are not in scope for this phase.

Anti-Patterns Found

File Line Pattern Severity Impact
server/src/services/voice-pipeline.ts 44 spawn() receives ffmpegPath which TypeScript resolves as module namespace type rather than string WARNING TypeScript compilation fails (10 errors); runtime behavior is unaffected because the module default is a string at runtime — the type mismatch is a declaration issue, not a runtime bug

No TODO/FIXME/placeholder comments found. No empty return stubs found. The catch(() => {}) at line 122 is legitimate cleanup (unlink temp file on error, intentionally silent).

Human Verification Required

1. Voice transcription with real Whisper binary

Test: On the Mac Mini M4, POST a real WebM audio recording (captured from browser) to /api/transcribe with a board authentication token Expected: Response contains { text: "...", language: "en" } with the spoken words correctly transcribed Why human: whisper-cpp binary is a runtime dependency not available in CI — unit tests mock child_process

2. TTS synthesis with real Piper binary

Test: On the Mac Mini M4, POST { text: "Hello world. How are you today?" } to /api/synthesize Expected: Response has Content-Type: audio/wav, the audio file is valid WAV format, and plays the sentence when opened Why human: piper binary is a runtime dependency not available in CI — unit tests mock child_process

3. End-to-end dual-output voice interaction

Test: From chat UI (Phase 37), send a message with voiceMode: "full_voice" in the stream request body Expected: AI response contains both SPOKEN: and DETAILED: sections; formatForVoice() extracts only the SPOKEN section for TTS; TTS audio plays the short prose summary Why human: Requires live Puter AI token, real browser, and audio hardware

Gaps Summary

One automated gap blocks the Plan 03 acceptance criterion "TypeScript compilation passes with no errors":

server/src/services/voice-pipeline.ts has 10 TypeScript errors caused by an ffmpeg-static type declaration mismatch. The @types/ffmpeg-static devDependency was installed in the worktree's pnpm store but did not propagate to the main repo's pnpm store. As a result, TypeScript resolves ffmpegPath from the package's own types/index.d.ts (which exports typeof import(...)) instead of a simple string | null. This causes spawn(ffmpegPath, ...) to fail overload resolution, cascading to never types on stdout/stderr/stdin.

Runtime impact: none. The actual runtime value of ffmpegPath is a string path (e.g. /usr/bin/ffmpeg), and the null guard at line 28 ensures it is non-null before use. All 28 tests pass. However, the TypeScript build is broken, which blocks CI and downstream phases that rely on tsc --noEmit as a gate.

Fix: Run cd server && pnpm add -D @types/ffmpeg-static in the main repo (not the worktree) to ensure the type declaration is available in the shared pnpm store.


Verified: 2026-04-03T01:45:00Z Verifier: Claude (gsd-verifier)