13 KiB
| phase | verified | status | score | re_verification | gaps | human_verification | ||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 36-voice-pipeline-foundation | 2026-04-03T01:45:00Z | gaps_found | 11/12 must-haves verified | false |
|
|
Phase 36: Voice Pipeline Foundation Verification Report
Phase Goal: The transport-agnostic voice pipeline is live and callable from any consumer — web chat, Telegram, or future integrations — with correct audio transcoding, voice mode flag propagation, and dual output formatting baked in from the start Verified: 2026-04-03T01:45:00Z Status: gaps_found (1 automated gap, 3 human verification items) Re-verification: No — initial verification
Goal Achievement
Observable Truths
| # | Truth | Status | Evidence |
|---|---|---|---|
| 1 | transcribe() accepts a Buffer and format string, returns { text, language? } | VERIFIED | voice-pipeline.ts:75-124 — full whisper-cpp cascade + fallback |
| 2 | synthesize() accepts text and optional voiceId, returns a WAV Buffer | VERIFIED | voice-pipeline.ts:126-166 — sentence split + piper execFile concat |
| 3 | transcodeToWav16k() converts any input format to WAV 16kHz mono via ffmpeg-static | VERIFIED | voice-pipeline.ts:44 — spawn with -ar 16000 -ac 1 -f wav pipe:1 |
| 4 | formatForVoice() strips markdown and extracts SPOKEN section when present | VERIFIED | voice-pipeline.ts:168-212 — SPOKEN: regex + markdown stripping |
| 5 | formatForVoice() falls back to markdown stripping when SPOKEN marker absent | VERIFIED | voice-pipeline.ts:178-211 — fallback branch present |
| 6 | POST /api/transcribe accepts audio file upload and returns { text, language? } | VERIFIED | voice.ts:16-31 — multer single("audio") + svc.transcribe() |
| 7 | POST /api/synthesize accepts { text } body and returns audio/wav buffer | VERIFIED | voice.ts:34-44 — svc.synthesize() + Content-Type audio/wav |
| 8 | voiceMode from request body is injected as dual-output system prompt in stream endpoint | VERIFIED | chat.ts:145-156 — if (voiceMode === "full_voice") inserts SPOKEN:/DETAILED: message |
| 9 | voiceMode is persisted to messageType column when message is saved | VERIFIED | chat.ts:187-189 — messageType: voiceMode === "full_voice" ? "voice_full" : voiceMode === "voice_input" ? "voice_input" : undefined |
| 10 | Old /transcribe endpoint removed from chat-files.ts | VERIFIED | grep finds zero router.post("/transcribe") in chat-files.ts |
| 11 | Voice routes are mounted in app.ts | VERIFIED | app.ts:35 import + app.ts:167 api.use(voiceRoutes()) |
| 12 | TypeScript compilation passes with no errors | FAILED | tsc --noEmit exits code 2 — 10 errors all in voice-pipeline.ts (ffmpeg-static type mismatch) |
Score: 11/12 truths verified
Required Artifacts
| Artifact | Expected | Status | Details |
|---|---|---|---|
server/src/services/voice-pipeline.ts |
VoicePipelineService factory with transcribe, synthesize, formatForVoice, transcodeToWav16k | VERIFIED | 215 lines, exports voicePipelineService() factory |
server/src/__tests__/36-voice-pipeline.test.ts |
Unit tests (min 80 lines) | VERIFIED | 259 lines, 12 tests all passing |
packages/shared/src/validators/chat.ts |
voiceMode field on createMessageSchema | VERIFIED | z.enum(VOICE_MODES).optional() at line 23 |
packages/shared/src/types/chat.ts |
voiceMode on ChatMessage interface | VERIFIED | voiceMode?: "text" |
server/src/services/nexus-settings.ts |
voiceMode and telegramToken in settings schema | VERIFIED | voiceMode default "text", telegramToken optional, piperBinaryPath, whisperBinaryPath |
server/src/__tests__/36-voice-schema.test.ts |
Schema validation tests (min 40 lines) | VERIFIED | 92 lines, 11 tests all passing |
server/src/routes/voice.ts |
POST /api/transcribe and POST /api/synthesize | VERIFIED | 47 lines, both routes implemented with assertBoard auth |
server/src/__tests__/36-voice-routes.test.ts |
Integration tests for voice routes (min 60 lines) | VERIFIED | 103 lines, 5 tests all passing |
Key Link Verification
| From | To | Via | Status | Details |
|---|---|---|---|---|
| server/src/services/voice-pipeline.ts | ffmpeg-static | import ffmpegPath from "ffmpeg-static" | WIRED | Line 1 — import present, null guard at line 28 |
| server/src/services/voice-pipeline.ts | node:child_process | execFile and spawn | WIRED | Line 2 — both imported, spawn at line 44, execFileCb at line 135 |
| server/src/routes/voice.ts | server/src/services/voice-pipeline.ts | voicePipelineService() import | WIRED | Line 4 — import + instantiation at line 9 |
| server/src/routes/chat.ts | packages/shared/src/validators/chat.ts | createMessageSchema preserves voiceMode | WIRED | createMessageSchema imported at line 14, parsed at line 85; voiceMode also destructured from req.body at line 93 for stream endpoint |
| server/src/app.ts | server/src/routes/voice.ts | api.use(voiceRoutes()) | WIRED | Import at line 35, mount at line 167 |
| server/src/services/nexus-settings.ts | nexus-settings.json | Zod .default() handles missing voiceMode | WIRED | voiceMode: z.enum(VOICE_MODES).default("text") at line 15 |
Data-Flow Trace (Level 4)
This phase produces service and route infrastructure rather than UI components that render data to users. Data-flow trace is not applicable for API endpoints and service factories. The key data flows (Buffer through transcodeToWav16k, text through formatForVoice, voiceMode through stream endpoint) are verified by test coverage (28/28 tests passing).
Behavioral Spot-Checks
| Behavior | Command | Result | Status |
|---|---|---|---|
| All 12 voice-pipeline unit tests pass | pnpm vitest run src/tests/36-voice-pipeline.test.ts | 12/12 pass (27ms) | PASS |
| All 11 schema tests pass | pnpm vitest run src/tests/36-voice-schema.test.ts | 11/11 pass (6ms) | PASS |
| All 5 route integration tests pass | pnpm vitest run src/tests/36-voice-routes.test.ts | 5/5 pass (39ms) | PASS |
| TypeScript compilation | tsc --noEmit | 10 errors in voice-pipeline.ts (ffmpeg-static type) | FAIL |
| Old transcribe endpoint removed | grep router.post("/transcribe" chat-files.ts | 0 matches | PASS |
| voiceRoutes mounted in app.ts | grep voiceRoutes app.ts | import at line 35, use at line 167 | PASS |
Requirements Coverage
| Requirement | Source Plan | Description | Status | Evidence |
|---|---|---|---|---|
| VPIPE-01 | 36-01 | Whisper STT with automatic language detection | SATISFIED | voice-pipeline.ts:90 -- "--language", "auto" flag; langMatch regex at line 96 |
| VPIPE-02 | 36-01 | Piper TTS synthesis | SATISFIED | voice-pipeline.ts:126-166 — synthesize() with 8s timeout per sentence via Promise.race |
| VPIPE-03 | 36-03 | Transport-agnostic VoicePipelineService | SATISFIED | voice.ts:16-44 — HTTP endpoints callable from any transport; voice-pipeline.ts factory reusable by Phase 38 Telegram bridge |
| VPIPE-04 | 36-01 | WAV 16kHz mono transcoding via ffmpeg | SATISFIED | voice-pipeline.ts:44 — -ar 16000 -ac 1 flags passed to ffmpeg |
| VPIPE-05 | 36-02 | voiceMode flag propagation through message pipeline | SATISFIED | validators/chat.ts voiceMode field; types/chat.ts interface; stream endpoint injects dual-output prompt |
| VPIPE-06 | 36-01, 36-03 | Dual output: spoken prose + full markdown | SATISFIED | formatForVoice() extracts SPOKEN section; chat.ts:145-156 injects SPOKEN:/DETAILED: system prompt for full_voice mode |
All 6 required requirements (VPIPE-01 through VPIPE-06) are accounted for. VPIPE-07 and VPIPE-08 are mapped to Phase 39 (pending) and are not in scope for this phase.
Anti-Patterns Found
| File | Line | Pattern | Severity | Impact |
|---|---|---|---|---|
| server/src/services/voice-pipeline.ts | 44 | spawn() receives ffmpegPath which TypeScript resolves as module namespace type rather than string | WARNING | TypeScript compilation fails (10 errors); runtime behavior is unaffected because the module default is a string at runtime — the type mismatch is a declaration issue, not a runtime bug |
No TODO/FIXME/placeholder comments found. No empty return stubs found. The catch(() => {}) at line 122 is legitimate cleanup (unlink temp file on error, intentionally silent).
Human Verification Required
1. Voice transcription with real Whisper binary
Test: On the Mac Mini M4, POST a real WebM audio recording (captured from browser) to /api/transcribe with a board authentication token
Expected: Response contains { text: "...", language: "en" } with the spoken words correctly transcribed
Why human: whisper-cpp binary is a runtime dependency not available in CI — unit tests mock child_process
2. TTS synthesis with real Piper binary
Test: On the Mac Mini M4, POST { text: "Hello world. How are you today?" } to /api/synthesize
Expected: Response has Content-Type: audio/wav, the audio file is valid WAV format, and plays the sentence when opened
Why human: piper binary is a runtime dependency not available in CI — unit tests mock child_process
3. End-to-end dual-output voice interaction
Test: From chat UI (Phase 37), send a message with voiceMode: "full_voice" in the stream request body
Expected: AI response contains both SPOKEN: and DETAILED: sections; formatForVoice() extracts only the SPOKEN section for TTS; TTS audio plays the short prose summary
Why human: Requires live Puter AI token, real browser, and audio hardware
Gaps Summary
One automated gap blocks the Plan 03 acceptance criterion "TypeScript compilation passes with no errors":
server/src/services/voice-pipeline.ts has 10 TypeScript errors caused by an ffmpeg-static type declaration mismatch. The @types/ffmpeg-static devDependency was installed in the worktree's pnpm store but did not propagate to the main repo's pnpm store. As a result, TypeScript resolves ffmpegPath from the package's own types/index.d.ts (which exports typeof import(...)) instead of a simple string | null. This causes spawn(ffmpegPath, ...) to fail overload resolution, cascading to never types on stdout/stderr/stdin.
Runtime impact: none. The actual runtime value of ffmpegPath is a string path (e.g. /usr/bin/ffmpeg), and the null guard at line 28 ensures it is non-null before use. All 28 tests pass. However, the TypeScript build is broken, which blocks CI and downstream phases that rely on tsc --noEmit as a gate.
Fix: Run cd server && pnpm add -D @types/ffmpeg-static in the main repo (not the worktree) to ensure the type declaration is available in the shared pnpm store.
Verified: 2026-04-03T01:45:00Z Verifier: Claude (gsd-verifier)