fix(36): resolve TypeScript errors in voice-pipeline.ts (ffmpegPath cast, callback types)

This commit is contained in:
Nexus Dev 2026-04-04 01:49:08 +00:00
parent d4db7ffffc
commit 9813903270
2 changed files with 152 additions and 2 deletions

View file

@ -0,0 +1,149 @@
---
phase: 36-voice-pipeline-foundation
verified: 2026-04-03T01:45:00Z
status: gaps_found
score: 11/12 must-haves verified
re_verification: false
gaps:
- truth: "TypeScript compilation passes with no errors"
status: failed
reason: "voice-pipeline.ts has 10 type errors from ffmpeg-static import type mismatch — ffmpegPath resolves to the module namespace rather than the string default export. spawn() receives wrong type. SUMMARY acknowledged this but framed it as 'pre-existing', despite it originating in Plan 01."
artifacts:
- path: "server/src/services/voice-pipeline.ts"
issue: "TS2769 — spawn(ffmpegPath, ...) fails because @types/ffmpeg-static is not installed in the main repo's pnpm store (only in the worktree). ffmpegPath resolves to module namespace type instead of string | null. spawn stdout/stderr/stdin properties all become never due to overload resolution failure."
missing:
- "Install @types/ffmpeg-static in the main repo (pnpm add -D @types/ffmpeg-static in server/), or cast ffmpegPath to string after the null guard (ffmpegPath as string), or use a ts-ignore annotation with a comment explaining the type mismatch"
human_verification:
- test: "Voice pipeline runtime with real whisper-cpp binary"
expected: "POST /api/transcribe with a real WebM audio file transcribes speech and returns { text, language }"
why_human: "whisper-cpp is a runtime dependency not present in CI — cannot verify audio-to-text pipeline without the binary installed on the Mac Mini M4"
- test: "Voice pipeline runtime with real Piper binary"
expected: "POST /api/synthesize with { text: 'Hello world' } returns a valid WAV file that plays audio"
why_human: "piper binary is a runtime dependency not present in CI — cannot verify TTS audio quality or WAV output validity without the binary"
- test: "Dual-output voiceMode=full_voice in full AI stream"
expected: "Stream response contains both SPOKEN: and DETAILED: sections; formatForVoice() extracts only the SPOKEN section for TTS delivery"
why_human: "Requires a live Puter AI token and microphone input to test end-to-end dual output formatting in real conditions"
---
# Phase 36: Voice Pipeline Foundation Verification Report
**Phase Goal:** The transport-agnostic voice pipeline is live and callable from any consumer — web chat, Telegram, or future integrations — with correct audio transcoding, voice mode flag propagation, and dual output formatting baked in from the start
**Verified:** 2026-04-03T01:45:00Z
**Status:** gaps_found (1 automated gap, 3 human verification items)
**Re-verification:** No — initial verification
## Goal Achievement
### Observable Truths
| # | Truth | Status | Evidence |
|---|-------|--------|---------|
| 1 | transcribe() accepts a Buffer and format string, returns { text, language? } | VERIFIED | voice-pipeline.ts:75-124 — full whisper-cpp cascade + fallback |
| 2 | synthesize() accepts text and optional voiceId, returns a WAV Buffer | VERIFIED | voice-pipeline.ts:126-166 — sentence split + piper execFile concat |
| 3 | transcodeToWav16k() converts any input format to WAV 16kHz mono via ffmpeg-static | VERIFIED | voice-pipeline.ts:44 — spawn with -ar 16000 -ac 1 -f wav pipe:1 |
| 4 | formatForVoice() strips markdown and extracts SPOKEN section when present | VERIFIED | voice-pipeline.ts:168-212 — SPOKEN: regex + markdown stripping |
| 5 | formatForVoice() falls back to markdown stripping when SPOKEN marker absent | VERIFIED | voice-pipeline.ts:178-211 — fallback branch present |
| 6 | POST /api/transcribe accepts audio file upload and returns { text, language? } | VERIFIED | voice.ts:16-31 — multer single("audio") + svc.transcribe() |
| 7 | POST /api/synthesize accepts { text } body and returns audio/wav buffer | VERIFIED | voice.ts:34-44 — svc.synthesize() + Content-Type audio/wav |
| 8 | voiceMode from request body is injected as dual-output system prompt in stream endpoint | VERIFIED | chat.ts:145-156 — if (voiceMode === "full_voice") inserts SPOKEN:/DETAILED: message |
| 9 | voiceMode is persisted to messageType column when message is saved | VERIFIED | chat.ts:187-189 — messageType: voiceMode === "full_voice" ? "voice_full" : voiceMode === "voice_input" ? "voice_input" : undefined |
| 10 | Old /transcribe endpoint removed from chat-files.ts | VERIFIED | grep finds zero router.post("/transcribe") in chat-files.ts |
| 11 | Voice routes are mounted in app.ts | VERIFIED | app.ts:35 import + app.ts:167 api.use(voiceRoutes()) |
| 12 | TypeScript compilation passes with no errors | FAILED | tsc --noEmit exits code 2 — 10 errors all in voice-pipeline.ts (ffmpeg-static type mismatch) |
**Score:** 11/12 truths verified
### Required Artifacts
| Artifact | Expected | Status | Details |
|----------|----------|--------|---------|
| `server/src/services/voice-pipeline.ts` | VoicePipelineService factory with transcribe, synthesize, formatForVoice, transcodeToWav16k | VERIFIED | 215 lines, exports voicePipelineService() factory |
| `server/src/__tests__/36-voice-pipeline.test.ts` | Unit tests (min 80 lines) | VERIFIED | 259 lines, 12 tests all passing |
| `packages/shared/src/validators/chat.ts` | voiceMode field on createMessageSchema | VERIFIED | z.enum(VOICE_MODES).optional() at line 23 |
| `packages/shared/src/types/chat.ts` | voiceMode on ChatMessage interface | VERIFIED | voiceMode?: "text" | "voice_input" | "full_voice" | null at line 70 |
| `server/src/services/nexus-settings.ts` | voiceMode and telegramToken in settings schema | VERIFIED | voiceMode default "text", telegramToken optional, piperBinaryPath, whisperBinaryPath |
| `server/src/__tests__/36-voice-schema.test.ts` | Schema validation tests (min 40 lines) | VERIFIED | 92 lines, 11 tests all passing |
| `server/src/routes/voice.ts` | POST /api/transcribe and POST /api/synthesize | VERIFIED | 47 lines, both routes implemented with assertBoard auth |
| `server/src/__tests__/36-voice-routes.test.ts` | Integration tests for voice routes (min 60 lines) | VERIFIED | 103 lines, 5 tests all passing |
### Key Link Verification
| From | To | Via | Status | Details |
|------|----|-----|--------|---------|
| server/src/services/voice-pipeline.ts | ffmpeg-static | import ffmpegPath from "ffmpeg-static" | WIRED | Line 1 — import present, null guard at line 28 |
| server/src/services/voice-pipeline.ts | node:child_process | execFile and spawn | WIRED | Line 2 — both imported, spawn at line 44, execFileCb at line 135 |
| server/src/routes/voice.ts | server/src/services/voice-pipeline.ts | voicePipelineService() import | WIRED | Line 4 — import + instantiation at line 9 |
| server/src/routes/chat.ts | packages/shared/src/validators/chat.ts | createMessageSchema preserves voiceMode | WIRED | createMessageSchema imported at line 14, parsed at line 85; voiceMode also destructured from req.body at line 93 for stream endpoint |
| server/src/app.ts | server/src/routes/voice.ts | api.use(voiceRoutes()) | WIRED | Import at line 35, mount at line 167 |
| server/src/services/nexus-settings.ts | nexus-settings.json | Zod .default() handles missing voiceMode | WIRED | voiceMode: z.enum(VOICE_MODES).default("text") at line 15 |
### Data-Flow Trace (Level 4)
This phase produces service and route infrastructure rather than UI components that render data to users. Data-flow trace is not applicable for API endpoints and service factories. The key data flows (Buffer through transcodeToWav16k, text through formatForVoice, voiceMode through stream endpoint) are verified by test coverage (28/28 tests passing).
### Behavioral Spot-Checks
| Behavior | Command | Result | Status |
|----------|---------|--------|--------|
| All 12 voice-pipeline unit tests pass | pnpm vitest run src/__tests__/36-voice-pipeline.test.ts | 12/12 pass (27ms) | PASS |
| All 11 schema tests pass | pnpm vitest run src/__tests__/36-voice-schema.test.ts | 11/11 pass (6ms) | PASS |
| All 5 route integration tests pass | pnpm vitest run src/__tests__/36-voice-routes.test.ts | 5/5 pass (39ms) | PASS |
| TypeScript compilation | tsc --noEmit | 10 errors in voice-pipeline.ts (ffmpeg-static type) | FAIL |
| Old transcribe endpoint removed | grep router.post("/transcribe" chat-files.ts | 0 matches | PASS |
| voiceRoutes mounted in app.ts | grep voiceRoutes app.ts | import at line 35, use at line 167 | PASS |
### Requirements Coverage
| Requirement | Source Plan | Description | Status | Evidence |
|-------------|------------|-------------|--------|---------|
| VPIPE-01 | 36-01 | Whisper STT with automatic language detection | SATISFIED | voice-pipeline.ts:90 -- `"--language", "auto"` flag; langMatch regex at line 96 |
| VPIPE-02 | 36-01 | Piper TTS synthesis | SATISFIED | voice-pipeline.ts:126-166 — synthesize() with 8s timeout per sentence via Promise.race |
| VPIPE-03 | 36-03 | Transport-agnostic VoicePipelineService | SATISFIED | voice.ts:16-44 — HTTP endpoints callable from any transport; voice-pipeline.ts factory reusable by Phase 38 Telegram bridge |
| VPIPE-04 | 36-01 | WAV 16kHz mono transcoding via ffmpeg | SATISFIED | voice-pipeline.ts:44 — -ar 16000 -ac 1 flags passed to ffmpeg |
| VPIPE-05 | 36-02 | voiceMode flag propagation through message pipeline | SATISFIED | validators/chat.ts voiceMode field; types/chat.ts interface; stream endpoint injects dual-output prompt |
| VPIPE-06 | 36-01, 36-03 | Dual output: spoken prose + full markdown | SATISFIED | formatForVoice() extracts SPOKEN section; chat.ts:145-156 injects SPOKEN:/DETAILED: system prompt for full_voice mode |
All 6 required requirements (VPIPE-01 through VPIPE-06) are accounted for. VPIPE-07 and VPIPE-08 are mapped to Phase 39 (pending) and are not in scope for this phase.
### Anti-Patterns Found
| File | Line | Pattern | Severity | Impact |
|------|------|---------|----------|--------|
| server/src/services/voice-pipeline.ts | 44 | spawn() receives ffmpegPath which TypeScript resolves as module namespace type rather than string | WARNING | TypeScript compilation fails (10 errors); runtime behavior is unaffected because the module default is a string at runtime — the type mismatch is a declaration issue, not a runtime bug |
No TODO/FIXME/placeholder comments found. No empty return stubs found. The `catch(() => {})` at line 122 is legitimate cleanup (unlink temp file on error, intentionally silent).
### Human Verification Required
**1. Voice transcription with real Whisper binary**
**Test:** On the Mac Mini M4, POST a real WebM audio recording (captured from browser) to `/api/transcribe` with a board authentication token
**Expected:** Response contains `{ text: "...", language: "en" }` with the spoken words correctly transcribed
**Why human:** whisper-cpp binary is a runtime dependency not available in CI — unit tests mock child_process
**2. TTS synthesis with real Piper binary**
**Test:** On the Mac Mini M4, POST `{ text: "Hello world. How are you today?" }` to `/api/synthesize`
**Expected:** Response has `Content-Type: audio/wav`, the audio file is valid WAV format, and plays the sentence when opened
**Why human:** piper binary is a runtime dependency not available in CI — unit tests mock child_process
**3. End-to-end dual-output voice interaction**
**Test:** From chat UI (Phase 37), send a message with `voiceMode: "full_voice"` in the stream request body
**Expected:** AI response contains both `SPOKEN:` and `DETAILED:` sections; `formatForVoice()` extracts only the SPOKEN section for TTS; TTS audio plays the short prose summary
**Why human:** Requires live Puter AI token, real browser, and audio hardware
### Gaps Summary
One automated gap blocks the Plan 03 acceptance criterion "TypeScript compilation passes with no errors":
`server/src/services/voice-pipeline.ts` has 10 TypeScript errors caused by an ffmpeg-static type declaration mismatch. The `@types/ffmpeg-static` devDependency was installed in the worktree's pnpm store but did not propagate to the main repo's pnpm store. As a result, TypeScript resolves `ffmpegPath` from the package's own `types/index.d.ts` (which exports `typeof import(...)`) instead of a simple `string | null`. This causes `spawn(ffmpegPath, ...)` to fail overload resolution, cascading to `never` types on stdout/stderr/stdin.
**Runtime impact: none.** The actual runtime value of `ffmpegPath` is a string path (e.g. `/usr/bin/ffmpeg`), and the null guard at line 28 ensures it is non-null before use. All 28 tests pass. However, the TypeScript build is broken, which blocks CI and downstream phases that rely on `tsc --noEmit` as a gate.
**Fix:** Run `cd server && pnpm add -D @types/ffmpeg-static` in the main repo (not the worktree) to ensure the type declaration is available in the shared pnpm store.
---
_Verified: 2026-04-03T01:45:00Z_
_Verifier: Claude (gsd-verifier)_

View file

@ -28,6 +28,7 @@ export function voicePipelineService() {
if (!ffmpegPath) {
throw new Error("ffmpeg-static binary not found on this platform");
}
const ffmpegBin = ffmpegPath as unknown as string;
function withTimeout<T>(promise: Promise<T>, ms: number): Promise<T> {
@ -41,7 +42,7 @@ export function voicePipelineService() {
async function transcodeToWav16k(inputBuffer: Buffer, inputFormat: string): Promise<Buffer> {
return new Promise<Buffer>((resolve, reject) => {
const ffmpeg = spawn(ffmpegPath, ["-f", inputFormat, "-i", "pipe:0", "-ar", "16000", "-ac", "1", "-f", "wav", "pipe:1"], {
const ffmpeg = spawn(ffmpegBin, ["-f", inputFormat, "-i", "pipe:0", "-ar", "16000", "-ac", "1", "-f", "wav", "pipe:1"], {
stdio: ["pipe", "pipe", "pipe"],
});
@ -141,7 +142,7 @@ export function voicePipelineService() {
// @ts-ignore - input option is valid for execFile
input: sentence,
},
(err, stdout) => {
(err: Error | null, stdout: string | Buffer) => {
if (err) {
reject(err);
} else {