mikkel/nexus

Fork 0

Nexus Dev b204d6318e fix(36): resolve TypeScript errors in voice-pipeline.ts (ffmpegPath cast, callback types)

2026-04-04 03:55:50 +00:00

13 KiB

Raw Blame History

phase

verified

status

score

re_verification

gaps

human_verification

36-voice-pipeline-foundation

2026-04-03T01:45:00Z

gaps_found

11/12 must-haves verified

false

truth

status

reason

artifacts

missing

TypeScript compilation passes with no errors

failed

voice-pipeline.ts has 10 type errors from ffmpeg-static import type mismatch — ffmpegPath resolves to the module namespace rather than the string default export. spawn() receives wrong type. SUMMARY acknowledged this but framed it as 'pre-existing', despite it originating in Plan 01.

path	issue
server/src/services/voice-pipeline.ts	TS2769 — spawn(ffmpegPath, ...) fails because @types/ffmpeg-static is not installed in the main repo's pnpm store (only in the worktree). ffmpegPath resolves to module namespace type instead of string \| null. spawn stdout/stderr/stdin properties all become never due to overload resolution failure.

Install @types/ffmpeg-static in the main repo (pnpm add -D @types/ffmpeg-static in server/), or cast ffmpegPath to string after the null guard (ffmpegPath as string), or use a ts-ignore annotation with a comment explaining the type mismatch

test	expected	why_human
Voice pipeline runtime with real whisper-cpp binary	POST /api/transcribe with a real WebM audio file transcribes speech and returns { text, language }	whisper-cpp is a runtime dependency not present in CI — cannot verify audio-to-text pipeline without the binary installed on the Mac Mini M4

test	expected	why_human
Voice pipeline runtime with real Piper binary	POST /api/synthesize with { text: 'Hello world' } returns a valid WAV file that plays audio	piper binary is a runtime dependency not present in CI — cannot verify TTS audio quality or WAV output validity without the binary

test	expected	why_human
Dual-output voiceMode=full_voice in full AI stream	Stream response contains both SPOKEN: and DETAILED: sections; formatForVoice() extracts only the SPOKEN section for TTS delivery	Requires a live Puter AI token and microphone input to test end-to-end dual output formatting in real conditions

Phase 36: Voice Pipeline Foundation Verification Report

Phase Goal: The transport-agnostic voice pipeline is live and callable from any consumer — web chat, Telegram, or future integrations — with correct audio transcoding, voice mode flag propagation, and dual output formatting baked in from the start Verified: 2026-04-03T01:45:00Z Status: gaps_found (1 automated gap, 3 human verification items) Re-verification: No — initial verification

Goal Achievement

Observable Truths

#	Truth	Status	Evidence
1	transcribe() accepts a Buffer and format string, returns { text, language? }	VERIFIED	voice-pipeline.ts:75-124 — full whisper-cpp cascade + fallback
2	synthesize() accepts text and optional voiceId, returns a WAV Buffer	VERIFIED	voice-pipeline.ts:126-166 — sentence split + piper execFile concat
3	transcodeToWav16k() converts any input format to WAV 16kHz mono via ffmpeg-static	VERIFIED	voice-pipeline.ts:44 — spawn with -ar 16000 -ac 1 -f wav pipe:1
4	formatForVoice() strips markdown and extracts SPOKEN section when present	VERIFIED	voice-pipeline.ts:168-212 — SPOKEN: regex + markdown stripping
5	formatForVoice() falls back to markdown stripping when SPOKEN marker absent	VERIFIED	voice-pipeline.ts:178-211 — fallback branch present
6	POST /api/transcribe accepts audio file upload and returns { text, language? }	VERIFIED	voice.ts:16-31 — multer single("audio") + svc.transcribe()
7	POST /api/synthesize accepts { text } body and returns audio/wav buffer	VERIFIED	voice.ts:34-44 — svc.synthesize() + Content-Type audio/wav
8	voiceMode from request body is injected as dual-output system prompt in stream endpoint	VERIFIED	chat.ts:145-156 — if (voiceMode === "full_voice") inserts SPOKEN:/DETAILED: message
9	voiceMode is persisted to messageType column when message is saved	VERIFIED	chat.ts:187-189 — messageType: voiceMode === "full_voice" ? "voice_full" : voiceMode === "voice_input" ? "voice_input" : undefined
10	Old /transcribe endpoint removed from chat-files.ts	VERIFIED	grep finds zero router.post("/transcribe") in chat-files.ts
11	Voice routes are mounted in app.ts	VERIFIED	app.ts:35 import + app.ts:167 api.use(voiceRoutes())
12	TypeScript compilation passes with no errors	FAILED	tsc --noEmit exits code 2 — 10 errors all in voice-pipeline.ts (ffmpeg-static type mismatch)

Score: 11/12 truths verified

Required Artifacts

Artifact	Expected	Status	Details
`server/src/services/voice-pipeline.ts`	VoicePipelineService factory with transcribe, synthesize, formatForVoice, transcodeToWav16k	VERIFIED	215 lines, exports voicePipelineService() factory
`server/src/__tests__/36-voice-pipeline.test.ts`	Unit tests (min 80 lines)	VERIFIED	259 lines, 12 tests all passing
`packages/shared/src/validators/chat.ts`	voiceMode field on createMessageSchema	VERIFIED	z.enum(VOICE_MODES).optional() at line 23
`packages/shared/src/types/chat.ts`	voiceMode on ChatMessage interface	VERIFIED	voiceMode?: "text"
`server/src/services/nexus-settings.ts`	voiceMode and telegramToken in settings schema	VERIFIED	voiceMode default "text", telegramToken optional, piperBinaryPath, whisperBinaryPath
`server/src/__tests__/36-voice-schema.test.ts`	Schema validation tests (min 40 lines)	VERIFIED	92 lines, 11 tests all passing
`server/src/routes/voice.ts`	POST /api/transcribe and POST /api/synthesize	VERIFIED	47 lines, both routes implemented with assertBoard auth
`server/src/__tests__/36-voice-routes.test.ts`	Integration tests for voice routes (min 60 lines)	VERIFIED	103 lines, 5 tests all passing

Key Link Verification

From	To	Via	Status	Details
server/src/services/voice-pipeline.ts	ffmpeg-static	import ffmpegPath from "ffmpeg-static"	WIRED	Line 1 — import present, null guard at line 28
server/src/services/voice-pipeline.ts	node:child_process	execFile and spawn	WIRED	Line 2 — both imported, spawn at line 44, execFileCb at line 135
server/src/routes/voice.ts	server/src/services/voice-pipeline.ts	voicePipelineService() import	WIRED	Line 4 — import + instantiation at line 9
server/src/routes/chat.ts	packages/shared/src/validators/chat.ts	createMessageSchema preserves voiceMode	WIRED	createMessageSchema imported at line 14, parsed at line 85; voiceMode also destructured from req.body at line 93 for stream endpoint
server/src/app.ts	server/src/routes/voice.ts	api.use(voiceRoutes())	WIRED	Import at line 35, mount at line 167
server/src/services/nexus-settings.ts	nexus-settings.json	Zod .default() handles missing voiceMode	WIRED	voiceMode: z.enum(VOICE_MODES).default("text") at line 15

Data-Flow Trace (Level 4)

This phase produces service and route infrastructure rather than UI components that render data to users. Data-flow trace is not applicable for API endpoints and service factories. The key data flows (Buffer through transcodeToWav16k, text through formatForVoice, voiceMode through stream endpoint) are verified by test coverage (28/28 tests passing).

Behavioral Spot-Checks

Behavior	Command	Result	Status
All 12 voice-pipeline unit tests pass	pnpm vitest run src/tests/36-voice-pipeline.test.ts	12/12 pass (27ms)	PASS
All 11 schema tests pass	pnpm vitest run src/tests/36-voice-schema.test.ts	11/11 pass (6ms)	PASS
All 5 route integration tests pass	pnpm vitest run src/tests/36-voice-routes.test.ts	5/5 pass (39ms)	PASS
TypeScript compilation	tsc --noEmit	10 errors in voice-pipeline.ts (ffmpeg-static type)	FAIL
Old transcribe endpoint removed	grep router.post("/transcribe" chat-files.ts	0 matches	PASS
voiceRoutes mounted in app.ts	grep voiceRoutes app.ts	import at line 35, use at line 167	PASS

Requirements Coverage

Requirement	Source Plan	Description	Status	Evidence
VPIPE-01	36-01	Whisper STT with automatic language detection	SATISFIED	voice-pipeline.ts:90 -- `"--language", "auto"` flag; langMatch regex at line 96
VPIPE-02	36-01	Piper TTS synthesis	SATISFIED	voice-pipeline.ts:126-166 — synthesize() with 8s timeout per sentence via Promise.race
VPIPE-03	36-03	Transport-agnostic VoicePipelineService	SATISFIED	voice.ts:16-44 — HTTP endpoints callable from any transport; voice-pipeline.ts factory reusable by Phase 38 Telegram bridge
VPIPE-04	36-01	WAV 16kHz mono transcoding via ffmpeg	SATISFIED	voice-pipeline.ts:44 — -ar 16000 -ac 1 flags passed to ffmpeg
VPIPE-05	36-02	voiceMode flag propagation through message pipeline	SATISFIED	validators/chat.ts voiceMode field; types/chat.ts interface; stream endpoint injects dual-output prompt
VPIPE-06	36-01, 36-03	Dual output: spoken prose + full markdown	SATISFIED	formatForVoice() extracts SPOKEN section; chat.ts:145-156 injects SPOKEN:/DETAILED: system prompt for full_voice mode

All 6 required requirements (VPIPE-01 through VPIPE-06) are accounted for. VPIPE-07 and VPIPE-08 are mapped to Phase 39 (pending) and are not in scope for this phase.

Anti-Patterns Found

File	Line	Pattern	Severity	Impact
server/src/services/voice-pipeline.ts	44	spawn() receives ffmpegPath which TypeScript resolves as module namespace type rather than string	WARNING	TypeScript compilation fails (10 errors); runtime behavior is unaffected because the module default is a string at runtime — the type mismatch is a declaration issue, not a runtime bug

No TODO/FIXME/placeholder comments found. No empty return stubs found. The catch(() => {}) at line 122 is legitimate cleanup (unlink temp file on error, intentionally silent).

Human Verification Required

1. Voice transcription with real Whisper binary

Test: On the Mac Mini M4, POST a real WebM audio recording (captured from browser) to /api/transcribe with a board authentication token Expected: Response contains { text: "...", language: "en" } with the spoken words correctly transcribed Why human: whisper-cpp binary is a runtime dependency not available in CI — unit tests mock child_process

2. TTS synthesis with real Piper binary

Test: On the Mac Mini M4, POST { text: "Hello world. How are you today?" } to /api/synthesize Expected: Response has Content-Type: audio/wav, the audio file is valid WAV format, and plays the sentence when opened Why human: piper binary is a runtime dependency not available in CI — unit tests mock child_process

3. End-to-end dual-output voice interaction

Test: From chat UI (Phase 37), send a message with voiceMode: "full_voice" in the stream request body Expected: AI response contains both SPOKEN: and DETAILED: sections; formatForVoice() extracts only the SPOKEN section for TTS; TTS audio plays the short prose summary Why human: Requires live Puter AI token, real browser, and audio hardware

Gaps Summary

One automated gap blocks the Plan 03 acceptance criterion "TypeScript compilation passes with no errors":

server/src/services/voice-pipeline.ts has 10 TypeScript errors caused by an ffmpeg-static type declaration mismatch. The @types/ffmpeg-static devDependency was installed in the worktree's pnpm store but did not propagate to the main repo's pnpm store. As a result, TypeScript resolves ffmpegPath from the package's own types/index.d.ts (which exports typeof import(...)) instead of a simple string | null. This causes spawn(ffmpegPath, ...) to fail overload resolution, cascading to never types on stdout/stderr/stdin.

Runtime impact: none. The actual runtime value of ffmpegPath is a string path (e.g. /usr/bin/ffmpeg), and the null guard at line 28 ensures it is non-null before use. All 28 tests pass. However, the TypeScript build is broken, which blocks CI and downstream phases that rely on tsc --noEmit as a gate.

Fix: Run cd server && pnpm add -D @types/ffmpeg-static in the main repo (not the worktree) to ensure the type declaration is available in the shared pnpm store.

Verified: 2026-04-03T01:45:00Z Verifier: Claude (gsd-verifier)

13 KiB Raw Blame History