mikkel/nexus

Fork 0

Nexus Dev 35be67d019 fix(37): pass voiceMode in ChatPanel handleEdit path + add verification

2026-04-04 03:55:50 +00:00

18 KiB

Raw Blame History

phase

verified

status

score

re_verification

gaps

human_verification

37-web-chat-voice-ui

2026-04-03T12:00:00Z

gaps_found

7/8 must-haves verified

false

truth

status

reason

artifacts

missing

Full voice flow works end-to-end: mic -> VAD -> transcribe -> stream -> voice badge + audio

partial

One of five startStream call sites (in-place edit path in handleEdit) is missing the voiceMode argument. When a user edits a message in place (no subsequent messages), the stream is initiated without voiceMode, so voice mode is not sent to the server for that specific interaction path.

path	issue
ui/src/components/ChatPanel.tsx	Line 231: startStream(newContent, activeAgentId ?? undefined) — missing third voiceMode argument in the else branch of handleEdit (in-place edit, no branching)

Pass voiceMode as third argument: startStream(newContent, activeAgentId ?? undefined, voiceMode) in the in-place edit branch of handleEdit

test	expected	why_human
Verify waveform renders during recording	After clicking the mic button, an animated canvas with vertical bars appears inside the button ring during recording	VoiceWaveform reads from Web Audio API AnalyserNode in a requestAnimationFrame loop — cannot verify animated canvas output programmatically

test	expected	why_human
Verify VAD auto-stop triggers transcription	After speaking and then being silent for ~1 second, recording stops automatically, the mic button shows a spinner, and the transcribed text appears in the input field	Requires actual microphone input and silence detection from Silero VAD ONNX model running in an AudioWorklet — not testable without a browser

test	expected	why_human
Verify voice_full response plays audio automatically	In Full Voice mode, after receiving an assistant response, the audio player auto-plays and the spoken text is shown above a collapsible 'Show full response' section	Requires TTS (synthesize endpoint via whisper/piper binaries), browser audio playback, and localStorage autoplay toggle — not testable statically

test	expected	why_human
Verify VoiceModeToggle persists across page refresh	Selecting 'Full Voice' pill, refreshing the page, and re-opening chat shows 'Full Voice' still selected	Requires server-side nexus-settings PATCH round-trip and page reload — not testable statically

Phase 37: Web Chat Voice UI Verification Report

Phase Goal: Users can speak to any agent in web chat — recording auto-stops on silence, a live waveform confirms the mic is active, responses play back automatically (toggleable), and voice mode is a first-class setting

Verified: 2026-04-03T12:00:00Z Status: gaps_found (1 minor gap + 4 human verification items) Re-verification: No — initial verification

Note on worktrees: All phase 37 code was committed to the gsd/phase-37-web-chat-voice-ui branch, not the current worktree branch. All verification was performed against gsd/phase-37-web-chat-voice-ui via git show and git cat-file. The branch exists and contains all phase commits up through c294277b (docs(37-04)).

Goal Achievement

Observable Truths

#	Truth	Status	Evidence
1	POST /api/transcribe accepts audio upload and returns { text }	VERIFIED	`server/src/routes/voice.ts` uses multer memoryStorage, calls `voicePipelineService().transcribe()`, returns `res.json(result)` where result is `{ text: string; language?: string }`
2	POST /api/synthesize accepts { text } and returns audio/wav	VERIFIED	`server/src/routes/voice.ts` calls `voicePipelineService().synthesize(text, voiceId)`, sends buffer with `Content-Type: audio/wav`
3	Recording auto-stops on silence via VAD onSpeechEnd callback	VERIFIED	`useVadRecorder.ts` uses `useMicVAD` with `startOnLoad: false`, `onSpeechEnd: handleSpeechEnd` — handler calls `vad.pause()`, POSTs WAV to `/api/transcribe`, calls `opts.onTranscript`
4	Live waveform canvas renders animated bars during recording	VERIFIED	`VoiceWaveform.tsx` uses `canvas`, `createAnalyser()` with `fftSize=64`, `getByteFrequencyData()`, `requestAnimationFrame` loop — wired into `VoiceMicButton` recording state
5	Voice response audio plays inline with play/pause and auto-play	VERIFIED	`ChatVoicePlayer.tsx` POSTs to `/api/synthesize`, creates object URL, renders `<audio>` element with play/pause controls; auto-play triggers when `autoPlay=true` and `audioUrl` is set
6	VoiceModeToggle presents Text/Voice In/Full Voice and persists via nexus-settings	VERIFIED	`VoiceModeToggle.tsx` renders three pills via `useVoiceMode()` which GETs/PATCHes `/api/nexus/settings`; `nexus-settings.ts` service/route implement GET+PATCH
7	Auto-play preference stored in localStorage under nexus:voice:autoplay	VERIFIED	`VoiceModeToggle.tsx` reads/writes `localStorage.getItem("nexus:voice:autoplay")`; `ChatMessage.tsx` reads it at render time
8	Full voice flow end-to-end: mic -> VAD -> transcribe -> stream -> voice badge + audio	PARTIAL	4 of 5 `startStream` call sites pass `voiceMode`; the in-place edit branch of `handleEdit` (line 231, `ChatPanel.tsx`) calls `startStream(newContent, activeAgentId ?? undefined)` — missing `voiceMode` argument

Score: 7/8 truths verified (1 partial)

Required Artifacts

Artifact	Status	Evidence
`server/src/routes/voice.ts`	VERIFIED	Exports `voiceRoutes()` with POST /transcribe and POST /synthesize wired to `voicePipelineService`
`server/src/routes/nexus-settings.ts`	VERIFIED	Exports `nexusSettingsRoutes()` with GET/PATCH /nexus/settings
`server/src/services/nexus-settings.ts`	VERIFIED	Exports `VOICE_MODES`, `VoiceMode`, `nexusSettingsService`; schema includes `voiceMode: z.enum(VOICE_MODES).default("text")`
`ui/public/vad.worklet.bundle.min.js`	VERIFIED	2480 bytes — matches installed `@ricky0123/vad-web@0.0.30` dist file exactly (correct, not a placeholder)
`ui/public/silero_vad_legacy.onnx`	VERIFIED	1,807,522 bytes — real ONNX model
`ui/public/silero_vad_v5.onnx`	VERIFIED	2,327,524 bytes — real ONNX model
`ui/src/lib/encodeWav.ts`	VERIFIED	Exports `encodeWav(samples: Float32Array, sampleRate?)` with standard RIFF/WAVE 44-byte header, PCM 16-bit mono
`ui/src/hooks/useVadRecorder.ts`	VERIFIED	Uses `useMicVAD` with `startOnLoad: false`, `baseAssetPath: "/"`, `onnxWASMBasePath: "/"`, POSTs to `/api/transcribe`
`ui/src/hooks/useVoiceMode.ts`	VERIFIED	GETs and PATCHes `/api/nexus/settings` for voiceMode
`ui/src/components/VoiceWaveform.tsx`	VERIFIED	Canvas + AnalyserNode + requestAnimationFrame + getByteFrequencyData
`ui/src/components/VoiceMicButton.tsx`	VERIFIED	Three visual states (idle/recording/processing) with correct aria-labels, ring-2 ring-primary on recording, Loader2 animate-spin on processing
`ui/src/components/ChatVoicePlayer.tsx`	VERIFIED	POSTs to /api/synthesize, creates object URL, native `<audio>` element, play/pause controls, auto-play logic, blob URL cleanup
`ui/src/components/ChatVoiceBadge.tsx`	VERIFIED	Parses SPOKEN/DETAILED format, renders Badge "Voice", ChatVoicePlayer for voice_full, Collapsible "Show/Hide full response"
`ui/src/components/VoiceModeToggle.tsx`	VERIFIED	Three pills (Text/Voice In/Full Voice), `role="group"`, `aria-label="Voice mode"`, bg-primary active / bg-muted inactive, nexus:voice:autoplay localStorage
`ui/src/components/ChatInput.tsx`	VERIFIED	VoiceMicButton replaces VoiceRecordButton (zero VoiceRecordButton references remaining); VoiceModeToggle rendered above form
`ui/src/components/ChatMessage.tsx`	VERIFIED	ChatVoiceBadge dispatched for voice_input and voice_full messageTypes; reads nexus:voice:autoplay from localStorage
`ui/src/components/ChatPanel.tsx`	PARTIAL	useVoiceMode imported and called; voiceMode passed to 4 of 5 startStream call sites — missing in in-place edit path
`ui/src/hooks/useStreamingChat.ts`	VERIFIED	startStream signature: `(userMessage, agentId?, voiceMode?)` — voiceMode forwarded to chatApi
`ui/src/api/chat.ts`	VERIFIED	postMessageAndStream data parameter typed as `{ content; agentId?; voiceMode? }`

Key Link Verification

From	To	Via	Status	Details
`server/src/app.ts`	`server/src/routes/voice.ts`	`api.use(voiceRoutes())`	WIRED	`voiceRoutes` appears twice in app.ts (import + mount)
`server/src/app.ts`	`server/src/routes/nexus-settings.ts`	`api.use(nexusSettingsRoutes())`	WIRED	`nexusSettingsRoutes` appears twice in app.ts
`server/src/app.ts`	COOP/COEP middleware	`res.setHeader` before routes	WIRED	`Cross-Origin-Opener-Policy: same-origin` + `Cross-Origin-Embedder-Policy: require-corp`
`server/src/routes/chat.ts`	voiceMode parameter	destructured from req.body	WIRED	`const { content, agentId, voiceMode } = req.body` with union type
`ui/vite.config.ts`	COOP/COEP dev headers	`server.headers` config	WIRED	Both headers set in dev server config
`ui/src/components/VoiceMicButton.tsx`	`useVadRecorder.ts`	`useVadRecorder()` hook	WIRED	Imports and calls the hook, destructures state/start/stop/mediaStream
`ui/src/hooks/useVadRecorder.ts`	`ui/src/lib/encodeWav.ts`	`encodeWav(audio)` in onSpeechEnd	WIRED	Import verified, called in handleSpeechEnd
`ui/src/hooks/useVadRecorder.ts`	`/api/transcribe`	`fetch POST with FormData`	WIRED	`fetch("/api/transcribe", { method: "POST", credentials: "include", body: formData })`
`ui/src/components/VoiceMicButton.tsx`	`VoiceWaveform.tsx`	`<VoiceWaveform stream={mediaStream} active={true} />`	WIRED	Rendered inside recording state conditional
`ui/src/components/ChatVoicePlayer.tsx`	`/api/synthesize`	`fetch POST`	WIRED	`fetch("/api/synthesize", { method: "POST" ... })` in useEffect
`ui/src/components/ChatVoiceBadge.tsx`	shadcn Collapsible	`Collapsible/CollapsibleContent/CollapsibleTrigger`	WIRED	All three imported and used
`ui/src/components/VoiceModeToggle.tsx`	`useVoiceMode.ts`	`useVoiceMode()`	WIRED	Imported from `@/hooks/useVoiceMode`, destructures mode/setMode/isLoading
`ui/src/components/ChatPanel.tsx`	`useVoiceMode.ts`	`useVoiceMode()`	WIRED	`const { mode: voiceMode } = useVoiceMode()`
`ui/src/components/ChatPanel.tsx`	`useStreamingChat.ts`	`startStream(content, agentId, voiceMode)`	PARTIAL	4/5 call sites pass voiceMode; in-place edit path at line 231 does not
`ui/src/hooks/useStreamingChat.ts`	`ui/src/api/chat.ts`	`chatApi.postMessageAndStream with voiceMode`	WIRED	`{ content: userMessage, agentId, voiceMode }` passed as data
`ui/src/components/ChatInput.tsx`	`VoiceMicButton.tsx`	replaces VoiceRecordButton	WIRED	VoiceRecordButton has 0 occurrences; VoiceMicButton imported and rendered
`ui/src/components/ChatMessage.tsx`	`ChatVoiceBadge.tsx`	voice messageType dispatch	WIRED	`if (messageType === "voice_input" \|\| messageType === "voice_full")` dispatches to ChatVoiceBadge

Data-Flow Trace (Level 4)

Artifact	Data Variable	Source	Produces Real Data	Status
`ChatVoicePlayer.tsx`	`audioUrl` (blob URL)	POST /api/synthesize -> voicePipelineService().synthesize() -> piper/TTS binary	Binary audio buffer from real TTS service	FLOWING
`useVadRecorder.ts`	transcript text	POST /api/transcribe -> voicePipelineService().transcribe() -> whisper-cpp	Real transcription from audio buffer	FLOWING
`useVoiceMode.ts`	`mode`	GET /api/nexus/settings -> nexusSettingsService().get() -> file-backed settings	Real persisted settings value	FLOWING
`ChatMessage.tsx`	`autoPlay`	`localStorage.getItem("nexus:voice:autoplay")`	Real localStorage value	FLOWING
`VoiceWaveform.tsx`	`dataArray` (Uint8Array)	Web Audio API AnalyserNode from microphone MediaStream	Real frequency data from mic	FLOWING (browser only)

Behavioral Spot-Checks

Behavior	Command	Result	Status
voice.ts exports voiceRoutes	`git show branch:server/src/routes/voice.ts \| grep "export function voiceRoutes"`	`export function voiceRoutes(): Router`	PASS
nexus-settings exports nexusSettingsRoutes	`git show branch:server/src/routes/nexus-settings.ts \| grep "export function"`	`export function nexusSettingsRoutes(): Router`	PASS
encodeWav exports function with RIFF header	`git show branch:ui/src/lib/encodeWav.ts \| grep "RIFF"`	`writeString(view, 0, "RIFF")`	PASS
ChatInput has zero VoiceRecordButton references	`git show branch:ui/src/components/ChatInput.tsx \| grep VoiceRecordButton \| wc -l`	`0`	PASS
VAD worklet matches installed package	size check: 2480 bytes in branch == 2480 bytes in node_modules	Exact match	PASS
COOP/COEP on Express	`git show branch:server/src/app.ts \| grep Cross-Origin`	both headers present	PASS
COOP/COEP on Vite	`git show branch:ui/vite.config.ts \| grep Cross-Origin`	both headers present	PASS
Browser-dependent audio flows	cannot test without browser runtime	N/A	SKIP

Requirements Coverage

Requirement	Source Plans	Description	Status	Evidence
WCHAT-01	37-01, 37-02, 37-04	Mic button with idle/recording/processing states	SATISFIED	`VoiceMicButton.tsx` three conditional renders with correct icons and aria-labels
WCHAT-02	37-01, 37-02, 37-04	Recording auto-stops on silence via VAD	SATISFIED	`useMicVAD` onSpeechEnd callback in `useVadRecorder.ts`
WCHAT-03	37-02	Real-time waveform/amplitude visualization	SATISFIED	`VoiceWaveform.tsx` canvas with AnalyserNode and requestAnimationFrame loop
WCHAT-04	37-01, 37-03, 37-04	Voice response audio plays inline with player controls	SATISFIED	`ChatVoicePlayer.tsx` with play/pause buttons and hidden `<audio>` element
WCHAT-05	37-02, 37-03, 37-04	User can toggle voice mode (text/voice_input/full_voice)	SATISFIED	`VoiceModeToggle.tsx` three pills wired to `useVoiceMode` which PATCHes nexus-settings
WCHAT-06	37-03, 37-04	Auto-play of voice responses is configurable	SATISFIED	`nexus:voice:autoplay` localStorage key in VoiceModeToggle checkbox; read in ChatMessage

All 6 WCHAT requirements are satisfied. No orphaned requirements found.

Anti-Patterns Found

File	Line	Pattern	Severity	Impact
`ui/src/components/ChatPanel.tsx`	231	`startStream(newContent, activeAgentId ?? undefined)` — missing voiceMode argument	Warning	In-place message edit (no subsequent messages branch) will not send voiceMode to server; voice formatting won't apply for that edit path
`ui/src/hooks/useVadRecorder.ts`	96	`mediaStream: mediaStreamRef.current` returned from hook (ref, not state)	Info	Not a stub — state change on `setState("recording")` triggers re-render which reads the ref value, which was set before setState in start(). Works in practice due to synchronous ref update before async setState. No runtime issue expected.

Human Verification Required

1. Waveform Animation During Recording

Test: Open web chat, click the mic button (should show Mic icon), observe the recording state. Expected: The mic button shows a canvas with 20 animated vertical bars that pulse with voice amplitude, surrounded by a blue/primary ring. Why human: Web Audio API AnalyserNode + canvas animation requires browser runtime with a real microphone.

2. VAD Auto-Stop Triggers Transcription

Test: Click mic, speak a complete sentence, then be silent for ~1 second. Expected: Recording stops automatically (no manual stop needed), the spinner appears briefly, then the transcribed text populates the chat input field. Why human: Requires Silero VAD ONNX model running in an AudioWorklet with real microphone input — not testable statically.

3. Voice Full Response Auto-Play and Collapsible

Test: Enable "Full Voice" mode via VoiceModeToggle, enable "Auto-play voice responses" checkbox, send a message to an agent, wait for response. Expected: The agent response shows a "Voice" badge, the spoken text, an audio player that starts playing automatically, and a "Show full response" link that expands to show full markdown. Why human: Requires TTS binary (piper/whisper) running, browser audio playback, and synthesize endpoint returning real audio — not testable statically.

4. VoiceModeToggle Persistence Across Refresh

Test: Open chat, click "Full Voice" pill, refresh the page, reopen the same chat. Expected: The VoiceModeToggle shows "Full Voice" still selected. Why human: Requires a running server with nexus-settings PATCH persisting to disk and a page reload to confirm GET retrieves the saved value.

Gaps Summary

One minor gap was found: the in-place message edit path (handleEdit else branch, ChatPanel.tsx line 231) calls startStream without passing voiceMode. The four other call sites (handleSend online, handleSend offline, handleEdit with branching, handleRetry) all correctly pass voiceMode. This means a user who edits a message in-place while in Voice mode will not have their voice mode forwarded to the server for that specific re-stream, so the server will default to text mode for that response. The impact is minor — it only affects the edge case of in-place edits (not the primary send flow), and the server defaults gracefully to text mode.

The fix is one line: change startStream(newContent, activeAgentId ?? undefined) to startStream(newContent, activeAgentId ?? undefined, voiceMode) at ChatPanel.tsx line 231.

Verified: 2026-04-03T12:00:00Z Verifier: Claude (gsd-verifier)

18 KiB Raw Blame History