nexus/.planning/phases/37-web-chat-voice-ui/37-VERIFICATION.md

18 KiB

phase verified status score re_verification gaps human_verification
37-web-chat-voice-ui 2026-04-03T12:00:00Z gaps_found 7/8 must-haves verified false
truth status reason artifacts missing
Full voice flow works end-to-end: mic -> VAD -> transcribe -> stream -> voice badge + audio partial One of five startStream call sites (in-place edit path in handleEdit) is missing the voiceMode argument. When a user edits a message in place (no subsequent messages), the stream is initiated without voiceMode, so voice mode is not sent to the server for that specific interaction path.
path issue
ui/src/components/ChatPanel.tsx Line 231: startStream(newContent, activeAgentId ?? undefined) — missing third voiceMode argument in the else branch of handleEdit (in-place edit, no branching)
Pass voiceMode as third argument: startStream(newContent, activeAgentId ?? undefined, voiceMode) in the in-place edit branch of handleEdit
test expected why_human
Verify waveform renders during recording After clicking the mic button, an animated canvas with vertical bars appears inside the button ring during recording VoiceWaveform reads from Web Audio API AnalyserNode in a requestAnimationFrame loop — cannot verify animated canvas output programmatically
test expected why_human
Verify VAD auto-stop triggers transcription After speaking and then being silent for ~1 second, recording stops automatically, the mic button shows a spinner, and the transcribed text appears in the input field Requires actual microphone input and silence detection from Silero VAD ONNX model running in an AudioWorklet — not testable without a browser
test expected why_human
Verify voice_full response plays audio automatically In Full Voice mode, after receiving an assistant response, the audio player auto-plays and the spoken text is shown above a collapsible 'Show full response' section Requires TTS (synthesize endpoint via whisper/piper binaries), browser audio playback, and localStorage autoplay toggle — not testable statically
test expected why_human
Verify VoiceModeToggle persists across page refresh Selecting 'Full Voice' pill, refreshing the page, and re-opening chat shows 'Full Voice' still selected Requires server-side nexus-settings PATCH round-trip and page reload — not testable statically

Phase 37: Web Chat Voice UI Verification Report

Phase Goal: Users can speak to any agent in web chat — recording auto-stops on silence, a live waveform confirms the mic is active, responses play back automatically (toggleable), and voice mode is a first-class setting

Verified: 2026-04-03T12:00:00Z Status: gaps_found (1 minor gap + 4 human verification items) Re-verification: No — initial verification

Note on worktrees: All phase 37 code was committed to the gsd/phase-37-web-chat-voice-ui branch, not the current worktree branch. All verification was performed against gsd/phase-37-web-chat-voice-ui via git show and git cat-file. The branch exists and contains all phase commits up through c294277b (docs(37-04)).


Goal Achievement

Observable Truths

# Truth Status Evidence
1 POST /api/transcribe accepts audio upload and returns { text } VERIFIED server/src/routes/voice.ts uses multer memoryStorage, calls voicePipelineService().transcribe(), returns res.json(result) where result is { text: string; language?: string }
2 POST /api/synthesize accepts { text } and returns audio/wav VERIFIED server/src/routes/voice.ts calls voicePipelineService().synthesize(text, voiceId), sends buffer with Content-Type: audio/wav
3 Recording auto-stops on silence via VAD onSpeechEnd callback VERIFIED useVadRecorder.ts uses useMicVAD with startOnLoad: false, onSpeechEnd: handleSpeechEnd — handler calls vad.pause(), POSTs WAV to /api/transcribe, calls opts.onTranscript
4 Live waveform canvas renders animated bars during recording VERIFIED VoiceWaveform.tsx uses canvas, createAnalyser() with fftSize=64, getByteFrequencyData(), requestAnimationFrame loop — wired into VoiceMicButton recording state
5 Voice response audio plays inline with play/pause and auto-play VERIFIED ChatVoicePlayer.tsx POSTs to /api/synthesize, creates object URL, renders <audio> element with play/pause controls; auto-play triggers when autoPlay=true and audioUrl is set
6 VoiceModeToggle presents Text/Voice In/Full Voice and persists via nexus-settings VERIFIED VoiceModeToggle.tsx renders three pills via useVoiceMode() which GETs/PATCHes /api/nexus/settings; nexus-settings.ts service/route implement GET+PATCH
7 Auto-play preference stored in localStorage under nexus:voice:autoplay VERIFIED VoiceModeToggle.tsx reads/writes localStorage.getItem("nexus:voice:autoplay"); ChatMessage.tsx reads it at render time
8 Full voice flow end-to-end: mic -> VAD -> transcribe -> stream -> voice badge + audio PARTIAL 4 of 5 startStream call sites pass voiceMode; the in-place edit branch of handleEdit (line 231, ChatPanel.tsx) calls startStream(newContent, activeAgentId ?? undefined) — missing voiceMode argument

Score: 7/8 truths verified (1 partial)


Required Artifacts

Artifact Status Evidence
server/src/routes/voice.ts VERIFIED Exports voiceRoutes() with POST /transcribe and POST /synthesize wired to voicePipelineService
server/src/routes/nexus-settings.ts VERIFIED Exports nexusSettingsRoutes() with GET/PATCH /nexus/settings
server/src/services/nexus-settings.ts VERIFIED Exports VOICE_MODES, VoiceMode, nexusSettingsService; schema includes voiceMode: z.enum(VOICE_MODES).default("text")
ui/public/vad.worklet.bundle.min.js VERIFIED 2480 bytes — matches installed @ricky0123/vad-web@0.0.30 dist file exactly (correct, not a placeholder)
ui/public/silero_vad_legacy.onnx VERIFIED 1,807,522 bytes — real ONNX model
ui/public/silero_vad_v5.onnx VERIFIED 2,327,524 bytes — real ONNX model
ui/src/lib/encodeWav.ts VERIFIED Exports encodeWav(samples: Float32Array, sampleRate?) with standard RIFF/WAVE 44-byte header, PCM 16-bit mono
ui/src/hooks/useVadRecorder.ts VERIFIED Uses useMicVAD with startOnLoad: false, baseAssetPath: "/", onnxWASMBasePath: "/", POSTs to /api/transcribe
ui/src/hooks/useVoiceMode.ts VERIFIED GETs and PATCHes /api/nexus/settings for voiceMode
ui/src/components/VoiceWaveform.tsx VERIFIED Canvas + AnalyserNode + requestAnimationFrame + getByteFrequencyData
ui/src/components/VoiceMicButton.tsx VERIFIED Three visual states (idle/recording/processing) with correct aria-labels, ring-2 ring-primary on recording, Loader2 animate-spin on processing
ui/src/components/ChatVoicePlayer.tsx VERIFIED POSTs to /api/synthesize, creates object URL, native <audio> element, play/pause controls, auto-play logic, blob URL cleanup
ui/src/components/ChatVoiceBadge.tsx VERIFIED Parses SPOKEN/DETAILED format, renders Badge "Voice", ChatVoicePlayer for voice_full, Collapsible "Show/Hide full response"
ui/src/components/VoiceModeToggle.tsx VERIFIED Three pills (Text/Voice In/Full Voice), role="group", aria-label="Voice mode", bg-primary active / bg-muted inactive, nexus:voice:autoplay localStorage
ui/src/components/ChatInput.tsx VERIFIED VoiceMicButton replaces VoiceRecordButton (zero VoiceRecordButton references remaining); VoiceModeToggle rendered above form
ui/src/components/ChatMessage.tsx VERIFIED ChatVoiceBadge dispatched for voice_input and voice_full messageTypes; reads nexus:voice:autoplay from localStorage
ui/src/components/ChatPanel.tsx PARTIAL useVoiceMode imported and called; voiceMode passed to 4 of 5 startStream call sites — missing in in-place edit path
ui/src/hooks/useStreamingChat.ts VERIFIED startStream signature: (userMessage, agentId?, voiceMode?) — voiceMode forwarded to chatApi
ui/src/api/chat.ts VERIFIED postMessageAndStream data parameter typed as { content; agentId?; voiceMode? }

From To Via Status Details
server/src/app.ts server/src/routes/voice.ts api.use(voiceRoutes()) WIRED voiceRoutes appears twice in app.ts (import + mount)
server/src/app.ts server/src/routes/nexus-settings.ts api.use(nexusSettingsRoutes()) WIRED nexusSettingsRoutes appears twice in app.ts
server/src/app.ts COOP/COEP middleware res.setHeader before routes WIRED Cross-Origin-Opener-Policy: same-origin + Cross-Origin-Embedder-Policy: require-corp
server/src/routes/chat.ts voiceMode parameter destructured from req.body WIRED const { content, agentId, voiceMode } = req.body with union type
ui/vite.config.ts COOP/COEP dev headers server.headers config WIRED Both headers set in dev server config
ui/src/components/VoiceMicButton.tsx useVadRecorder.ts useVadRecorder() hook WIRED Imports and calls the hook, destructures state/start/stop/mediaStream
ui/src/hooks/useVadRecorder.ts ui/src/lib/encodeWav.ts encodeWav(audio) in onSpeechEnd WIRED Import verified, called in handleSpeechEnd
ui/src/hooks/useVadRecorder.ts /api/transcribe fetch POST with FormData WIRED fetch("/api/transcribe", { method: "POST", credentials: "include", body: formData })
ui/src/components/VoiceMicButton.tsx VoiceWaveform.tsx <VoiceWaveform stream={mediaStream} active={true} /> WIRED Rendered inside recording state conditional
ui/src/components/ChatVoicePlayer.tsx /api/synthesize fetch POST WIRED fetch("/api/synthesize", { method: "POST" ... }) in useEffect
ui/src/components/ChatVoiceBadge.tsx shadcn Collapsible Collapsible/CollapsibleContent/CollapsibleTrigger WIRED All three imported and used
ui/src/components/VoiceModeToggle.tsx useVoiceMode.ts useVoiceMode() WIRED Imported from @/hooks/useVoiceMode, destructures mode/setMode/isLoading
ui/src/components/ChatPanel.tsx useVoiceMode.ts useVoiceMode() WIRED const { mode: voiceMode } = useVoiceMode()
ui/src/components/ChatPanel.tsx useStreamingChat.ts startStream(content, agentId, voiceMode) PARTIAL 4/5 call sites pass voiceMode; in-place edit path at line 231 does not
ui/src/hooks/useStreamingChat.ts ui/src/api/chat.ts chatApi.postMessageAndStream with voiceMode WIRED { content: userMessage, agentId, voiceMode } passed as data
ui/src/components/ChatInput.tsx VoiceMicButton.tsx replaces VoiceRecordButton WIRED VoiceRecordButton has 0 occurrences; VoiceMicButton imported and rendered
ui/src/components/ChatMessage.tsx ChatVoiceBadge.tsx voice messageType dispatch WIRED if (messageType === "voice_input" || messageType === "voice_full") dispatches to ChatVoiceBadge

Data-Flow Trace (Level 4)

Artifact Data Variable Source Produces Real Data Status
ChatVoicePlayer.tsx audioUrl (blob URL) POST /api/synthesize -> voicePipelineService().synthesize() -> piper/TTS binary Binary audio buffer from real TTS service FLOWING
useVadRecorder.ts transcript text POST /api/transcribe -> voicePipelineService().transcribe() -> whisper-cpp Real transcription from audio buffer FLOWING
useVoiceMode.ts mode GET /api/nexus/settings -> nexusSettingsService().get() -> file-backed settings Real persisted settings value FLOWING
ChatMessage.tsx autoPlay localStorage.getItem("nexus:voice:autoplay") Real localStorage value FLOWING
VoiceWaveform.tsx dataArray (Uint8Array) Web Audio API AnalyserNode from microphone MediaStream Real frequency data from mic FLOWING (browser only)

Behavioral Spot-Checks

Behavior Command Result Status
voice.ts exports voiceRoutes git show branch:server/src/routes/voice.ts | grep "export function voiceRoutes" export function voiceRoutes(): Router PASS
nexus-settings exports nexusSettingsRoutes git show branch:server/src/routes/nexus-settings.ts | grep "export function" export function nexusSettingsRoutes(): Router PASS
encodeWav exports function with RIFF header git show branch:ui/src/lib/encodeWav.ts | grep "RIFF" writeString(view, 0, "RIFF") PASS
ChatInput has zero VoiceRecordButton references git show branch:ui/src/components/ChatInput.tsx | grep VoiceRecordButton | wc -l 0 PASS
VAD worklet matches installed package size check: 2480 bytes in branch == 2480 bytes in node_modules Exact match PASS
COOP/COEP on Express git show branch:server/src/app.ts | grep Cross-Origin both headers present PASS
COOP/COEP on Vite git show branch:ui/vite.config.ts | grep Cross-Origin both headers present PASS
Browser-dependent audio flows cannot test without browser runtime N/A SKIP

Requirements Coverage

Requirement Source Plans Description Status Evidence
WCHAT-01 37-01, 37-02, 37-04 Mic button with idle/recording/processing states SATISFIED VoiceMicButton.tsx three conditional renders with correct icons and aria-labels
WCHAT-02 37-01, 37-02, 37-04 Recording auto-stops on silence via VAD SATISFIED useMicVAD onSpeechEnd callback in useVadRecorder.ts
WCHAT-03 37-02 Real-time waveform/amplitude visualization SATISFIED VoiceWaveform.tsx canvas with AnalyserNode and requestAnimationFrame loop
WCHAT-04 37-01, 37-03, 37-04 Voice response audio plays inline with player controls SATISFIED ChatVoicePlayer.tsx with play/pause buttons and hidden <audio> element
WCHAT-05 37-02, 37-03, 37-04 User can toggle voice mode (text/voice_input/full_voice) SATISFIED VoiceModeToggle.tsx three pills wired to useVoiceMode which PATCHes nexus-settings
WCHAT-06 37-03, 37-04 Auto-play of voice responses is configurable SATISFIED nexus:voice:autoplay localStorage key in VoiceModeToggle checkbox; read in ChatMessage

All 6 WCHAT requirements are satisfied. No orphaned requirements found.


Anti-Patterns Found

File Line Pattern Severity Impact
ui/src/components/ChatPanel.tsx 231 startStream(newContent, activeAgentId ?? undefined) — missing voiceMode argument Warning In-place message edit (no subsequent messages branch) will not send voiceMode to server; voice formatting won't apply for that edit path
ui/src/hooks/useVadRecorder.ts 96 mediaStream: mediaStreamRef.current returned from hook (ref, not state) Info Not a stub — state change on setState("recording") triggers re-render which reads the ref value, which was set before setState in start(). Works in practice due to synchronous ref update before async setState. No runtime issue expected.

Human Verification Required

1. Waveform Animation During Recording

Test: Open web chat, click the mic button (should show Mic icon), observe the recording state. Expected: The mic button shows a canvas with 20 animated vertical bars that pulse with voice amplitude, surrounded by a blue/primary ring. Why human: Web Audio API AnalyserNode + canvas animation requires browser runtime with a real microphone.

2. VAD Auto-Stop Triggers Transcription

Test: Click mic, speak a complete sentence, then be silent for ~1 second. Expected: Recording stops automatically (no manual stop needed), the spinner appears briefly, then the transcribed text populates the chat input field. Why human: Requires Silero VAD ONNX model running in an AudioWorklet with real microphone input — not testable statically.

3. Voice Full Response Auto-Play and Collapsible

Test: Enable "Full Voice" mode via VoiceModeToggle, enable "Auto-play voice responses" checkbox, send a message to an agent, wait for response. Expected: The agent response shows a "Voice" badge, the spoken text, an audio player that starts playing automatically, and a "Show full response" link that expands to show full markdown. Why human: Requires TTS binary (piper/whisper) running, browser audio playback, and synthesize endpoint returning real audio — not testable statically.

4. VoiceModeToggle Persistence Across Refresh

Test: Open chat, click "Full Voice" pill, refresh the page, reopen the same chat. Expected: The VoiceModeToggle shows "Full Voice" still selected. Why human: Requires a running server with nexus-settings PATCH persisting to disk and a page reload to confirm GET retrieves the saved value.


Gaps Summary

One minor gap was found: the in-place message edit path (handleEdit else branch, ChatPanel.tsx line 231) calls startStream without passing voiceMode. The four other call sites (handleSend online, handleSend offline, handleEdit with branching, handleRetry) all correctly pass voiceMode. This means a user who edits a message in-place while in Voice mode will not have their voice mode forwarded to the server for that specific re-stream, so the server will default to text mode for that response. The impact is minor — it only affects the edge case of in-place edits (not the primary send flow), and the server defaults gracefully to text mode.

The fix is one line: change startStream(newContent, activeAgentId ?? undefined) to startStream(newContent, activeAgentId ?? undefined, voiceMode) at ChatPanel.tsx line 231.


Verified: 2026-04-03T12:00:00Z Verifier: Claude (gsd-verifier)