18 KiB
| phase | verified | status | score | re_verification | gaps | human_verification | |||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 37-web-chat-voice-ui | 2026-04-03T12:00:00Z | gaps_found | 7/8 must-haves verified | false |
|
|
Phase 37: Web Chat Voice UI Verification Report
Phase Goal: Users can speak to any agent in web chat — recording auto-stops on silence, a live waveform confirms the mic is active, responses play back automatically (toggleable), and voice mode is a first-class setting
Verified: 2026-04-03T12:00:00Z Status: gaps_found (1 minor gap + 4 human verification items) Re-verification: No — initial verification
Note on worktrees: All phase 37 code was committed to the gsd/phase-37-web-chat-voice-ui branch, not the current worktree branch. All verification was performed against gsd/phase-37-web-chat-voice-ui via git show and git cat-file. The branch exists and contains all phase commits up through c294277b (docs(37-04)).
Goal Achievement
Observable Truths
| # | Truth | Status | Evidence |
|---|---|---|---|
| 1 | POST /api/transcribe accepts audio upload and returns { text } | VERIFIED | server/src/routes/voice.ts uses multer memoryStorage, calls voicePipelineService().transcribe(), returns res.json(result) where result is { text: string; language?: string } |
| 2 | POST /api/synthesize accepts { text } and returns audio/wav | VERIFIED | server/src/routes/voice.ts calls voicePipelineService().synthesize(text, voiceId), sends buffer with Content-Type: audio/wav |
| 3 | Recording auto-stops on silence via VAD onSpeechEnd callback | VERIFIED | useVadRecorder.ts uses useMicVAD with startOnLoad: false, onSpeechEnd: handleSpeechEnd — handler calls vad.pause(), POSTs WAV to /api/transcribe, calls opts.onTranscript |
| 4 | Live waveform canvas renders animated bars during recording | VERIFIED | VoiceWaveform.tsx uses canvas, createAnalyser() with fftSize=64, getByteFrequencyData(), requestAnimationFrame loop — wired into VoiceMicButton recording state |
| 5 | Voice response audio plays inline with play/pause and auto-play | VERIFIED | ChatVoicePlayer.tsx POSTs to /api/synthesize, creates object URL, renders <audio> element with play/pause controls; auto-play triggers when autoPlay=true and audioUrl is set |
| 6 | VoiceModeToggle presents Text/Voice In/Full Voice and persists via nexus-settings | VERIFIED | VoiceModeToggle.tsx renders three pills via useVoiceMode() which GETs/PATCHes /api/nexus/settings; nexus-settings.ts service/route implement GET+PATCH |
| 7 | Auto-play preference stored in localStorage under nexus:voice:autoplay | VERIFIED | VoiceModeToggle.tsx reads/writes localStorage.getItem("nexus:voice:autoplay"); ChatMessage.tsx reads it at render time |
| 8 | Full voice flow end-to-end: mic -> VAD -> transcribe -> stream -> voice badge + audio | PARTIAL | 4 of 5 startStream call sites pass voiceMode; the in-place edit branch of handleEdit (line 231, ChatPanel.tsx) calls startStream(newContent, activeAgentId ?? undefined) — missing voiceMode argument |
Score: 7/8 truths verified (1 partial)
Required Artifacts
| Artifact | Status | Evidence |
|---|---|---|
server/src/routes/voice.ts |
VERIFIED | Exports voiceRoutes() with POST /transcribe and POST /synthesize wired to voicePipelineService |
server/src/routes/nexus-settings.ts |
VERIFIED | Exports nexusSettingsRoutes() with GET/PATCH /nexus/settings |
server/src/services/nexus-settings.ts |
VERIFIED | Exports VOICE_MODES, VoiceMode, nexusSettingsService; schema includes voiceMode: z.enum(VOICE_MODES).default("text") |
ui/public/vad.worklet.bundle.min.js |
VERIFIED | 2480 bytes — matches installed @ricky0123/vad-web@0.0.30 dist file exactly (correct, not a placeholder) |
ui/public/silero_vad_legacy.onnx |
VERIFIED | 1,807,522 bytes — real ONNX model |
ui/public/silero_vad_v5.onnx |
VERIFIED | 2,327,524 bytes — real ONNX model |
ui/src/lib/encodeWav.ts |
VERIFIED | Exports encodeWav(samples: Float32Array, sampleRate?) with standard RIFF/WAVE 44-byte header, PCM 16-bit mono |
ui/src/hooks/useVadRecorder.ts |
VERIFIED | Uses useMicVAD with startOnLoad: false, baseAssetPath: "/", onnxWASMBasePath: "/", POSTs to /api/transcribe |
ui/src/hooks/useVoiceMode.ts |
VERIFIED | GETs and PATCHes /api/nexus/settings for voiceMode |
ui/src/components/VoiceWaveform.tsx |
VERIFIED | Canvas + AnalyserNode + requestAnimationFrame + getByteFrequencyData |
ui/src/components/VoiceMicButton.tsx |
VERIFIED | Three visual states (idle/recording/processing) with correct aria-labels, ring-2 ring-primary on recording, Loader2 animate-spin on processing |
ui/src/components/ChatVoicePlayer.tsx |
VERIFIED | POSTs to /api/synthesize, creates object URL, native <audio> element, play/pause controls, auto-play logic, blob URL cleanup |
ui/src/components/ChatVoiceBadge.tsx |
VERIFIED | Parses SPOKEN/DETAILED format, renders Badge "Voice", ChatVoicePlayer for voice_full, Collapsible "Show/Hide full response" |
ui/src/components/VoiceModeToggle.tsx |
VERIFIED | Three pills (Text/Voice In/Full Voice), role="group", aria-label="Voice mode", bg-primary active / bg-muted inactive, nexus:voice:autoplay localStorage |
ui/src/components/ChatInput.tsx |
VERIFIED | VoiceMicButton replaces VoiceRecordButton (zero VoiceRecordButton references remaining); VoiceModeToggle rendered above form |
ui/src/components/ChatMessage.tsx |
VERIFIED | ChatVoiceBadge dispatched for voice_input and voice_full messageTypes; reads nexus:voice:autoplay from localStorage |
ui/src/components/ChatPanel.tsx |
PARTIAL | useVoiceMode imported and called; voiceMode passed to 4 of 5 startStream call sites — missing in in-place edit path |
ui/src/hooks/useStreamingChat.ts |
VERIFIED | startStream signature: (userMessage, agentId?, voiceMode?) — voiceMode forwarded to chatApi |
ui/src/api/chat.ts |
VERIFIED | postMessageAndStream data parameter typed as { content; agentId?; voiceMode? } |
Key Link Verification
| From | To | Via | Status | Details |
|---|---|---|---|---|
server/src/app.ts |
server/src/routes/voice.ts |
api.use(voiceRoutes()) |
WIRED | voiceRoutes appears twice in app.ts (import + mount) |
server/src/app.ts |
server/src/routes/nexus-settings.ts |
api.use(nexusSettingsRoutes()) |
WIRED | nexusSettingsRoutes appears twice in app.ts |
server/src/app.ts |
COOP/COEP middleware | res.setHeader before routes |
WIRED | Cross-Origin-Opener-Policy: same-origin + Cross-Origin-Embedder-Policy: require-corp |
server/src/routes/chat.ts |
voiceMode parameter | destructured from req.body | WIRED | const { content, agentId, voiceMode } = req.body with union type |
ui/vite.config.ts |
COOP/COEP dev headers | server.headers config |
WIRED | Both headers set in dev server config |
ui/src/components/VoiceMicButton.tsx |
useVadRecorder.ts |
useVadRecorder() hook |
WIRED | Imports and calls the hook, destructures state/start/stop/mediaStream |
ui/src/hooks/useVadRecorder.ts |
ui/src/lib/encodeWav.ts |
encodeWav(audio) in onSpeechEnd |
WIRED | Import verified, called in handleSpeechEnd |
ui/src/hooks/useVadRecorder.ts |
/api/transcribe |
fetch POST with FormData |
WIRED | fetch("/api/transcribe", { method: "POST", credentials: "include", body: formData }) |
ui/src/components/VoiceMicButton.tsx |
VoiceWaveform.tsx |
<VoiceWaveform stream={mediaStream} active={true} /> |
WIRED | Rendered inside recording state conditional |
ui/src/components/ChatVoicePlayer.tsx |
/api/synthesize |
fetch POST |
WIRED | fetch("/api/synthesize", { method: "POST" ... }) in useEffect |
ui/src/components/ChatVoiceBadge.tsx |
shadcn Collapsible | Collapsible/CollapsibleContent/CollapsibleTrigger |
WIRED | All three imported and used |
ui/src/components/VoiceModeToggle.tsx |
useVoiceMode.ts |
useVoiceMode() |
WIRED | Imported from @/hooks/useVoiceMode, destructures mode/setMode/isLoading |
ui/src/components/ChatPanel.tsx |
useVoiceMode.ts |
useVoiceMode() |
WIRED | const { mode: voiceMode } = useVoiceMode() |
ui/src/components/ChatPanel.tsx |
useStreamingChat.ts |
startStream(content, agentId, voiceMode) |
PARTIAL | 4/5 call sites pass voiceMode; in-place edit path at line 231 does not |
ui/src/hooks/useStreamingChat.ts |
ui/src/api/chat.ts |
chatApi.postMessageAndStream with voiceMode |
WIRED | { content: userMessage, agentId, voiceMode } passed as data |
ui/src/components/ChatInput.tsx |
VoiceMicButton.tsx |
replaces VoiceRecordButton | WIRED | VoiceRecordButton has 0 occurrences; VoiceMicButton imported and rendered |
ui/src/components/ChatMessage.tsx |
ChatVoiceBadge.tsx |
voice messageType dispatch | WIRED | if (messageType === "voice_input" || messageType === "voice_full") dispatches to ChatVoiceBadge |
Data-Flow Trace (Level 4)
| Artifact | Data Variable | Source | Produces Real Data | Status |
|---|---|---|---|---|
ChatVoicePlayer.tsx |
audioUrl (blob URL) |
POST /api/synthesize -> voicePipelineService().synthesize() -> piper/TTS binary | Binary audio buffer from real TTS service | FLOWING |
useVadRecorder.ts |
transcript text | POST /api/transcribe -> voicePipelineService().transcribe() -> whisper-cpp | Real transcription from audio buffer | FLOWING |
useVoiceMode.ts |
mode |
GET /api/nexus/settings -> nexusSettingsService().get() -> file-backed settings | Real persisted settings value | FLOWING |
ChatMessage.tsx |
autoPlay |
localStorage.getItem("nexus:voice:autoplay") |
Real localStorage value | FLOWING |
VoiceWaveform.tsx |
dataArray (Uint8Array) |
Web Audio API AnalyserNode from microphone MediaStream | Real frequency data from mic | FLOWING (browser only) |
Behavioral Spot-Checks
| Behavior | Command | Result | Status |
|---|---|---|---|
| voice.ts exports voiceRoutes | git show branch:server/src/routes/voice.ts | grep "export function voiceRoutes" |
export function voiceRoutes(): Router |
PASS |
| nexus-settings exports nexusSettingsRoutes | git show branch:server/src/routes/nexus-settings.ts | grep "export function" |
export function nexusSettingsRoutes(): Router |
PASS |
| encodeWav exports function with RIFF header | git show branch:ui/src/lib/encodeWav.ts | grep "RIFF" |
writeString(view, 0, "RIFF") |
PASS |
| ChatInput has zero VoiceRecordButton references | git show branch:ui/src/components/ChatInput.tsx | grep VoiceRecordButton | wc -l |
0 |
PASS |
| VAD worklet matches installed package | size check: 2480 bytes in branch == 2480 bytes in node_modules | Exact match | PASS |
| COOP/COEP on Express | git show branch:server/src/app.ts | grep Cross-Origin |
both headers present | PASS |
| COOP/COEP on Vite | git show branch:ui/vite.config.ts | grep Cross-Origin |
both headers present | PASS |
| Browser-dependent audio flows | cannot test without browser runtime | N/A | SKIP |
Requirements Coverage
| Requirement | Source Plans | Description | Status | Evidence |
|---|---|---|---|---|
| WCHAT-01 | 37-01, 37-02, 37-04 | Mic button with idle/recording/processing states | SATISFIED | VoiceMicButton.tsx three conditional renders with correct icons and aria-labels |
| WCHAT-02 | 37-01, 37-02, 37-04 | Recording auto-stops on silence via VAD | SATISFIED | useMicVAD onSpeechEnd callback in useVadRecorder.ts |
| WCHAT-03 | 37-02 | Real-time waveform/amplitude visualization | SATISFIED | VoiceWaveform.tsx canvas with AnalyserNode and requestAnimationFrame loop |
| WCHAT-04 | 37-01, 37-03, 37-04 | Voice response audio plays inline with player controls | SATISFIED | ChatVoicePlayer.tsx with play/pause buttons and hidden <audio> element |
| WCHAT-05 | 37-02, 37-03, 37-04 | User can toggle voice mode (text/voice_input/full_voice) | SATISFIED | VoiceModeToggle.tsx three pills wired to useVoiceMode which PATCHes nexus-settings |
| WCHAT-06 | 37-03, 37-04 | Auto-play of voice responses is configurable | SATISFIED | nexus:voice:autoplay localStorage key in VoiceModeToggle checkbox; read in ChatMessage |
All 6 WCHAT requirements are satisfied. No orphaned requirements found.
Anti-Patterns Found
| File | Line | Pattern | Severity | Impact |
|---|---|---|---|---|
ui/src/components/ChatPanel.tsx |
231 | startStream(newContent, activeAgentId ?? undefined) — missing voiceMode argument |
Warning | In-place message edit (no subsequent messages branch) will not send voiceMode to server; voice formatting won't apply for that edit path |
ui/src/hooks/useVadRecorder.ts |
96 | mediaStream: mediaStreamRef.current returned from hook (ref, not state) |
Info | Not a stub — state change on setState("recording") triggers re-render which reads the ref value, which was set before setState in start(). Works in practice due to synchronous ref update before async setState. No runtime issue expected. |
Human Verification Required
1. Waveform Animation During Recording
Test: Open web chat, click the mic button (should show Mic icon), observe the recording state. Expected: The mic button shows a canvas with 20 animated vertical bars that pulse with voice amplitude, surrounded by a blue/primary ring. Why human: Web Audio API AnalyserNode + canvas animation requires browser runtime with a real microphone.
2. VAD Auto-Stop Triggers Transcription
Test: Click mic, speak a complete sentence, then be silent for ~1 second. Expected: Recording stops automatically (no manual stop needed), the spinner appears briefly, then the transcribed text populates the chat input field. Why human: Requires Silero VAD ONNX model running in an AudioWorklet with real microphone input — not testable statically.
3. Voice Full Response Auto-Play and Collapsible
Test: Enable "Full Voice" mode via VoiceModeToggle, enable "Auto-play voice responses" checkbox, send a message to an agent, wait for response. Expected: The agent response shows a "Voice" badge, the spoken text, an audio player that starts playing automatically, and a "Show full response" link that expands to show full markdown. Why human: Requires TTS binary (piper/whisper) running, browser audio playback, and synthesize endpoint returning real audio — not testable statically.
4. VoiceModeToggle Persistence Across Refresh
Test: Open chat, click "Full Voice" pill, refresh the page, reopen the same chat. Expected: The VoiceModeToggle shows "Full Voice" still selected. Why human: Requires a running server with nexus-settings PATCH persisting to disk and a page reload to confirm GET retrieves the saved value.
Gaps Summary
One minor gap was found: the in-place message edit path (handleEdit else branch, ChatPanel.tsx line 231) calls startStream without passing voiceMode. The four other call sites (handleSend online, handleSend offline, handleEdit with branching, handleRetry) all correctly pass voiceMode. This means a user who edits a message in-place while in Voice mode will not have their voice mode forwarded to the server for that specific re-stream, so the server will default to text mode for that response. The impact is minor — it only affects the edge case of in-place edits (not the primary send flow), and the server defaults gracefully to text mode.
The fix is one line: change startStream(newContent, activeAgentId ?? undefined) to startStream(newContent, activeAgentId ?? undefined, voiceMode) at ChatPanel.tsx line 231.
Verified: 2026-04-03T12:00:00Z Verifier: Claude (gsd-verifier)