fix(37): pass voiceMode in ChatPanel handleEdit path + add verification

2026-04-04 02:52:55 +00:00 · 2026-04-04 02:52:55 +00:00 · b32e8029c0
commit b32e8029c0
parent c294277b84
2 changed files with 201 additions and 1 deletions
--- a/.planning/phases/37-web-chat-voice-ui/37-VERIFICATION.md
+++ b/.planning/phases/37-web-chat-voice-ui/37-VERIFICATION.md
@ -0,0 +1,200 @@
+---
+phase: 37-web-chat-voice-ui
+verified: 2026-04-03T12:00:00Z
+status: gaps_found
+score: 7/8 must-haves verified
+re_verification: false
+gaps:
+  - truth: "Full voice flow works end-to-end: mic -> VAD -> transcribe -> stream -> voice badge + audio"
+    status: partial
+    reason: "One of five startStream call sites (in-place edit path in handleEdit) is missing the voiceMode argument. When a user edits a message in place (no subsequent messages), the stream is initiated without voiceMode, so voice mode is not sent to the server for that specific interaction path."
+    artifacts:
+      - path: "ui/src/components/ChatPanel.tsx"
+        issue: "Line 231: startStream(newContent, activeAgentId ?? undefined) — missing third voiceMode argument in the else branch of handleEdit (in-place edit, no branching)"
+    missing:
+      - "Pass voiceMode as third argument: startStream(newContent, activeAgentId ?? undefined, voiceMode) in the in-place edit branch of handleEdit"
+human_verification:
+  - test: "Verify waveform renders during recording"
+    expected: "After clicking the mic button, an animated canvas with vertical bars appears inside the button ring during recording"
+    why_human: "VoiceWaveform reads from Web Audio API AnalyserNode in a requestAnimationFrame loop — cannot verify animated canvas output programmatically"
+  - test: "Verify VAD auto-stop triggers transcription"
+    expected: "After speaking and then being silent for ~1 second, recording stops automatically, the mic button shows a spinner, and the transcribed text appears in the input field"
+    why_human: "Requires actual microphone input and silence detection from Silero VAD ONNX model running in an AudioWorklet — not testable without a browser"
+  - test: "Verify voice_full response plays audio automatically"
+    expected: "In Full Voice mode, after receiving an assistant response, the audio player auto-plays and the spoken text is shown above a collapsible 'Show full response' section"
+    why_human: "Requires TTS (synthesize endpoint via whisper/piper binaries), browser audio playback, and localStorage autoplay toggle — not testable statically"
+  - test: "Verify VoiceModeToggle persists across page refresh"
+    expected: "Selecting 'Full Voice' pill, refreshing the page, and re-opening chat shows 'Full Voice' still selected"
+    why_human: "Requires server-side nexus-settings PATCH round-trip and page reload — not testable statically"
+---
+
+# Phase 37: Web Chat Voice UI Verification Report
+
+**Phase Goal:** Users can speak to any agent in web chat — recording auto-stops on silence, a live waveform confirms the mic is active, responses play back automatically (toggleable), and voice mode is a first-class setting
+
+**Verified:** 2026-04-03T12:00:00Z
+**Status:** gaps_found (1 minor gap + 4 human verification items)
+**Re-verification:** No — initial verification
+
+**Note on worktrees:** All phase 37 code was committed to the `gsd/phase-37-web-chat-voice-ui` branch, not the current worktree branch. All verification was performed against `gsd/phase-37-web-chat-voice-ui` via `git show` and `git cat-file`. The branch exists and contains all phase commits up through `c294277b` (docs(37-04)).
+
+---
+
+## Goal Achievement
+
+### Observable Truths
+
+| # | Truth | Status | Evidence |
+|---|-------|--------|----------|
+| 1 | POST /api/transcribe accepts audio upload and returns { text } | VERIFIED | `server/src/routes/voice.ts` uses multer memoryStorage, calls `voicePipelineService().transcribe()`, returns `res.json(result)` where result is `{ text: string; language?: string }` |
+| 2 | POST /api/synthesize accepts { text } and returns audio/wav | VERIFIED | `server/src/routes/voice.ts` calls `voicePipelineService().synthesize(text, voiceId)`, sends buffer with `Content-Type: audio/wav` |
+| 3 | Recording auto-stops on silence via VAD onSpeechEnd callback | VERIFIED | `useVadRecorder.ts` uses `useMicVAD` with `startOnLoad: false`, `onSpeechEnd: handleSpeechEnd` — handler calls `vad.pause()`, POSTs WAV to `/api/transcribe`, calls `opts.onTranscript` |
+| 4 | Live waveform canvas renders animated bars during recording | VERIFIED | `VoiceWaveform.tsx` uses `canvas`, `createAnalyser()` with `fftSize=64`, `getByteFrequencyData()`, `requestAnimationFrame` loop — wired into `VoiceMicButton` recording state |
+| 5 | Voice response audio plays inline with play/pause and auto-play | VERIFIED | `ChatVoicePlayer.tsx` POSTs to `/api/synthesize`, creates object URL, renders `<audio>` element with play/pause controls; auto-play triggers when `autoPlay=true` and `audioUrl` is set |
+| 6 | VoiceModeToggle presents Text/Voice In/Full Voice and persists via nexus-settings | VERIFIED | `VoiceModeToggle.tsx` renders three pills via `useVoiceMode()` which GETs/PATCHes `/api/nexus/settings`; `nexus-settings.ts` service/route implement GET+PATCH |
+| 7 | Auto-play preference stored in localStorage under nexus:voice:autoplay | VERIFIED | `VoiceModeToggle.tsx` reads/writes `localStorage.getItem("nexus:voice:autoplay")`; `ChatMessage.tsx` reads it at render time |
+| 8 | Full voice flow end-to-end: mic -> VAD -> transcribe -> stream -> voice badge + audio | PARTIAL | 4 of 5 `startStream` call sites pass `voiceMode`; the in-place edit branch of `handleEdit` (line 231, `ChatPanel.tsx`) calls `startStream(newContent, activeAgentId ?? undefined)` — missing `voiceMode` argument |
+
+**Score:** 7/8 truths verified (1 partial)
+
+---
+
+### Required Artifacts
+
+| Artifact | Status | Evidence |
+|----------|--------|----------|
+| `server/src/routes/voice.ts` | VERIFIED | Exports `voiceRoutes()` with POST /transcribe and POST /synthesize wired to `voicePipelineService` |
+| `server/src/routes/nexus-settings.ts` | VERIFIED | Exports `nexusSettingsRoutes()` with GET/PATCH /nexus/settings |
+| `server/src/services/nexus-settings.ts` | VERIFIED | Exports `VOICE_MODES`, `VoiceMode`, `nexusSettingsService`; schema includes `voiceMode: z.enum(VOICE_MODES).default("text")` |
+| `ui/public/vad.worklet.bundle.min.js` | VERIFIED | 2480 bytes — matches installed `@ricky0123/vad-web@0.0.30` dist file exactly (correct, not a placeholder) |
+| `ui/public/silero_vad_legacy.onnx` | VERIFIED | 1,807,522 bytes — real ONNX model |
+| `ui/public/silero_vad_v5.onnx` | VERIFIED | 2,327,524 bytes — real ONNX model |
+| `ui/src/lib/encodeWav.ts` | VERIFIED | Exports `encodeWav(samples: Float32Array, sampleRate?)` with standard RIFF/WAVE 44-byte header, PCM 16-bit mono |
+| `ui/src/hooks/useVadRecorder.ts` | VERIFIED | Uses `useMicVAD` with `startOnLoad: false`, `baseAssetPath: "/"`, `onnxWASMBasePath: "/"`, POSTs to `/api/transcribe` |
+| `ui/src/hooks/useVoiceMode.ts` | VERIFIED | GETs and PATCHes `/api/nexus/settings` for voiceMode |
+| `ui/src/components/VoiceWaveform.tsx` | VERIFIED | Canvas + AnalyserNode + requestAnimationFrame + getByteFrequencyData |
+| `ui/src/components/VoiceMicButton.tsx` | VERIFIED | Three visual states (idle/recording/processing) with correct aria-labels, ring-2 ring-primary on recording, Loader2 animate-spin on processing |
+| `ui/src/components/ChatVoicePlayer.tsx` | VERIFIED | POSTs to /api/synthesize, creates object URL, native `<audio>` element, play/pause controls, auto-play logic, blob URL cleanup |
+| `ui/src/components/ChatVoiceBadge.tsx` | VERIFIED | Parses SPOKEN/DETAILED format, renders Badge "Voice", ChatVoicePlayer for voice_full, Collapsible "Show/Hide full response" |
+| `ui/src/components/VoiceModeToggle.tsx` | VERIFIED | Three pills (Text/Voice In/Full Voice), `role="group"`, `aria-label="Voice mode"`, bg-primary active / bg-muted inactive, nexus:voice:autoplay localStorage |
+| `ui/src/components/ChatInput.tsx` | VERIFIED | VoiceMicButton replaces VoiceRecordButton (zero VoiceRecordButton references remaining); VoiceModeToggle rendered above form |
+| `ui/src/components/ChatMessage.tsx` | VERIFIED | ChatVoiceBadge dispatched for voice_input and voice_full messageTypes; reads nexus:voice:autoplay from localStorage |
+| `ui/src/components/ChatPanel.tsx` | PARTIAL | useVoiceMode imported and called; voiceMode passed to 4 of 5 startStream call sites — missing in in-place edit path |
+| `ui/src/hooks/useStreamingChat.ts` | VERIFIED | startStream signature: `(userMessage, agentId?, voiceMode?)` — voiceMode forwarded to chatApi |
+| `ui/src/api/chat.ts` | VERIFIED | postMessageAndStream data parameter typed as `{ content; agentId?; voiceMode? }` |
+
+---
+
+### Key Link Verification
+
+| From | To | Via | Status | Details |
+|------|----|-----|--------|---------|
+| `server/src/app.ts` | `server/src/routes/voice.ts` | `api.use(voiceRoutes())` | WIRED | `voiceRoutes` appears twice in app.ts (import + mount) |
+| `server/src/app.ts` | `server/src/routes/nexus-settings.ts` | `api.use(nexusSettingsRoutes())` | WIRED | `nexusSettingsRoutes` appears twice in app.ts |
+| `server/src/app.ts` | COOP/COEP middleware | `res.setHeader` before routes | WIRED | `Cross-Origin-Opener-Policy: same-origin` + `Cross-Origin-Embedder-Policy: require-corp` |
+| `server/src/routes/chat.ts` | voiceMode parameter | destructured from req.body | WIRED | `const { content, agentId, voiceMode } = req.body` with union type |
+| `ui/vite.config.ts` | COOP/COEP dev headers | `server.headers` config | WIRED | Both headers set in dev server config |
+| `ui/src/components/VoiceMicButton.tsx` | `useVadRecorder.ts` | `useVadRecorder()` hook | WIRED | Imports and calls the hook, destructures state/start/stop/mediaStream |
+| `ui/src/hooks/useVadRecorder.ts` | `ui/src/lib/encodeWav.ts` | `encodeWav(audio)` in onSpeechEnd | WIRED | Import verified, called in handleSpeechEnd |
+| `ui/src/hooks/useVadRecorder.ts` | `/api/transcribe` | `fetch POST with FormData` | WIRED | `fetch("/api/transcribe", { method: "POST", credentials: "include", body: formData })` |
+| `ui/src/components/VoiceMicButton.tsx` | `VoiceWaveform.tsx` | `<VoiceWaveform stream={mediaStream} active={true} />` | WIRED | Rendered inside recording state conditional |
+| `ui/src/components/ChatVoicePlayer.tsx` | `/api/synthesize` | `fetch POST` | WIRED | `fetch("/api/synthesize", { method: "POST" ... })` in useEffect |
+| `ui/src/components/ChatVoiceBadge.tsx` | shadcn Collapsible | `Collapsible/CollapsibleContent/CollapsibleTrigger` | WIRED | All three imported and used |
+| `ui/src/components/VoiceModeToggle.tsx` | `useVoiceMode.ts` | `useVoiceMode()` | WIRED | Imported from `@/hooks/useVoiceMode`, destructures mode/setMode/isLoading |
+| `ui/src/components/ChatPanel.tsx` | `useVoiceMode.ts` | `useVoiceMode()` | WIRED | `const { mode: voiceMode } = useVoiceMode()` |
+| `ui/src/components/ChatPanel.tsx` | `useStreamingChat.ts` | `startStream(content, agentId, voiceMode)` | PARTIAL | 4/5 call sites pass voiceMode; in-place edit path at line 231 does not |
+| `ui/src/hooks/useStreamingChat.ts` | `ui/src/api/chat.ts` | `chatApi.postMessageAndStream with voiceMode` | WIRED | `{ content: userMessage, agentId, voiceMode }` passed as data |
+| `ui/src/components/ChatInput.tsx` | `VoiceMicButton.tsx` | replaces VoiceRecordButton | WIRED | VoiceRecordButton has 0 occurrences; VoiceMicButton imported and rendered |
+| `ui/src/components/ChatMessage.tsx` | `ChatVoiceBadge.tsx` | voice messageType dispatch | WIRED | `if (messageType === "voice_input" \|\| messageType === "voice_full")` dispatches to ChatVoiceBadge |
+
+---
+
+### Data-Flow Trace (Level 4)
+
+| Artifact | Data Variable | Source | Produces Real Data | Status |
+|----------|---------------|--------|--------------------|--------|
+| `ChatVoicePlayer.tsx` | `audioUrl` (blob URL) | POST /api/synthesize -> voicePipelineService().synthesize() -> piper/TTS binary | Binary audio buffer from real TTS service | FLOWING |
+| `useVadRecorder.ts` | transcript text | POST /api/transcribe -> voicePipelineService().transcribe() -> whisper-cpp | Real transcription from audio buffer | FLOWING |
+| `useVoiceMode.ts` | `mode` | GET /api/nexus/settings -> nexusSettingsService().get() -> file-backed settings | Real persisted settings value | FLOWING |
+| `ChatMessage.tsx` | `autoPlay` | `localStorage.getItem("nexus:voice:autoplay")` | Real localStorage value | FLOWING |
+| `VoiceWaveform.tsx` | `dataArray` (Uint8Array) | Web Audio API AnalyserNode from microphone MediaStream | Real frequency data from mic | FLOWING (browser only) |
+
+---
+
+### Behavioral Spot-Checks
+
+| Behavior | Command | Result | Status |
+|----------|---------|--------|--------|
+| voice.ts exports voiceRoutes | `git show branch:server/src/routes/voice.ts \| grep "export function voiceRoutes"` | `export function voiceRoutes(): Router` | PASS |
+| nexus-settings exports nexusSettingsRoutes | `git show branch:server/src/routes/nexus-settings.ts \| grep "export function"` | `export function nexusSettingsRoutes(): Router` | PASS |
+| encodeWav exports function with RIFF header | `git show branch:ui/src/lib/encodeWav.ts \| grep "RIFF"` | `writeString(view, 0, "RIFF")` | PASS |
+| ChatInput has zero VoiceRecordButton references | `git show branch:ui/src/components/ChatInput.tsx \| grep VoiceRecordButton \| wc -l` | `0` | PASS |
+| VAD worklet matches installed package | size check: 2480 bytes in branch == 2480 bytes in node_modules | Exact match | PASS |
+| COOP/COEP on Express | `git show branch:server/src/app.ts \| grep Cross-Origin` | both headers present | PASS |
+| COOP/COEP on Vite | `git show branch:ui/vite.config.ts \| grep Cross-Origin` | both headers present | PASS |
+| Browser-dependent audio flows | cannot test without browser runtime | N/A | SKIP |
+
+---
+
+### Requirements Coverage
+
+| Requirement | Source Plans | Description | Status | Evidence |
+|-------------|-------------|-------------|--------|----------|
+| WCHAT-01 | 37-01, 37-02, 37-04 | Mic button with idle/recording/processing states | SATISFIED | `VoiceMicButton.tsx` three conditional renders with correct icons and aria-labels |
+| WCHAT-02 | 37-01, 37-02, 37-04 | Recording auto-stops on silence via VAD | SATISFIED | `useMicVAD` onSpeechEnd callback in `useVadRecorder.ts` |
+| WCHAT-03 | 37-02 | Real-time waveform/amplitude visualization | SATISFIED | `VoiceWaveform.tsx` canvas with AnalyserNode and requestAnimationFrame loop |
+| WCHAT-04 | 37-01, 37-03, 37-04 | Voice response audio plays inline with player controls | SATISFIED | `ChatVoicePlayer.tsx` with play/pause buttons and hidden `<audio>` element |
+| WCHAT-05 | 37-02, 37-03, 37-04 | User can toggle voice mode (text/voice_input/full_voice) | SATISFIED | `VoiceModeToggle.tsx` three pills wired to `useVoiceMode` which PATCHes nexus-settings |
+| WCHAT-06 | 37-03, 37-04 | Auto-play of voice responses is configurable | SATISFIED | `nexus:voice:autoplay` localStorage key in VoiceModeToggle checkbox; read in ChatMessage |
+
+All 6 WCHAT requirements are satisfied. No orphaned requirements found.
+
+---
+
+### Anti-Patterns Found
+
+| File | Line | Pattern | Severity | Impact |
+|------|------|---------|----------|--------|
+| `ui/src/components/ChatPanel.tsx` | 231 | `startStream(newContent, activeAgentId ?? undefined)` — missing voiceMode argument | Warning | In-place message edit (no subsequent messages branch) will not send voiceMode to server; voice formatting won't apply for that edit path |
+| `ui/src/hooks/useVadRecorder.ts` | 96 | `mediaStream: mediaStreamRef.current` returned from hook (ref, not state) | Info | Not a stub — state change on `setState("recording")` triggers re-render which reads the ref value, which was set before setState in start(). Works in practice due to synchronous ref update before async setState. No runtime issue expected. |
+
+---
+
+### Human Verification Required
+
+#### 1. Waveform Animation During Recording
+
+**Test:** Open web chat, click the mic button (should show Mic icon), observe the recording state.
+**Expected:** The mic button shows a canvas with 20 animated vertical bars that pulse with voice amplitude, surrounded by a blue/primary ring.
+**Why human:** Web Audio API AnalyserNode + canvas animation requires browser runtime with a real microphone.
+
+#### 2. VAD Auto-Stop Triggers Transcription
+
+**Test:** Click mic, speak a complete sentence, then be silent for ~1 second.
+**Expected:** Recording stops automatically (no manual stop needed), the spinner appears briefly, then the transcribed text populates the chat input field.
+**Why human:** Requires Silero VAD ONNX model running in an AudioWorklet with real microphone input — not testable statically.
+
+#### 3. Voice Full Response Auto-Play and Collapsible
+
+**Test:** Enable "Full Voice" mode via VoiceModeToggle, enable "Auto-play voice responses" checkbox, send a message to an agent, wait for response.
+**Expected:** The agent response shows a "Voice" badge, the spoken text, an audio player that starts playing automatically, and a "Show full response" link that expands to show full markdown.
+**Why human:** Requires TTS binary (piper/whisper) running, browser audio playback, and synthesize endpoint returning real audio — not testable statically.
+
+#### 4. VoiceModeToggle Persistence Across Refresh
+
+**Test:** Open chat, click "Full Voice" pill, refresh the page, reopen the same chat.
+**Expected:** The VoiceModeToggle shows "Full Voice" still selected.
+**Why human:** Requires a running server with nexus-settings PATCH persisting to disk and a page reload to confirm GET retrieves the saved value.
+
+---
+
+### Gaps Summary
+
+One minor gap was found: the in-place message edit path (`handleEdit` else branch, `ChatPanel.tsx` line 231) calls `startStream` without passing `voiceMode`. The four other call sites (handleSend online, handleSend offline, handleEdit with branching, handleRetry) all correctly pass `voiceMode`. This means a user who edits a message in-place while in Voice mode will not have their voice mode forwarded to the server for that specific re-stream, so the server will default to text mode for that response. The impact is minor — it only affects the edge case of in-place edits (not the primary send flow), and the server defaults gracefully to text mode.
+
+The fix is one line: change `startStream(newContent, activeAgentId ?? undefined)` to `startStream(newContent, activeAgentId ?? undefined, voiceMode)` at `ChatPanel.tsx` line 231.
+
+---
+
+_Verified: 2026-04-03T12:00:00Z_
+_Verifier: Claude (gsd-verifier)_
--- a/ui/src/components/ChatPanel.tsx
+++ b/ui/src/components/ChatPanel.tsx
@ -228,7 +228,7 @@ export function ChatPanel() {
      await chatApi.truncateMessagesAfter(activeConversationId, messageId);
      queryClient.invalidateQueries({ queryKey: ["chat", "messages", activeConversationId] });
      queryClient.invalidateQueries({ queryKey: ["chat", "search"] });
-      startStream(newContent, activeAgentId ?? undefined);
+      startStream(newContent, activeAgentId ?? undefined, voiceMode);
    }
  };