fix(37): pass voiceMode in ChatPanel handleEdit path + add verification
This commit is contained in:
parent
c294277b84
commit
b32e8029c0
2 changed files with 201 additions and 1 deletions
200
.planning/phases/37-web-chat-voice-ui/37-VERIFICATION.md
Normal file
200
.planning/phases/37-web-chat-voice-ui/37-VERIFICATION.md
Normal file
|
|
@ -0,0 +1,200 @@
|
|||
---
|
||||
phase: 37-web-chat-voice-ui
|
||||
verified: 2026-04-03T12:00:00Z
|
||||
status: gaps_found
|
||||
score: 7/8 must-haves verified
|
||||
re_verification: false
|
||||
gaps:
|
||||
- truth: "Full voice flow works end-to-end: mic -> VAD -> transcribe -> stream -> voice badge + audio"
|
||||
status: partial
|
||||
reason: "One of five startStream call sites (in-place edit path in handleEdit) is missing the voiceMode argument. When a user edits a message in place (no subsequent messages), the stream is initiated without voiceMode, so voice mode is not sent to the server for that specific interaction path."
|
||||
artifacts:
|
||||
- path: "ui/src/components/ChatPanel.tsx"
|
||||
issue: "Line 231: startStream(newContent, activeAgentId ?? undefined) — missing third voiceMode argument in the else branch of handleEdit (in-place edit, no branching)"
|
||||
missing:
|
||||
- "Pass voiceMode as third argument: startStream(newContent, activeAgentId ?? undefined, voiceMode) in the in-place edit branch of handleEdit"
|
||||
human_verification:
|
||||
- test: "Verify waveform renders during recording"
|
||||
expected: "After clicking the mic button, an animated canvas with vertical bars appears inside the button ring during recording"
|
||||
why_human: "VoiceWaveform reads from Web Audio API AnalyserNode in a requestAnimationFrame loop — cannot verify animated canvas output programmatically"
|
||||
- test: "Verify VAD auto-stop triggers transcription"
|
||||
expected: "After speaking and then being silent for ~1 second, recording stops automatically, the mic button shows a spinner, and the transcribed text appears in the input field"
|
||||
why_human: "Requires actual microphone input and silence detection from Silero VAD ONNX model running in an AudioWorklet — not testable without a browser"
|
||||
- test: "Verify voice_full response plays audio automatically"
|
||||
expected: "In Full Voice mode, after receiving an assistant response, the audio player auto-plays and the spoken text is shown above a collapsible 'Show full response' section"
|
||||
why_human: "Requires TTS (synthesize endpoint via whisper/piper binaries), browser audio playback, and localStorage autoplay toggle — not testable statically"
|
||||
- test: "Verify VoiceModeToggle persists across page refresh"
|
||||
expected: "Selecting 'Full Voice' pill, refreshing the page, and re-opening chat shows 'Full Voice' still selected"
|
||||
why_human: "Requires server-side nexus-settings PATCH round-trip and page reload — not testable statically"
|
||||
---
|
||||
|
||||
# Phase 37: Web Chat Voice UI Verification Report
|
||||
|
||||
**Phase Goal:** Users can speak to any agent in web chat — recording auto-stops on silence, a live waveform confirms the mic is active, responses play back automatically (toggleable), and voice mode is a first-class setting
|
||||
|
||||
**Verified:** 2026-04-03T12:00:00Z
|
||||
**Status:** gaps_found (1 minor gap + 4 human verification items)
|
||||
**Re-verification:** No — initial verification
|
||||
|
||||
**Note on worktrees:** All phase 37 code was committed to the `gsd/phase-37-web-chat-voice-ui` branch, not the current worktree branch. All verification was performed against `gsd/phase-37-web-chat-voice-ui` via `git show` and `git cat-file`. The branch exists and contains all phase commits up through `c294277b` (docs(37-04)).
|
||||
|
||||
---
|
||||
|
||||
## Goal Achievement
|
||||
|
||||
### Observable Truths
|
||||
|
||||
| # | Truth | Status | Evidence |
|
||||
|---|-------|--------|----------|
|
||||
| 1 | POST /api/transcribe accepts audio upload and returns { text } | VERIFIED | `server/src/routes/voice.ts` uses multer memoryStorage, calls `voicePipelineService().transcribe()`, returns `res.json(result)` where result is `{ text: string; language?: string }` |
|
||||
| 2 | POST /api/synthesize accepts { text } and returns audio/wav | VERIFIED | `server/src/routes/voice.ts` calls `voicePipelineService().synthesize(text, voiceId)`, sends buffer with `Content-Type: audio/wav` |
|
||||
| 3 | Recording auto-stops on silence via VAD onSpeechEnd callback | VERIFIED | `useVadRecorder.ts` uses `useMicVAD` with `startOnLoad: false`, `onSpeechEnd: handleSpeechEnd` — handler calls `vad.pause()`, POSTs WAV to `/api/transcribe`, calls `opts.onTranscript` |
|
||||
| 4 | Live waveform canvas renders animated bars during recording | VERIFIED | `VoiceWaveform.tsx` uses `canvas`, `createAnalyser()` with `fftSize=64`, `getByteFrequencyData()`, `requestAnimationFrame` loop — wired into `VoiceMicButton` recording state |
|
||||
| 5 | Voice response audio plays inline with play/pause and auto-play | VERIFIED | `ChatVoicePlayer.tsx` POSTs to `/api/synthesize`, creates object URL, renders `<audio>` element with play/pause controls; auto-play triggers when `autoPlay=true` and `audioUrl` is set |
|
||||
| 6 | VoiceModeToggle presents Text/Voice In/Full Voice and persists via nexus-settings | VERIFIED | `VoiceModeToggle.tsx` renders three pills via `useVoiceMode()` which GETs/PATCHes `/api/nexus/settings`; `nexus-settings.ts` service/route implement GET+PATCH |
|
||||
| 7 | Auto-play preference stored in localStorage under nexus:voice:autoplay | VERIFIED | `VoiceModeToggle.tsx` reads/writes `localStorage.getItem("nexus:voice:autoplay")`; `ChatMessage.tsx` reads it at render time |
|
||||
| 8 | Full voice flow end-to-end: mic -> VAD -> transcribe -> stream -> voice badge + audio | PARTIAL | 4 of 5 `startStream` call sites pass `voiceMode`; the in-place edit branch of `handleEdit` (line 231, `ChatPanel.tsx`) calls `startStream(newContent, activeAgentId ?? undefined)` — missing `voiceMode` argument |
|
||||
|
||||
**Score:** 7/8 truths verified (1 partial)
|
||||
|
||||
---
|
||||
|
||||
### Required Artifacts
|
||||
|
||||
| Artifact | Status | Evidence |
|
||||
|----------|--------|----------|
|
||||
| `server/src/routes/voice.ts` | VERIFIED | Exports `voiceRoutes()` with POST /transcribe and POST /synthesize wired to `voicePipelineService` |
|
||||
| `server/src/routes/nexus-settings.ts` | VERIFIED | Exports `nexusSettingsRoutes()` with GET/PATCH /nexus/settings |
|
||||
| `server/src/services/nexus-settings.ts` | VERIFIED | Exports `VOICE_MODES`, `VoiceMode`, `nexusSettingsService`; schema includes `voiceMode: z.enum(VOICE_MODES).default("text")` |
|
||||
| `ui/public/vad.worklet.bundle.min.js` | VERIFIED | 2480 bytes — matches installed `@ricky0123/vad-web@0.0.30` dist file exactly (correct, not a placeholder) |
|
||||
| `ui/public/silero_vad_legacy.onnx` | VERIFIED | 1,807,522 bytes — real ONNX model |
|
||||
| `ui/public/silero_vad_v5.onnx` | VERIFIED | 2,327,524 bytes — real ONNX model |
|
||||
| `ui/src/lib/encodeWav.ts` | VERIFIED | Exports `encodeWav(samples: Float32Array, sampleRate?)` with standard RIFF/WAVE 44-byte header, PCM 16-bit mono |
|
||||
| `ui/src/hooks/useVadRecorder.ts` | VERIFIED | Uses `useMicVAD` with `startOnLoad: false`, `baseAssetPath: "/"`, `onnxWASMBasePath: "/"`, POSTs to `/api/transcribe` |
|
||||
| `ui/src/hooks/useVoiceMode.ts` | VERIFIED | GETs and PATCHes `/api/nexus/settings` for voiceMode |
|
||||
| `ui/src/components/VoiceWaveform.tsx` | VERIFIED | Canvas + AnalyserNode + requestAnimationFrame + getByteFrequencyData |
|
||||
| `ui/src/components/VoiceMicButton.tsx` | VERIFIED | Three visual states (idle/recording/processing) with correct aria-labels, ring-2 ring-primary on recording, Loader2 animate-spin on processing |
|
||||
| `ui/src/components/ChatVoicePlayer.tsx` | VERIFIED | POSTs to /api/synthesize, creates object URL, native `<audio>` element, play/pause controls, auto-play logic, blob URL cleanup |
|
||||
| `ui/src/components/ChatVoiceBadge.tsx` | VERIFIED | Parses SPOKEN/DETAILED format, renders Badge "Voice", ChatVoicePlayer for voice_full, Collapsible "Show/Hide full response" |
|
||||
| `ui/src/components/VoiceModeToggle.tsx` | VERIFIED | Three pills (Text/Voice In/Full Voice), `role="group"`, `aria-label="Voice mode"`, bg-primary active / bg-muted inactive, nexus:voice:autoplay localStorage |
|
||||
| `ui/src/components/ChatInput.tsx` | VERIFIED | VoiceMicButton replaces VoiceRecordButton (zero VoiceRecordButton references remaining); VoiceModeToggle rendered above form |
|
||||
| `ui/src/components/ChatMessage.tsx` | VERIFIED | ChatVoiceBadge dispatched for voice_input and voice_full messageTypes; reads nexus:voice:autoplay from localStorage |
|
||||
| `ui/src/components/ChatPanel.tsx` | PARTIAL | useVoiceMode imported and called; voiceMode passed to 4 of 5 startStream call sites — missing in in-place edit path |
|
||||
| `ui/src/hooks/useStreamingChat.ts` | VERIFIED | startStream signature: `(userMessage, agentId?, voiceMode?)` — voiceMode forwarded to chatApi |
|
||||
| `ui/src/api/chat.ts` | VERIFIED | postMessageAndStream data parameter typed as `{ content; agentId?; voiceMode? }` |
|
||||
|
||||
---
|
||||
|
||||
### Key Link Verification
|
||||
|
||||
| From | To | Via | Status | Details |
|
||||
|------|----|-----|--------|---------|
|
||||
| `server/src/app.ts` | `server/src/routes/voice.ts` | `api.use(voiceRoutes())` | WIRED | `voiceRoutes` appears twice in app.ts (import + mount) |
|
||||
| `server/src/app.ts` | `server/src/routes/nexus-settings.ts` | `api.use(nexusSettingsRoutes())` | WIRED | `nexusSettingsRoutes` appears twice in app.ts |
|
||||
| `server/src/app.ts` | COOP/COEP middleware | `res.setHeader` before routes | WIRED | `Cross-Origin-Opener-Policy: same-origin` + `Cross-Origin-Embedder-Policy: require-corp` |
|
||||
| `server/src/routes/chat.ts` | voiceMode parameter | destructured from req.body | WIRED | `const { content, agentId, voiceMode } = req.body` with union type |
|
||||
| `ui/vite.config.ts` | COOP/COEP dev headers | `server.headers` config | WIRED | Both headers set in dev server config |
|
||||
| `ui/src/components/VoiceMicButton.tsx` | `useVadRecorder.ts` | `useVadRecorder()` hook | WIRED | Imports and calls the hook, destructures state/start/stop/mediaStream |
|
||||
| `ui/src/hooks/useVadRecorder.ts` | `ui/src/lib/encodeWav.ts` | `encodeWav(audio)` in onSpeechEnd | WIRED | Import verified, called in handleSpeechEnd |
|
||||
| `ui/src/hooks/useVadRecorder.ts` | `/api/transcribe` | `fetch POST with FormData` | WIRED | `fetch("/api/transcribe", { method: "POST", credentials: "include", body: formData })` |
|
||||
| `ui/src/components/VoiceMicButton.tsx` | `VoiceWaveform.tsx` | `<VoiceWaveform stream={mediaStream} active={true} />` | WIRED | Rendered inside recording state conditional |
|
||||
| `ui/src/components/ChatVoicePlayer.tsx` | `/api/synthesize` | `fetch POST` | WIRED | `fetch("/api/synthesize", { method: "POST" ... })` in useEffect |
|
||||
| `ui/src/components/ChatVoiceBadge.tsx` | shadcn Collapsible | `Collapsible/CollapsibleContent/CollapsibleTrigger` | WIRED | All three imported and used |
|
||||
| `ui/src/components/VoiceModeToggle.tsx` | `useVoiceMode.ts` | `useVoiceMode()` | WIRED | Imported from `@/hooks/useVoiceMode`, destructures mode/setMode/isLoading |
|
||||
| `ui/src/components/ChatPanel.tsx` | `useVoiceMode.ts` | `useVoiceMode()` | WIRED | `const { mode: voiceMode } = useVoiceMode()` |
|
||||
| `ui/src/components/ChatPanel.tsx` | `useStreamingChat.ts` | `startStream(content, agentId, voiceMode)` | PARTIAL | 4/5 call sites pass voiceMode; in-place edit path at line 231 does not |
|
||||
| `ui/src/hooks/useStreamingChat.ts` | `ui/src/api/chat.ts` | `chatApi.postMessageAndStream with voiceMode` | WIRED | `{ content: userMessage, agentId, voiceMode }` passed as data |
|
||||
| `ui/src/components/ChatInput.tsx` | `VoiceMicButton.tsx` | replaces VoiceRecordButton | WIRED | VoiceRecordButton has 0 occurrences; VoiceMicButton imported and rendered |
|
||||
| `ui/src/components/ChatMessage.tsx` | `ChatVoiceBadge.tsx` | voice messageType dispatch | WIRED | `if (messageType === "voice_input" \|\| messageType === "voice_full")` dispatches to ChatVoiceBadge |
|
||||
|
||||
---
|
||||
|
||||
### Data-Flow Trace (Level 4)
|
||||
|
||||
| Artifact | Data Variable | Source | Produces Real Data | Status |
|
||||
|----------|---------------|--------|--------------------|--------|
|
||||
| `ChatVoicePlayer.tsx` | `audioUrl` (blob URL) | POST /api/synthesize -> voicePipelineService().synthesize() -> piper/TTS binary | Binary audio buffer from real TTS service | FLOWING |
|
||||
| `useVadRecorder.ts` | transcript text | POST /api/transcribe -> voicePipelineService().transcribe() -> whisper-cpp | Real transcription from audio buffer | FLOWING |
|
||||
| `useVoiceMode.ts` | `mode` | GET /api/nexus/settings -> nexusSettingsService().get() -> file-backed settings | Real persisted settings value | FLOWING |
|
||||
| `ChatMessage.tsx` | `autoPlay` | `localStorage.getItem("nexus:voice:autoplay")` | Real localStorage value | FLOWING |
|
||||
| `VoiceWaveform.tsx` | `dataArray` (Uint8Array) | Web Audio API AnalyserNode from microphone MediaStream | Real frequency data from mic | FLOWING (browser only) |
|
||||
|
||||
---
|
||||
|
||||
### Behavioral Spot-Checks
|
||||
|
||||
| Behavior | Command | Result | Status |
|
||||
|----------|---------|--------|--------|
|
||||
| voice.ts exports voiceRoutes | `git show branch:server/src/routes/voice.ts \| grep "export function voiceRoutes"` | `export function voiceRoutes(): Router` | PASS |
|
||||
| nexus-settings exports nexusSettingsRoutes | `git show branch:server/src/routes/nexus-settings.ts \| grep "export function"` | `export function nexusSettingsRoutes(): Router` | PASS |
|
||||
| encodeWav exports function with RIFF header | `git show branch:ui/src/lib/encodeWav.ts \| grep "RIFF"` | `writeString(view, 0, "RIFF")` | PASS |
|
||||
| ChatInput has zero VoiceRecordButton references | `git show branch:ui/src/components/ChatInput.tsx \| grep VoiceRecordButton \| wc -l` | `0` | PASS |
|
||||
| VAD worklet matches installed package | size check: 2480 bytes in branch == 2480 bytes in node_modules | Exact match | PASS |
|
||||
| COOP/COEP on Express | `git show branch:server/src/app.ts \| grep Cross-Origin` | both headers present | PASS |
|
||||
| COOP/COEP on Vite | `git show branch:ui/vite.config.ts \| grep Cross-Origin` | both headers present | PASS |
|
||||
| Browser-dependent audio flows | cannot test without browser runtime | N/A | SKIP |
|
||||
|
||||
---
|
||||
|
||||
### Requirements Coverage
|
||||
|
||||
| Requirement | Source Plans | Description | Status | Evidence |
|
||||
|-------------|-------------|-------------|--------|----------|
|
||||
| WCHAT-01 | 37-01, 37-02, 37-04 | Mic button with idle/recording/processing states | SATISFIED | `VoiceMicButton.tsx` three conditional renders with correct icons and aria-labels |
|
||||
| WCHAT-02 | 37-01, 37-02, 37-04 | Recording auto-stops on silence via VAD | SATISFIED | `useMicVAD` onSpeechEnd callback in `useVadRecorder.ts` |
|
||||
| WCHAT-03 | 37-02 | Real-time waveform/amplitude visualization | SATISFIED | `VoiceWaveform.tsx` canvas with AnalyserNode and requestAnimationFrame loop |
|
||||
| WCHAT-04 | 37-01, 37-03, 37-04 | Voice response audio plays inline with player controls | SATISFIED | `ChatVoicePlayer.tsx` with play/pause buttons and hidden `<audio>` element |
|
||||
| WCHAT-05 | 37-02, 37-03, 37-04 | User can toggle voice mode (text/voice_input/full_voice) | SATISFIED | `VoiceModeToggle.tsx` three pills wired to `useVoiceMode` which PATCHes nexus-settings |
|
||||
| WCHAT-06 | 37-03, 37-04 | Auto-play of voice responses is configurable | SATISFIED | `nexus:voice:autoplay` localStorage key in VoiceModeToggle checkbox; read in ChatMessage |
|
||||
|
||||
All 6 WCHAT requirements are satisfied. No orphaned requirements found.
|
||||
|
||||
---
|
||||
|
||||
### Anti-Patterns Found
|
||||
|
||||
| File | Line | Pattern | Severity | Impact |
|
||||
|------|------|---------|----------|--------|
|
||||
| `ui/src/components/ChatPanel.tsx` | 231 | `startStream(newContent, activeAgentId ?? undefined)` — missing voiceMode argument | Warning | In-place message edit (no subsequent messages branch) will not send voiceMode to server; voice formatting won't apply for that edit path |
|
||||
| `ui/src/hooks/useVadRecorder.ts` | 96 | `mediaStream: mediaStreamRef.current` returned from hook (ref, not state) | Info | Not a stub — state change on `setState("recording")` triggers re-render which reads the ref value, which was set before setState in start(). Works in practice due to synchronous ref update before async setState. No runtime issue expected. |
|
||||
|
||||
---
|
||||
|
||||
### Human Verification Required
|
||||
|
||||
#### 1. Waveform Animation During Recording
|
||||
|
||||
**Test:** Open web chat, click the mic button (should show Mic icon), observe the recording state.
|
||||
**Expected:** The mic button shows a canvas with 20 animated vertical bars that pulse with voice amplitude, surrounded by a blue/primary ring.
|
||||
**Why human:** Web Audio API AnalyserNode + canvas animation requires browser runtime with a real microphone.
|
||||
|
||||
#### 2. VAD Auto-Stop Triggers Transcription
|
||||
|
||||
**Test:** Click mic, speak a complete sentence, then be silent for ~1 second.
|
||||
**Expected:** Recording stops automatically (no manual stop needed), the spinner appears briefly, then the transcribed text populates the chat input field.
|
||||
**Why human:** Requires Silero VAD ONNX model running in an AudioWorklet with real microphone input — not testable statically.
|
||||
|
||||
#### 3. Voice Full Response Auto-Play and Collapsible
|
||||
|
||||
**Test:** Enable "Full Voice" mode via VoiceModeToggle, enable "Auto-play voice responses" checkbox, send a message to an agent, wait for response.
|
||||
**Expected:** The agent response shows a "Voice" badge, the spoken text, an audio player that starts playing automatically, and a "Show full response" link that expands to show full markdown.
|
||||
**Why human:** Requires TTS binary (piper/whisper) running, browser audio playback, and synthesize endpoint returning real audio — not testable statically.
|
||||
|
||||
#### 4. VoiceModeToggle Persistence Across Refresh
|
||||
|
||||
**Test:** Open chat, click "Full Voice" pill, refresh the page, reopen the same chat.
|
||||
**Expected:** The VoiceModeToggle shows "Full Voice" still selected.
|
||||
**Why human:** Requires a running server with nexus-settings PATCH persisting to disk and a page reload to confirm GET retrieves the saved value.
|
||||
|
||||
---
|
||||
|
||||
### Gaps Summary
|
||||
|
||||
One minor gap was found: the in-place message edit path (`handleEdit` else branch, `ChatPanel.tsx` line 231) calls `startStream` without passing `voiceMode`. The four other call sites (handleSend online, handleSend offline, handleEdit with branching, handleRetry) all correctly pass `voiceMode`. This means a user who edits a message in-place while in Voice mode will not have their voice mode forwarded to the server for that specific re-stream, so the server will default to text mode for that response. The impact is minor — it only affects the edge case of in-place edits (not the primary send flow), and the server defaults gracefully to text mode.
|
||||
|
||||
The fix is one line: change `startStream(newContent, activeAgentId ?? undefined)` to `startStream(newContent, activeAgentId ?? undefined, voiceMode)` at `ChatPanel.tsx` line 231.
|
||||
|
||||
---
|
||||
|
||||
_Verified: 2026-04-03T12:00:00Z_
|
||||
_Verifier: Claude (gsd-verifier)_
|
||||
|
|
@ -228,7 +228,7 @@ export function ChatPanel() {
|
|||
await chatApi.truncateMessagesAfter(activeConversationId, messageId);
|
||||
queryClient.invalidateQueries({ queryKey: ["chat", "messages", activeConversationId] });
|
||||
queryClient.invalidateQueries({ queryKey: ["chat", "search"] });
|
||||
startStream(newContent, activeAgentId ?? undefined);
|
||||
startStream(newContent, activeAgentId ?? undefined, voiceMode);
|
||||
}
|
||||
};
|
||||
|
||||
|
|
|
|||
Loading…
Add table
Reference in a new issue