14 KiB
| phase | plan | type | wave | depends_on | files_modified | autonomous | requirements | must_haves | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 37-web-chat-voice-ui | 02 | execute | 2 |
|
|
true |
|
|
Purpose: These are the foundational building blocks that replace VoiceRecordButton with VAD-powered auto-stop recording and real-time waveform visualization.
Output: 5 new files — encodeWav utility, useVadRecorder hook, useVoiceMode hook, VoiceWaveform component, VoiceMicButton component
<execution_context> @$HOME/.claude/get-shit-done/workflows/execute-plan.md @$HOME/.claude/get-shit-done/templates/summary.md </execution_context>
@.planning/phases/37-web-chat-voice-ui/37-RESEARCH.md @.planning/phases/37-web-chat-voice-ui/37-01-SUMMARY.md ```typescript // @ricky0123/vad-react useMicVAD hook import { useMicVAD } from "@ricky0123/vad-react"; const vad = useMicVAD({ startOnLoad: false, baseAssetPath: "/", onnxWASMBasePath: "/", positiveSpeechThreshold: 0.8, negativeSpeechThreshold: 0.65, redemptionFrames: 8, minSpeechFrames: 5, onSpeechStart: () => void, onSpeechEnd: (audio: Float32Array) => void, }); // Returns: { listening, loading, errored, userSpeaking, start, pause } ```interface VoiceRecordButtonProps {
onTranscription: (text: string) => void;
disabled?: boolean;
}
GET /api/nexus/settings → { mode, voiceEnabled, voiceMode, ... }
PATCH /api/nexus/settings → accepts partial, returns updated
Task 1: Create encodeWav utility and useVadRecorder + useVoiceMode hooks
ui/src/lib/encodeWav.ts,
ui/src/hooks/useVadRecorder.ts,
ui/src/hooks/useVoiceMode.ts
ui/src/hooks/useStreamingChat.ts,
ui/src/api/chat.ts,
ui/src/components/VoiceRecordButton.tsx
1. **ui/src/lib/encodeWav.ts** — Create WAV encoder function:
```typescript
export function encodeWav(samples: Float32Array, sampleRate = 16000): Blob
```
- Standard 44-byte WAV header (RIFF/WAVE/fmt/data chunks)
- PCM format (1), mono (1 channel), 16-bit depth
- Clamp samples to [-1, 1] range before int16 conversion
- Return Blob with type "audio/wav"
- Helper: `function writeString(view: DataView, offset: number, str: string)`
-
ui/src/hooks/useVadRecorder.ts — Create VAD recording hook:
interface UseVadRecorderOptions { onTranscript: (text: string) => void; } interface UseVadRecorderReturn { state: "idle" | "recording" | "processing"; start: () => void; stop: () => void; mediaStream: MediaStream | null; // exposed for VoiceWaveform AnalyserNode } export function useVadRecorder(opts: UseVadRecorderOptions): UseVadRecorderReturnImplementation:
- Use
useMicVADfrom@ricky0123/vad-reactwithstartOnLoad: false - Set
baseAssetPath: "/"andonnxWASMBasePath: "/"(serve from ui/public/) - Set
positiveSpeechThreshold: 0.8,minSpeechFrames: 5(300ms minimum to filter noise) - In
onSpeechEnd(audio: Float32Array): a. Callvad.pause()to stop listening b. Set state to "processing" c. CallencodeWav(audio)to get WAV blob d. Create FormData, append blob as "audio" field with filename "recording.wav" e. POST to/api/transcribewithcredentials: "include"f. Parse response as{ text: string }g. If text is non-empty (length >= 2), callopts.onTranscript(text.trim())h. Set state back to "idle" start(): callsvad.start(), sets state to "recording"stop(): callsvad.pause(), sets state to "idle"- Expose
mediaStreamfromnavigator.mediaDevices.getUserMedia({ audio: true })— store in a ref. This is needed for VoiceWaveform AnalyserNode. - NOTE: useMicVAD manages its own media stream internally, but VoiceWaveform needs a separate reference to the stream for the AnalyserNode. Request the stream in the
start()function and store in a ref. Stop tracks instop().
- Use
-
ui/src/hooks/useVoiceMode.ts — Create voice mode hook:
type VoiceMode = "text" | "voice_input" | "full_voice"; interface UseVoiceModeReturn { mode: VoiceMode; setMode: (next: VoiceMode) => Promise<void>; isLoading: boolean; } export function useVoiceMode(): UseVoiceModeReturnImplementation:
- On mount, GET /api/nexus/settings with credentials: "include"
- Extract
voiceModefrom response, default to "text" setMode(next): optimistically update local state, then PATCH /api/nexus/settings with{ voiceMode: next }- Use useState for mode and isLoading
- Wrap fetch in try/catch; on error, revert to previous mode cd /opt/nexus/.claude/worktrees/agent-a009558f && test -f ui/src/lib/encodeWav.ts && test -f ui/src/hooks/useVadRecorder.ts && test -f ui/src/hooks/useVoiceMode.ts && grep -q "encodeWav" ui/src/lib/encodeWav.ts && grep -q "useVadRecorder" ui/src/hooks/useVadRecorder.ts && grep -q "useVoiceMode" ui/src/hooks/useVoiceMode.ts && grep -q "useMicVAD" ui/src/hooks/useVadRecorder.ts && grep -q "api/transcribe" ui/src/hooks/useVadRecorder.ts && grep -q "api/nexus/settings" ui/src/hooks/useVoiceMode.ts && echo "PASS" || echo "FAIL" <acceptance_criteria>
- grep "export function encodeWav" ui/src/lib/encodeWav.ts returns match
- grep "export function useVadRecorder" ui/src/hooks/useVadRecorder.ts returns match
- grep "export function useVoiceMode" ui/src/hooks/useVoiceMode.ts returns match
- grep "useMicVAD" ui/src/hooks/useVadRecorder.ts returns match
- grep "startOnLoad.*false" ui/src/hooks/useVadRecorder.ts returns match
- grep "baseAssetPath" ui/src/hooks/useVadRecorder.ts returns match with "/"
- grep "api/transcribe" ui/src/hooks/useVadRecorder.ts returns match
- grep "api/nexus/settings" ui/src/hooks/useVoiceMode.ts returns match
- grep "encodeWav" ui/src/hooks/useVadRecorder.ts returns match (imports it)
- grep "RIFF" ui/src/lib/encodeWav.ts returns match (WAV header) </acceptance_criteria> encodeWav utility produces valid WAV blobs. useVadRecorder wraps useMicVAD with auto-stop + transcription. useVoiceMode reads/writes voiceMode from nexus-settings API.
- ui/src/components/VoiceMicButton.tsx — VAD-powered mic button:
Implementation:interface VoiceMicButtonProps { onTranscript: (text: string) => void; disabled?: boolean; } export function VoiceMicButton({ onTranscript, disabled }: VoiceMicButtonProps)- Call
useVadRecorder({ onTranscript })to get{ state, start, stop, mediaStream } - Three visual states per UI spec:
a. idle (state === "idle"): Render Button with ghost variant, size="icon", h-8 w-8. Content:
<Mic className="h-4 w-4" />. aria-label="Start voice input". onClick calls start(). b. recording (state === "recording"): Render Button with ghost variant, size="icon", h-8 w-8, withring-2 ring-primaryclasses. Content:<VoiceWaveform stream={mediaStream} active={true} />. aria-label="Recording — speak now". onClick calls stop(). c. processing (state === "processing"): Render Button disabled, ghost variant, size="icon", h-8 w-8. Content:<Loader2 className="h-4 w-4 animate-spin" />. aria-label="Transcribing...". - Import Mic, Loader2 from lucide-react
- Import Button from @/components/ui/button
- Import VoiceWaveform from ./VoiceWaveform
- Import useVadRecorder from ../hooks/useVadRecorder
- When disabled prop is true, render idle state with disabled attribute cd /opt/nexus/.claude/worktrees/agent-a009558f && test -f ui/src/components/VoiceWaveform.tsx && test -f ui/src/components/VoiceMicButton.tsx && grep -q "VoiceWaveform" ui/src/components/VoiceWaveform.tsx && grep -q "VoiceMicButton" ui/src/components/VoiceMicButton.tsx && grep -q "canvas" ui/src/components/VoiceWaveform.tsx && grep -q "useVadRecorder" ui/src/components/VoiceMicButton.tsx && grep -q "Mic" ui/src/components/VoiceMicButton.tsx && grep -q "Loader2" ui/src/components/VoiceMicButton.tsx && grep -q "ring-2 ring-primary" ui/src/components/VoiceMicButton.tsx && echo "PASS" || echo "FAIL" <acceptance_criteria>
- grep "export function VoiceWaveform" ui/src/components/VoiceWaveform.tsx returns match
- grep "export function VoiceMicButton" ui/src/components/VoiceMicButton.tsx returns match
- grep "canvas" ui/src/components/VoiceWaveform.tsx returns match
- grep "AnalyserNode|createAnalyser|analyser" ui/src/components/VoiceWaveform.tsx returns match
- grep "requestAnimationFrame" ui/src/components/VoiceWaveform.tsx returns match
- grep "getByteFrequencyData" ui/src/components/VoiceWaveform.tsx returns match
- grep "useVadRecorder" ui/src/components/VoiceMicButton.tsx returns match
- grep 'aria-label="Start voice input"' ui/src/components/VoiceMicButton.tsx returns match
- grep 'aria-label="Recording' ui/src/components/VoiceMicButton.tsx returns match
- grep 'aria-label="Transcribing' ui/src/components/VoiceMicButton.tsx returns match
- grep "ring-2 ring-primary" ui/src/components/VoiceMicButton.tsx returns match
- grep "Loader2.*animate-spin" ui/src/components/VoiceMicButton.tsx returns match </acceptance_criteria> VoiceWaveform renders 20 animated bars from Web Audio API AnalyserNode on a 80x32 canvas. VoiceMicButton shows idle/recording/processing states with correct icons, aria-labels, and ring styling.
- Call
<success_criteria> Core voice recording pipeline complete: user clicks mic -> VAD listens -> waveform animates -> silence detected -> audio encoded to WAV -> POSTed to /api/transcribe -> transcript returned. Voice mode readable/writable from nexus-settings. </success_criteria>
After completion, create `.planning/phases/37-web-chat-voice-ui/37-02-SUMMARY.md`