nexus/.planning/phases/37-web-chat-voice-ui/37-02-PLAN.md at 1c8a26dbb422b9457864c09809cd28815268598c

mikkel/nexus

Fork 0

Nexus Dev 1eaa6c4b3e docs(37): create 4 plans in 3 waves for web chat voice UI

2026-04-04 03:55:50 +00:00

14 KiB

Raw Blame History

phase

plan

type

wave

depends_on

files_modified

autonomous

requirements

must_haves

37-web-chat-voice-ui

execute

37-01

ui/src/lib/encodeWav.ts

ui/src/hooks/useVadRecorder.ts

ui/src/hooks/useVoiceMode.ts

ui/src/components/VoiceWaveform.tsx

ui/src/components/VoiceMicButton.tsx

true

WCHAT-01

WCHAT-02

WCHAT-03

WCHAT-05

truths

artifacts

key_links

VoiceMicButton renders three visual states: idle (Mic icon), recording (waveform + ring), processing (Loader2 spinner)

Recording auto-stops on silence via VAD onSpeechEnd callback

VoiceWaveform renders animated canvas bars during recording

useVadRecorder converts Float32Array to WAV and POSTs to /api/transcribe

useVoiceMode reads voiceMode from GET /api/nexus/settings and writes via PATCH

path

provides

exports

ui/src/lib/encodeWav.ts

Float32Array to WAV blob encoder

encodeWav

path

provides

exports

ui/src/hooks/useVadRecorder.ts

VAD recording hook with auto-stop

useVadRecorder

path

provides

exports

ui/src/hooks/useVoiceMode.ts

Voice mode state from nexus-settings

useVoiceMode

path

provides

exports

ui/src/components/VoiceWaveform.tsx

Canvas amplitude visualization

VoiceWaveform

path

provides

exports

ui/src/components/VoiceMicButton.tsx

VAD-powered mic button with three states

VoiceMicButton

from	to	via	pattern
ui/src/components/VoiceMicButton.tsx	ui/src/hooks/useVadRecorder.ts	useVadRecorder() hook call	useVadRecorder

from	to	via	pattern
ui/src/hooks/useVadRecorder.ts	ui/src/lib/encodeWav.ts	encodeWav(audio) in onSpeechEnd	encodeWav

from	to	via	pattern
ui/src/hooks/useVadRecorder.ts	/api/transcribe	fetch POST with FormData	fetch.*api/transcribe

from	to	via	pattern
ui/src/components/VoiceMicButton.tsx	ui/src/components/VoiceWaveform.tsx	VoiceWaveform rendered inside recording state	<VoiceWaveform

Build the core voice recording components: WAV encoder, VAD recorder hook, voice mode hook, waveform visualization, and the VoiceMicButton that ties them together.

Purpose: These are the foundational building blocks that replace VoiceRecordButton with VAD-powered auto-stop recording and real-time waveform visualization.

Output: 5 new files — encodeWav utility, useVadRecorder hook, useVoiceMode hook, VoiceWaveform component, VoiceMicButton component

<execution_context> @$HOME/.claude/get-shit-done/workflows/execute-plan.md @$HOME/.claude/get-shit-done/templates/summary.md </execution_context>

@.planning/phases/37-web-chat-voice-ui/37-RESEARCH.md @.planning/phases/37-web-chat-voice-ui/37-01-SUMMARY.md ```typescript // @ricky0123/vad-react useMicVAD hook import { useMicVAD } from "@ricky0123/vad-react"; const vad = useMicVAD({ startOnLoad: false, baseAssetPath: "/", onnxWASMBasePath: "/", positiveSpeechThreshold: 0.8, negativeSpeechThreshold: 0.65, redemptionFrames: 8, minSpeechFrames: 5, onSpeechStart: () => void, onSpeechEnd: (audio: Float32Array) => void, }); // Returns: { listening, loading, errored, userSpeaking, start, pause } ```

interface VoiceRecordButtonProps {
  onTranscription: (text: string) => void;
  disabled?: boolean;
}

GET  /api/nexus/settings → { mode, voiceEnabled, voiceMode, ... }
PATCH /api/nexus/settings → accepts partial, returns updated

Task 1: Create encodeWav utility and useVadRecorder + useVoiceMode hooks ui/src/lib/encodeWav.ts, ui/src/hooks/useVadRecorder.ts, ui/src/hooks/useVoiceMode.ts ui/src/hooks/useStreamingChat.ts, ui/src/api/chat.ts, ui/src/components/VoiceRecordButton.tsx 1. **ui/src/lib/encodeWav.ts** — Create WAV encoder function: ```typescript export function encodeWav(samples: Float32Array, sampleRate = 16000): Blob ``` - Standard 44-byte WAV header (RIFF/WAVE/fmt/data chunks) - PCM format (1), mono (1 channel), 16-bit depth - Clamp samples to [-1, 1] range before int16 conversion - Return Blob with type "audio/wav" - Helper: `function writeString(view: DataView, offset: number, str: string)`

ui/src/hooks/useVadRecorder.ts — Create VAD recording hook:
```
interface UseVadRecorderOptions {
  onTranscript: (text: string) => void;
}
interface UseVadRecorderReturn {
  state: "idle" | "recording" | "processing";
  start: () => void;
  stop: () => void;
  mediaStream: MediaStream | null; // exposed for VoiceWaveform AnalyserNode
}
export function useVadRecorder(opts: UseVadRecorderOptions): UseVadRecorderReturn
```
Implementation:
- Use useMicVAD from @ricky0123/vad-react with startOnLoad: false
- Set baseAssetPath: "/" and onnxWASMBasePath: "/" (serve from ui/public/)
- Set positiveSpeechThreshold: 0.8, minSpeechFrames: 5 (300ms minimum to filter noise)
- In onSpeechEnd(audio: Float32Array): a. Call vad.pause() to stop listening b. Set state to "processing" c. Call encodeWav(audio) to get WAV blob d. Create FormData, append blob as "audio" field with filename "recording.wav" e. POST to /api/transcribe with credentials: "include" f. Parse response as { text: string } g. If text is non-empty (length >= 2), call opts.onTranscript(text.trim()) h. Set state back to "idle"
- start(): calls vad.start(), sets state to "recording"
- stop(): calls vad.pause(), sets state to "idle"
- Expose mediaStream from navigator.mediaDevices.getUserMedia({ audio: true }) — store in a ref. This is needed for VoiceWaveform AnalyserNode.
- NOTE: useMicVAD manages its own media stream internally, but VoiceWaveform needs a separate reference to the stream for the AnalyserNode. Request the stream in the start() function and store in a ref. Stop tracks in stop().
ui/src/hooks/useVoiceMode.ts — Create voice mode hook:
```
type VoiceMode = "text" | "voice_input" | "full_voice";
interface UseVoiceModeReturn {
  mode: VoiceMode;
  setMode: (next: VoiceMode) => Promise<void>;
  isLoading: boolean;
}
export function useVoiceMode(): UseVoiceModeReturn
```
Implementation:
- On mount, GET /api/nexus/settings with credentials: "include"
- Extract voiceMode from response, default to "text"
- setMode(next): optimistically update local state, then PATCH /api/nexus/settings with { voiceMode: next }
- Use useState for mode and isLoading
- Wrap fetch in try/catch; on error, revert to previous mode cd /opt/nexus/.claude/worktrees/agent-a009558f && test -f ui/src/lib/encodeWav.ts && test -f ui/src/hooks/useVadRecorder.ts && test -f ui/src/hooks/useVoiceMode.ts && grep -q "encodeWav" ui/src/lib/encodeWav.ts && grep -q "useVadRecorder" ui/src/hooks/useVadRecorder.ts && grep -q "useVoiceMode" ui/src/hooks/useVoiceMode.ts && grep -q "useMicVAD" ui/src/hooks/useVadRecorder.ts && grep -q "api/transcribe" ui/src/hooks/useVadRecorder.ts && grep -q "api/nexus/settings" ui/src/hooks/useVoiceMode.ts && echo "PASS" || echo "FAIL" <acceptance_criteria>
- grep "export function encodeWav" ui/src/lib/encodeWav.ts returns match
- grep "export function useVadRecorder" ui/src/hooks/useVadRecorder.ts returns match
- grep "export function useVoiceMode" ui/src/hooks/useVoiceMode.ts returns match
- grep "useMicVAD" ui/src/hooks/useVadRecorder.ts returns match
- grep "startOnLoad.*false" ui/src/hooks/useVadRecorder.ts returns match
- grep "baseAssetPath" ui/src/hooks/useVadRecorder.ts returns match with "/"
- grep "api/transcribe" ui/src/hooks/useVadRecorder.ts returns match
- grep "api/nexus/settings" ui/src/hooks/useVoiceMode.ts returns match
- grep "encodeWav" ui/src/hooks/useVadRecorder.ts returns match (imports it)
- grep "RIFF" ui/src/lib/encodeWav.ts returns match (WAV header) </acceptance_criteria> encodeWav utility produces valid WAV blobs. useVadRecorder wraps useMicVAD with auto-stop + transcription. useVoiceMode reads/writes voiceMode from nexus-settings API.

Task 2: Create VoiceWaveform canvas component and VoiceMicButton ui/src/components/VoiceWaveform.tsx, ui/src/components/VoiceMicButton.tsx ui/src/hooks/useVadRecorder.ts, ui/src/lib/encodeWav.ts, ui/src/components/VoiceRecordButton.tsx 1. **ui/src/components/VoiceWaveform.tsx** — Canvas-based amplitude visualization: ```typescript interface VoiceWaveformProps { stream: MediaStream | null; active: boolean; // controls animation loop } export function VoiceWaveform({ stream, active }: VoiceWaveformProps) ``` Implementation: - Use a `` element, width=80, height=32 (h-8 per UI spec), className="inline-block" - On mount (when stream is truthy and active is true): a. Create AudioContext (lazily — only create once, store in ref) b. If AudioContext is suspended, call `audioCtx.resume()` c. Create MediaStreamSource from stream d. Create AnalyserNode with fftSize=64 (gives 32 frequency bins) e. Connect source -> analyser f. Start requestAnimationFrame loop: - Call `analyser.getByteFrequencyData(dataArray)` into Uint8Array(32) - Clear canvas - Draw 20 bars (skip every other bin for cleaner look): each bar width=2px, gap=2px - Bar height = (dataArray[i*2] / 255) * canvasHeight, minimum 2px - Bar color: use CSS variable --primary via getComputedStyle g. Store animationFrame id in ref for cleanup - On cleanup or when active becomes false: cancelAnimationFrame, disconnect source - Do NOT close AudioContext on cleanup (reuse across start/stop cycles)

ui/src/components/VoiceMicButton.tsx — VAD-powered mic button:
```
interface VoiceMicButtonProps {
  onTranscript: (text: string) => void;
  disabled?: boolean;
}
export function VoiceMicButton({ onTranscript, disabled }: VoiceMicButtonProps)
```
Implementation:
- Call useVadRecorder({ onTranscript }) to get { state, start, stop, mediaStream }
- Three visual states per UI spec: a. idle (state === "idle"): Render Button with ghost variant, size="icon", h-8 w-8. Content: <Mic className="h-4 w-4" />. aria-label="Start voice input". onClick calls start(). b. recording (state === "recording"): Render Button with ghost variant, size="icon", h-8 w-8, with ring-2 ring-primary classes. Content: <VoiceWaveform stream={mediaStream} active={true} />. aria-label="Recording — speak now". onClick calls stop(). c. processing (state === "processing"): Render Button disabled, ghost variant, size="icon", h-8 w-8. Content: <Loader2 className="h-4 w-4 animate-spin" />. aria-label="Transcribing...".
- Import Mic, Loader2 from lucide-react
- Import Button from @/components/ui/button
- Import VoiceWaveform from ./VoiceWaveform
- Import useVadRecorder from ../hooks/useVadRecorder
- When disabled prop is true, render idle state with disabled attribute cd /opt/nexus/.claude/worktrees/agent-a009558f && test -f ui/src/components/VoiceWaveform.tsx && test -f ui/src/components/VoiceMicButton.tsx && grep -q "VoiceWaveform" ui/src/components/VoiceWaveform.tsx && grep -q "VoiceMicButton" ui/src/components/VoiceMicButton.tsx && grep -q "canvas" ui/src/components/VoiceWaveform.tsx && grep -q "useVadRecorder" ui/src/components/VoiceMicButton.tsx && grep -q "Mic" ui/src/components/VoiceMicButton.tsx && grep -q "Loader2" ui/src/components/VoiceMicButton.tsx && grep -q "ring-2 ring-primary" ui/src/components/VoiceMicButton.tsx && echo "PASS" || echo "FAIL" <acceptance_criteria>
- grep "export function VoiceWaveform" ui/src/components/VoiceWaveform.tsx returns match
- grep "export function VoiceMicButton" ui/src/components/VoiceMicButton.tsx returns match
- grep "canvas" ui/src/components/VoiceWaveform.tsx returns match
- grep "AnalyserNode|createAnalyser|analyser" ui/src/components/VoiceWaveform.tsx returns match
- grep "requestAnimationFrame" ui/src/components/VoiceWaveform.tsx returns match
- grep "getByteFrequencyData" ui/src/components/VoiceWaveform.tsx returns match
- grep "useVadRecorder" ui/src/components/VoiceMicButton.tsx returns match
- grep 'aria-label="Start voice input"' ui/src/components/VoiceMicButton.tsx returns match
- grep 'aria-label="Recording' ui/src/components/VoiceMicButton.tsx returns match
- grep 'aria-label="Transcribing' ui/src/components/VoiceMicButton.tsx returns match
- grep "ring-2 ring-primary" ui/src/components/VoiceMicButton.tsx returns match
- grep "Loader2.*animate-spin" ui/src/components/VoiceMicButton.tsx returns match </acceptance_criteria> VoiceWaveform renders 20 animated bars from Web Audio API AnalyserNode on a 80x32 canvas. VoiceMicButton shows idle/recording/processing states with correct icons, aria-labels, and ring styling.

- All 5 files exist and export their named functions - useVadRecorder uses useMicVAD with startOnLoad: false and baseAssetPath: "/" - VoiceMicButton has three distinct visual states with correct aria-labels - VoiceWaveform uses canvas + AnalyserNode pattern - encodeWav produces Blob with type audio/wav - useVoiceMode reads/writes via /api/nexus/settings

<success_criteria> Core voice recording pipeline complete: user clicks mic -> VAD listens -> waveform animates -> silence detected -> audio encoded to WAV -> POSTed to /api/transcribe -> transcript returned. Voice mode readable/writable from nexus-settings. </success_criteria>

After completion, create `.planning/phases/37-web-chat-voice-ui/37-02-SUMMARY.md`

14 KiB Raw Blame History

14 KiB

Raw Blame History