nexus/.planning/phases/37-web-chat-voice-ui/37-02-SUMMARY.md
Nexus Dev fc422a2364 docs(37-02): complete voice recording components plan
- SUMMARY.md: encodeWav, useVadRecorder, useVoiceMode, VoiceWaveform, VoiceMicButton
- STATE.md: advanced to plan 3, 71% progress, added decisions
- ROADMAP.md: updated phase-37 progress (2/4 plans done)
- REQUIREMENTS.md: marked WCHAT-01..03, WCHAT-05 complete
2026-04-04 02:37:31 +00:00

4 KiB

phase plan subsystem tags dependency_graph tech_stack key_files decisions metrics
37-web-chat-voice-ui 02 ui/voice-recording
voice
vad
waveform
hooks
components
requires provides affects
37-01
VoiceMicButton
VoiceWaveform
useVadRecorder
useVoiceMode
encodeWav
ChatInput.tsx
added patterns
@ricky0123/vad-react (useMicVAD)
Web Audio API (AnalyserNode)
WAV PCM encoding
VAD auto-stop recording
canvas animation loop
optimistic settings update
created modified
ui/src/lib/encodeWav.ts
ui/src/hooks/useVadRecorder.ts
ui/src/hooks/useVoiceMode.ts
ui/src/components/VoiceWaveform.tsx
ui/src/components/VoiceMicButton.tsx
useVadRecorder requests a separate MediaStream ref for VoiceWaveform AnalyserNode — useMicVAD manages its own stream internally but VoiceWaveform needs an explicit stream reference
AudioContext not closed on cleanup — reused across recording cycles to avoid repeated unlock prompts
WAV encoder clamps Float32Array samples to [-1, 1] then converts to int16 — matches Whisper 16kHz mono PCM input format
VoiceMicButton uses three separate JSX returns rather than conditional rendering within one — cleaner state machines, avoids aria-label switching bugs
duration_seconds completed_date tasks_completed tasks_total files_created files_modified
102 2026-04-04 2 2 5 0

Phase 37 Plan 02: Voice Recording Components Summary

One-liner: VAD-powered voice recording pipeline — WAV encoder, useMicVAD hook with auto-stop + transcription, voice mode settings hook, canvas waveform, and three-state VoiceMicButton.

Tasks Completed

Task Name Commit Files
1 encodeWav utility and useVadRecorder + useVoiceMode hooks 3676c9c3 ui/src/lib/encodeWav.ts, ui/src/hooks/useVadRecorder.ts, ui/src/hooks/useVoiceMode.ts
2 VoiceWaveform canvas component and VoiceMicButton bdb2f770 ui/src/components/VoiceWaveform.tsx, ui/src/components/VoiceMicButton.tsx

What Was Built

encodeWav (ui/src/lib/encodeWav.ts)

Standard 44-byte WAV header encoder (RIFF/WAVE/fmt/data chunks). Converts Float32Array to PCM mono 16-bit WAV Blob at 16kHz. Clamps samples to [-1, 1] before int16 conversion. Whisper-compatible format.

useVadRecorder (ui/src/hooks/useVadRecorder.ts)

Wraps useMicVAD from @ricky0123/vad-react. Configuration: startOnLoad: false, baseAssetPath: "/", onnxWASMBasePath: "/", positiveSpeechThreshold: 0.8, minSpeechFrames: 5. On onSpeechEnd: pauses VAD, encodes to WAV, POSTs FormData to /api/transcribe, calls onTranscript if text length >= 2. Exposes mediaStream ref for VoiceWaveform AnalyserNode.

useVoiceMode (ui/src/hooks/useVoiceMode.ts)

Loads voiceMode from GET /api/nexus/settings on mount. Provides setMode() with optimistic update that PATCHes /api/nexus/settings and reverts on error. Three mode values: text | voice_input | full_voice.

VoiceWaveform (ui/src/components/VoiceWaveform.tsx)

Canvas element (80x32px). When stream and active are truthy: creates AudioContext, MediaStreamSource, AnalyserNode (fftSize=64, 32 bins). Draws 20 bars per frame (skipping every other bin), bar width 2px + gap 2px, height proportional to frequency magnitude. Color from CSS --primary variable. Cancels animation and disconnects source on cleanup.

VoiceMicButton (ui/src/components/VoiceMicButton.tsx)

Three visual states:

  • idle: Mic icon, ghost variant, h-8 w-8, aria-label="Start voice input"
  • recording: VoiceWaveform inside button, ring-2 ring-primary class, aria-label="Recording — speak now"
  • processing: Loader2 animate-spin, disabled, aria-label="Transcribing..."

Deviations from Plan

None — plan executed exactly as written.

Known Stubs

None. All 5 files are fully implemented with real logic, no placeholder values.

Self-Check: PASSED

All 5 files confirmed on disk. Both task commits (3676c9c3, bdb2f770) confirmed in git history.