- SUMMARY.md: encodeWav, useVadRecorder, useVoiceMode, VoiceWaveform, VoiceMicButton - STATE.md: advanced to plan 3, 71% progress, added decisions - ROADMAP.md: updated phase-37 progress (2/4 plans done) - REQUIREMENTS.md: marked WCHAT-01..03, WCHAT-05 complete
4 KiB
| phase | plan | subsystem | tags | dependency_graph | tech_stack | key_files | decisions | metrics | |||||||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 37-web-chat-voice-ui | 02 | ui/voice-recording |
|
|
|
|
|
|
Phase 37 Plan 02: Voice Recording Components Summary
One-liner: VAD-powered voice recording pipeline — WAV encoder, useMicVAD hook with auto-stop + transcription, voice mode settings hook, canvas waveform, and three-state VoiceMicButton.
Tasks Completed
| Task | Name | Commit | Files |
|---|---|---|---|
| 1 | encodeWav utility and useVadRecorder + useVoiceMode hooks | 3676c9c3 |
ui/src/lib/encodeWav.ts, ui/src/hooks/useVadRecorder.ts, ui/src/hooks/useVoiceMode.ts |
| 2 | VoiceWaveform canvas component and VoiceMicButton | bdb2f770 |
ui/src/components/VoiceWaveform.tsx, ui/src/components/VoiceMicButton.tsx |
What Was Built
encodeWav (ui/src/lib/encodeWav.ts)
Standard 44-byte WAV header encoder (RIFF/WAVE/fmt/data chunks). Converts Float32Array to PCM mono 16-bit WAV Blob at 16kHz. Clamps samples to [-1, 1] before int16 conversion. Whisper-compatible format.
useVadRecorder (ui/src/hooks/useVadRecorder.ts)
Wraps useMicVAD from @ricky0123/vad-react. Configuration: startOnLoad: false, baseAssetPath: "/", onnxWASMBasePath: "/", positiveSpeechThreshold: 0.8, minSpeechFrames: 5. On onSpeechEnd: pauses VAD, encodes to WAV, POSTs FormData to /api/transcribe, calls onTranscript if text length >= 2. Exposes mediaStream ref for VoiceWaveform AnalyserNode.
useVoiceMode (ui/src/hooks/useVoiceMode.ts)
Loads voiceMode from GET /api/nexus/settings on mount. Provides setMode() with optimistic update that PATCHes /api/nexus/settings and reverts on error. Three mode values: text | voice_input | full_voice.
VoiceWaveform (ui/src/components/VoiceWaveform.tsx)
Canvas element (80x32px). When stream and active are truthy: creates AudioContext, MediaStreamSource, AnalyserNode (fftSize=64, 32 bins). Draws 20 bars per frame (skipping every other bin), bar width 2px + gap 2px, height proportional to frequency magnitude. Color from CSS --primary variable. Cancels animation and disconnects source on cleanup.
VoiceMicButton (ui/src/components/VoiceMicButton.tsx)
Three visual states:
- idle:
Micicon, ghost variant, h-8 w-8,aria-label="Start voice input" - recording:
VoiceWaveforminside button,ring-2 ring-primaryclass,aria-label="Recording — speak now" - processing:
Loader2 animate-spin, disabled,aria-label="Transcribing..."
Deviations from Plan
None — plan executed exactly as written.
Known Stubs
None. All 5 files are fully implemented with real logic, no placeholder values.
Self-Check: PASSED
All 5 files confirmed on disk. Both task commits (3676c9c3, bdb2f770) confirmed in git history.