Nexus Dev c85a5016ac docs(37-02): complete voice recording components plan

- SUMMARY.md: encodeWav, useVadRecorder, useVoiceMode, VoiceWaveform, VoiceMicButton
- STATE.md: advanced to plan 3, 71% progress, added decisions
- ROADMAP.md: updated phase-37 progress (2/4 plans done)
- REQUIREMENTS.md: marked WCHAT-01..03, WCHAT-05 complete

2026-04-04 03:55:50 +00:00

4 KiB

Raw Blame History

phase

plan

subsystem

tags

dependency_graph

tech_stack

key_files

decisions

metrics

37-web-chat-voice-ui

ui/voice-recording

voice

vad

waveform

hooks

components

requires

provides

affects

37-01

VoiceMicButton

VoiceWaveform

useVadRecorder

useVoiceMode

encodeWav

ChatInput.tsx

added

patterns

@ricky0123/vad-react (useMicVAD)

Web Audio API (AnalyserNode)

WAV PCM encoding

VAD auto-stop recording

canvas animation loop

optimistic settings update

created

modified

ui/src/lib/encodeWav.ts

ui/src/hooks/useVadRecorder.ts

ui/src/hooks/useVoiceMode.ts

ui/src/components/VoiceWaveform.tsx

ui/src/components/VoiceMicButton.tsx

useVadRecorder requests a separate MediaStream ref for VoiceWaveform AnalyserNode — useMicVAD manages its own stream internally but VoiceWaveform needs an explicit stream reference

AudioContext not closed on cleanup — reused across recording cycles to avoid repeated unlock prompts

WAV encoder clamps Float32Array samples to [-1, 1] then converts to int16 — matches Whisper 16kHz mono PCM input format

VoiceMicButton uses three separate JSX returns rather than conditional rendering within one — cleaner state machines, avoids aria-label switching bugs

duration_seconds	completed_date	tasks_completed	tasks_total	files_created	files_modified
102	2026-04-04	2	2	5	0

Phase 37 Plan 02: Voice Recording Components Summary

One-liner: VAD-powered voice recording pipeline — WAV encoder, useMicVAD hook with auto-stop + transcription, voice mode settings hook, canvas waveform, and three-state VoiceMicButton.

Tasks Completed

Task	Name	Commit	Files
1	encodeWav utility and useVadRecorder + useVoiceMode hooks	`3676c9c3`	ui/src/lib/encodeWav.ts, ui/src/hooks/useVadRecorder.ts, ui/src/hooks/useVoiceMode.ts
2	VoiceWaveform canvas component and VoiceMicButton	`bdb2f770`	ui/src/components/VoiceWaveform.tsx, ui/src/components/VoiceMicButton.tsx

What Was Built

encodeWav (ui/src/lib/encodeWav.ts)

Standard 44-byte WAV header encoder (RIFF/WAVE/fmt/data chunks). Converts Float32Array to PCM mono 16-bit WAV Blob at 16kHz. Clamps samples to [-1, 1] before int16 conversion. Whisper-compatible format.

useVadRecorder (ui/src/hooks/useVadRecorder.ts)

Wraps useMicVAD from @ricky0123/vad-react. Configuration: startOnLoad: false, baseAssetPath: "/", onnxWASMBasePath: "/", positiveSpeechThreshold: 0.8, minSpeechFrames: 5. On onSpeechEnd: pauses VAD, encodes to WAV, POSTs FormData to /api/transcribe, calls onTranscript if text length >= 2. Exposes mediaStream ref for VoiceWaveform AnalyserNode.

useVoiceMode (ui/src/hooks/useVoiceMode.ts)

Loads voiceMode from GET /api/nexus/settings on mount. Provides setMode() with optimistic update that PATCHes /api/nexus/settings and reverts on error. Three mode values: text | voice_input | full_voice.

VoiceWaveform (ui/src/components/VoiceWaveform.tsx)

Canvas element (80x32px). When stream and active are truthy: creates AudioContext, MediaStreamSource, AnalyserNode (fftSize=64, 32 bins). Draws 20 bars per frame (skipping every other bin), bar width 2px + gap 2px, height proportional to frequency magnitude. Color from CSS --primary variable. Cancels animation and disconnects source on cleanup.

VoiceMicButton (ui/src/components/VoiceMicButton.tsx)

Three visual states:

idle: Mic icon, ghost variant, h-8 w-8, aria-label="Start voice input"
recording: VoiceWaveform inside button, ring-2 ring-primary class, aria-label="Recording — speak now"
processing: Loader2 animate-spin, disabled, aria-label="Transcribing..."

Deviations from Plan

None — plan executed exactly as written.

Known Stubs

None. All 5 files are fully implemented with real logic, no placeholder values.

Self-Check: PASSED

All 5 files confirmed on disk. Both task commits (3676c9c3, bdb2f770) confirmed in git history.

4 KiB Raw Blame History