nexus/.planning/phases/37-web-chat-voice-ui/37-02-SUMMARY.md
Nexus Dev c85a5016ac docs(37-02): complete voice recording components plan
- SUMMARY.md: encodeWav, useVadRecorder, useVoiceMode, VoiceWaveform, VoiceMicButton
- STATE.md: advanced to plan 3, 71% progress, added decisions
- ROADMAP.md: updated phase-37 progress (2/4 plans done)
- REQUIREMENTS.md: marked WCHAT-01..03, WCHAT-05 complete
2026-04-04 03:55:50 +00:00

76 lines
4 KiB
Markdown

---
phase: 37-web-chat-voice-ui
plan: "02"
subsystem: ui/voice-recording
tags: [voice, vad, waveform, hooks, components]
dependency_graph:
requires: ["37-01"]
provides: ["VoiceMicButton", "VoiceWaveform", "useVadRecorder", "useVoiceMode", "encodeWav"]
affects: ["ChatInput.tsx"]
tech_stack:
added: ["@ricky0123/vad-react (useMicVAD)", "Web Audio API (AnalyserNode)", "WAV PCM encoding"]
patterns: ["VAD auto-stop recording", "canvas animation loop", "optimistic settings update"]
key_files:
created:
- ui/src/lib/encodeWav.ts
- ui/src/hooks/useVadRecorder.ts
- ui/src/hooks/useVoiceMode.ts
- ui/src/components/VoiceWaveform.tsx
- ui/src/components/VoiceMicButton.tsx
modified: []
decisions:
- "useVadRecorder requests a separate MediaStream ref for VoiceWaveform AnalyserNode — useMicVAD manages its own stream internally but VoiceWaveform needs an explicit stream reference"
- "AudioContext not closed on cleanup — reused across recording cycles to avoid repeated unlock prompts"
- "WAV encoder clamps Float32Array samples to [-1, 1] then converts to int16 — matches Whisper 16kHz mono PCM input format"
- "VoiceMicButton uses three separate JSX returns rather than conditional rendering within one — cleaner state machines, avoids aria-label switching bugs"
metrics:
duration_seconds: 102
completed_date: "2026-04-04"
tasks_completed: 2
tasks_total: 2
files_created: 5
files_modified: 0
---
# Phase 37 Plan 02: Voice Recording Components Summary
**One-liner:** VAD-powered voice recording pipeline — WAV encoder, useMicVAD hook with auto-stop + transcription, voice mode settings hook, canvas waveform, and three-state VoiceMicButton.
## Tasks Completed
| Task | Name | Commit | Files |
|------|------|--------|-------|
| 1 | encodeWav utility and useVadRecorder + useVoiceMode hooks | 3676c9c3 | ui/src/lib/encodeWav.ts, ui/src/hooks/useVadRecorder.ts, ui/src/hooks/useVoiceMode.ts |
| 2 | VoiceWaveform canvas component and VoiceMicButton | bdb2f770 | ui/src/components/VoiceWaveform.tsx, ui/src/components/VoiceMicButton.tsx |
## What Was Built
### encodeWav (ui/src/lib/encodeWav.ts)
Standard 44-byte WAV header encoder (RIFF/WAVE/fmt/data chunks). Converts Float32Array to PCM mono 16-bit WAV Blob at 16kHz. Clamps samples to [-1, 1] before int16 conversion. Whisper-compatible format.
### useVadRecorder (ui/src/hooks/useVadRecorder.ts)
Wraps `useMicVAD` from `@ricky0123/vad-react`. Configuration: `startOnLoad: false`, `baseAssetPath: "/"`, `onnxWASMBasePath: "/"`, `positiveSpeechThreshold: 0.8`, `minSpeechFrames: 5`. On `onSpeechEnd`: pauses VAD, encodes to WAV, POSTs FormData to `/api/transcribe`, calls `onTranscript` if text length >= 2. Exposes `mediaStream` ref for VoiceWaveform AnalyserNode.
### useVoiceMode (ui/src/hooks/useVoiceMode.ts)
Loads `voiceMode` from `GET /api/nexus/settings` on mount. Provides `setMode()` with optimistic update that PATCHes `/api/nexus/settings` and reverts on error. Three mode values: `text | voice_input | full_voice`.
### VoiceWaveform (ui/src/components/VoiceWaveform.tsx)
Canvas element (80x32px). When `stream` and `active` are truthy: creates AudioContext, MediaStreamSource, AnalyserNode (fftSize=64, 32 bins). Draws 20 bars per frame (skipping every other bin), bar width 2px + gap 2px, height proportional to frequency magnitude. Color from CSS `--primary` variable. Cancels animation and disconnects source on cleanup.
### VoiceMicButton (ui/src/components/VoiceMicButton.tsx)
Three visual states:
- **idle**: `Mic` icon, ghost variant, h-8 w-8, `aria-label="Start voice input"`
- **recording**: `VoiceWaveform` inside button, `ring-2 ring-primary` class, `aria-label="Recording — speak now"`
- **processing**: `Loader2 animate-spin`, disabled, `aria-label="Transcribing..."`
## Deviations from Plan
None — plan executed exactly as written.
## Known Stubs
None. All 5 files are fully implemented with real logic, no placeholder values.
## Self-Check: PASSED
All 5 files confirmed on disk. Both task commits (3676c9c3, bdb2f770) confirmed in git history.