docs(37-02): complete voice recording components plan

- SUMMARY.md: encodeWav, useVadRecorder, useVoiceMode, VoiceWaveform, VoiceMicButton - STATE.md: advanced to plan 3, 71% progress, added decisions - ROADMAP.md: updated phase-37 progress (2/4 plans done) - REQUIREMENTS.md: marked WCHAT-01..03, WCHAT-05 complete
2026-04-04 02:37:31 +00:00 · 2026-04-04 02:37:31 +00:00 · fc422a2364
commit fc422a2364
parent bdb2f77075
4 changed files with 89 additions and 11 deletions
--- a/.planning/REQUIREMENTS.md
+++ b/.planning/REQUIREMENTS.md
@ -20,9 +20,9 @@

 - [x] **WCHAT-01**: Mic button in chat input starts/stops voice recording with visual state (idle/recording/processing)
 - [x] **WCHAT-02**: Recording auto-stops on silence detection via VAD (voice activity detection)
- [ ] **WCHAT-03**: Real-time waveform/amplitude visualization displays while recording
+- [x] **WCHAT-03**: Real-time waveform/amplitude visualization displays while recording
 - [x] **WCHAT-04**: Voice response audio plays inline in chat message with audio player controls
- [ ] **WCHAT-05**: User can toggle voice mode: text only / voice input only / full voice (input + output)
+- [x] **WCHAT-05**: User can toggle voice mode: text only / voice input only / full voice (input + output)
 - [ ] **WCHAT-06**: Auto-play of voice responses is configurable (on/off in settings)

 ### Telegram Bridge
@ -82,9 +82,9 @@
 | VPIPE-08 | Phase 39 | Pending |
 | WCHAT-01 | Phase 37 | Complete |
 | WCHAT-02 | Phase 37 | Complete |
-| WCHAT-03 | Phase 37 | Pending |
+| WCHAT-03 | Phase 37 | Complete |
 | WCHAT-04 | Phase 37 | Complete |
-| WCHAT-05 | Phase 37 | Pending |
+| WCHAT-05 | Phase 37 | Complete |
 | WCHAT-06 | Phase 37 | Pending |
 | TGRAM-01 | Phase 38 | Pending |
 | TGRAM-02 | Phase 38 | Pending |
--- a/.planning/ROADMAP.md
+++ b/.planning/ROADMAP.md
@ -222,6 +222,6 @@ All 23 v1.6 requirements are mapped to exactly one phase. No orphans.
 | 34. Voice | v1.5 | 2/2 | Complete | 2026-04-03 |
 | 35. npx buildthis CLI | v1.5 | 1/1 | Complete | 2026-04-03 |
 | 36. Voice Pipeline Foundation | v1.6 | 2/3 | Complete    | 2026-04-04 |
-| 37. Web Chat Voice UI | v1.6 | 1/4 | In Progress|  |
+| 37. Web Chat Voice UI | v1.6 | 2/4 | In Progress|  |
 | 38. Telegram Bridge | v1.6 | 0/TBD | Not started | - |
 | 39. Voice Polish | v1.6 | 0/TBD | Not started | - |
--- a/.planning/STATE.md
+++ b/.planning/STATE.md
@ -3,14 +3,14 @@ gsd_state_version: 1.0
 milestone: v1.6
 milestone_name: Voice Pipeline + Minimal Message Bridge
 status: executing
-stopped_at: Completed 37-01-PLAN.md — Server Prerequisites + VAD Browser Infrastructure
-last_updated: "2026-04-04T02:26:30.188Z"
+stopped_at: Completed 37-02-PLAN.md — Voice Recording Components (encodeWav, useVadRecorder, useVoiceMode, VoiceWaveform, VoiceMicButton)
+last_updated: "2026-04-04T02:37:14.447Z"
 last_activity: 2026-04-04
 progress:
  total_phases: 4
  completed_phases: 1
  total_plans: 7
-  completed_plans: 4
+  completed_plans: 5
  percent: 0
 ---

@ -26,7 +26,7 @@ See: .planning/PROJECT.md (updated 2026-04-03)
 ## Current Position

 Phase: 37 (web-chat-voice-ui) — EXECUTING
-Plan: 2 of 4
+Plan: 3 of 4
 Status: Ready to execute
 Last activity: 2026-04-04

@ -61,6 +61,8 @@ Key constraints for v1.6:
 - [Phase 37]: Cherry-picked Phase 36 commits to bring voice pipeline, nexus-settings, and voiceMode wiring to phase-37 branch
 - [Phase 37]: COOP/COEP headers placed as first Express middleware — applies to all responses including API, static, and Vite dev
 - [Phase 37]: VAD ONNX assets served from ui/public/ same-origin to avoid COEP blocking CDN-served binary files
+- [Phase 37]: useVadRecorder requests separate MediaStream ref for VoiceWaveform AnalyserNode — useMicVAD manages its own stream internally
+- [Phase 37]: AudioContext not closed on cleanup in VoiceWaveform — reused across recording cycles to avoid repeated autoplay unlock prompts

 ### Pending Todos

@ -74,6 +76,6 @@ None yet.

 ## Session Continuity

-Last session: 2026-04-04T02:26:30.185Z
-Stopped at: Completed 37-01-PLAN.md — Server Prerequisites + VAD Browser Infrastructure
+Last session: 2026-04-04T02:37:14.444Z
+Stopped at: Completed 37-02-PLAN.md — Voice Recording Components (encodeWav, useVadRecorder, useVoiceMode, VoiceWaveform, VoiceMicButton)
 Resume file: None
--- a/.planning/phases/37-web-chat-voice-ui/37-02-SUMMARY.md
+++ b/.planning/phases/37-web-chat-voice-ui/37-02-SUMMARY.md
@ -0,0 +1,76 @@
+---
+phase: 37-web-chat-voice-ui
+plan: "02"
+subsystem: ui/voice-recording
+tags: [voice, vad, waveform, hooks, components]
+dependency_graph:
+  requires: ["37-01"]
+  provides: ["VoiceMicButton", "VoiceWaveform", "useVadRecorder", "useVoiceMode", "encodeWav"]
+  affects: ["ChatInput.tsx"]
+tech_stack:
+  added: ["@ricky0123/vad-react (useMicVAD)", "Web Audio API (AnalyserNode)", "WAV PCM encoding"]
+  patterns: ["VAD auto-stop recording", "canvas animation loop", "optimistic settings update"]
+key_files:
+  created:
+    - ui/src/lib/encodeWav.ts
+    - ui/src/hooks/useVadRecorder.ts
+    - ui/src/hooks/useVoiceMode.ts
+    - ui/src/components/VoiceWaveform.tsx
+    - ui/src/components/VoiceMicButton.tsx
+  modified: []
+decisions:
+  - "useVadRecorder requests a separate MediaStream ref for VoiceWaveform AnalyserNode — useMicVAD manages its own stream internally but VoiceWaveform needs an explicit stream reference"
+  - "AudioContext not closed on cleanup — reused across recording cycles to avoid repeated unlock prompts"
+  - "WAV encoder clamps Float32Array samples to [-1, 1] then converts to int16 — matches Whisper 16kHz mono PCM input format"
+  - "VoiceMicButton uses three separate JSX returns rather than conditional rendering within one — cleaner state machines, avoids aria-label switching bugs"
+metrics:
+  duration_seconds: 102
+  completed_date: "2026-04-04"
+  tasks_completed: 2
+  tasks_total: 2
+  files_created: 5
+  files_modified: 0
+---
+
+# Phase 37 Plan 02: Voice Recording Components Summary
+
+**One-liner:** VAD-powered voice recording pipeline — WAV encoder, useMicVAD hook with auto-stop + transcription, voice mode settings hook, canvas waveform, and three-state VoiceMicButton.
+
+## Tasks Completed
+
+| Task | Name | Commit | Files |
+|------|------|--------|-------|
+| 1 | encodeWav utility and useVadRecorder + useVoiceMode hooks | 3676c9c3 | ui/src/lib/encodeWav.ts, ui/src/hooks/useVadRecorder.ts, ui/src/hooks/useVoiceMode.ts |
+| 2 | VoiceWaveform canvas component and VoiceMicButton | bdb2f770 | ui/src/components/VoiceWaveform.tsx, ui/src/components/VoiceMicButton.tsx |
+
+## What Was Built
+
+### encodeWav (ui/src/lib/encodeWav.ts)
+Standard 44-byte WAV header encoder (RIFF/WAVE/fmt/data chunks). Converts Float32Array to PCM mono 16-bit WAV Blob at 16kHz. Clamps samples to [-1, 1] before int16 conversion. Whisper-compatible format.
+
+### useVadRecorder (ui/src/hooks/useVadRecorder.ts)
+Wraps `useMicVAD` from `@ricky0123/vad-react`. Configuration: `startOnLoad: false`, `baseAssetPath: "/"`, `onnxWASMBasePath: "/"`, `positiveSpeechThreshold: 0.8`, `minSpeechFrames: 5`. On `onSpeechEnd`: pauses VAD, encodes to WAV, POSTs FormData to `/api/transcribe`, calls `onTranscript` if text length >= 2. Exposes `mediaStream` ref for VoiceWaveform AnalyserNode.
+
+### useVoiceMode (ui/src/hooks/useVoiceMode.ts)
+Loads `voiceMode` from `GET /api/nexus/settings` on mount. Provides `setMode()` with optimistic update that PATCHes `/api/nexus/settings` and reverts on error. Three mode values: `text | voice_input | full_voice`.
+
+### VoiceWaveform (ui/src/components/VoiceWaveform.tsx)
+Canvas element (80x32px). When `stream` and `active` are truthy: creates AudioContext, MediaStreamSource, AnalyserNode (fftSize=64, 32 bins). Draws 20 bars per frame (skipping every other bin), bar width 2px + gap 2px, height proportional to frequency magnitude. Color from CSS `--primary` variable. Cancels animation and disconnects source on cleanup.
+
+### VoiceMicButton (ui/src/components/VoiceMicButton.tsx)
+Three visual states:
+- **idle**: `Mic` icon, ghost variant, h-8 w-8, `aria-label="Start voice input"`
+- **recording**: `VoiceWaveform` inside button, `ring-2 ring-primary` class, `aria-label="Recording — speak now"`
+- **processing**: `Loader2 animate-spin`, disabled, `aria-label="Transcribing..."`
+
+## Deviations from Plan
+
+None — plan executed exactly as written.
+
+## Known Stubs
+
+None. All 5 files are fully implemented with real logic, no placeholder values.
+
+## Self-Check: PASSED
+
+All 5 files confirmed on disk. Both task commits (3676c9c3, bdb2f770) confirmed in git history.