From fdc956c6a63d313a91be8d0a116dea9496f5a132 Mon Sep 17 00:00:00 2001 From: Nexus Dev Date: Sat, 4 Apr 2026 02:07:19 +0000 Subject: [PATCH] =?UTF-8?q?docs(37):=20phase=20research=20=E2=80=94=20VAD,?= =?UTF-8?q?=20COOP/COEP,=20component=20architecture?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- .../37-web-chat-voice-ui/37-RESEARCH.md | 567 ++++++++++++++++++ 1 file changed, 567 insertions(+) create mode 100644 .planning/phases/37-web-chat-voice-ui/37-RESEARCH.md diff --git a/.planning/phases/37-web-chat-voice-ui/37-RESEARCH.md b/.planning/phases/37-web-chat-voice-ui/37-RESEARCH.md new file mode 100644 index 00000000..b3642681 --- /dev/null +++ b/.planning/phases/37-web-chat-voice-ui/37-RESEARCH.md @@ -0,0 +1,567 @@ +# Phase 37: Web Chat Voice UI - Research + +**Researched:** 2026-04-03 +**Domain:** Browser voice I/O — VAD, MediaRecorder, Web Audio API, waveform visualization, audio playback, COOP/COEP headers +**Confidence:** HIGH + +--- + + +## User Constraints (from CONTEXT.md) + +### Locked Decisions +All implementation choices are at Claude's discretion — discuss phase was skipped per user setting. + +### Claude's Discretion +All implementation details. Use ROADMAP phase goal, success criteria, and codebase conventions. + +Key research findings baked into context: +- `@ricky0123/vad-react ^0.0.36` for browser-side silence detection (VAD) +- COOP/COEP headers required on Express server for SharedArrayBuffer +- Waveform via Web Audio API AnalyserNode (Canvas or SVG, 30-50 data points) +- Native ` + + +## Phase Requirements + +| ID | Description | Research Support | +|----|-------------|------------------| +| WCHAT-01 | Mic button in chat input starts/stops voice recording with visual state (idle/recording/processing) | VoiceMicButton replaces VoiceRecordButton; three-state via recording/userSpeaking/loading from useMicVAD | +| WCHAT-02 | Recording auto-stops on silence detection via VAD | useMicVAD onSpeechEnd callback fires automatically after 1.5s silence; no manual stop needed | +| WCHAT-03 | Real-time waveform/amplitude visualization displays while recording | VoiceWaveform canvas component using Web Audio API AnalyserNode + requestAnimationFrame | +| WCHAT-04 | Voice response audio plays inline in chat message with audio player controls | ChatVoicePlayer with native ` + +--- + +## Summary + +Phase 37 adds browser-based voice I/O to the existing web chat. Phase 36 delivered the server-side pipeline (VoicePipelineService, POST /api/transcribe, POST /api/synthesize, voiceMode wiring in chat.ts) and the nexus-settings schema extension. Phase 37 is entirely a frontend phase with one server-side addition: COOP/COEP response headers on the Express static middleware. + +The central library is `@ricky0123/vad-react ^0.0.36`, which wraps Silero VAD running in an AudioWorklet. It requires the page to be cross-origin isolated (COOP + COEP headers) to use SharedArrayBuffer. The package ships ONNX model files and a worklet bundle that must either be served locally from `public/` or loaded from its default CDN URLs. The CDN default is simpler and acceptable for development; production should serve them locally. + +Waveform visualization uses a standard Web Audio API AnalyserNode pattern: connect the microphone stream → AnalyserNode → read Uint8Array in requestAnimationFrame loop → render bars on a ``. This is entirely in-browser with no extra library. Audio playback for synthesized responses uses the native `