nexus/.planning/phases/37-web-chat-voice-ui/37-CONTEXT.md

2.8 KiB

Phase 37: Web Chat Voice UI - Context

Gathered: 2026-04-04 Status: Ready for planning Mode: Auto-generated (discuss skipped via workflow.skip_discuss)

## Phase Boundary

Users can speak to any agent in web chat — recording auto-stops on silence, a live waveform confirms the mic is active, responses play back automatically (toggleable), and voice mode is a first-class setting.

Requirements: WCHAT-01, WCHAT-02, WCHAT-03, WCHAT-04, WCHAT-05, WCHAT-06

## Implementation Decisions

Claude's Discretion

All implementation choices are at Claude's discretion — discuss phase was skipped per user setting. Use ROADMAP phase goal, success criteria, and codebase conventions to guide decisions.

Key research findings to incorporate:

  • @ricky0123/vad-react ^0.0.36 for browser-side silence detection (VAD) — delivers Float32Array at 16kHz on speech end
  • COOP/COEP headers required on Express server for SharedArrayBuffer (Cross-Origin-Opener-Policy: same-origin, Cross-Origin-Embedder-Policy: require-corp)
  • Waveform via Web Audio API AnalyserNode (Canvas or SVG, 30-50 data points)
  • Native <audio> element + URL.createObjectURL() for playback — no extra library needed
  • Three-state voice mode: "text" | "voice_input" | "full_voice" (already in nexus-settings schema from Phase 36)
  • VoiceMicButton replaces/enhances existing VoiceRecordButton from v1.5
  • Voice badge + expandable markdown section in ChatMessage for voice interactions
  • AudioContext must be unlocked on user's "start voice mode" gesture to avoid autoplay policy blocks
  • Existing hooks: usePiperTts (client-side TTS), VoiceRecordButton (MediaRecorder), TtsButton
  • POST /api/transcribe and POST /api/synthesize endpoints available from Phase 36

<code_context>

Existing Code Insights

Reusable Assets

  • ui/src/components/VoiceRecordButton.tsx — existing MediaRecorder-based recording button
  • ui/src/components/TtsButton.tsx — existing TTS playback button
  • ui/src/hooks/usePiperTts.ts — client-side Piper WASM TTS hook
  • server/src/routes/voice.ts — POST /api/transcribe, POST /api/synthesize (Phase 36)
  • server/src/services/nexus-settings.ts — voiceMode setting already in schema

Established Patterns

  • React hooks in ui/src/hooks/
  • Components in ui/src/components/
  • Settings via nexus-settings service + React hooks

Integration Points

  • ui/src/components/ChatInput.tsx — mic button integration point
  • ui/src/components/ChatMessage.tsx — voice badge + audio player
  • Express static middleware — COOP/COEP headers
  • server/src/app.ts — header middleware

</code_context>

## Specific Ideas

No specific requirements — discuss phase skipped. Refer to ROADMAP phase description and success criteria.

## Deferred Ideas

None — discuss phase skipped.