nexus/.planning/phases/37-web-chat-voice-ui/37-RESEARCH.md

30 KiB

Phase 37: Web Chat Voice UI - Research

Researched: 2026-04-03 Domain: Browser voice I/O — VAD, MediaRecorder, Web Audio API, waveform visualization, audio playback, COOP/COEP headers Confidence: HIGH


<user_constraints>

User Constraints (from CONTEXT.md)

Locked Decisions

All implementation choices are at Claude's discretion — discuss phase was skipped per user setting.

Claude's Discretion

All implementation details. Use ROADMAP phase goal, success criteria, and codebase conventions.

Key research findings baked into context:

  • @ricky0123/vad-react ^0.0.36 for browser-side silence detection (VAD)
  • COOP/COEP headers required on Express server for SharedArrayBuffer
  • Waveform via Web Audio API AnalyserNode (Canvas or SVG, 30-50 data points)
  • Native <audio> element + URL.createObjectURL() for playback
  • Three-state voice mode: "text" | "voice_input" | "full_voice"
  • VoiceMicButton replaces/enhances existing VoiceRecordButton
  • Voice badge + expandable markdown section in ChatMessage

Deferred Ideas (OUT OF SCOPE)

None — discuss phase skipped. </user_constraints>

<phase_requirements>

Phase Requirements

ID Description Research Support
WCHAT-01 Mic button in chat input starts/stops voice recording with visual state (idle/recording/processing) VoiceMicButton replaces VoiceRecordButton; three-state via recording/userSpeaking/loading from useMicVAD
WCHAT-02 Recording auto-stops on silence detection via VAD useMicVAD onSpeechEnd callback fires automatically after 1.5s silence; no manual stop needed
WCHAT-03 Real-time waveform/amplitude visualization displays while recording VoiceWaveform canvas component using Web Audio API AnalyserNode + requestAnimationFrame
WCHAT-04 Voice response audio plays inline in chat message with audio player controls ChatVoicePlayer with native <audio> + URL.createObjectURL(); POST /api/synthesize → blob
WCHAT-05 User can toggle voice mode: text only / voice input only / full voice (input + output) VoiceModeToggle three-pill component; persists to nexus-settings voiceMode field
WCHAT-06 Auto-play of voice responses is configurable (on/off in settings) autoPlay flag in nexus-settings or localStorage; ChatVoicePlayer reads it on mount
</phase_requirements>

Summary

Phase 37 adds browser-based voice I/O to the existing web chat. Phase 36 delivered the server-side pipeline (VoicePipelineService, POST /api/transcribe, POST /api/synthesize, voiceMode wiring in chat.ts) and the nexus-settings schema extension. Phase 37 is entirely a frontend phase with one server-side addition: COOP/COEP response headers on the Express static middleware.

The central library is @ricky0123/vad-react ^0.0.36, which wraps Silero VAD running in an AudioWorklet. It requires the page to be cross-origin isolated (COOP + COEP headers) to use SharedArrayBuffer. The package ships ONNX model files and a worklet bundle that must either be served locally from public/ or loaded from its default CDN URLs. The CDN default is simpler and acceptable for development; production should serve them locally.

Waveform visualization uses a standard Web Audio API AnalyserNode pattern: connect the microphone stream → AnalyserNode → read Uint8Array in requestAnimationFrame loop → render bars on a <canvas>. This is entirely in-browser with no extra library. Audio playback for synthesized responses uses the native <audio> HTML element with URL.createObjectURL() from a Blob received from POST /api/synthesize.

Primary recommendation: Install @ricky0123/vad-react, add COOP/COEP headers to Express static/vite-dev middleware, serve VAD assets from ui/public/, build five new components + two hooks as specified in 37-UI-SPEC.md, extend ChatInput + ChatMessage, wire voiceMode through useStreamingChat.


Branch Context (Critical)

The current worktree branch (gsd/phase-35-npx-buildthis-cli) has only Phase 36 Task 1 committed (VoicePipelineService). The remaining Phase 36 deliverables live on a separate branch not yet merged:

Phase 36 Deliverable Git Commit Status in Current Branch
VoicePipelineService 0ed912c2 PRESENT
nexus-settings voiceMode schema d0d7a23a ABSENT — must be built in 37 Wave 0 or assumed present
voiceMode in createMessageSchema b964c0e4 ABSENT
POST /api/transcribe, POST /api/synthesize routes 11508547 ABSENT
voiceMode wiring in chat.ts stream route fd372eaf ABSENT

Implication for planning: Wave 0 of Phase 37 must either (a) merge/cherry-pick the Phase 36 remainder, or (b) re-implement those 3 deliverables before building Phase 37 UI. The plan should treat Phase 36 tasks 2-3 as Wave 0 prerequisites and verify them before proceeding.

The ChatInput.tsx, ChatMessage.tsx, VoiceRecordButton.tsx, and related UI components exist on the parent branch PAP-878-create-a-mine-tab-in-inbox but NOT in the current worktree. The plan must account for these being the integration targets.


Standard Stack

Core

Library Version Purpose Why Standard
@ricky0123/vad-react ^0.0.36 Browser VAD with 1.5s silence auto-stop Specified in 37-UI-SPEC.md; only mature browser-side VAD library
@ricky0123/vad-web 0.0.30 (peer) VAD engine (AudioWorklet + Silero ONNX) Peer dep of vad-react
onnxruntime-web ^1.17.0 (peer) ONNX runtime for Silero model Required by vad-web
Web Audio API browser built-in AnalyserNode for waveform bars Zero bundle cost; already in browser
Native <audio> browser built-in Playback of synthesized WAV No extra library needed

Supporting

Library Version Purpose When to Use
lucide-react ^0.574.0 (already in ui/) Mic, Square, Loader2, Volume2, Play, Pause icons Voice button states + audio player
shadcn/ui Badge already installed Voice badge on agent messages ChatVoiceBadge component
shadcn/ui Collapsible already installed Expand/collapse full markdown in voice_full messages ChatVoiceBadge expand section

Alternatives Considered

Instead of Could Use Tradeoff
@ricky0123/vad-react Manual silence detection with AudioWorklet Much more complex; vad-react is the defacto standard
Canvas waveform SVG bars Canvas performs better for 30fps animation
Native <audio> + blob URL Howler.js No extra dependency; native handles WAV fine

Installation:

pnpm add @ricky0123/vad-react --filter @paperclipai/ui

Version verification (confirmed against npm registry 2026-04-03):

  • @ricky0123/vad-react: 0.0.36 (latest)
  • @ricky0123/vad-web: 0.0.30 (peer dependency, installed automatically)
  • onnxruntime-web: 1.24.3 (latest; ^1.17.0 from vad-web is satisfied)

Architecture Patterns

ui/src/
├── components/
│   ├── VoiceMicButton.tsx       # Replaces VoiceRecordButton — VAD + waveform + three states
│   ├── VoiceWaveform.tsx        # Canvas amplitude bars (30-50 points, 32px tall)
│   ├── VoiceModeToggle.tsx      # Three-pill: Text / Voice In / Full Voice
│   ├── ChatVoicePlayer.tsx      # Inline audio player with play/pause/progress
│   └── ChatVoiceBadge.tsx       # "Voice" badge + collapsible full markdown
├── hooks/
│   ├── useVadRecorder.ts        # Wraps useMicVAD; exposes Float32Array on speech end
│   └── useVoiceMode.ts          # Reads/writes voiceMode from nexus-settings
ui/public/
│   ├── vad.worklet.bundle.min.js   # From @ricky0123/vad-web/dist/
│   ├── silero_vad_legacy.onnx      # From @ricky0123/vad-web/dist/
│   └── silero_vad_v5.onnx          # From @ricky0123/vad-web/dist/
server/src/
└── app.ts  (add COOP/COEP headers middleware)

Pattern 1: useMicVAD from @ricky0123/vad-react

What: Hook that runs Silero VAD in an AudioWorklet; fires onSpeechEnd(audio: Float32Array) after silence When to use: VoiceMicButton and useVadRecorder hook

// Source: https://docs.vad.ricky0123.com/user-guide/api/
import { useMicVAD } from "@ricky0123/vad-react";

const vad = useMicVAD({
  startOnLoad: false,            // user must click mic button first
  onSpeechEnd: (audio: Float32Array) => {
    // audio is Float32Array at 16kHz
    // Convert to WAV blob and POST to /api/transcribe
  },
  onSpeechStart: () => { /* update waveform active state */ },
  positiveSpeechThreshold: 0.8,
  negativeSpeechThreshold: 0.8 - 0.15,
  redemptionFrames: 8,           // ~480ms silence before speech_end
  baseAssetPath: "/",            // serve from ui/public/
  onnxWASMBasePath: "/",
});

// Returned: { listening, loading, errored, userSpeaking, start, pause }

Audio conversion for upload:

// Float32Array → WAV blob (16kHz, mono, 16-bit PCM)
function float32ToWav(samples: Float32Array, sampleRate = 16000): Blob {
  const buffer = new ArrayBuffer(44 + samples.length * 2);
  const view = new DataView(buffer);
  // WAV header...
  return new Blob([buffer], { type: "audio/wav" });
}

Pattern 2: Web Audio API AnalyserNode for waveform

What: Connect MediaStream to AnalyserNode; poll getByteFrequencyData in rAF loop When to use: VoiceWaveform component (only while recording)

// Source: MDN Web Audio API docs
const audioCtx = new AudioContext();
const analyser = audioCtx.createAnalyser();
analyser.fftSize = 64;          // 32 frequency bins
const source = audioCtx.createMediaStreamSource(stream);
source.connect(analyser);
const dataArray = new Uint8Array(analyser.frequencyBinCount); // 32 bars

function draw() {
  animRef.current = requestAnimationFrame(draw);
  analyser.getByteFrequencyData(dataArray);
  // render bars to canvas
}

Pattern 3: COOP/COEP headers in Express

What: Cross-origin isolation required for SharedArrayBuffer (used by AudioWorklet/ONNX) When to use: All static responses and Vite dev server

// Source: MDN - Cross-Origin Isolation
// In server/src/app.ts, before static/vite middleware:
app.use((_req, res, next) => {
  res.setHeader("Cross-Origin-Opener-Policy", "same-origin");
  res.setHeader("Cross-Origin-Embedder-Policy", "require-corp");
  next();
});

// For Vite dev in vite.config.ts:
server: {
  headers: {
    "Cross-Origin-Opener-Policy": "same-origin",
    "Cross-Origin-Embedder-Policy": "require-corp",
  },
},

Critical: COEP require-corp means all cross-origin resources must opt-in with CORP headers. CDN-hosted VAD assets load via AudioWorklet (same-origin) so this is only a concern for user-loaded images. Serve VAD assets from ui/public/ (same-origin) to avoid CORP issues entirely.

Pattern 4: VAD asset setup (Vite)

What: Copy ONNX + worklet files to public/ so they are served at root When to use: Build setup task

# After pnpm install, copy from node_modules:
cp node_modules/@ricky0123/vad-web/dist/vad.worklet.bundle.min.js ui/public/
cp node_modules/@ricky0123/vad-web/dist/silero_vad_legacy.onnx ui/public/
cp node_modules/@ricky0123/vad-web/dist/silero_vad_v5.onnx ui/public/

Alternatively, add a vite-plugin-static-copy or script in package.json prepare:

"scripts": {
  "copy-vad-assets": "cp node_modules/@ricky0123/vad-web/dist/vad.worklet.bundle.min.js public/ && cp node_modules/@ricky0123/vad-web/dist/*.onnx public/"
}

Pattern 5: useVoiceMode hook

What: Reads voiceMode from GET /api/nexus-settings, writes via PATCH When to use: VoiceModeToggle component; ChatPanel to pass voiceMode to stream call

// Source: existing nexus-settings pattern in codebase
type VoiceMode = "text" | "voice_input" | "full_voice";

export function useVoiceMode() {
  const [mode, setMode] = useState<VoiceMode>("text");
  // Load on mount via GET /api/nexus-settings
  // PATCH on change
  return { mode, setMode: async (next: VoiceMode) => { ... } };
}

Pattern 6: Float32Array → WAV Blob

What: Convert vad-react onSpeechEnd Float32Array (16kHz) to WAV for upload When to use: useVadRecorder.ts, before POSTing to /api/transcribe

// Source: standard WAV encoding algorithm (verified against multiple sources)
function encodeWav(samples: Float32Array, sampleRate = 16000): Blob {
  const numSamples = samples.length;
  const buffer = new ArrayBuffer(44 + numSamples * 2);
  const view = new DataView(buffer);
  // RIFF chunk
  writeString(view, 0, "RIFF");
  view.setUint32(4, 36 + numSamples * 2, true);
  writeString(view, 8, "WAVE");
  // fmt sub-chunk
  writeString(view, 12, "fmt ");
  view.setUint32(16, 16, true);  // PCM
  view.setUint16(20, 1, true);   // PCM = 1
  view.setUint16(22, 1, true);   // mono
  view.setUint32(24, sampleRate, true);
  view.setUint32(28, sampleRate * 2, true);  // byte rate
  view.setUint16(32, 2, true);   // block align
  view.setUint16(34, 16, true);  // bits per sample
  // data sub-chunk
  writeString(view, 36, "data");
  view.setUint32(40, numSamples * 2, true);
  let offset = 44;
  for (let i = 0; i < numSamples; i++) {
    const s = Math.max(-1, Math.min(1, samples[i]));
    view.setInt16(offset, s < 0 ? s * 0x8000 : s * 0x7FFF, true);
    offset += 2;
  }
  return new Blob([buffer], { type: "audio/wav" });
}

Pattern 7: POST /api/synthesize + playback

What: Send text to synthesis endpoint, receive WAV buffer, play with native audio When to use: ChatVoicePlayer when messageType is voice_full

async function playVoiceResponse(text: string, autoPlay: boolean) {
  const res = await fetch("/api/synthesize", {
    method: "POST",
    headers: { "Content-Type": "application/json" },
    credentials: "include",
    body: JSON.stringify({ text }),
  });
  const blob = await res.blob();
  const url = URL.createObjectURL(blob);
  const audio = new Audio(url);
  if (autoPlay) audio.play();
  // expose pause/play controls; revoke URL on ended
  audio.addEventListener("ended", () => URL.revokeObjectURL(url));
}

Anti-Patterns to Avoid

  • Calling useMicVAD with startOnLoad: true: Triggers immediate mic permission prompt on page load, not on user gesture. Always use startOnLoad: false and call vad.start() on mic button click.
  • Using AudioContext before user gesture: Browsers require AudioContext creation/resume inside a user interaction. Create it lazily in the click handler, not on component mount.
  • Serving VAD assets from CDN with COEP require-corp: CDN resources lack CORP headers. Will cause COEP fetch errors. Always copy to ui/public/ and use baseAssetPath: "/".
  • Not revoking blob URLs: URL.createObjectURL() leaks memory if URLs are not revoked after use.
  • POSTing Float32Array directly to /api/transcribe: The transcribe endpoint expects audio/webm or audio/wav multipart upload. Must encode Float32Array to WAV first.

Don't Hand-Roll

Problem Don't Build Use Instead Why
Silence detection Custom silence timer with AudioWorklet @ricky0123/vad-react Silero VAD model; handles background noise, breath, plosives; 37 published versions
WAV encoding Custom encoder 44-line standard WAV encoder (see Pattern 6) Not complex enough for a library; standard algorithm
Audio playback Custom audio element abstraction Native <audio> + URL.createObjectURL() Browser handles all codec/format negotiation
ONNX inference Build ONNX runner onnxruntime-web (peer dep of vad-web) Already bundled

Common Pitfalls

Pitfall 1: COEP blocks CDN asset loading

What goes wrong: After adding Cross-Origin-Embedder-Policy: require-corp, all cross-origin resources (Google Fonts, avatars from external URLs, CDN assets) are blocked unless they send Cross-Origin-Resource-Policy: cross-origin. Existing chat images from /api/assets/ (same-origin) are fine, but any externally hosted content breaks. Why it happens: COEP require-corp enforces CORP on all sub-resources. How to avoid: Serve all VAD ONNX/worklet assets from ui/public/ (same-origin). Audit for any cross-origin resource loads in existing chat components before adding headers. Warning signs: Console errors: "COEP blocked cross-origin resource" for non-audio assets.

Pitfall 2: AudioContext suspended due to autoplay policy

What goes wrong: AudioContext.state === "suspended" prevents AnalyserNode from producing data; waveform is all zeros. Why it happens: Browsers require AudioContext to be created or resumed inside a user gesture (click/tap). How to avoid: Create new AudioContext() lazily inside the mic button click handler. If the context exists but is suspended, call context.resume() before starting recording. Warning signs: Waveform canvas renders but all bars are flat (zero amplitude).

Pitfall 3: VAD model files not found

What goes wrong: useMicVAD throws or hangs with loading: true indefinitely; console shows 404 for .onnx or .worklet.bundle.min.js. Why it happens: Default baseAssetPath may point to CDN; if COEP is active, CDN fetch is blocked. Or files were not copied to ui/public/. How to avoid: Explicitly set baseAssetPath: "/" and onnxWASMBasePath: "/" in useMicVAD options. Verify files exist at ui/public/vad.worklet.bundle.min.js, ui/public/silero_vad_legacy.onnx, ui/public/silero_vad_v5.onnx after install. Warning signs: vad.loading === true for more than 3 seconds; 404s in network tab.

Pitfall 4: voiceMode not passed through useStreamingChat

What goes wrong: Sending a voice_input message, the server doesn't set messageType: "voice_input" on the stored message, so ChatVoiceBadge never renders. Why it happens: useStreamingChat.startStream() current signature is (userMessage: string, agentId?: string) — no voiceMode parameter. Chat.ts only sets messageType when voiceMode is in the request body. How to avoid: Extend useStreamingChat.startStream() to accept voiceMode?: string and pass it in the fetch body to /api/conversations/${id}/stream. Warning signs: Voice messages render as plain user text without the voice badge.

Pitfall 5: onSpeechEnd fires on very short utterances

What goes wrong: Background noise triggers onSpeechEnd with very short audio that produces garbage transcription. Why it happens: VAD fires even for brief sounds if positiveThreshold is too low. How to avoid: Set minSpeechFrames: 3 (minimum ~180ms) and positiveSpeechThreshold: 0.8 to filter noise. Display a "Too short" toast if the returned text is empty or < 2 chars. Warning signs: Empty transcriptions appearing in chat; fast repeated submissions.

Pitfall 6: Phase 36 deliverables not present in working branch

What goes wrong: Building ChatInput voice integration before server/src/routes/voice.ts, server/src/services/nexus-settings.ts voiceMode schema, and voiceMode in createMessageSchema are present causes compile errors and missing endpoints. Why it happens: Only Phase 36 Task 1 (VoicePipelineService) is on the current branch. How to avoid: Wave 0 must cherry-pick or re-implement Phase 36 Tasks 2-3 commits before any Phase 37 implementation work. Verify GET /api/transcribe and GET /api/synthesize return 200 before proceeding.


Code Examples

VoiceMicButton state machine

// Source: 37-UI-SPEC.md + useMicVAD API docs
type RecordState = "idle" | "recording" | "processing";

function VoiceMicButton({ onTranscript }: { onTranscript: (text: string) => void }) {
  const [state, setState] = useState<RecordState>("idle");
  const vad = useMicVAD({
    startOnLoad: false,
    baseAssetPath: "/",
    onnxWASMBasePath: "/",
    onSpeechEnd: async (audio: Float32Array) => {
      vad.pause();
      setState("processing");
      const wav = encodeWav(audio);
      const form = new FormData();
      form.append("audio", wav, "recording.wav");
      const res = await fetch("/api/transcribe", {
        method: "POST", credentials: "include", body: form,
      });
      const { text } = await res.json() as { text: string };
      if (text?.trim()) onTranscript(text.trim());
      setState("idle");
    },
  });

  const handleClick = () => {
    if (state === "idle") { vad.start(); setState("recording"); }
    else if (state === "recording") { vad.pause(); setState("idle"); }
  };

  if (state === "processing") return <Button disabled><Loader2 className="h-4 w-4 animate-spin" /></Button>;
  if (state === "recording") return (
    <Button className="ring-2 ring-primary" onClick={handleClick} aria-label="Recording — speak now">
      <VoiceWaveform listening={vad.listening} />
    </Button>
  );
  return <Button onClick={handleClick} aria-label="Start voice input"><Mic className="h-4 w-4" /></Button>;
}

ChatVoiceBadge (voice_full expand/collapse)

// Source: 37-UI-SPEC.md; uses shadcn Collapsible (already installed)
import { Collapsible, CollapsibleContent, CollapsibleTrigger } from "@/components/ui/collapsible";
import { Badge } from "@/components/ui/badge";

function ChatVoiceBadge({ content, messageType }: { content: string; messageType: string }) {
  const [open, setOpen] = useState(false);
  const spokenMatch = content.match(/SPOKEN:\s*([\s\S]*?)(?=\nDETAILED:|$)/);
  const spokenText = spokenMatch?.[1]?.trim() ?? content;
  const detailedMatch = content.match(/DETAILED:\s*([\s\S]*)/);

  return (
    <div>
      <Badge variant="outline" className="text-xs mb-2">Voice</Badge>
      <p className="text-sm">{spokenText}</p>
      {messageType === "voice_full" && detailedMatch && (
        <Collapsible open={open} onOpenChange={setOpen}>
          <CollapsibleTrigger className="text-xs text-muted-foreground hover:text-foreground mt-1">
            {open ? "Hide full response" : "Show full response"}
          </CollapsibleTrigger>
          <CollapsibleContent>
            <MarkdownBody className="text-sm mt-2">{detailedMatch[1].trim()}</MarkdownBody>
          </CollapsibleContent>
        </Collapsible>
      )}
    </div>
  );
}

State of the Art

Old Approach Current Approach When Changed Impact
WebRTC VAD polyfill Silero VAD via ONNX + AudioWorklet 2023-2024 Dramatically better accuracy; handles noisy environments
MediaRecorder → manual silence timer @ricky0123/vad-react onSpeechEnd 2023 Eliminates timer tuning; model-based accuracy
Flash/plugin audio playback Native <audio> + Web Audio API 2015+ Universal; no plugin required
Custom waveform libraries Web Audio API AnalyserNode Always Zero dependency; 30fps canvas

Deprecated/outdated:

  • annyang, artyom.js: Web Speech API wrappers — browser-only, privacy concerns, no offline support
  • Manual silence detection with onaudioprocess: Deprecated ScriptProcessor API; replaced by AudioWorklet
  • MediaRecorder direct upload (VoiceRecordButton v1): Manual stop only; no auto-silence — replaced by useVadRecorder

Open Questions

  1. autoPlay persistence: nexus-settings vs localStorage

    • What we know: nexus-settings already has voiceMode field. autoPlay (WCHAT-06) is a separate user preference.
    • What's unclear: Should autoPlay live in nexus-settings (persisted server-side, works across devices) or localStorage (client-only, simpler)?
    • Recommendation: Use localStorage key nexus:voice:autoplay for autoPlay — it is a per-device UX preference that doesn't need server-side persistence. Keeps nexus-settings lean.
  2. COEP impact on existing cross-origin resources

    • What we know: COEP require-corp blocks cross-origin resources without CORP header.
    • What's unclear: Do existing Chat UI components load any cross-origin images (avatar CDN, external URLs in messages)?
    • Recommendation: Audit ui/src/components/ChatMessage.tsx and Identity.tsx for external image src. If any exist, use credentialless instead of require-corp for COEP — this relaxes the restriction while still enabling SharedArrayBuffer in Chromium 96+. MEDIUM confidence — Firefox may not support credentialless mode.
  3. VAD false-positive rate in quiet environments

    • What we know: Silero VAD default thresholds are tuned for speech.
    • What's unclear: In near-silent environments, keyboard noise or mouse clicks may trigger onSpeechEnd.
    • Recommendation: Use minSpeechFrames: 5 (300ms minimum) and add a minSpeechFrames: 5 safety gate. Show "Too short, try again" toast if transcript is empty.

Environment Availability

Dependency Required By Available Version Fallback
Node.js build + tests v20.20.2
pnpm package install 9.15.4
@ricky0123/vad-react WCHAT-02 ✗ (not installed) Must install via pnpm
@ricky0123/vad-web peer of vad-react ✗ (not installed) Installed automatically
onnxruntime-web peer of vad-web ✗ (not installed) Installed automatically
Phase 36 Task 2-3 deliverables All voice routes ✗ (not on branch) Wave 0 must cherry-pick or re-implement

Missing dependencies with no fallback:

  • @ricky0123/vad-react — must be installed (pnpm add @ricky0123/vad-react --filter @paperclipai/ui)
  • Phase 36 server-side deliverables — POST /api/transcribe, POST /api/synthesize, nexus-settings voiceMode

Missing dependencies with fallback:

  • None

Validation Architecture

Test Framework

Property Value
Framework vitest ^3.0.5
Config file ui/vitest.config.ts
Quick run command pnpm --filter @paperclipai/ui test --run
Full suite command pnpm test --run

Phase Requirements → Test Map

Req ID Behavior Test Type Automated Command File Exists?
WCHAT-01 VoiceMicButton renders idle/recording/processing states unit pnpm --filter @paperclipai/ui test --run -- VoiceMicButton Wave 0
WCHAT-02 useVadRecorder calls onTranscript after onSpeechEnd fires unit pnpm --filter @paperclipai/ui test --run -- useVadRecorder Wave 0
WCHAT-03 VoiceWaveform renders canvas with correct dimensions unit pnpm --filter @paperclipai/ui test --run -- VoiceWaveform Wave 0
WCHAT-04 ChatVoicePlayer renders play button; auto-plays when autoPlay=true unit pnpm --filter @paperclipai/ui test --run -- ChatVoicePlayer Wave 0
WCHAT-05 VoiceModeToggle renders three pills; click updates mode unit pnpm --filter @paperclipai/ui test --run -- VoiceModeToggle Wave 0
WCHAT-06 useVoiceMode persists mode to nexus-settings; loads on mount unit pnpm --filter @paperclipai/ui test --run -- useVoiceMode Wave 0
WCHAT-01,02 POST /api/transcribe returns { text } for WAV upload unit (server) pnpm --filter @paperclipai/server test --run -- voice-routes (Phase 36 Task 3 — verify present)
WCHAT-04 POST /api/synthesize returns audio/wav for text input unit (server) pnpm --filter @paperclipai/server test --run -- voice-routes (Phase 36 Task 3 — verify present)
WCHAT-03 encodeWav produces valid 44-byte WAV header unit pnpm --filter @paperclipai/ui test --run -- encodeWav Wave 0

Note: UI tests use // @vitest-environment jsdom at the top of test files (see ChatInput.test.tsx pattern). All voice component tests must include this directive.

Sampling Rate

  • Per task commit: pnpm --filter @paperclipai/ui test --run
  • Per wave merge: pnpm test --run
  • Phase gate: Full suite green before /gsd:verify-work

Wave 0 Gaps

  • ui/src/components/VoiceMicButton.test.tsx — covers WCHAT-01
  • ui/src/hooks/useVadRecorder.test.ts — covers WCHAT-02
  • ui/src/components/VoiceWaveform.test.tsx — covers WCHAT-03
  • ui/src/components/ChatVoicePlayer.test.tsx — covers WCHAT-04
  • ui/src/components/VoiceModeToggle.test.tsx — covers WCHAT-05
  • ui/src/hooks/useVoiceMode.test.ts — covers WCHAT-06
  • ui/src/lib/encodeWav.test.ts — covers WAV encoding utility
  • Verify server/src/routes/voice.ts present (Phase 36 Task 3)
  • Verify server/src/services/nexus-settings.ts has voiceMode (Phase 36 Task 2)

Sources

Primary (HIGH confidence)

  • npm registry — @ricky0123/vad-react@0.0.36, @ricky0123/vad-web@0.0.30, onnxruntime-web@1.24.3 versions verified 2026-04-03
  • https://docs.vad.ricky0123.com/user-guide/api/ — useMicVAD API properties (listening, loading, errored, userSpeaking, start, pause, onSpeechEnd, baseAssetPath, onnxWASMBasePath)
  • MDN Web Audio API AnalyserNode documentation — waveform pattern
  • 37-UI-SPEC.md (committed in a0103337) — component inventory, interaction states, copywriting contract
  • 37-CONTEXT.md (committed in 30708d38) — implementation decisions

Secondary (MEDIUM confidence)

  • Git history analysis (fd372eaf, d0d7a23a, 11508547) — Phase 36 deliverable status
  • https://web.dev/articles/coop-coep — COOP/COEP header semantics
  • Vite docs — server.headers for dev server COOP/COEP

Tertiary (LOW confidence)

  • COEP credentialless alternative (open question #2) — browser support needs verification

Metadata

Confidence breakdown:

  • Standard stack: HIGH — npm registry confirmed versions; vad-react API verified from official docs
  • Architecture: HIGH — derived from 37-UI-SPEC.md (committed) + existing codebase patterns
  • Pitfalls: HIGH — based on verified browser behaviour (autoplay policy, COEP); LOW for pitfall #3 (threshold tuning is empirical)
  • Branch status: HIGH — verified via git log --all --oneline + git show of specific commits

Research date: 2026-04-03 Valid until: 2026-05-03 (stable APIs; vad-react hasn't released a major version since 2023)