--- phase: 37-web-chat-voice-ui plan: 01 type: execute wave: 1 depends_on: [] files_modified: - server/src/services/nexus-settings.ts - server/src/routes/nexus-settings.ts - server/src/routes/voice.ts - server/src/routes/chat.ts - server/src/app.ts - packages/shared/src/types/chat.ts - packages/shared/src/validators/chat.ts - ui/vite.config.ts - ui/package.json - ui/public/vad.worklet.bundle.min.js - ui/public/silero_vad_legacy.onnx - ui/public/silero_vad_v5.onnx autonomous: true requirements: - WCHAT-01 - WCHAT-02 - WCHAT-04 must_haves: truths: - "POST /api/transcribe accepts audio upload and returns { text }" - "POST /api/synthesize accepts { text } and returns audio/wav" - "GET /api/nexus/settings returns voiceMode field" - "PATCH /api/nexus/settings accepts voiceMode update" - "Chat stream endpoint accepts voiceMode in request body" - "SharedArrayBuffer is available in browser (COOP/COEP headers set)" - "VAD ONNX model files are served from /vad.worklet.bundle.min.js, /silero_vad_legacy.onnx, /silero_vad_v5.onnx" artifacts: - path: "server/src/routes/voice.ts" provides: "POST /api/transcribe and POST /api/synthesize" exports: ["voiceRoutes"] - path: "server/src/routes/nexus-settings.ts" provides: "GET/PATCH /api/nexus/settings" exports: ["nexusSettingsRoutes"] - path: "server/src/services/nexus-settings.ts" provides: "nexusSettingsService with voiceMode field" exports: ["nexusSettingsService", "VoiceMode", "VOICE_MODES"] - path: "ui/public/vad.worklet.bundle.min.js" provides: "VAD AudioWorklet bundle" - path: "ui/public/silero_vad_legacy.onnx" provides: "Silero VAD legacy ONNX model" key_links: - from: "server/src/app.ts" to: "server/src/routes/voice.ts" via: "api.use(voiceRoutes())" pattern: "voiceRoutes" - from: "server/src/app.ts" to: "server/src/routes/nexus-settings.ts" via: "api.use(nexusSettingsRoutes())" pattern: "nexusSettingsRoutes" - from: "server/src/routes/chat.ts" to: "voiceMode parameter" via: "req.body.voiceMode in stream handler" pattern: "voiceMode.*voice_input|voice_full" --- Establish all server-side prerequisites and browser infrastructure for voice I/O. Purpose: Phase 36 Tasks 2-3 (nexus-settings voiceMode schema, voice HTTP routes, voiceMode wiring in chat.ts) are not present on this branch. This plan cherry-picks or re-implements those deliverables, adds COOP/COEP headers for SharedArrayBuffer, installs @ricky0123/vad-react, copies VAD ONNX assets to ui/public/, and configures Vite dev server headers. Output: Working server endpoints (transcribe, synthesize, nexus-settings), COOP/COEP isolation, VAD assets ready in ui/public/ @$HOME/.claude/get-shit-done/workflows/execute-plan.md @$HOME/.claude/get-shit-done/templates/summary.md @.planning/phases/37-web-chat-voice-ui/37-RESEARCH.md From server/src/services/voice-pipeline.ts (ALREADY on this branch): ```typescript // voicePipelineService() exposes transcribe(buffer, format) and synthesize(text, voiceId?) export function voicePipelineService(): { transcribe, synthesize, formatForVoice, transcodeToWav16k } ``` From server/src/app.ts (parent branch — route mounting pattern): ```typescript // Routes are mounted on an `api` Router via api.use(...) // Pattern: import { xyzRoutes } from "./routes/xyz.js"; then api.use(xyzRoutes()); import { chatRoutes } from "./routes/chat.js"; api.use(chatRoutes(db, storageService, config)); ``` From packages/shared/src/types/chat.ts (parent branch): ```typescript export interface ChatMessage { id: string; conversationId: string; role: "user" | "assistant" | "system"; content: string; messageType?: string | null; // ... other fields } ``` From packages/shared/src/validators/chat.ts (parent branch): ```typescript export const createMessageSchema = z.object({ content: z.string().min(1), role: z.enum(["user", "assistant", "system"]).default("user"), agentId: z.string().uuid().optional(), // voiceMode NOT present on parent branch — must add }); ``` Task 1: Cherry-pick Phase 36 server deliverables and add COOP/COEP headers server/src/services/nexus-settings.ts, server/src/routes/nexus-settings.ts, server/src/routes/voice.ts, server/src/routes/chat.ts, server/src/app.ts, packages/shared/src/types/chat.ts, packages/shared/src/validators/chat.ts server/src/services/nexus-settings.ts, server/src/services/voice-pipeline.ts, server/src/app.ts, server/src/routes/chat.ts, packages/shared/src/types/chat.ts, packages/shared/src/validators/chat.ts Cherry-pick or re-implement Phase 36 Tasks 2-3 deliverables. The commits on gsd/phase-36-voice-pipeline-foundation are: - d0d7a23a (nexus-settings voiceMode schema extension) - b964c0e4 (voiceMode in createMessageSchema + ChatMessage interface) - 11508547 (voice HTTP routes) - fd372eaf (voiceMode wiring in chat.ts + route mounting) Try cherry-picking these 4 commits in order: ```bash git cherry-pick d0d7a23a b964c0e4 11508547 fd372eaf ``` If cherry-pick conflicts, re-implement manually: 1. **server/src/services/nexus-settings.ts** — Add VOICE_MODES and VoiceMode type: ```typescript export const VOICE_MODES = ["text", "voice_input", "full_voice"] as const; export type VoiceMode = (typeof VOICE_MODES)[number]; ``` Add `voiceMode: z.enum(VOICE_MODES).default("text")` to nexusSettingsSchema. Add `telegramToken: z.string().optional()`, `piperBinaryPath: z.string().optional()`, `whisperBinaryPath: z.string().optional()`. 2. **server/src/routes/nexus-settings.ts** — Create new file: - GET /nexus/settings — returns nexusSettingsService().get() - PATCH /nexus/settings — calls nexusSettingsService().set(req.body), returns updated - Both routes call assertBoard(req) first - Import Router from express, assertBoard from ./authz.js, nexusSettingsService from ../services/nexus-settings.js 3. **server/src/routes/voice.ts** — Create new file: - POST /transcribe — accepts multipart audio upload via multer memoryStorage, calls voicePipelineService().transcribe(buffer, format), returns { text } - POST /synthesize — accepts JSON { text, voiceId? }, calls voicePipelineService().synthesize(text, voiceId), returns audio/wav buffer - Both routes call assertBoard(req) - Import multer, Router, assertBoard, voicePipelineService, MAX_ATTACHMENT_BYTES 4. **packages/shared/src/types/chat.ts** — Add `voiceMode?: string | null;` to ChatMessage interface if not present. 5. **packages/shared/src/validators/chat.ts** — Add `voiceMode: z.enum(["text", "voice_input", "full_voice"]).optional()` to createMessageSchema. 6. **server/src/routes/chat.ts** — In the stream POST handler, destructure `voiceMode` from req.body alongside content and agentId. When voiceMode is "full_voice", call voicePipelineService().formatForVoice(aiContent) to produce SPOKEN/DETAILED format. Set messageType on stored message: "voice_full" if voiceMode==="full_voice", "voice_input" if voiceMode==="voice_input", else null. 7. **server/src/app.ts** — Import and mount voiceRoutes and nexusSettingsRoutes: ```typescript import { nexusSettingsRoutes } from "./routes/nexus-settings.js"; import { voiceRoutes } from "./routes/voice.js"; // In the api router setup: api.use(nexusSettingsRoutes()); api.use(voiceRoutes()); ``` 8. **COOP/COEP headers** — In server/src/app.ts, add middleware BEFORE static file serving and vite dev middleware: ```typescript app.use((_req, res, next) => { res.setHeader("Cross-Origin-Opener-Policy", "same-origin"); res.setHeader("Cross-Origin-Embedder-Policy", "require-corp"); next(); }); ``` Place this before any `app.use(express.static(...))` or vite middleware attachment. cd /opt/nexus/.claude/worktrees/agent-a009558f && grep -q "voiceRoutes" server/src/app.ts && grep -q "nexusSettingsRoutes" server/src/app.ts && grep -q "Cross-Origin-Opener-Policy" server/src/app.ts && grep -q "voiceMode" server/src/routes/chat.ts && grep -q "voice_full" server/src/routes/chat.ts && test -f server/src/routes/voice.ts && test -f server/src/routes/nexus-settings.ts && echo "PASS" || echo "FAIL" - grep "voiceRoutes" server/src/app.ts returns match - grep "nexusSettingsRoutes" server/src/app.ts returns match - grep "Cross-Origin-Opener-Policy" server/src/app.ts returns "same-origin" - grep "Cross-Origin-Embedder-Policy" server/src/app.ts returns "require-corp" - grep "voiceMode" server/src/routes/chat.ts returns match - grep "voice_full" server/src/routes/chat.ts returns match - server/src/routes/voice.ts exists with POST /transcribe and POST /synthesize - server/src/routes/nexus-settings.ts exists with GET and PATCH /nexus/settings - grep "VOICE_MODES" server/src/services/nexus-settings.ts returns match Phase 36 server deliverables present on branch. COOP/COEP headers added. Voice routes mounted. Chat stream accepts voiceMode. Task 2: Install VAD library, copy ONNX assets, configure Vite COOP/COEP headers ui/package.json, ui/public/vad.worklet.bundle.min.js, ui/public/silero_vad_legacy.onnx, ui/public/silero_vad_v5.onnx, ui/vite.config.ts ui/package.json, ui/vite.config.ts 1. Install @ricky0123/vad-react in the ui package: ```bash pnpm add @ricky0123/vad-react --filter @paperclipai/ui ``` 2. Copy VAD assets from node_modules to ui/public/ for same-origin serving (avoids COEP blocking CDN): ```bash cp node_modules/@ricky0123/vad-web/dist/vad.worklet.bundle.min.js ui/public/ cp node_modules/@ricky0123/vad-web/dist/silero_vad_legacy.onnx ui/public/ cp node_modules/@ricky0123/vad-web/dist/silero_vad_v5.onnx ui/public/ ``` If vad-web is in ui/node_modules/@ricky0123/vad-web/dist/, use that path instead. Verify all three files exist after copy. 3. Add a "copy-vad-assets" script to ui/package.json: ```json "copy-vad-assets": "cp node_modules/@ricky0123/vad-web/dist/vad.worklet.bundle.min.js public/ && cp node_modules/@ricky0123/vad-web/dist/silero_vad_legacy.onnx public/ && cp node_modules/@ricky0123/vad-web/dist/silero_vad_v5.onnx public/" ``` 4. Update ui/vite.config.ts — add COOP/COEP headers to dev server config: ```typescript server: { port: 5173, headers: { "Cross-Origin-Opener-Policy": "same-origin", "Cross-Origin-Embedder-Policy": "require-corp", }, proxy: { ... }, // keep existing proxy config }, ``` This ensures SharedArrayBuffer works in Vite dev mode too. cd /opt/nexus/.claude/worktrees/agent-a009558f && test -f ui/public/vad.worklet.bundle.min.js && test -f ui/public/silero_vad_legacy.onnx && test -f ui/public/silero_vad_v5.onnx && grep -q "vad-react" ui/package.json && grep -q "Cross-Origin-Opener-Policy" ui/vite.config.ts && echo "PASS" || echo "FAIL" - ui/public/vad.worklet.bundle.min.js exists (non-zero size) - ui/public/silero_vad_legacy.onnx exists (non-zero size) - ui/public/silero_vad_v5.onnx exists (non-zero size) - grep "vad-react" ui/package.json returns match - grep "Cross-Origin-Opener-Policy" ui/vite.config.ts returns "same-origin" - grep "Cross-Origin-Embedder-Policy" ui/vite.config.ts returns "require-corp" - grep "copy-vad-assets" ui/package.json returns match VAD library installed. ONNX model files and worklet bundle served from ui/public/. Vite dev server sends COOP/COEP headers. SharedArrayBuffer available in dev. - server/src/routes/voice.ts exists with transcribe and synthesize endpoints - server/src/routes/nexus-settings.ts exists with GET/PATCH - server/src/app.ts mounts both route sets and has COOP/COEP middleware - server/src/routes/chat.ts handles voiceMode in stream handler - ui/public/ has all 3 VAD asset files - ui/vite.config.ts has COOP/COEP headers - @ricky0123/vad-react in ui/package.json dependencies All Phase 36 server deliverables present. COOP/COEP headers set on both Express and Vite dev server. VAD assets served from same-origin. Foundation ready for frontend voice components. After completion, create `.planning/phases/37-web-chat-voice-ui/37-01-SUMMARY.md`