nexus/.planning/phases/37-web-chat-voice-ui/37-01-PLAN.md

297 lines
13 KiB
Markdown

---
phase: 37-web-chat-voice-ui
plan: 01
type: execute
wave: 1
depends_on: []
files_modified:
- server/src/services/nexus-settings.ts
- server/src/routes/nexus-settings.ts
- server/src/routes/voice.ts
- server/src/routes/chat.ts
- server/src/app.ts
- packages/shared/src/types/chat.ts
- packages/shared/src/validators/chat.ts
- ui/vite.config.ts
- ui/package.json
- ui/public/vad.worklet.bundle.min.js
- ui/public/silero_vad_legacy.onnx
- ui/public/silero_vad_v5.onnx
autonomous: true
requirements:
- WCHAT-01
- WCHAT-02
- WCHAT-04
must_haves:
truths:
- "POST /api/transcribe accepts audio upload and returns { text }"
- "POST /api/synthesize accepts { text } and returns audio/wav"
- "GET /api/nexus/settings returns voiceMode field"
- "PATCH /api/nexus/settings accepts voiceMode update"
- "Chat stream endpoint accepts voiceMode in request body"
- "SharedArrayBuffer is available in browser (COOP/COEP headers set)"
- "VAD ONNX model files are served from /vad.worklet.bundle.min.js, /silero_vad_legacy.onnx, /silero_vad_v5.onnx"
artifacts:
- path: "server/src/routes/voice.ts"
provides: "POST /api/transcribe and POST /api/synthesize"
exports: ["voiceRoutes"]
- path: "server/src/routes/nexus-settings.ts"
provides: "GET/PATCH /api/nexus/settings"
exports: ["nexusSettingsRoutes"]
- path: "server/src/services/nexus-settings.ts"
provides: "nexusSettingsService with voiceMode field"
exports: ["nexusSettingsService", "VoiceMode", "VOICE_MODES"]
- path: "ui/public/vad.worklet.bundle.min.js"
provides: "VAD AudioWorklet bundle"
- path: "ui/public/silero_vad_legacy.onnx"
provides: "Silero VAD legacy ONNX model"
key_links:
- from: "server/src/app.ts"
to: "server/src/routes/voice.ts"
via: "api.use(voiceRoutes())"
pattern: "voiceRoutes"
- from: "server/src/app.ts"
to: "server/src/routes/nexus-settings.ts"
via: "api.use(nexusSettingsRoutes())"
pattern: "nexusSettingsRoutes"
- from: "server/src/routes/chat.ts"
to: "voiceMode parameter"
via: "req.body.voiceMode in stream handler"
pattern: "voiceMode.*voice_input|voice_full"
---
<objective>
Establish all server-side prerequisites and browser infrastructure for voice I/O.
Purpose: Phase 36 Tasks 2-3 (nexus-settings voiceMode schema, voice HTTP routes, voiceMode wiring in chat.ts) are not present on this branch. This plan cherry-picks or re-implements those deliverables, adds COOP/COEP headers for SharedArrayBuffer, installs @ricky0123/vad-react, copies VAD ONNX assets to ui/public/, and configures Vite dev server headers.
Output: Working server endpoints (transcribe, synthesize, nexus-settings), COOP/COEP isolation, VAD assets ready in ui/public/
</objective>
<execution_context>
@$HOME/.claude/get-shit-done/workflows/execute-plan.md
@$HOME/.claude/get-shit-done/templates/summary.md
</execution_context>
<context>
@.planning/phases/37-web-chat-voice-ui/37-RESEARCH.md
<interfaces>
<!-- Phase 36 branch deliverables that must be present before Phase 37 UI work -->
From server/src/services/voice-pipeline.ts (ALREADY on this branch):
```typescript
// voicePipelineService() exposes transcribe(buffer, format) and synthesize(text, voiceId?)
export function voicePipelineService(): { transcribe, synthesize, formatForVoice, transcodeToWav16k }
```
From server/src/app.ts (parent branch — route mounting pattern):
```typescript
// Routes are mounted on an `api` Router via api.use(...)
// Pattern: import { xyzRoutes } from "./routes/xyz.js"; then api.use(xyzRoutes());
import { chatRoutes } from "./routes/chat.js";
api.use(chatRoutes(db, storageService, config));
```
From packages/shared/src/types/chat.ts (parent branch):
```typescript
export interface ChatMessage {
id: string;
conversationId: string;
role: "user" | "assistant" | "system";
content: string;
messageType?: string | null;
// ... other fields
}
```
From packages/shared/src/validators/chat.ts (parent branch):
```typescript
export const createMessageSchema = z.object({
content: z.string().min(1),
role: z.enum(["user", "assistant", "system"]).default("user"),
agentId: z.string().uuid().optional(),
// voiceMode NOT present on parent branch — must add
});
```
</interfaces>
</context>
<tasks>
<task type="auto">
<name>Task 1: Cherry-pick Phase 36 server deliverables and add COOP/COEP headers</name>
<files>
server/src/services/nexus-settings.ts,
server/src/routes/nexus-settings.ts,
server/src/routes/voice.ts,
server/src/routes/chat.ts,
server/src/app.ts,
packages/shared/src/types/chat.ts,
packages/shared/src/validators/chat.ts
</files>
<read_first>
server/src/services/nexus-settings.ts,
server/src/services/voice-pipeline.ts,
server/src/app.ts,
server/src/routes/chat.ts,
packages/shared/src/types/chat.ts,
packages/shared/src/validators/chat.ts
</read_first>
<action>
Cherry-pick or re-implement Phase 36 Tasks 2-3 deliverables. The commits on gsd/phase-36-voice-pipeline-foundation are:
- d0d7a23a (nexus-settings voiceMode schema extension)
- b964c0e4 (voiceMode in createMessageSchema + ChatMessage interface)
- 11508547 (voice HTTP routes)
- fd372eaf (voiceMode wiring in chat.ts + route mounting)
Try cherry-picking these 4 commits in order:
```bash
git cherry-pick d0d7a23a b964c0e4 11508547 fd372eaf
```
If cherry-pick conflicts, re-implement manually:
1. **server/src/services/nexus-settings.ts** — Add VOICE_MODES and VoiceMode type:
```typescript
export const VOICE_MODES = ["text", "voice_input", "full_voice"] as const;
export type VoiceMode = (typeof VOICE_MODES)[number];
```
Add `voiceMode: z.enum(VOICE_MODES).default("text")` to nexusSettingsSchema.
Add `telegramToken: z.string().optional()`, `piperBinaryPath: z.string().optional()`, `whisperBinaryPath: z.string().optional()`.
2. **server/src/routes/nexus-settings.ts** — Create new file:
- GET /nexus/settings — returns nexusSettingsService().get()
- PATCH /nexus/settings — calls nexusSettingsService().set(req.body), returns updated
- Both routes call assertBoard(req) first
- Import Router from express, assertBoard from ./authz.js, nexusSettingsService from ../services/nexus-settings.js
3. **server/src/routes/voice.ts** — Create new file:
- POST /transcribe — accepts multipart audio upload via multer memoryStorage, calls voicePipelineService().transcribe(buffer, format), returns { text }
- POST /synthesize — accepts JSON { text, voiceId? }, calls voicePipelineService().synthesize(text, voiceId), returns audio/wav buffer
- Both routes call assertBoard(req)
- Import multer, Router, assertBoard, voicePipelineService, MAX_ATTACHMENT_BYTES
4. **packages/shared/src/types/chat.ts** — Add `voiceMode?: string | null;` to ChatMessage interface if not present.
5. **packages/shared/src/validators/chat.ts** — Add `voiceMode: z.enum(["text", "voice_input", "full_voice"]).optional()` to createMessageSchema.
6. **server/src/routes/chat.ts** — In the stream POST handler, destructure `voiceMode` from req.body alongside content and agentId. When voiceMode is "full_voice", call voicePipelineService().formatForVoice(aiContent) to produce SPOKEN/DETAILED format. Set messageType on stored message: "voice_full" if voiceMode==="full_voice", "voice_input" if voiceMode==="voice_input", else null.
7. **server/src/app.ts** — Import and mount voiceRoutes and nexusSettingsRoutes:
```typescript
import { nexusSettingsRoutes } from "./routes/nexus-settings.js";
import { voiceRoutes } from "./routes/voice.js";
// In the api router setup:
api.use(nexusSettingsRoutes());
api.use(voiceRoutes());
```
8. **COOP/COEP headers** — In server/src/app.ts, add middleware BEFORE static file serving and vite dev middleware:
```typescript
app.use((_req, res, next) => {
res.setHeader("Cross-Origin-Opener-Policy", "same-origin");
res.setHeader("Cross-Origin-Embedder-Policy", "require-corp");
next();
});
```
Place this before any `app.use(express.static(...))` or vite middleware attachment.
</action>
<verify>
<automated>cd /opt/nexus/.claude/worktrees/agent-a009558f && grep -q "voiceRoutes" server/src/app.ts && grep -q "nexusSettingsRoutes" server/src/app.ts && grep -q "Cross-Origin-Opener-Policy" server/src/app.ts && grep -q "voiceMode" server/src/routes/chat.ts && grep -q "voice_full" server/src/routes/chat.ts && test -f server/src/routes/voice.ts && test -f server/src/routes/nexus-settings.ts && echo "PASS" || echo "FAIL"</automated>
</verify>
<acceptance_criteria>
- grep "voiceRoutes" server/src/app.ts returns match
- grep "nexusSettingsRoutes" server/src/app.ts returns match
- grep "Cross-Origin-Opener-Policy" server/src/app.ts returns "same-origin"
- grep "Cross-Origin-Embedder-Policy" server/src/app.ts returns "require-corp"
- grep "voiceMode" server/src/routes/chat.ts returns match
- grep "voice_full" server/src/routes/chat.ts returns match
- server/src/routes/voice.ts exists with POST /transcribe and POST /synthesize
- server/src/routes/nexus-settings.ts exists with GET and PATCH /nexus/settings
- grep "VOICE_MODES" server/src/services/nexus-settings.ts returns match
</acceptance_criteria>
<done>Phase 36 server deliverables present on branch. COOP/COEP headers added. Voice routes mounted. Chat stream accepts voiceMode.</done>
</task>
<task type="auto">
<name>Task 2: Install VAD library, copy ONNX assets, configure Vite COOP/COEP headers</name>
<files>
ui/package.json,
ui/public/vad.worklet.bundle.min.js,
ui/public/silero_vad_legacy.onnx,
ui/public/silero_vad_v5.onnx,
ui/vite.config.ts
</files>
<read_first>
ui/package.json,
ui/vite.config.ts
</read_first>
<action>
1. Install @ricky0123/vad-react in the ui package:
```bash
pnpm add @ricky0123/vad-react --filter @paperclipai/ui
```
2. Copy VAD assets from node_modules to ui/public/ for same-origin serving (avoids COEP blocking CDN):
```bash
cp node_modules/@ricky0123/vad-web/dist/vad.worklet.bundle.min.js ui/public/
cp node_modules/@ricky0123/vad-web/dist/silero_vad_legacy.onnx ui/public/
cp node_modules/@ricky0123/vad-web/dist/silero_vad_v5.onnx ui/public/
```
If vad-web is in ui/node_modules/@ricky0123/vad-web/dist/, use that path instead.
Verify all three files exist after copy.
3. Add a "copy-vad-assets" script to ui/package.json:
```json
"copy-vad-assets": "cp node_modules/@ricky0123/vad-web/dist/vad.worklet.bundle.min.js public/ && cp node_modules/@ricky0123/vad-web/dist/silero_vad_legacy.onnx public/ && cp node_modules/@ricky0123/vad-web/dist/silero_vad_v5.onnx public/"
```
4. Update ui/vite.config.ts — add COOP/COEP headers to dev server config:
```typescript
server: {
port: 5173,
headers: {
"Cross-Origin-Opener-Policy": "same-origin",
"Cross-Origin-Embedder-Policy": "require-corp",
},
proxy: { ... }, // keep existing proxy config
},
```
This ensures SharedArrayBuffer works in Vite dev mode too.
</action>
<verify>
<automated>cd /opt/nexus/.claude/worktrees/agent-a009558f && test -f ui/public/vad.worklet.bundle.min.js && test -f ui/public/silero_vad_legacy.onnx && test -f ui/public/silero_vad_v5.onnx && grep -q "vad-react" ui/package.json && grep -q "Cross-Origin-Opener-Policy" ui/vite.config.ts && echo "PASS" || echo "FAIL"</automated>
</verify>
<acceptance_criteria>
- ui/public/vad.worklet.bundle.min.js exists (non-zero size)
- ui/public/silero_vad_legacy.onnx exists (non-zero size)
- ui/public/silero_vad_v5.onnx exists (non-zero size)
- grep "vad-react" ui/package.json returns match
- grep "Cross-Origin-Opener-Policy" ui/vite.config.ts returns "same-origin"
- grep "Cross-Origin-Embedder-Policy" ui/vite.config.ts returns "require-corp"
- grep "copy-vad-assets" ui/package.json returns match
</acceptance_criteria>
<done>VAD library installed. ONNX model files and worklet bundle served from ui/public/. Vite dev server sends COOP/COEP headers. SharedArrayBuffer available in dev.</done>
</task>
</tasks>
<verification>
- server/src/routes/voice.ts exists with transcribe and synthesize endpoints
- server/src/routes/nexus-settings.ts exists with GET/PATCH
- server/src/app.ts mounts both route sets and has COOP/COEP middleware
- server/src/routes/chat.ts handles voiceMode in stream handler
- ui/public/ has all 3 VAD asset files
- ui/vite.config.ts has COOP/COEP headers
- @ricky0123/vad-react in ui/package.json dependencies
</verification>
<success_criteria>
All Phase 36 server deliverables present. COOP/COEP headers set on both Express and Vite dev server. VAD assets served from same-origin. Foundation ready for frontend voice components.
</success_criteria>
<output>
After completion, create `.planning/phases/37-web-chat-voice-ui/37-01-SUMMARY.md`
</output>