--- phase: 36-voice-pipeline-foundation plan: 03 type: execute wave: 2 depends_on: ["36-01", "36-02"] files_modified: - server/src/routes/voice.ts - server/src/routes/chat-files.ts - server/src/routes/chat.ts - server/src/app.ts - server/src/__tests__/36-voice-routes.test.ts autonomous: true requirements: - VPIPE-03 - VPIPE-06 must_haves: truths: - "POST /api/transcribe accepts audio file upload and returns { text, language? }" - "POST /api/synthesize accepts { text } body and returns audio/wav buffer" - "voiceMode from request body is injected as dual-output system prompt in stream endpoint" - "voiceMode is persisted to messageType column when message is saved" - "Old /transcribe endpoint is removed from chat-files.ts" - "Voice routes are mounted in app.ts" artifacts: - path: "server/src/routes/voice.ts" provides: "POST /api/transcribe and POST /api/synthesize endpoints" exports: ["voiceRoutes"] - path: "server/src/__tests__/36-voice-routes.test.ts" provides: "Integration tests for voice routes and voiceMode wiring" min_lines: 60 key_links: - from: "server/src/routes/voice.ts" to: "server/src/services/voice-pipeline.ts" via: "voicePipelineService() import" pattern: "voicePipelineService" - from: "server/src/routes/chat.ts" to: "packages/shared/src/validators/chat.ts" via: "createMessageSchema preserves voiceMode on parse" pattern: "voiceMode" - from: "server/src/app.ts" to: "server/src/routes/voice.ts" via: "api.use(voiceRoutes())" pattern: "voiceRoutes" --- Create voice HTTP routes (transcribe + synthesize), wire voiceMode through the chat stream endpoint with dual-output prompt injection, mount in app.ts, and remove the old transcribe endpoint from chat-files.ts. Purpose: VPIPE-03 requires the voice pipeline to be callable from any transport via HTTP. VPIPE-06 requires dual output (spoken prose + full markdown) triggered by voiceMode=full_voice in the stream endpoint. Output: `server/src/routes/voice.ts` with two endpoints, updated `chat.ts` with voiceMode wiring, cleaned `chat-files.ts`, updated `app.ts` mount. @$HOME/.claude/get-shit-done/workflows/execute-plan.md @$HOME/.claude/get-shit-done/templates/summary.md @.planning/PROJECT.md @.planning/ROADMAP.md @.planning/STATE.md @.planning/phases/36-voice-pipeline-foundation/36-RESEARCH.md @.planning/phases/36-voice-pipeline-foundation/36-01-SUMMARY.md @.planning/phases/36-voice-pipeline-foundation/36-02-SUMMARY.md From server/src/services/voice-pipeline.ts: ```typescript export function voicePipelineService(): { transcribe(buffer: Buffer, format: "webm" | "ogg" | "wav"): Promise<{ text: string; language?: string }>; synthesize(text: string, voiceId?: string): Promise; formatForVoice(text: string): string; }; ``` From packages/shared/src/validators/chat.ts: ```typescript export const createMessageSchema = z.object({ role: z.enum(["user", "assistant", "system"]), content: z.string().min(1).max(100_000), agentId: z.string().uuid().optional(), messageType: z.string().optional(), voiceMode: z.enum(["text", "voice_input", "full_voice"]).optional(), }); ``` From server/src/routes/authz.ts: ```typescript export function assertBoard(req: Request) { if (req.actor.type !== "board") throw forbidden("Board access required"); } ``` From server/src/attachment-types.ts: ```typescript export const MAX_ATTACHMENT_BYTES = ... ``` From server/src/app.ts (mount pattern, line ~164): ```typescript api.use(chatFileRoutes(db, opts.storageService)); api.use(nexusSettingsRoutes()); ``` From server/src/routes/chat.ts (stream endpoint, lines 91-194): ```typescript router.post("/conversations/:id/stream", async (req, res) => { assertBoard(req); const { content, agentId } = req.body; // ... builds messagesWithMemory array ... // ... streams tokens ... const message = await svc.addMessage(req.params.id!, { role: "assistant", content: fullContent.trim(), agentId: agentId || undefined, }); }); ``` From server/src/routes/chat-files.ts (lines 297-386 to remove): ```typescript // POST /transcribe — the old endpoint with inline audioUpload multer, runAudioUpload helper, // and whisper-cpp/openai-whisper cascade. This entire block (lines 297-386) is replaced by voice.ts. ``` Task 1: Create voice.ts routes and tests server/src/routes/voice.ts server/src/__tests__/36-voice-routes.test.ts server/src/routes/chat-files.ts server/src/routes/authz.ts server/src/attachment-types.ts server/src/services/voice-pipeline.ts - POST /transcribe with valid audio file returns 200 with { text: "...", language: "..." } - POST /transcribe without audio field returns 400 with { error: "Missing audio field" } - POST /synthesize with { text: "Hello" } returns 200 with Content-Type audio/wav - POST /synthesize without text returns 400 with { error: "text is required" } - Both endpoints call assertBoard(req) for auth 1. Create `server/src/__tests__/36-voice-routes.test.ts` (RED): - Mock `../services/voice-pipeline.js` to return a mock service object - Mock `./authz.js` assertBoard to be a no-op - Test POST /transcribe with a Buffer body returns { text, language } - Test POST /transcribe without file returns 400 - Test POST /synthesize with { text: "Hello" } returns audio/wav content-type - Test POST /synthesize without text returns 400 2. Create `server/src/routes/voice.ts` (GREEN): ```typescript import { Router } from "express"; import multer from "multer"; import { assertBoard } from "./authz.js"; import { voicePipelineService } from "../services/voice-pipeline.js"; import { MAX_ATTACHMENT_BYTES } from "../attachment-types.js"; export function voiceRoutes(): Router { const router = Router(); const svc = voicePipelineService(); const audioUpload = multer({ storage: multer.memoryStorage(), limits: { fileSize: MAX_ATTACHMENT_BYTES, files: 1 }, }); // POST /api/transcribe — transcribe uploaded audio via VoicePipelineService router.post("/transcribe", async (req, res) => { assertBoard(req); await new Promise((resolve, reject) => audioUpload.single("audio")(req, res, (err) => (err ? reject(err) : resolve())) ); const file = (req as any).file as { buffer: Buffer; mimetype: string } | undefined; if (!file) { res.status(400).json({ error: "Missing audio field" }); return; } const fmt = file.mimetype.includes("ogg") ? "ogg" : file.mimetype.includes("wav") ? "wav" : "webm"; const result = await svc.transcribe(file.buffer, fmt); res.json(result); }); // POST /api/synthesize — synthesize text to speech via VoicePipelineService router.post("/synthesize", async (req, res) => { assertBoard(req); const { text, voiceId } = req.body as { text?: string; voiceId?: string }; if (!text || typeof text !== "string") { res.status(400).json({ error: "text is required" }); return; } const audioBuffer = await svc.synthesize(text, voiceId); res.setHeader("Content-Type", "audio/wav"); res.send(audioBuffer); }); return router; } ``` cd /opt/nexus && pnpm --filter @paperclipai/server test --run src/__tests__/36-voice-routes.test.ts - server/src/routes/voice.ts contains `export function voiceRoutes()` - server/src/routes/voice.ts contains `router.post("/transcribe"` - server/src/routes/voice.ts contains `router.post("/synthesize"` - server/src/routes/voice.ts contains `import { voicePipelineService }` from voice-pipeline service - server/src/routes/voice.ts contains `import { MAX_ATTACHMENT_BYTES }` from attachment-types - server/src/routes/voice.ts contains `assertBoard(req)` on both routes - server/src/routes/voice.ts contains `res.setHeader("Content-Type", "audio/wav")` - server/src/__tests__/36-voice-routes.test.ts exits 0 Voice routes exist with POST /transcribe (audio upload -> VoicePipelineService.transcribe) and POST /synthesize (text body -> VoicePipelineService.synthesize -> WAV response). Both authenticated via assertBoard. Task 2: Wire voiceMode in chat.ts stream, mount voice routes, remove old transcribe server/src/routes/chat.ts server/src/routes/chat-files.ts server/src/app.ts server/src/routes/chat.ts server/src/routes/chat-files.ts server/src/app.ts server/src/routes/voice.ts 1. Modify `server/src/routes/chat.ts` — inject voiceMode into stream endpoint: - At line 93, change `const { content, agentId } = req.body;` to: ```typescript const { content, agentId, voiceMode } = req.body as { content: string; agentId?: string; voiceMode?: "text" | "voice_input" | "full_voice"; }; ``` - After building `messagesWithMemory` array (after line 140 where user message is pushed), before the SSE headers block (before line 142), add: ```typescript // Inject dual-output formatting prompt when voice mode is full_voice (VPIPE-06) if (voiceMode === "full_voice") { messagesWithMemory.push({ role: "system", content: [ "Format your response with EXACTLY these two labeled sections:", "", "SPOKEN: [Natural speech prose only. No markdown. No bullet points. No code blocks. Max 2-3 sentences for spoken delivery.]", "", "DETAILED: [Your full response with all detail, code blocks, and markdown formatting.]", ].join("\n"), }); } ``` - At line 167-171, update the `svc.addMessage()` call to include voiceMode in messageType: ```typescript const message = await svc.addMessage(req.params.id!, { role: "assistant", content: fullContent.trim(), agentId: agentId || undefined, messageType: voiceMode === "full_voice" ? "voice_full" : voiceMode === "voice_input" ? "voice_input" : undefined, }); ``` 2. Modify `server/src/routes/chat-files.ts` — remove old /transcribe endpoint: - Delete lines 297-386: the `audioUpload` multer instance, `runAudioUpload` helper function, and the entire `router.post("/transcribe", ...)` handler - The line `return router;` at line 388 should remain (it becomes the new end of the function) - Remove the `multer` import ONLY if no other route in the file uses multer (check: the file upload routes at top of file use multer too, so keep the import) 3. Modify `server/src/app.ts` — mount voice routes: - Add import at top with other route imports: ```typescript import { voiceRoutes } from "./routes/voice.js"; ``` - Add mount line after `api.use(nexusSettingsRoutes());` (after line 165): ```typescript api.use(voiceRoutes()); ``` cd /opt/nexus && pnpm --filter @paperclipai/server exec tsc --noEmit 2>&1 | head -30 - server/src/routes/chat.ts contains `voiceMode` destructured from req.body - server/src/routes/chat.ts contains `if (voiceMode === "full_voice")` - server/src/routes/chat.ts contains `SPOKEN:` in the system prompt string - server/src/routes/chat.ts contains `DETAILED:` in the system prompt string - server/src/routes/chat.ts contains `messageType: voiceMode === "full_voice" ? "voice_full"` - server/src/routes/chat-files.ts does NOT contain `router.post("/transcribe"` (old endpoint removed) - server/src/app.ts contains `import { voiceRoutes }` from voice routes - server/src/app.ts contains `api.use(voiceRoutes())` - TypeScript compilation passes with no errors (`tsc --noEmit` exits 0) voiceMode flows from client request body through the stream endpoint: (1) dual-output system prompt injected when full_voice, (2) voiceMode persisted to messageType column on assistant message save. Old /transcribe endpoint removed from chat-files.ts. Voice routes mounted in app.ts. TypeScript compiles clean. - `pnpm --filter @paperclipai/server exec tsc --noEmit` exits 0 - `pnpm --filter @paperclipai/server test --run src/__tests__/36-voice-routes.test.ts` exits 0 - `grep -c "router.post(\"/transcribe\"" server/src/routes/chat-files.ts` returns 0 (old endpoint removed) - `grep "voiceRoutes" server/src/app.ts` shows mount present - `grep "voiceMode" server/src/routes/chat.ts` shows flag wired through stream endpoint Voice pipeline is fully callable via HTTP (POST /api/transcribe, POST /api/synthesize). voiceMode flag propagates from client request through the stream endpoint to message persistence. Dual output prompt injected for full_voice mode. Old transcribe endpoint removed from chat-files.ts. All routes mounted and TypeScript compiles clean. After completion, create `.planning/phases/36-voice-pipeline-foundation/36-03-SUMMARY.md`