330 lines
14 KiB
Markdown
330 lines
14 KiB
Markdown
---
|
|
phase: 36-voice-pipeline-foundation
|
|
plan: 03
|
|
type: execute
|
|
wave: 2
|
|
depends_on: ["36-01", "36-02"]
|
|
files_modified:
|
|
- server/src/routes/voice.ts
|
|
- server/src/routes/chat-files.ts
|
|
- server/src/routes/chat.ts
|
|
- server/src/app.ts
|
|
- server/src/__tests__/36-voice-routes.test.ts
|
|
autonomous: true
|
|
requirements:
|
|
- VPIPE-03
|
|
- VPIPE-06
|
|
|
|
must_haves:
|
|
truths:
|
|
- "POST /api/transcribe accepts audio file upload and returns { text, language? }"
|
|
- "POST /api/synthesize accepts { text } body and returns audio/wav buffer"
|
|
- "voiceMode from request body is injected as dual-output system prompt in stream endpoint"
|
|
- "voiceMode is persisted to messageType column when message is saved"
|
|
- "Old /transcribe endpoint is removed from chat-files.ts"
|
|
- "Voice routes are mounted in app.ts"
|
|
artifacts:
|
|
- path: "server/src/routes/voice.ts"
|
|
provides: "POST /api/transcribe and POST /api/synthesize endpoints"
|
|
exports: ["voiceRoutes"]
|
|
- path: "server/src/__tests__/36-voice-routes.test.ts"
|
|
provides: "Integration tests for voice routes and voiceMode wiring"
|
|
min_lines: 60
|
|
key_links:
|
|
- from: "server/src/routes/voice.ts"
|
|
to: "server/src/services/voice-pipeline.ts"
|
|
via: "voicePipelineService() import"
|
|
pattern: "voicePipelineService"
|
|
- from: "server/src/routes/chat.ts"
|
|
to: "packages/shared/src/validators/chat.ts"
|
|
via: "createMessageSchema preserves voiceMode on parse"
|
|
pattern: "voiceMode"
|
|
- from: "server/src/app.ts"
|
|
to: "server/src/routes/voice.ts"
|
|
via: "api.use(voiceRoutes())"
|
|
pattern: "voiceRoutes"
|
|
---
|
|
|
|
<objective>
|
|
Create voice HTTP routes (transcribe + synthesize), wire voiceMode through the chat stream endpoint with dual-output prompt injection, mount in app.ts, and remove the old transcribe endpoint from chat-files.ts.
|
|
|
|
Purpose: VPIPE-03 requires the voice pipeline to be callable from any transport via HTTP. VPIPE-06 requires dual output (spoken prose + full markdown) triggered by voiceMode=full_voice in the stream endpoint.
|
|
|
|
Output: `server/src/routes/voice.ts` with two endpoints, updated `chat.ts` with voiceMode wiring, cleaned `chat-files.ts`, updated `app.ts` mount.
|
|
</objective>
|
|
|
|
<execution_context>
|
|
@$HOME/.claude/get-shit-done/workflows/execute-plan.md
|
|
@$HOME/.claude/get-shit-done/templates/summary.md
|
|
</execution_context>
|
|
|
|
<context>
|
|
@.planning/PROJECT.md
|
|
@.planning/ROADMAP.md
|
|
@.planning/STATE.md
|
|
@.planning/phases/36-voice-pipeline-foundation/36-RESEARCH.md
|
|
@.planning/phases/36-voice-pipeline-foundation/36-01-SUMMARY.md
|
|
@.planning/phases/36-voice-pipeline-foundation/36-02-SUMMARY.md
|
|
|
|
<interfaces>
|
|
<!-- From Plan 01 output -->
|
|
From server/src/services/voice-pipeline.ts:
|
|
```typescript
|
|
export function voicePipelineService(): {
|
|
transcribe(buffer: Buffer, format: "webm" | "ogg" | "wav"): Promise<{ text: string; language?: string }>;
|
|
synthesize(text: string, voiceId?: string): Promise<Buffer>;
|
|
formatForVoice(text: string): string;
|
|
};
|
|
```
|
|
|
|
<!-- From Plan 02 output -->
|
|
From packages/shared/src/validators/chat.ts:
|
|
```typescript
|
|
export const createMessageSchema = z.object({
|
|
role: z.enum(["user", "assistant", "system"]),
|
|
content: z.string().min(1).max(100_000),
|
|
agentId: z.string().uuid().optional(),
|
|
messageType: z.string().optional(),
|
|
voiceMode: z.enum(["text", "voice_input", "full_voice"]).optional(),
|
|
});
|
|
```
|
|
|
|
<!-- Existing code being modified -->
|
|
From server/src/routes/authz.ts:
|
|
```typescript
|
|
export function assertBoard(req: Request) {
|
|
if (req.actor.type !== "board") throw forbidden("Board access required");
|
|
}
|
|
```
|
|
|
|
From server/src/attachment-types.ts:
|
|
```typescript
|
|
export const MAX_ATTACHMENT_BYTES = ...
|
|
```
|
|
|
|
From server/src/app.ts (mount pattern, line ~164):
|
|
```typescript
|
|
api.use(chatFileRoutes(db, opts.storageService));
|
|
api.use(nexusSettingsRoutes());
|
|
```
|
|
|
|
From server/src/routes/chat.ts (stream endpoint, lines 91-194):
|
|
```typescript
|
|
router.post("/conversations/:id/stream", async (req, res) => {
|
|
assertBoard(req);
|
|
const { content, agentId } = req.body;
|
|
// ... builds messagesWithMemory array ...
|
|
// ... streams tokens ...
|
|
const message = await svc.addMessage(req.params.id!, {
|
|
role: "assistant",
|
|
content: fullContent.trim(),
|
|
agentId: agentId || undefined,
|
|
});
|
|
});
|
|
```
|
|
|
|
From server/src/routes/chat-files.ts (lines 297-386 to remove):
|
|
```typescript
|
|
// POST /transcribe — the old endpoint with inline audioUpload multer, runAudioUpload helper,
|
|
// and whisper-cpp/openai-whisper cascade. This entire block (lines 297-386) is replaced by voice.ts.
|
|
```
|
|
</interfaces>
|
|
</context>
|
|
|
|
<tasks>
|
|
|
|
<task type="auto" tdd="true">
|
|
<name>Task 1: Create voice.ts routes and tests</name>
|
|
<files>
|
|
server/src/routes/voice.ts
|
|
server/src/__tests__/36-voice-routes.test.ts
|
|
</files>
|
|
<read_first>
|
|
server/src/routes/chat-files.ts
|
|
server/src/routes/authz.ts
|
|
server/src/attachment-types.ts
|
|
server/src/services/voice-pipeline.ts
|
|
</read_first>
|
|
<behavior>
|
|
- POST /transcribe with valid audio file returns 200 with { text: "...", language: "..." }
|
|
- POST /transcribe without audio field returns 400 with { error: "Missing audio field" }
|
|
- POST /synthesize with { text: "Hello" } returns 200 with Content-Type audio/wav
|
|
- POST /synthesize without text returns 400 with { error: "text is required" }
|
|
- Both endpoints call assertBoard(req) for auth
|
|
</behavior>
|
|
<action>
|
|
1. Create `server/src/__tests__/36-voice-routes.test.ts` (RED):
|
|
- Mock `../services/voice-pipeline.js` to return a mock service object
|
|
- Mock `./authz.js` assertBoard to be a no-op
|
|
- Test POST /transcribe with a Buffer body returns { text, language }
|
|
- Test POST /transcribe without file returns 400
|
|
- Test POST /synthesize with { text: "Hello" } returns audio/wav content-type
|
|
- Test POST /synthesize without text returns 400
|
|
|
|
2. Create `server/src/routes/voice.ts` (GREEN):
|
|
```typescript
|
|
import { Router } from "express";
|
|
import multer from "multer";
|
|
import { assertBoard } from "./authz.js";
|
|
import { voicePipelineService } from "../services/voice-pipeline.js";
|
|
import { MAX_ATTACHMENT_BYTES } from "../attachment-types.js";
|
|
|
|
export function voiceRoutes(): Router {
|
|
const router = Router();
|
|
const svc = voicePipelineService();
|
|
const audioUpload = multer({
|
|
storage: multer.memoryStorage(),
|
|
limits: { fileSize: MAX_ATTACHMENT_BYTES, files: 1 },
|
|
});
|
|
|
|
// POST /api/transcribe — transcribe uploaded audio via VoicePipelineService
|
|
router.post("/transcribe", async (req, res) => {
|
|
assertBoard(req);
|
|
await new Promise<void>((resolve, reject) =>
|
|
audioUpload.single("audio")(req, res, (err) => (err ? reject(err) : resolve()))
|
|
);
|
|
const file = (req as any).file as { buffer: Buffer; mimetype: string } | undefined;
|
|
if (!file) {
|
|
res.status(400).json({ error: "Missing audio field" });
|
|
return;
|
|
}
|
|
const fmt = file.mimetype.includes("ogg") ? "ogg"
|
|
: file.mimetype.includes("wav") ? "wav"
|
|
: "webm";
|
|
const result = await svc.transcribe(file.buffer, fmt);
|
|
res.json(result);
|
|
});
|
|
|
|
// POST /api/synthesize — synthesize text to speech via VoicePipelineService
|
|
router.post("/synthesize", async (req, res) => {
|
|
assertBoard(req);
|
|
const { text, voiceId } = req.body as { text?: string; voiceId?: string };
|
|
if (!text || typeof text !== "string") {
|
|
res.status(400).json({ error: "text is required" });
|
|
return;
|
|
}
|
|
const audioBuffer = await svc.synthesize(text, voiceId);
|
|
res.setHeader("Content-Type", "audio/wav");
|
|
res.send(audioBuffer);
|
|
});
|
|
|
|
return router;
|
|
}
|
|
```
|
|
</action>
|
|
<verify>
|
|
<automated>cd /opt/nexus && pnpm --filter @paperclipai/server test --run src/__tests__/36-voice-routes.test.ts</automated>
|
|
</verify>
|
|
<acceptance_criteria>
|
|
- server/src/routes/voice.ts contains `export function voiceRoutes()`
|
|
- server/src/routes/voice.ts contains `router.post("/transcribe"`
|
|
- server/src/routes/voice.ts contains `router.post("/synthesize"`
|
|
- server/src/routes/voice.ts contains `import { voicePipelineService }` from voice-pipeline service
|
|
- server/src/routes/voice.ts contains `import { MAX_ATTACHMENT_BYTES }` from attachment-types
|
|
- server/src/routes/voice.ts contains `assertBoard(req)` on both routes
|
|
- server/src/routes/voice.ts contains `res.setHeader("Content-Type", "audio/wav")`
|
|
- server/src/__tests__/36-voice-routes.test.ts exits 0
|
|
</acceptance_criteria>
|
|
<done>Voice routes exist with POST /transcribe (audio upload -> VoicePipelineService.transcribe) and POST /synthesize (text body -> VoicePipelineService.synthesize -> WAV response). Both authenticated via assertBoard.</done>
|
|
</task>
|
|
|
|
<task type="auto">
|
|
<name>Task 2: Wire voiceMode in chat.ts stream, mount voice routes, remove old transcribe</name>
|
|
<files>
|
|
server/src/routes/chat.ts
|
|
server/src/routes/chat-files.ts
|
|
server/src/app.ts
|
|
</files>
|
|
<read_first>
|
|
server/src/routes/chat.ts
|
|
server/src/routes/chat-files.ts
|
|
server/src/app.ts
|
|
server/src/routes/voice.ts
|
|
</read_first>
|
|
<action>
|
|
1. Modify `server/src/routes/chat.ts` — inject voiceMode into stream endpoint:
|
|
- At line 93, change `const { content, agentId } = req.body;` to:
|
|
```typescript
|
|
const { content, agentId, voiceMode } = req.body as {
|
|
content: string; agentId?: string; voiceMode?: "text" | "voice_input" | "full_voice";
|
|
};
|
|
```
|
|
- After building `messagesWithMemory` array (after line 140 where user message is pushed), before the SSE headers block (before line 142), add:
|
|
```typescript
|
|
// Inject dual-output formatting prompt when voice mode is full_voice (VPIPE-06)
|
|
if (voiceMode === "full_voice") {
|
|
messagesWithMemory.push({
|
|
role: "system",
|
|
content: [
|
|
"Format your response with EXACTLY these two labeled sections:",
|
|
"",
|
|
"SPOKEN: [Natural speech prose only. No markdown. No bullet points. No code blocks. Max 2-3 sentences for spoken delivery.]",
|
|
"",
|
|
"DETAILED: [Your full response with all detail, code blocks, and markdown formatting.]",
|
|
].join("\n"),
|
|
});
|
|
}
|
|
```
|
|
- At line 167-171, update the `svc.addMessage()` call to include voiceMode in messageType:
|
|
```typescript
|
|
const message = await svc.addMessage(req.params.id!, {
|
|
role: "assistant",
|
|
content: fullContent.trim(),
|
|
agentId: agentId || undefined,
|
|
messageType: voiceMode === "full_voice" ? "voice_full"
|
|
: voiceMode === "voice_input" ? "voice_input"
|
|
: undefined,
|
|
});
|
|
```
|
|
|
|
2. Modify `server/src/routes/chat-files.ts` — remove old /transcribe endpoint:
|
|
- Delete lines 297-386: the `audioUpload` multer instance, `runAudioUpload` helper function, and the entire `router.post("/transcribe", ...)` handler
|
|
- The line `return router;` at line 388 should remain (it becomes the new end of the function)
|
|
- Remove the `multer` import ONLY if no other route in the file uses multer (check: the file upload routes at top of file use multer too, so keep the import)
|
|
|
|
3. Modify `server/src/app.ts` — mount voice routes:
|
|
- Add import at top with other route imports:
|
|
```typescript
|
|
import { voiceRoutes } from "./routes/voice.js";
|
|
```
|
|
- Add mount line after `api.use(nexusSettingsRoutes());` (after line 165):
|
|
```typescript
|
|
api.use(voiceRoutes());
|
|
```
|
|
</action>
|
|
<verify>
|
|
<automated>cd /opt/nexus && pnpm --filter @paperclipai/server exec tsc --noEmit 2>&1 | head -30</automated>
|
|
</verify>
|
|
<acceptance_criteria>
|
|
- server/src/routes/chat.ts contains `voiceMode` destructured from req.body
|
|
- server/src/routes/chat.ts contains `if (voiceMode === "full_voice")`
|
|
- server/src/routes/chat.ts contains `SPOKEN:` in the system prompt string
|
|
- server/src/routes/chat.ts contains `DETAILED:` in the system prompt string
|
|
- server/src/routes/chat.ts contains `messageType: voiceMode === "full_voice" ? "voice_full"`
|
|
- server/src/routes/chat-files.ts does NOT contain `router.post("/transcribe"` (old endpoint removed)
|
|
- server/src/app.ts contains `import { voiceRoutes }` from voice routes
|
|
- server/src/app.ts contains `api.use(voiceRoutes())`
|
|
- TypeScript compilation passes with no errors (`tsc --noEmit` exits 0)
|
|
</acceptance_criteria>
|
|
<done>
|
|
voiceMode flows from client request body through the stream endpoint: (1) dual-output system prompt injected when full_voice, (2) voiceMode persisted to messageType column on assistant message save. Old /transcribe endpoint removed from chat-files.ts. Voice routes mounted in app.ts. TypeScript compiles clean.
|
|
</done>
|
|
</task>
|
|
|
|
</tasks>
|
|
|
|
<verification>
|
|
- `pnpm --filter @paperclipai/server exec tsc --noEmit` exits 0
|
|
- `pnpm --filter @paperclipai/server test --run src/__tests__/36-voice-routes.test.ts` exits 0
|
|
- `grep -c "router.post(\"/transcribe\"" server/src/routes/chat-files.ts` returns 0 (old endpoint removed)
|
|
- `grep "voiceRoutes" server/src/app.ts` shows mount present
|
|
- `grep "voiceMode" server/src/routes/chat.ts` shows flag wired through stream endpoint
|
|
</verification>
|
|
|
|
<success_criteria>
|
|
Voice pipeline is fully callable via HTTP (POST /api/transcribe, POST /api/synthesize). voiceMode flag propagates from client request through the stream endpoint to message persistence. Dual output prompt injected for full_voice mode. Old transcribe endpoint removed from chat-files.ts. All routes mounted and TypeScript compiles clean.
|
|
</success_criteria>
|
|
|
|
<output>
|
|
After completion, create `.planning/phases/36-voice-pipeline-foundation/36-03-SUMMARY.md`
|
|
</output>
|