nexus/.planning/phases/34-voice/34-01-PLAN.md
2026-04-04 03:55:49 +00:00

332 lines
12 KiB
Markdown

---
phase: 34-voice
plan: 01
type: execute
wave: 1
depends_on: []
files_modified:
- server/src/app.ts
- server/src/services/nexus-settings.ts
- server/src/routes/nexus-settings.ts
- ui/src/api/hardware.ts
- ui/src/hooks/usePiperTts.ts
- ui/src/components/TtsButton.tsx
autonomous: true
requirements:
- VOICE-01
- VOICE-02
must_haves:
truths:
- "POST /api/transcribe is reachable and returns 503 with descriptive error when no Whisper CLI is installed"
- "usePiperTts hook exposes prewarm/speak/status/progress and transitions idle->downloading->ready->speaking"
- "TtsButton renders a speaker icon that calls speak() and shows download progress during prewarm"
- "voiceEnabled boolean is persisted in nexus-settings.json and exposed via GET/PATCH /nexus/settings"
artifacts:
- path: "ui/src/hooks/usePiperTts.ts"
provides: "Piper TTS hook with prewarm, speak, status, progress"
exports: ["usePiperTts"]
- path: "ui/src/components/TtsButton.tsx"
provides: "Speaker button component for TTS playback"
exports: ["TtsButton"]
key_links:
- from: "server/src/app.ts"
to: "server/src/routes/chat-files.ts"
via: "api.use(chatFileRoutes(db, opts.storageService))"
pattern: "chatFileRoutes"
- from: "ui/src/hooks/usePiperTts.ts"
to: "@mintplex-labs/piper-tts-web"
via: "import { tts }"
pattern: "tts\\.download|tts\\.predict"
---
<objective>
Fix the broken /transcribe route registration, create the Piper TTS browser hook and button component, and add voiceEnabled to nexus-settings persistence.
Purpose: VOICE-01 requires TTS on CPU-only hardware (browser WASM satisfies this). VOICE-02 requires visible download progress before first synthesis. The /transcribe route exists but is never mounted — a 1-line fix. voiceEnabled persistence is needed so onboarding voice opt-in survives sessions.
Output: Working /api/transcribe endpoint, usePiperTts hook, TtsButton component, voiceEnabled in nexus-settings.
</objective>
<execution_context>
@$HOME/.claude/get-shit-done/workflows/execute-plan.md
@$HOME/.claude/get-shit-done/templates/summary.md
</execution_context>
<context>
@.planning/PROJECT.md
@.planning/ROADMAP.md
@.planning/STATE.md
@.planning/phases/34-voice/34-RESEARCH.md
@server/src/app.ts
@server/src/routes/chat-files.ts
@server/src/services/nexus-settings.ts
@ui/src/api/hardware.ts
@ui/src/components/VoiceRecordButton.tsx
<interfaces>
<!-- Existing interfaces the executor needs -->
From server/src/routes/chat-files.ts:
```typescript
export function chatFileRoutes(db: Db, storage: StorageService) { ... }
// POST /transcribe — accepts multipart audio, returns { text: string } or 503
```
From server/src/app.ts (line 147 pattern):
```typescript
api.use(assetRoutes(db, opts.storageService));
// chatFileRoutes uses the same (db, opts.storageService) signature
```
From server/src/services/nexus-settings.ts:
```typescript
export const NEXUS_MODES = ["personal_ai", "project_builder", "both"] as const;
export type NexusMode = (typeof NEXUS_MODES)[number];
const nexusSettingsSchema = z.object({
mode: z.enum(NEXUS_MODES).default("both"),
});
export function nexusSettingsService() { get(), set(patch) }
```
From ui/src/api/hardware.ts:
```typescript
export type NexusMode = "personal_ai" | "project_builder" | "both";
export interface NexusSettings { mode: NexusMode; }
export function fetchNexusSettings(): Promise<NexusSettings>;
export function updateNexusSettings(settings: Partial<NexusSettings>): Promise<NexusSettings>;
```
</interfaces>
</context>
<tasks>
<task type="auto">
<name>Task 1: Register chatFileRoutes in app.ts and add voiceEnabled to nexus-settings</name>
<files>server/src/app.ts, server/src/services/nexus-settings.ts, server/src/routes/nexus-settings.ts, ui/src/api/hardware.ts</files>
<read_first>
- server/src/app.ts (full file — find insertion point after assistantHandoffRoutes)
- server/src/services/nexus-settings.ts (full file — understand schema)
- server/src/routes/nexus-settings.ts (full file — understand PATCH handler)
- ui/src/api/hardware.ts (full file — understand client types)
</read_first>
<action>
**1. Register chatFileRoutes in app.ts:**
- Add import at top with other route imports: `import { chatFileRoutes } from "./routes/chat-files.js";`
- Add `api.use(chatFileRoutes(db, opts.storageService));` after the `api.use(assistantHandoffRoutes(db));` line (around line 161). Mirror the `assetRoutes(db, opts.storageService)` pattern exactly.
- Do NOT place it before boardMutationGuard — the /transcribe route calls assertBoard(req) and needs to be inside the guarded api sub-router.
**2. Add voiceEnabled to nexusSettingsSchema (server/src/services/nexus-settings.ts):**
- Add `voiceEnabled: z.boolean().default(false)` to the nexusSettingsSchema z.object.
- This is a file-backed JSON field, NOT a DB migration — acceptable under the "no DB schema changes" constraint.
**3. Update NexusSettings type on client (ui/src/api/hardware.ts):**
- Add `voiceEnabled?: boolean` to the `NexusSettings` interface.
- No changes to API functions needed — they already handle Partial<NexusSettings>.
**4. Check nexus-settings route handler (server/src/routes/nexus-settings.ts):**
- Read the file. The PATCH handler should already forward arbitrary fields to `nexusSettingsService().set(patch)` since it uses the Zod schema. If it manually picks fields, add voiceEnabled to the pick list. If it passes req.body through, no change needed.
</action>
<verify>
<automated>cd /opt/nexus && npx vitest run server/src/__tests__/chat-file-routes.test.ts 2>&1 | tail -5</automated>
</verify>
<acceptance_criteria>
- grep -q "chatFileRoutes" server/src/app.ts returns 0
- grep -q "voiceEnabled" server/src/services/nexus-settings.ts returns 0
- grep -q "voiceEnabled" ui/src/api/hardware.ts returns 0
</acceptance_criteria>
<done>POST /api/transcribe is reachable (returns 503 when no Whisper CLI installed, not 404). voiceEnabled persists in nexus-settings.json via the existing settings route.</done>
</task>
<task type="auto">
<name>Task 2: Create usePiperTts hook and TtsButton component</name>
<files>ui/src/hooks/usePiperTts.ts, ui/src/components/TtsButton.tsx</files>
<read_first>
- ui/src/components/VoiceRecordButton.tsx (reference for button style patterns)
- ui/src/components/ui/button.tsx (Button component API)
</read_first>
<action>
**0. Install piper-tts-web:**
```bash
pnpm --filter @paperclipai/ui add @mintplex-labs/piper-tts-web
```
**1. Create ui/src/hooks/usePiperTts.ts:**
```typescript
import { useState, useCallback, useRef } from "react";
import { tts } from "@mintplex-labs/piper-tts-web";
const DEFAULT_VOICE = "en_US-hfc_female-medium";
export type TtsStatus = "idle" | "downloading" | "ready" | "speaking" | "error";
export function usePiperTts() {
const [status, setStatus] = useState<TtsStatus>("idle");
const [progress, setProgress] = useState(0);
const audioRef = useRef<HTMLAudioElement | null>(null);
const prewarm = useCallback(async () => {
if (status === "ready" || status === "downloading") return;
setStatus("downloading");
setProgress(0);
try {
const stored = await tts.stored();
if (!stored.includes(DEFAULT_VOICE)) {
await tts.download(DEFAULT_VOICE, (p: { loaded: number; total: number }) => {
setProgress(Math.round((p.loaded / p.total) * 100));
});
}
setStatus("ready");
setProgress(100);
} catch {
setStatus("error");
}
}, [status]);
const speak = useCallback(async (text: string) => {
if (status !== "ready") return;
// Stop any currently playing audio
if (audioRef.current) {
audioRef.current.pause();
audioRef.current = null;
}
setStatus("speaking");
try {
const wav = await tts.predict({ text, voiceId: DEFAULT_VOICE });
const audio = new Audio(wav);
audioRef.current = audio;
audio.onended = () => {
audioRef.current = null;
setStatus("ready");
};
audio.onerror = () => {
audioRef.current = null;
setStatus("ready");
};
await audio.play();
} catch {
setStatus("ready");
}
}, [status]);
const stop = useCallback(() => {
if (audioRef.current) {
audioRef.current.pause();
audioRef.current = null;
}
if (status === "speaking") setStatus("ready");
}, [status]);
return { status, progress, prewarm, speak, stop };
}
```
Key points:
- `tts.stored()` checks IndexedDB cache — skips download if model already present (VOICE-02).
- `tts.download()` with progress callback provides visible download progress (VOICE-02).
- `tts.predict()` returns a Blob URL (WAV) — use `new Audio(url).play()` (VOICE-01, CPU-safe WASM).
- `stop()` allows interrupting playback.
- Do NOT import this in any server-side or test file running in Node — browser-only.
**2. Create ui/src/components/TtsButton.tsx:**
```typescript
import { Volume2, VolumeX, Loader2 } from "lucide-react";
import { Button } from "./ui/button";
import type { TtsStatus } from "../hooks/usePiperTts";
interface TtsButtonProps {
status: TtsStatus;
progress: number;
onSpeak: () => void;
onStop: () => void;
onPrewarm: () => void;
disabled?: boolean;
}
export function TtsButton({ status, progress, onSpeak, onStop, onPrewarm, disabled }: TtsButtonProps) {
if (status === "downloading") {
return (
<Button variant="ghost" size="icon" className="h-8 w-8 relative" disabled title={`Downloading voice model: ${progress}%`}>
<Loader2 className="h-4 w-4 animate-spin" />
<span className="absolute -bottom-1 text-[10px] text-muted-foreground">{progress}%</span>
</Button>
);
}
if (status === "speaking") {
return (
<Button
variant="ghost"
size="icon"
className="h-8 w-8 text-primary"
onClick={onStop}
aria-label="Stop speaking"
title="Stop speaking"
>
<VolumeX className="h-4 w-4" />
</Button>
);
}
// idle or error: clicking triggers prewarm then speak
// ready: clicking triggers speak directly
const handleClick = () => {
if (status === "ready") {
onSpeak();
} else {
onPrewarm();
}
};
return (
<Button
variant="ghost"
size="icon"
className="h-8 w-8"
onClick={handleClick}
disabled={disabled || status === "error"}
aria-label="Read aloud"
title={status === "error" ? "TTS unavailable" : status === "idle" ? "Download voice model and read aloud" : "Read aloud"}
>
<Volume2 className="h-4 w-4" />
</Button>
);
}
```
The TtsButton receives status/progress from the hook and delegates actions. It does NOT import piper-tts-web directly — all TTS logic stays in the hook. The button is reusable: PersonalAssistant (Plan 02) will place it next to assistant messages.
</action>
<verify>
<automated>cd /opt/nexus && grep -q "usePiperTts" ui/src/hooks/usePiperTts.ts && grep -q "TtsButton" ui/src/components/TtsButton.tsx && grep -q "piper-tts-web" ui/package.json 2>/dev/null || grep -q "piper-tts-web" pnpm-lock.yaml && echo "PASS" || echo "FAIL"</automated>
</verify>
<acceptance_criteria>
- grep -q "tts.download" ui/src/hooks/usePiperTts.ts returns 0
- grep -q "tts.predict" ui/src/hooks/usePiperTts.ts returns 0
- grep -q "tts.stored" ui/src/hooks/usePiperTts.ts returns 0
- grep -q "TtsButton" ui/src/components/TtsButton.tsx returns 0
- grep -q "piper-tts-web" pnpm-lock.yaml returns 0
- grep -q "Volume2" ui/src/components/TtsButton.tsx returns 0
</acceptance_criteria>
<done>usePiperTts hook handles download progress (VOICE-02) and CPU-safe WASM synthesis (VOICE-01). TtsButton shows download progress during prewarm and speaker icon for playback. piper-tts-web is installed as a UI dependency.</done>
</task>
</tasks>
<verification>
- `grep -q "chatFileRoutes" server/src/app.ts` — route is registered
- `grep -q "voiceEnabled" server/src/services/nexus-settings.ts` — settings schema extended
- `ls ui/src/hooks/usePiperTts.ts ui/src/components/TtsButton.tsx` — both files exist
- `npx vitest run server/src/__tests__/chat-file-routes.test.ts` — existing route tests pass
</verification>
<success_criteria>
1. POST /api/transcribe returns 503 (not 404) when no Whisper CLI is installed — route is mounted
2. usePiperTts hook exports prewarm(), speak(), stop(), status, progress
3. TtsButton renders download progress during prewarm and speaker icon for playback
4. voiceEnabled persists in nexus-settings.json
</success_criteria>
<output>
After completion, create `.planning/phases/34-voice/34-01-SUMMARY.md`
</output>