332 lines
12 KiB
Markdown
332 lines
12 KiB
Markdown
---
|
|
phase: 34-voice
|
|
plan: 01
|
|
type: execute
|
|
wave: 1
|
|
depends_on: []
|
|
files_modified:
|
|
- server/src/app.ts
|
|
- server/src/services/nexus-settings.ts
|
|
- server/src/routes/nexus-settings.ts
|
|
- ui/src/api/hardware.ts
|
|
- ui/src/hooks/usePiperTts.ts
|
|
- ui/src/components/TtsButton.tsx
|
|
autonomous: true
|
|
requirements:
|
|
- VOICE-01
|
|
- VOICE-02
|
|
|
|
must_haves:
|
|
truths:
|
|
- "POST /api/transcribe is reachable and returns 503 with descriptive error when no Whisper CLI is installed"
|
|
- "usePiperTts hook exposes prewarm/speak/status/progress and transitions idle->downloading->ready->speaking"
|
|
- "TtsButton renders a speaker icon that calls speak() and shows download progress during prewarm"
|
|
- "voiceEnabled boolean is persisted in nexus-settings.json and exposed via GET/PATCH /nexus/settings"
|
|
artifacts:
|
|
- path: "ui/src/hooks/usePiperTts.ts"
|
|
provides: "Piper TTS hook with prewarm, speak, status, progress"
|
|
exports: ["usePiperTts"]
|
|
- path: "ui/src/components/TtsButton.tsx"
|
|
provides: "Speaker button component for TTS playback"
|
|
exports: ["TtsButton"]
|
|
key_links:
|
|
- from: "server/src/app.ts"
|
|
to: "server/src/routes/chat-files.ts"
|
|
via: "api.use(chatFileRoutes(db, opts.storageService))"
|
|
pattern: "chatFileRoutes"
|
|
- from: "ui/src/hooks/usePiperTts.ts"
|
|
to: "@mintplex-labs/piper-tts-web"
|
|
via: "import { tts }"
|
|
pattern: "tts\\.download|tts\\.predict"
|
|
---
|
|
|
|
<objective>
|
|
Fix the broken /transcribe route registration, create the Piper TTS browser hook and button component, and add voiceEnabled to nexus-settings persistence.
|
|
|
|
Purpose: VOICE-01 requires TTS on CPU-only hardware (browser WASM satisfies this). VOICE-02 requires visible download progress before first synthesis. The /transcribe route exists but is never mounted — a 1-line fix. voiceEnabled persistence is needed so onboarding voice opt-in survives sessions.
|
|
|
|
Output: Working /api/transcribe endpoint, usePiperTts hook, TtsButton component, voiceEnabled in nexus-settings.
|
|
</objective>
|
|
|
|
<execution_context>
|
|
@$HOME/.claude/get-shit-done/workflows/execute-plan.md
|
|
@$HOME/.claude/get-shit-done/templates/summary.md
|
|
</execution_context>
|
|
|
|
<context>
|
|
@.planning/PROJECT.md
|
|
@.planning/ROADMAP.md
|
|
@.planning/STATE.md
|
|
@.planning/phases/34-voice/34-RESEARCH.md
|
|
|
|
@server/src/app.ts
|
|
@server/src/routes/chat-files.ts
|
|
@server/src/services/nexus-settings.ts
|
|
@ui/src/api/hardware.ts
|
|
@ui/src/components/VoiceRecordButton.tsx
|
|
|
|
<interfaces>
|
|
<!-- Existing interfaces the executor needs -->
|
|
|
|
From server/src/routes/chat-files.ts:
|
|
```typescript
|
|
export function chatFileRoutes(db: Db, storage: StorageService) { ... }
|
|
// POST /transcribe — accepts multipart audio, returns { text: string } or 503
|
|
```
|
|
|
|
From server/src/app.ts (line 147 pattern):
|
|
```typescript
|
|
api.use(assetRoutes(db, opts.storageService));
|
|
// chatFileRoutes uses the same (db, opts.storageService) signature
|
|
```
|
|
|
|
From server/src/services/nexus-settings.ts:
|
|
```typescript
|
|
export const NEXUS_MODES = ["personal_ai", "project_builder", "both"] as const;
|
|
export type NexusMode = (typeof NEXUS_MODES)[number];
|
|
const nexusSettingsSchema = z.object({
|
|
mode: z.enum(NEXUS_MODES).default("both"),
|
|
});
|
|
export function nexusSettingsService() { get(), set(patch) }
|
|
```
|
|
|
|
From ui/src/api/hardware.ts:
|
|
```typescript
|
|
export type NexusMode = "personal_ai" | "project_builder" | "both";
|
|
export interface NexusSettings { mode: NexusMode; }
|
|
export function fetchNexusSettings(): Promise<NexusSettings>;
|
|
export function updateNexusSettings(settings: Partial<NexusSettings>): Promise<NexusSettings>;
|
|
```
|
|
</interfaces>
|
|
</context>
|
|
|
|
<tasks>
|
|
|
|
<task type="auto">
|
|
<name>Task 1: Register chatFileRoutes in app.ts and add voiceEnabled to nexus-settings</name>
|
|
<files>server/src/app.ts, server/src/services/nexus-settings.ts, server/src/routes/nexus-settings.ts, ui/src/api/hardware.ts</files>
|
|
<read_first>
|
|
- server/src/app.ts (full file — find insertion point after assistantHandoffRoutes)
|
|
- server/src/services/nexus-settings.ts (full file — understand schema)
|
|
- server/src/routes/nexus-settings.ts (full file — understand PATCH handler)
|
|
- ui/src/api/hardware.ts (full file — understand client types)
|
|
</read_first>
|
|
<action>
|
|
**1. Register chatFileRoutes in app.ts:**
|
|
- Add import at top with other route imports: `import { chatFileRoutes } from "./routes/chat-files.js";`
|
|
- Add `api.use(chatFileRoutes(db, opts.storageService));` after the `api.use(assistantHandoffRoutes(db));` line (around line 161). Mirror the `assetRoutes(db, opts.storageService)` pattern exactly.
|
|
- Do NOT place it before boardMutationGuard — the /transcribe route calls assertBoard(req) and needs to be inside the guarded api sub-router.
|
|
|
|
**2. Add voiceEnabled to nexusSettingsSchema (server/src/services/nexus-settings.ts):**
|
|
- Add `voiceEnabled: z.boolean().default(false)` to the nexusSettingsSchema z.object.
|
|
- This is a file-backed JSON field, NOT a DB migration — acceptable under the "no DB schema changes" constraint.
|
|
|
|
**3. Update NexusSettings type on client (ui/src/api/hardware.ts):**
|
|
- Add `voiceEnabled?: boolean` to the `NexusSettings` interface.
|
|
- No changes to API functions needed — they already handle Partial<NexusSettings>.
|
|
|
|
**4. Check nexus-settings route handler (server/src/routes/nexus-settings.ts):**
|
|
- Read the file. The PATCH handler should already forward arbitrary fields to `nexusSettingsService().set(patch)` since it uses the Zod schema. If it manually picks fields, add voiceEnabled to the pick list. If it passes req.body through, no change needed.
|
|
</action>
|
|
<verify>
|
|
<automated>cd /opt/nexus && npx vitest run server/src/__tests__/chat-file-routes.test.ts 2>&1 | tail -5</automated>
|
|
</verify>
|
|
<acceptance_criteria>
|
|
- grep -q "chatFileRoutes" server/src/app.ts returns 0
|
|
- grep -q "voiceEnabled" server/src/services/nexus-settings.ts returns 0
|
|
- grep -q "voiceEnabled" ui/src/api/hardware.ts returns 0
|
|
</acceptance_criteria>
|
|
<done>POST /api/transcribe is reachable (returns 503 when no Whisper CLI installed, not 404). voiceEnabled persists in nexus-settings.json via the existing settings route.</done>
|
|
</task>
|
|
|
|
<task type="auto">
|
|
<name>Task 2: Create usePiperTts hook and TtsButton component</name>
|
|
<files>ui/src/hooks/usePiperTts.ts, ui/src/components/TtsButton.tsx</files>
|
|
<read_first>
|
|
- ui/src/components/VoiceRecordButton.tsx (reference for button style patterns)
|
|
- ui/src/components/ui/button.tsx (Button component API)
|
|
</read_first>
|
|
<action>
|
|
**0. Install piper-tts-web:**
|
|
```bash
|
|
pnpm --filter @paperclipai/ui add @mintplex-labs/piper-tts-web
|
|
```
|
|
|
|
**1. Create ui/src/hooks/usePiperTts.ts:**
|
|
```typescript
|
|
import { useState, useCallback, useRef } from "react";
|
|
import { tts } from "@mintplex-labs/piper-tts-web";
|
|
|
|
const DEFAULT_VOICE = "en_US-hfc_female-medium";
|
|
|
|
export type TtsStatus = "idle" | "downloading" | "ready" | "speaking" | "error";
|
|
|
|
export function usePiperTts() {
|
|
const [status, setStatus] = useState<TtsStatus>("idle");
|
|
const [progress, setProgress] = useState(0);
|
|
const audioRef = useRef<HTMLAudioElement | null>(null);
|
|
|
|
const prewarm = useCallback(async () => {
|
|
if (status === "ready" || status === "downloading") return;
|
|
setStatus("downloading");
|
|
setProgress(0);
|
|
try {
|
|
const stored = await tts.stored();
|
|
if (!stored.includes(DEFAULT_VOICE)) {
|
|
await tts.download(DEFAULT_VOICE, (p: { loaded: number; total: number }) => {
|
|
setProgress(Math.round((p.loaded / p.total) * 100));
|
|
});
|
|
}
|
|
setStatus("ready");
|
|
setProgress(100);
|
|
} catch {
|
|
setStatus("error");
|
|
}
|
|
}, [status]);
|
|
|
|
const speak = useCallback(async (text: string) => {
|
|
if (status !== "ready") return;
|
|
// Stop any currently playing audio
|
|
if (audioRef.current) {
|
|
audioRef.current.pause();
|
|
audioRef.current = null;
|
|
}
|
|
setStatus("speaking");
|
|
try {
|
|
const wav = await tts.predict({ text, voiceId: DEFAULT_VOICE });
|
|
const audio = new Audio(wav);
|
|
audioRef.current = audio;
|
|
audio.onended = () => {
|
|
audioRef.current = null;
|
|
setStatus("ready");
|
|
};
|
|
audio.onerror = () => {
|
|
audioRef.current = null;
|
|
setStatus("ready");
|
|
};
|
|
await audio.play();
|
|
} catch {
|
|
setStatus("ready");
|
|
}
|
|
}, [status]);
|
|
|
|
const stop = useCallback(() => {
|
|
if (audioRef.current) {
|
|
audioRef.current.pause();
|
|
audioRef.current = null;
|
|
}
|
|
if (status === "speaking") setStatus("ready");
|
|
}, [status]);
|
|
|
|
return { status, progress, prewarm, speak, stop };
|
|
}
|
|
```
|
|
|
|
Key points:
|
|
- `tts.stored()` checks IndexedDB cache — skips download if model already present (VOICE-02).
|
|
- `tts.download()` with progress callback provides visible download progress (VOICE-02).
|
|
- `tts.predict()` returns a Blob URL (WAV) — use `new Audio(url).play()` (VOICE-01, CPU-safe WASM).
|
|
- `stop()` allows interrupting playback.
|
|
- Do NOT import this in any server-side or test file running in Node — browser-only.
|
|
|
|
**2. Create ui/src/components/TtsButton.tsx:**
|
|
```typescript
|
|
import { Volume2, VolumeX, Loader2 } from "lucide-react";
|
|
import { Button } from "./ui/button";
|
|
import type { TtsStatus } from "../hooks/usePiperTts";
|
|
|
|
interface TtsButtonProps {
|
|
status: TtsStatus;
|
|
progress: number;
|
|
onSpeak: () => void;
|
|
onStop: () => void;
|
|
onPrewarm: () => void;
|
|
disabled?: boolean;
|
|
}
|
|
|
|
export function TtsButton({ status, progress, onSpeak, onStop, onPrewarm, disabled }: TtsButtonProps) {
|
|
if (status === "downloading") {
|
|
return (
|
|
<Button variant="ghost" size="icon" className="h-8 w-8 relative" disabled title={`Downloading voice model: ${progress}%`}>
|
|
<Loader2 className="h-4 w-4 animate-spin" />
|
|
<span className="absolute -bottom-1 text-[10px] text-muted-foreground">{progress}%</span>
|
|
</Button>
|
|
);
|
|
}
|
|
|
|
if (status === "speaking") {
|
|
return (
|
|
<Button
|
|
variant="ghost"
|
|
size="icon"
|
|
className="h-8 w-8 text-primary"
|
|
onClick={onStop}
|
|
aria-label="Stop speaking"
|
|
title="Stop speaking"
|
|
>
|
|
<VolumeX className="h-4 w-4" />
|
|
</Button>
|
|
);
|
|
}
|
|
|
|
// idle or error: clicking triggers prewarm then speak
|
|
// ready: clicking triggers speak directly
|
|
const handleClick = () => {
|
|
if (status === "ready") {
|
|
onSpeak();
|
|
} else {
|
|
onPrewarm();
|
|
}
|
|
};
|
|
|
|
return (
|
|
<Button
|
|
variant="ghost"
|
|
size="icon"
|
|
className="h-8 w-8"
|
|
onClick={handleClick}
|
|
disabled={disabled || status === "error"}
|
|
aria-label="Read aloud"
|
|
title={status === "error" ? "TTS unavailable" : status === "idle" ? "Download voice model and read aloud" : "Read aloud"}
|
|
>
|
|
<Volume2 className="h-4 w-4" />
|
|
</Button>
|
|
);
|
|
}
|
|
```
|
|
|
|
The TtsButton receives status/progress from the hook and delegates actions. It does NOT import piper-tts-web directly — all TTS logic stays in the hook. The button is reusable: PersonalAssistant (Plan 02) will place it next to assistant messages.
|
|
</action>
|
|
<verify>
|
|
<automated>cd /opt/nexus && grep -q "usePiperTts" ui/src/hooks/usePiperTts.ts && grep -q "TtsButton" ui/src/components/TtsButton.tsx && grep -q "piper-tts-web" ui/package.json 2>/dev/null || grep -q "piper-tts-web" pnpm-lock.yaml && echo "PASS" || echo "FAIL"</automated>
|
|
</verify>
|
|
<acceptance_criteria>
|
|
- grep -q "tts.download" ui/src/hooks/usePiperTts.ts returns 0
|
|
- grep -q "tts.predict" ui/src/hooks/usePiperTts.ts returns 0
|
|
- grep -q "tts.stored" ui/src/hooks/usePiperTts.ts returns 0
|
|
- grep -q "TtsButton" ui/src/components/TtsButton.tsx returns 0
|
|
- grep -q "piper-tts-web" pnpm-lock.yaml returns 0
|
|
- grep -q "Volume2" ui/src/components/TtsButton.tsx returns 0
|
|
</acceptance_criteria>
|
|
<done>usePiperTts hook handles download progress (VOICE-02) and CPU-safe WASM synthesis (VOICE-01). TtsButton shows download progress during prewarm and speaker icon for playback. piper-tts-web is installed as a UI dependency.</done>
|
|
</task>
|
|
|
|
</tasks>
|
|
|
|
<verification>
|
|
- `grep -q "chatFileRoutes" server/src/app.ts` — route is registered
|
|
- `grep -q "voiceEnabled" server/src/services/nexus-settings.ts` — settings schema extended
|
|
- `ls ui/src/hooks/usePiperTts.ts ui/src/components/TtsButton.tsx` — both files exist
|
|
- `npx vitest run server/src/__tests__/chat-file-routes.test.ts` — existing route tests pass
|
|
</verification>
|
|
|
|
<success_criteria>
|
|
1. POST /api/transcribe returns 503 (not 404) when no Whisper CLI is installed — route is mounted
|
|
2. usePiperTts hook exports prewarm(), speak(), stop(), status, progress
|
|
3. TtsButton renders download progress during prewarm and speaker icon for playback
|
|
4. voiceEnabled persists in nexus-settings.json
|
|
</success_criteria>
|
|
|
|
<output>
|
|
After completion, create `.planning/phases/34-voice/34-01-SUMMARY.md`
|
|
</output>
|