6 phases, 13 plans, 21 requirements. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
12 KiB
| phase | plan | type | wave | depends_on | files_modified | autonomous | requirements | must_haves | ||||||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 34-voice | 01 | execute | 1 |
|
true |
|
|
Purpose: VOICE-01 requires TTS on CPU-only hardware (browser WASM satisfies this). VOICE-02 requires visible download progress before first synthesis. The /transcribe route exists but is never mounted — a 1-line fix. voiceEnabled persistence is needed so onboarding voice opt-in survives sessions.
Output: Working /api/transcribe endpoint, usePiperTts hook, TtsButton component, voiceEnabled in nexus-settings.
<execution_context> @$HOME/.claude/get-shit-done/workflows/execute-plan.md @$HOME/.claude/get-shit-done/templates/summary.md </execution_context>
@.planning/PROJECT.md @.planning/ROADMAP.md @.planning/STATE.md @.planning/phases/34-voice/34-RESEARCH.md@server/src/app.ts @server/src/routes/chat-files.ts @server/src/services/nexus-settings.ts @ui/src/api/hardware.ts @ui/src/components/VoiceRecordButton.tsx
From server/src/routes/chat-files.ts:
export function chatFileRoutes(db: Db, storage: StorageService) { ... }
// POST /transcribe — accepts multipart audio, returns { text: string } or 503
From server/src/app.ts (line 147 pattern):
api.use(assetRoutes(db, opts.storageService));
// chatFileRoutes uses the same (db, opts.storageService) signature
From server/src/services/nexus-settings.ts:
export const NEXUS_MODES = ["personal_ai", "project_builder", "both"] as const;
export type NexusMode = (typeof NEXUS_MODES)[number];
const nexusSettingsSchema = z.object({
mode: z.enum(NEXUS_MODES).default("both"),
});
export function nexusSettingsService() { get(), set(patch) }
From ui/src/api/hardware.ts:
export type NexusMode = "personal_ai" | "project_builder" | "both";
export interface NexusSettings { mode: NexusMode; }
export function fetchNexusSettings(): Promise<NexusSettings>;
export function updateNexusSettings(settings: Partial<NexusSettings>): Promise<NexusSettings>;
Task 1: Register chatFileRoutes in app.ts and add voiceEnabled to nexus-settings
server/src/app.ts, server/src/services/nexus-settings.ts, server/src/routes/nexus-settings.ts, ui/src/api/hardware.ts
- server/src/app.ts (full file — find insertion point after assistantHandoffRoutes)
- server/src/services/nexus-settings.ts (full file — understand schema)
- server/src/routes/nexus-settings.ts (full file — understand PATCH handler)
- ui/src/api/hardware.ts (full file — understand client types)
**1. Register chatFileRoutes in app.ts:**
- Add import at top with other route imports: `import { chatFileRoutes } from "./routes/chat-files.js";`
- Add `api.use(chatFileRoutes(db, opts.storageService));` after the `api.use(assistantHandoffRoutes(db));` line (around line 161). Mirror the `assetRoutes(db, opts.storageService)` pattern exactly.
- Do NOT place it before boardMutationGuard — the /transcribe route calls assertBoard(req) and needs to be inside the guarded api sub-router.
2. Add voiceEnabled to nexusSettingsSchema (server/src/services/nexus-settings.ts):
- Add
voiceEnabled: z.boolean().default(false)to the nexusSettingsSchema z.object. - This is a file-backed JSON field, NOT a DB migration — acceptable under the "no DB schema changes" constraint.
3. Update NexusSettings type on client (ui/src/api/hardware.ts):
- Add
voiceEnabled?: booleanto theNexusSettingsinterface. - No changes to API functions needed — they already handle Partial.
4. Check nexus-settings route handler (server/src/routes/nexus-settings.ts):
- Read the file. The PATCH handler should already forward arbitrary fields to
nexusSettingsService().set(patch)since it uses the Zod schema. If it manually picks fields, add voiceEnabled to the pick list. If it passes req.body through, no change needed. cd /opt/nexus && npx vitest run server/src/tests/chat-file-routes.test.ts 2>&1 | tail -5 <acceptance_criteria>- grep -q "chatFileRoutes" server/src/app.ts returns 0
- grep -q "voiceEnabled" server/src/services/nexus-settings.ts returns 0
- grep -q "voiceEnabled" ui/src/api/hardware.ts returns 0 </acceptance_criteria> POST /api/transcribe is reachable (returns 503 when no Whisper CLI installed, not 404). voiceEnabled persists in nexus-settings.json via the existing settings route.
1. Create ui/src/hooks/usePiperTts.ts:
import { useState, useCallback, useRef } from "react";
import { tts } from "@mintplex-labs/piper-tts-web";
const DEFAULT_VOICE = "en_US-hfc_female-medium";
export type TtsStatus = "idle" | "downloading" | "ready" | "speaking" | "error";
export function usePiperTts() {
const [status, setStatus] = useState<TtsStatus>("idle");
const [progress, setProgress] = useState(0);
const audioRef = useRef<HTMLAudioElement | null>(null);
const prewarm = useCallback(async () => {
if (status === "ready" || status === "downloading") return;
setStatus("downloading");
setProgress(0);
try {
const stored = await tts.stored();
if (!stored.includes(DEFAULT_VOICE)) {
await tts.download(DEFAULT_VOICE, (p: { loaded: number; total: number }) => {
setProgress(Math.round((p.loaded / p.total) * 100));
});
}
setStatus("ready");
setProgress(100);
} catch {
setStatus("error");
}
}, [status]);
const speak = useCallback(async (text: string) => {
if (status !== "ready") return;
// Stop any currently playing audio
if (audioRef.current) {
audioRef.current.pause();
audioRef.current = null;
}
setStatus("speaking");
try {
const wav = await tts.predict({ text, voiceId: DEFAULT_VOICE });
const audio = new Audio(wav);
audioRef.current = audio;
audio.onended = () => {
audioRef.current = null;
setStatus("ready");
};
audio.onerror = () => {
audioRef.current = null;
setStatus("ready");
};
await audio.play();
} catch {
setStatus("ready");
}
}, [status]);
const stop = useCallback(() => {
if (audioRef.current) {
audioRef.current.pause();
audioRef.current = null;
}
if (status === "speaking") setStatus("ready");
}, [status]);
return { status, progress, prewarm, speak, stop };
}
Key points:
tts.stored()checks IndexedDB cache — skips download if model already present (VOICE-02).tts.download()with progress callback provides visible download progress (VOICE-02).tts.predict()returns a Blob URL (WAV) — usenew Audio(url).play()(VOICE-01, CPU-safe WASM).stop()allows interrupting playback.- Do NOT import this in any server-side or test file running in Node — browser-only.
2. Create ui/src/components/TtsButton.tsx:
import { Volume2, VolumeX, Loader2 } from "lucide-react";
import { Button } from "./ui/button";
import type { TtsStatus } from "../hooks/usePiperTts";
interface TtsButtonProps {
status: TtsStatus;
progress: number;
onSpeak: () => void;
onStop: () => void;
onPrewarm: () => void;
disabled?: boolean;
}
export function TtsButton({ status, progress, onSpeak, onStop, onPrewarm, disabled }: TtsButtonProps) {
if (status === "downloading") {
return (
<Button variant="ghost" size="icon" className="h-8 w-8 relative" disabled title={`Downloading voice model: ${progress}%`}>
<Loader2 className="h-4 w-4 animate-spin" />
<span className="absolute -bottom-1 text-[10px] text-muted-foreground">{progress}%</span>
</Button>
);
}
if (status === "speaking") {
return (
<Button
variant="ghost"
size="icon"
className="h-8 w-8 text-primary"
onClick={onStop}
aria-label="Stop speaking"
title="Stop speaking"
>
<VolumeX className="h-4 w-4" />
</Button>
);
}
// idle or error: clicking triggers prewarm then speak
// ready: clicking triggers speak directly
const handleClick = () => {
if (status === "ready") {
onSpeak();
} else {
onPrewarm();
}
};
return (
<Button
variant="ghost"
size="icon"
className="h-8 w-8"
onClick={handleClick}
disabled={disabled || status === "error"}
aria-label="Read aloud"
title={status === "error" ? "TTS unavailable" : status === "idle" ? "Download voice model and read aloud" : "Read aloud"}
>
<Volume2 className="h-4 w-4" />
</Button>
);
}
The TtsButton receives status/progress from the hook and delegates actions. It does NOT import piper-tts-web directly — all TTS logic stays in the hook. The button is reusable: PersonalAssistant (Plan 02) will place it next to assistant messages. cd /opt/nexus && grep -q "usePiperTts" ui/src/hooks/usePiperTts.ts && grep -q "TtsButton" ui/src/components/TtsButton.tsx && grep -q "piper-tts-web" ui/package.json 2>/dev/null || grep -q "piper-tts-web" pnpm-lock.yaml && echo "PASS" || echo "FAIL" <acceptance_criteria> - grep -q "tts.download" ui/src/hooks/usePiperTts.ts returns 0 - grep -q "tts.predict" ui/src/hooks/usePiperTts.ts returns 0 - grep -q "tts.stored" ui/src/hooks/usePiperTts.ts returns 0 - grep -q "TtsButton" ui/src/components/TtsButton.tsx returns 0 - grep -q "piper-tts-web" pnpm-lock.yaml returns 0 - grep -q "Volume2" ui/src/components/TtsButton.tsx returns 0 </acceptance_criteria> usePiperTts hook handles download progress (VOICE-02) and CPU-safe WASM synthesis (VOICE-01). TtsButton shows download progress during prewarm and speaker icon for playback. piper-tts-web is installed as a UI dependency.
- `grep -q "chatFileRoutes" server/src/app.ts` — route is registered - `grep -q "voiceEnabled" server/src/services/nexus-settings.ts` — settings schema extended - `ls ui/src/hooks/usePiperTts.ts ui/src/components/TtsButton.tsx` — both files exist - `npx vitest run server/src/__tests__/chat-file-routes.test.ts` — existing route tests pass<success_criteria>
- POST /api/transcribe returns 503 (not 404) when no Whisper CLI is installed — route is mounted
- usePiperTts hook exports prewarm(), speak(), stop(), status, progress
- TtsButton renders download progress during prewarm and speaker icon for playback
- voiceEnabled persists in nexus-settings.json </success_criteria>