nexus/.planning/milestones/v1.5-phases/34-voice/34-01-PLAN.md
Nexus Dev 285bf585be chore: complete v1.5 Smart Onboarding + Personal AI Assistant milestone
6 phases, 13 plans, 21 requirements.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 03:55:49 +00:00

12 KiB

phase plan type wave depends_on files_modified autonomous requirements must_haves
34-voice 01 execute 1
server/src/app.ts
server/src/services/nexus-settings.ts
server/src/routes/nexus-settings.ts
ui/src/api/hardware.ts
ui/src/hooks/usePiperTts.ts
ui/src/components/TtsButton.tsx
true
VOICE-01
VOICE-02
truths artifacts key_links
POST /api/transcribe is reachable and returns 503 with descriptive error when no Whisper CLI is installed
usePiperTts hook exposes prewarm/speak/status/progress and transitions idle->downloading->ready->speaking
TtsButton renders a speaker icon that calls speak() and shows download progress during prewarm
voiceEnabled boolean is persisted in nexus-settings.json and exposed via GET/PATCH /nexus/settings
path provides exports
ui/src/hooks/usePiperTts.ts Piper TTS hook with prewarm, speak, status, progress
usePiperTts
path provides exports
ui/src/components/TtsButton.tsx Speaker button component for TTS playback
TtsButton
from to via pattern
server/src/app.ts server/src/routes/chat-files.ts api.use(chatFileRoutes(db, opts.storageService)) chatFileRoutes
from to via pattern
ui/src/hooks/usePiperTts.ts @mintplex-labs/piper-tts-web import { tts } tts.download|tts.predict
Fix the broken /transcribe route registration, create the Piper TTS browser hook and button component, and add voiceEnabled to nexus-settings persistence.

Purpose: VOICE-01 requires TTS on CPU-only hardware (browser WASM satisfies this). VOICE-02 requires visible download progress before first synthesis. The /transcribe route exists but is never mounted — a 1-line fix. voiceEnabled persistence is needed so onboarding voice opt-in survives sessions.

Output: Working /api/transcribe endpoint, usePiperTts hook, TtsButton component, voiceEnabled in nexus-settings.

<execution_context> @$HOME/.claude/get-shit-done/workflows/execute-plan.md @$HOME/.claude/get-shit-done/templates/summary.md </execution_context>

@.planning/PROJECT.md @.planning/ROADMAP.md @.planning/STATE.md @.planning/phases/34-voice/34-RESEARCH.md

@server/src/app.ts @server/src/routes/chat-files.ts @server/src/services/nexus-settings.ts @ui/src/api/hardware.ts @ui/src/components/VoiceRecordButton.tsx

From server/src/routes/chat-files.ts:

export function chatFileRoutes(db: Db, storage: StorageService) { ... }
// POST /transcribe — accepts multipart audio, returns { text: string } or 503

From server/src/app.ts (line 147 pattern):

api.use(assetRoutes(db, opts.storageService));
// chatFileRoutes uses the same (db, opts.storageService) signature

From server/src/services/nexus-settings.ts:

export const NEXUS_MODES = ["personal_ai", "project_builder", "both"] as const;
export type NexusMode = (typeof NEXUS_MODES)[number];
const nexusSettingsSchema = z.object({
  mode: z.enum(NEXUS_MODES).default("both"),
});
export function nexusSettingsService() { get(), set(patch) }

From ui/src/api/hardware.ts:

export type NexusMode = "personal_ai" | "project_builder" | "both";
export interface NexusSettings { mode: NexusMode; }
export function fetchNexusSettings(): Promise<NexusSettings>;
export function updateNexusSettings(settings: Partial<NexusSettings>): Promise<NexusSettings>;
Task 1: Register chatFileRoutes in app.ts and add voiceEnabled to nexus-settings server/src/app.ts, server/src/services/nexus-settings.ts, server/src/routes/nexus-settings.ts, ui/src/api/hardware.ts - server/src/app.ts (full file — find insertion point after assistantHandoffRoutes) - server/src/services/nexus-settings.ts (full file — understand schema) - server/src/routes/nexus-settings.ts (full file — understand PATCH handler) - ui/src/api/hardware.ts (full file — understand client types) **1. Register chatFileRoutes in app.ts:** - Add import at top with other route imports: `import { chatFileRoutes } from "./routes/chat-files.js";` - Add `api.use(chatFileRoutes(db, opts.storageService));` after the `api.use(assistantHandoffRoutes(db));` line (around line 161). Mirror the `assetRoutes(db, opts.storageService)` pattern exactly. - Do NOT place it before boardMutationGuard — the /transcribe route calls assertBoard(req) and needs to be inside the guarded api sub-router.

2. Add voiceEnabled to nexusSettingsSchema (server/src/services/nexus-settings.ts):

  • Add voiceEnabled: z.boolean().default(false) to the nexusSettingsSchema z.object.
  • This is a file-backed JSON field, NOT a DB migration — acceptable under the "no DB schema changes" constraint.

3. Update NexusSettings type on client (ui/src/api/hardware.ts):

  • Add voiceEnabled?: boolean to the NexusSettings interface.
  • No changes to API functions needed — they already handle Partial.

4. Check nexus-settings route handler (server/src/routes/nexus-settings.ts):

  • Read the file. The PATCH handler should already forward arbitrary fields to nexusSettingsService().set(patch) since it uses the Zod schema. If it manually picks fields, add voiceEnabled to the pick list. If it passes req.body through, no change needed. cd /opt/nexus && npx vitest run server/src/tests/chat-file-routes.test.ts 2>&1 | tail -5 <acceptance_criteria>
    • grep -q "chatFileRoutes" server/src/app.ts returns 0
    • grep -q "voiceEnabled" server/src/services/nexus-settings.ts returns 0
    • grep -q "voiceEnabled" ui/src/api/hardware.ts returns 0 </acceptance_criteria> POST /api/transcribe is reachable (returns 503 when no Whisper CLI installed, not 404). voiceEnabled persists in nexus-settings.json via the existing settings route.
Task 2: Create usePiperTts hook and TtsButton component ui/src/hooks/usePiperTts.ts, ui/src/components/TtsButton.tsx - ui/src/components/VoiceRecordButton.tsx (reference for button style patterns) - ui/src/components/ui/button.tsx (Button component API) **0. Install piper-tts-web:** ```bash pnpm --filter @paperclipai/ui add @mintplex-labs/piper-tts-web ```

1. Create ui/src/hooks/usePiperTts.ts:

import { useState, useCallback, useRef } from "react";
import { tts } from "@mintplex-labs/piper-tts-web";

const DEFAULT_VOICE = "en_US-hfc_female-medium";

export type TtsStatus = "idle" | "downloading" | "ready" | "speaking" | "error";

export function usePiperTts() {
  const [status, setStatus] = useState<TtsStatus>("idle");
  const [progress, setProgress] = useState(0);
  const audioRef = useRef<HTMLAudioElement | null>(null);

  const prewarm = useCallback(async () => {
    if (status === "ready" || status === "downloading") return;
    setStatus("downloading");
    setProgress(0);
    try {
      const stored = await tts.stored();
      if (!stored.includes(DEFAULT_VOICE)) {
        await tts.download(DEFAULT_VOICE, (p: { loaded: number; total: number }) => {
          setProgress(Math.round((p.loaded / p.total) * 100));
        });
      }
      setStatus("ready");
      setProgress(100);
    } catch {
      setStatus("error");
    }
  }, [status]);

  const speak = useCallback(async (text: string) => {
    if (status !== "ready") return;
    // Stop any currently playing audio
    if (audioRef.current) {
      audioRef.current.pause();
      audioRef.current = null;
    }
    setStatus("speaking");
    try {
      const wav = await tts.predict({ text, voiceId: DEFAULT_VOICE });
      const audio = new Audio(wav);
      audioRef.current = audio;
      audio.onended = () => {
        audioRef.current = null;
        setStatus("ready");
      };
      audio.onerror = () => {
        audioRef.current = null;
        setStatus("ready");
      };
      await audio.play();
    } catch {
      setStatus("ready");
    }
  }, [status]);

  const stop = useCallback(() => {
    if (audioRef.current) {
      audioRef.current.pause();
      audioRef.current = null;
    }
    if (status === "speaking") setStatus("ready");
  }, [status]);

  return { status, progress, prewarm, speak, stop };
}

Key points:

  • tts.stored() checks IndexedDB cache — skips download if model already present (VOICE-02).
  • tts.download() with progress callback provides visible download progress (VOICE-02).
  • tts.predict() returns a Blob URL (WAV) — use new Audio(url).play() (VOICE-01, CPU-safe WASM).
  • stop() allows interrupting playback.
  • Do NOT import this in any server-side or test file running in Node — browser-only.

2. Create ui/src/components/TtsButton.tsx:

import { Volume2, VolumeX, Loader2 } from "lucide-react";
import { Button } from "./ui/button";
import type { TtsStatus } from "../hooks/usePiperTts";

interface TtsButtonProps {
  status: TtsStatus;
  progress: number;
  onSpeak: () => void;
  onStop: () => void;
  onPrewarm: () => void;
  disabled?: boolean;
}

export function TtsButton({ status, progress, onSpeak, onStop, onPrewarm, disabled }: TtsButtonProps) {
  if (status === "downloading") {
    return (
      <Button variant="ghost" size="icon" className="h-8 w-8 relative" disabled title={`Downloading voice model: ${progress}%`}>
        <Loader2 className="h-4 w-4 animate-spin" />
        <span className="absolute -bottom-1 text-[10px] text-muted-foreground">{progress}%</span>
      </Button>
    );
  }

  if (status === "speaking") {
    return (
      <Button
        variant="ghost"
        size="icon"
        className="h-8 w-8 text-primary"
        onClick={onStop}
        aria-label="Stop speaking"
        title="Stop speaking"
      >
        <VolumeX className="h-4 w-4" />
      </Button>
    );
  }

  // idle or error: clicking triggers prewarm then speak
  // ready: clicking triggers speak directly
  const handleClick = () => {
    if (status === "ready") {
      onSpeak();
    } else {
      onPrewarm();
    }
  };

  return (
    <Button
      variant="ghost"
      size="icon"
      className="h-8 w-8"
      onClick={handleClick}
      disabled={disabled || status === "error"}
      aria-label="Read aloud"
      title={status === "error" ? "TTS unavailable" : status === "idle" ? "Download voice model and read aloud" : "Read aloud"}
    >
      <Volume2 className="h-4 w-4" />
    </Button>
  );
}

The TtsButton receives status/progress from the hook and delegates actions. It does NOT import piper-tts-web directly — all TTS logic stays in the hook. The button is reusable: PersonalAssistant (Plan 02) will place it next to assistant messages. cd /opt/nexus && grep -q "usePiperTts" ui/src/hooks/usePiperTts.ts && grep -q "TtsButton" ui/src/components/TtsButton.tsx && grep -q "piper-tts-web" ui/package.json 2>/dev/null || grep -q "piper-tts-web" pnpm-lock.yaml && echo "PASS" || echo "FAIL" <acceptance_criteria> - grep -q "tts.download" ui/src/hooks/usePiperTts.ts returns 0 - grep -q "tts.predict" ui/src/hooks/usePiperTts.ts returns 0 - grep -q "tts.stored" ui/src/hooks/usePiperTts.ts returns 0 - grep -q "TtsButton" ui/src/components/TtsButton.tsx returns 0 - grep -q "piper-tts-web" pnpm-lock.yaml returns 0 - grep -q "Volume2" ui/src/components/TtsButton.tsx returns 0 </acceptance_criteria> usePiperTts hook handles download progress (VOICE-02) and CPU-safe WASM synthesis (VOICE-01). TtsButton shows download progress during prewarm and speaker icon for playback. piper-tts-web is installed as a UI dependency.

- `grep -q "chatFileRoutes" server/src/app.ts` — route is registered - `grep -q "voiceEnabled" server/src/services/nexus-settings.ts` — settings schema extended - `ls ui/src/hooks/usePiperTts.ts ui/src/components/TtsButton.tsx` — both files exist - `npx vitest run server/src/__tests__/chat-file-routes.test.ts` — existing route tests pass

<success_criteria>

  1. POST /api/transcribe returns 503 (not 404) when no Whisper CLI is installed — route is mounted
  2. usePiperTts hook exports prewarm(), speak(), stop(), status, progress
  3. TtsButton renders download progress during prewarm and speaker icon for playback
  4. voiceEnabled persists in nexus-settings.json </success_criteria>
After completion, create `.planning/phases/34-voice/34-01-SUMMARY.md`