nexus/.planning/milestones/v1.5-phases/34-voice/34-01-PLAN.md at 3d2117ee9ff455a797c57cde3ffefaa11e8a3f9c

Nexus Dev 285bf585be chore: complete v1.5 Smart Onboarding + Personal AI Assistant milestone

6 phases, 13 plans, 21 requirements.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

2026-04-04 03:55:49 +00:00

12 KiB

Raw Blame History

phase

plan

type

wave

depends_on

files_modified

autonomous

requirements

must_haves

34-voice

execute

server/src/app.ts

server/src/services/nexus-settings.ts

server/src/routes/nexus-settings.ts

ui/src/api/hardware.ts

ui/src/hooks/usePiperTts.ts

ui/src/components/TtsButton.tsx

true

VOICE-01

VOICE-02

truths

artifacts

key_links

POST /api/transcribe is reachable and returns 503 with descriptive error when no Whisper CLI is installed

usePiperTts hook exposes prewarm/speak/status/progress and transitions idle->downloading->ready->speaking

TtsButton renders a speaker icon that calls speak() and shows download progress during prewarm

voiceEnabled boolean is persisted in nexus-settings.json and exposed via GET/PATCH /nexus/settings

path

provides

exports

ui/src/hooks/usePiperTts.ts

Piper TTS hook with prewarm, speak, status, progress

usePiperTts

path

provides

exports

ui/src/components/TtsButton.tsx

Speaker button component for TTS playback

TtsButton

from	to	via	pattern
server/src/app.ts	server/src/routes/chat-files.ts	api.use(chatFileRoutes(db, opts.storageService))	chatFileRoutes

from	to	via	pattern
ui/src/hooks/usePiperTts.ts	@mintplex-labs/piper-tts-web	import { tts }	tts.download\|tts.predict

Fix the broken /transcribe route registration, create the Piper TTS browser hook and button component, and add voiceEnabled to nexus-settings persistence.

Purpose: VOICE-01 requires TTS on CPU-only hardware (browser WASM satisfies this). VOICE-02 requires visible download progress before first synthesis. The /transcribe route exists but is never mounted — a 1-line fix. voiceEnabled persistence is needed so onboarding voice opt-in survives sessions.

Output: Working /api/transcribe endpoint, usePiperTts hook, TtsButton component, voiceEnabled in nexus-settings.

<execution_context> @$HOME/.claude/get-shit-done/workflows/execute-plan.md @$HOME/.claude/get-shit-done/templates/summary.md </execution_context>

@.planning/PROJECT.md @.planning/ROADMAP.md @.planning/STATE.md @.planning/phases/34-voice/34-RESEARCH.md

@server/src/app.ts @server/src/routes/chat-files.ts @server/src/services/nexus-settings.ts @ui/src/api/hardware.ts @ui/src/components/VoiceRecordButton.tsx

From server/src/routes/chat-files.ts:

export function chatFileRoutes(db: Db, storage: StorageService) { ... }
// POST /transcribe — accepts multipart audio, returns { text: string } or 503

From server/src/app.ts (line 147 pattern):

api.use(assetRoutes(db, opts.storageService));
// chatFileRoutes uses the same (db, opts.storageService) signature

From server/src/services/nexus-settings.ts:

export const NEXUS_MODES = ["personal_ai", "project_builder", "both"] as const;
export type NexusMode = (typeof NEXUS_MODES)[number];
const nexusSettingsSchema = z.object({
  mode: z.enum(NEXUS_MODES).default("both"),
});
export function nexusSettingsService() { get(), set(patch) }

From ui/src/api/hardware.ts:

export type NexusMode = "personal_ai" | "project_builder" | "both";
export interface NexusSettings { mode: NexusMode; }
export function fetchNexusSettings(): Promise<NexusSettings>;
export function updateNexusSettings(settings: Partial<NexusSettings>): Promise<NexusSettings>;

Task 1: Register chatFileRoutes in app.ts and add voiceEnabled to nexus-settings server/src/app.ts, server/src/services/nexus-settings.ts, server/src/routes/nexus-settings.ts, ui/src/api/hardware.ts - server/src/app.ts (full file — find insertion point after assistantHandoffRoutes) - server/src/services/nexus-settings.ts (full file — understand schema) - server/src/routes/nexus-settings.ts (full file — understand PATCH handler) - ui/src/api/hardware.ts (full file — understand client types) **1. Register chatFileRoutes in app.ts:** - Add import at top with other route imports: `import { chatFileRoutes } from "./routes/chat-files.js";` - Add `api.use(chatFileRoutes(db, opts.storageService));` after the `api.use(assistantHandoffRoutes(db));` line (around line 161). Mirror the `assetRoutes(db, opts.storageService)` pattern exactly. - Do NOT place it before boardMutationGuard — the /transcribe route calls assertBoard(req) and needs to be inside the guarded api sub-router.

2. Add voiceEnabled to nexusSettingsSchema (server/src/services/nexus-settings.ts):

Add voiceEnabled: z.boolean().default(false) to the nexusSettingsSchema z.object.
This is a file-backed JSON field, NOT a DB migration — acceptable under the "no DB schema changes" constraint.

3. Update NexusSettings type on client (ui/src/api/hardware.ts):

Add voiceEnabled?: boolean to the NexusSettings interface.
No changes to API functions needed — they already handle Partial.

4. Check nexus-settings route handler (server/src/routes/nexus-settings.ts):

Read the file. The PATCH handler should already forward arbitrary fields to nexusSettingsService().set(patch) since it uses the Zod schema. If it manually picks fields, add voiceEnabled to the pick list. If it passes req.body through, no change needed. cd /opt/nexus && npx vitest run server/src/tests/chat-file-routes.test.ts 2>&1 | tail -5 <acceptance_criteria>
- grep -q "chatFileRoutes" server/src/app.ts returns 0
- grep -q "voiceEnabled" server/src/services/nexus-settings.ts returns 0
- grep -q "voiceEnabled" ui/src/api/hardware.ts returns 0 </acceptance_criteria> POST /api/transcribe is reachable (returns 503 when no Whisper CLI installed, not 404). voiceEnabled persists in nexus-settings.json via the existing settings route.

Task 2: Create usePiperTts hook and TtsButton component ui/src/hooks/usePiperTts.ts, ui/src/components/TtsButton.tsx - ui/src/components/VoiceRecordButton.tsx (reference for button style patterns) - ui/src/components/ui/button.tsx (Button component API) **0. Install piper-tts-web:** ```bash pnpm --filter @paperclipai/ui add @mintplex-labs/piper-tts-web ```

1. Create ui/src/hooks/usePiperTts.ts:

import { useState, useCallback, useRef } from "react";
import { tts } from "@mintplex-labs/piper-tts-web";

const DEFAULT_VOICE = "en_US-hfc_female-medium";

export type TtsStatus = "idle" | "downloading" | "ready" | "speaking" | "error";

export function usePiperTts() {
  const [status, setStatus] = useState<TtsStatus>("idle");
  const [progress, setProgress] = useState(0);
  const audioRef = useRef<HTMLAudioElement | null>(null);

  const prewarm = useCallback(async () => {
    if (status === "ready" || status === "downloading") return;
    setStatus("downloading");
    setProgress(0);
    try {
      const stored = await tts.stored();
      if (!stored.includes(DEFAULT_VOICE)) {
        await tts.download(DEFAULT_VOICE, (p: { loaded: number; total: number }) => {
          setProgress(Math.round((p.loaded / p.total) * 100));
        });
      }
      setStatus("ready");
      setProgress(100);
    } catch {
      setStatus("error");
    }
  }, [status]);

  const speak = useCallback(async (text: string) => {
    if (status !== "ready") return;
    // Stop any currently playing audio
    if (audioRef.current) {
      audioRef.current.pause();
      audioRef.current = null;
    }
    setStatus("speaking");
    try {
      const wav = await tts.predict({ text, voiceId: DEFAULT_VOICE });
      const audio = new Audio(wav);
      audioRef.current = audio;
      audio.onended = () => {
        audioRef.current = null;
        setStatus("ready");
      };
      audio.onerror = () => {
        audioRef.current = null;
        setStatus("ready");
      };
      await audio.play();
    } catch {
      setStatus("ready");
    }
  }, [status]);

  const stop = useCallback(() => {
    if (audioRef.current) {
      audioRef.current.pause();
      audioRef.current = null;
    }
    if (status === "speaking") setStatus("ready");
  }, [status]);

  return { status, progress, prewarm, speak, stop };
}

Key points:

tts.stored() checks IndexedDB cache — skips download if model already present (VOICE-02).
tts.download() with progress callback provides visible download progress (VOICE-02).
tts.predict() returns a Blob URL (WAV) — use new Audio(url).play() (VOICE-01, CPU-safe WASM).
stop() allows interrupting playback.
Do NOT import this in any server-side or test file running in Node — browser-only.

2. Create ui/src/components/TtsButton.tsx:

import { Volume2, VolumeX, Loader2 } from "lucide-react";
import { Button } from "./ui/button";
import type { TtsStatus } from "../hooks/usePiperTts";

interface TtsButtonProps {
  status: TtsStatus;
  progress: number;
  onSpeak: () => void;
  onStop: () => void;
  onPrewarm: () => void;
  disabled?: boolean;
}

export function TtsButton({ status, progress, onSpeak, onStop, onPrewarm, disabled }: TtsButtonProps) {
  if (status === "downloading") {
    return (
      <Button variant="ghost" size="icon" className="h-8 w-8 relative" disabled title={`Downloading voice model: ${progress}%`}>
        <Loader2 className="h-4 w-4 animate-spin" />
        <span className="absolute -bottom-1 text-[10px] text-muted-foreground">{progress}%</span>
      </Button>
    );
  }

  if (status === "speaking") {
    return (
      <Button
        variant="ghost"
        size="icon"
        className="h-8 w-8 text-primary"
        onClick={onStop}
        aria-label="Stop speaking"
        title="Stop speaking"
      >
        <VolumeX className="h-4 w-4" />
      </Button>
    );
  }

  // idle or error: clicking triggers prewarm then speak
  // ready: clicking triggers speak directly
  const handleClick = () => {
    if (status === "ready") {
      onSpeak();
    } else {
      onPrewarm();
    }
  };

  return (
    <Button
      variant="ghost"
      size="icon"
      className="h-8 w-8"
      onClick={handleClick}
      disabled={disabled || status === "error"}
      aria-label="Read aloud"
      title={status === "error" ? "TTS unavailable" : status === "idle" ? "Download voice model and read aloud" : "Read aloud"}
    >
      <Volume2 className="h-4 w-4" />
    </Button>
  );
}

The TtsButton receives status/progress from the hook and delegates actions. It does NOT import piper-tts-web directly — all TTS logic stays in the hook. The button is reusable: PersonalAssistant (Plan 02) will place it next to assistant messages. cd /opt/nexus && grep -q "usePiperTts" ui/src/hooks/usePiperTts.ts && grep -q "TtsButton" ui/src/components/TtsButton.tsx && grep -q "piper-tts-web" ui/package.json 2>/dev/null || grep -q "piper-tts-web" pnpm-lock.yaml && echo "PASS" || echo "FAIL" <acceptance_criteria> - grep -q "tts.download" ui/src/hooks/usePiperTts.ts returns 0 - grep -q "tts.predict" ui/src/hooks/usePiperTts.ts returns 0 - grep -q "tts.stored" ui/src/hooks/usePiperTts.ts returns 0 - grep -q "TtsButton" ui/src/components/TtsButton.tsx returns 0 - grep -q "piper-tts-web" pnpm-lock.yaml returns 0 - grep -q "Volume2" ui/src/components/TtsButton.tsx returns 0 </acceptance_criteria> usePiperTts hook handles download progress (VOICE-02) and CPU-safe WASM synthesis (VOICE-01). TtsButton shows download progress during prewarm and speaker icon for playback. piper-tts-web is installed as a UI dependency.

- `grep -q "chatFileRoutes" server/src/app.ts` — route is registered - `grep -q "voiceEnabled" server/src/services/nexus-settings.ts` — settings schema extended - `ls ui/src/hooks/usePiperTts.ts ui/src/components/TtsButton.tsx` — both files exist - `npx vitest run server/src/__tests__/chat-file-routes.test.ts` — existing route tests pass

<success_criteria>

POST /api/transcribe returns 503 (not 404) when no Whisper CLI is installed — route is mounted
usePiperTts hook exports prewarm(), speak(), stop(), status, progress
TtsButton renders download progress during prewarm and speaker icon for playback
voiceEnabled persists in nexus-settings.json </success_criteria>

After completion, create `.planning/phases/34-voice/34-01-SUMMARY.md`

12 KiB Raw Blame History

12 KiB

Raw Blame History