nexus/.planning/phases/34-voice/34-02-PLAN.md at e7090f63c3e9032cb039def3a2923ab15c025ccc

mikkel/nexus

Fork 0

Nexus Dev af77ef6da8 docs(34-voice): create phase plan

2026-04-03 22:29:45 +00:00

15 KiB

Raw Blame History

phase

plan

type

wave

depends_on

files_modified

autonomous

requirements

must_haves

34-voice

execute

34-01

ui/src/components/onboarding/VoiceStep.tsx

ui/src/components/NexusOnboardingWizard.tsx

ui/src/pages/PersonalAssistant.tsx

true

VOICE-03

truths

artifacts

key_links

Onboarding wizard shows a voice opt-in step (step 4) with enable/skip buttons

Step numbering shifts correctly: rootDir is step 5, summary is step 6

VoiceRecordButton appears in PersonalAssistant input bar for STT

TtsButton appears next to assistant messages in PersonalAssistant for TTS playback

Voice step detects microphone availability and shows appropriate messaging

path

provides

exports

ui/src/components/onboarding/VoiceStep.tsx

Voice opt-in onboarding step

VoiceStep

path	provides
ui/src/components/NexusOnboardingWizard.tsx	6-step wizard with voice at step 4

path	provides
ui/src/pages/PersonalAssistant.tsx	Full-page assistant chat with voice input and TTS

from	to	via	pattern
ui/src/components/NexusOnboardingWizard.tsx	ui/src/components/onboarding/VoiceStep.tsx	import and render at step 4	VoiceStep

from	to	via	pattern
ui/src/pages/PersonalAssistant.tsx	ui/src/components/VoiceRecordButton.tsx	import and render in input bar	VoiceRecordButton

from	to	via	pattern
ui/src/pages/PersonalAssistant.tsx	ui/src/hooks/usePiperTts.ts	usePiperTts hook for TTS playback	usePiperTts

Add the onboarding voice step and wire VoiceRecordButton + TtsButton into PersonalAssistant.

Purpose: VOICE-03 requires voice features offered during onboarding based on hardware capability. The PersonalAssistant is the primary chat surface for v1.5 and must have both STT (VoiceRecordButton) and TTS (TtsButton) controls.

Output: VoiceStep component, updated 6-step wizard, PersonalAssistant with voice I/O.

<execution_context> @$HOME/.claude/get-shit-done/workflows/execute-plan.md @$HOME/.claude/get-shit-done/templates/summary.md </execution_context>

@.planning/PROJECT.md @.planning/ROADMAP.md @.planning/STATE.md @.planning/phases/34-voice/34-RESEARCH.md @.planning/phases/34-voice/34-01-SUMMARY.md

@ui/src/components/NexusOnboardingWizard.tsx @ui/src/pages/PersonalAssistant.tsx @ui/src/components/VoiceRecordButton.tsx

From ui/src/hooks/usePiperTts.ts (created in Plan 01):

export type TtsStatus = "idle" | "downloading" | "ready" | "speaking" | "error";
export function usePiperTts(): {
  status: TtsStatus;
  progress: number;
  prewarm: () => Promise<void>;
  speak: (text: string) => Promise<void>;
  stop: () => void;
};

From ui/src/components/TtsButton.tsx (created in Plan 01):

interface TtsButtonProps {
  status: TtsStatus;
  progress: number;
  onSpeak: () => void;
  onStop: () => void;
  onPrewarm: () => void;
  disabled?: boolean;
}
export function TtsButton(props: TtsButtonProps): JSX.Element;

From ui/src/api/hardware.ts (updated in Plan 01):

export interface NexusSettings {
  mode: NexusMode;
  voiceEnabled?: boolean;
}
export function updateNexusSettings(settings: Partial<NexusSettings>): Promise<NexusSettings>;

From ui/src/components/VoiceRecordButton.tsx (existing):

interface VoiceRecordButtonProps {
  onTranscription: (text: string) => void;
  disabled?: boolean;
}
export function VoiceRecordButton(props: VoiceRecordButtonProps): JSX.Element;

From ui/src/components/NexusOnboardingWizard.tsx (existing step structure):

Step 1: hardware detection
Step 2: mode selection
Step 3: provider selection
Step 4: root directory (will become step 5)
Step 5: summary (will become step 6)
Step state: const [step, setStep] = useState(1);
Label: {step === 5 ? "Summary" : \Step ${step} of 4`}`

Task 1: Create VoiceStep component and insert into NexusOnboardingWizard as step 4 ui/src/components/onboarding/VoiceStep.tsx, ui/src/components/NexusOnboardingWizard.tsx - ui/src/components/NexusOnboardingWizard.tsx (full file — understand all setStep calls and step rendering) - ui/src/components/onboarding/ModeSelector.tsx (reference for onboarding step component patterns) - ui/src/components/onboarding/HardwareSummaryStep.tsx (reference for step component patterns) - ui/src/api/hardware.ts (NexusSettings type with voiceEnabled) **1. Create ui/src/components/onboarding/VoiceStep.tsx:**

import { useEffect, useState } from "react";
import { Mic, Volume2 } from "lucide-react";
import { Button } from "@/components/ui/button";

interface VoiceStepProps {
  onEnable: () => void;
  onSkip: () => void;
}

export function VoiceStep({ onEnable, onSkip }: VoiceStepProps) {
  const [micAvailable, setMicAvailable] = useState<boolean | null>(null);

  useEffect(() => {
    navigator.mediaDevices?.enumerateDevices()
      .then(devices => setMicAvailable(devices.some(d => d.kind === "audioinput")))
      .catch(() => setMicAvailable(false));
  }, []);

  return (
    <div className="flex flex-col gap-4">
      <div className="flex flex-col gap-3">
        <div className="flex items-center gap-3 rounded-lg border p-3">
          <Mic className="h-5 w-5 text-primary shrink-0" />
          <div>
            <p className="text-sm font-medium">Speech-to-Text (Whisper)</p>
            <p className="text-xs text-muted-foreground">
              {micAvailable === false
                ? "No microphone detected — unavailable"
                : micAvailable === true
                  ? "Microphone detected — speak to your assistant"
                  : "Checking microphone..."}
            </p>
          </div>
        </div>

        <div className="flex items-center gap-3 rounded-lg border p-3">
          <Volume2 className="h-5 w-5 text-primary shrink-0" />
          <div>
            <p className="text-sm font-medium">Text-to-Speech (Piper)</p>
            <p className="text-xs text-muted-foreground">
              Hear responses read aloud. Runs entirely on your device — no server needed.
            </p>
          </div>
        </div>
      </div>

      <div className="flex flex-col gap-2">
        <Button onClick={onEnable} className="w-full">
          Enable voice
        </Button>
        <Button variant="ghost" onClick={onSkip} className="w-full">
          Skip
        </Button>
      </div>
    </div>
  );
}

2. Update NexusOnboardingWizard.tsx — insert step 4 (voice), shift steps:

This is a precise step-number shift. Do a full audit of all setStep(N) calls and update:

a. Add imports at top:

import { VoiceStep } from "./onboarding/VoiceStep";
import { updateNexusSettings } from "../api/hardware"; (already imported)

b. Add voiceEnabled state:

const [voiceEnabled, setVoiceEnabled] = useState(false);

c. Step number shift — ALL occurrences:

Old step 4 (rootDir) becomes step 5
Old step 5 (summary) becomes step 6
Every setStep(4) that meant "go to rootDir" becomes setStep(5)
Every setStep(5) that meant "go to summary" becomes setStep(6)
The Back button on old step 4 (rootDir) that said setStep(3) becomes setStep(4) (back to voice)
Old step 3 (provider) onSkip/onContinue setStep(4) becomes setStep(4) (now goes to voice, not rootDir) — no change needed here since 4 IS the voice step

d. Step indicator label:

Change {step === 5 ? "Summary" : \Step ${step} of 4`}to{step === 6 ? "Summary" : `Step ${step} of 5`}`

e. Reset voiceEnabled in the cleanup useEffect:

Add setVoiceEnabled(false); alongside other resets

f. Add step 4 rendering block (voice) — insert between step 3 and the rootDir step:

{/* Step 4 — Voice */}
{step === 4 && (
  <>
    <div className="flex flex-col gap-2 text-center">
      <h1 className="text-2xl font-semibold tracking-tight">
        Voice features
      </h1>
      <p className="text-sm text-muted-foreground">
        Speak to your assistant and hear responses read aloud. Runs entirely on your device.
      </p>
    </div>

    <VoiceStep
      onEnable={() => {
        setVoiceEnabled(true);
        setStep(5);
      }}
      onSkip={() => setStep(5)}
    />

    <Button
      type="button"
      variant="ghost"
      onClick={() => setStep(3)}
      className="w-full"
    >
      Back
    </Button>
  </>
)}

g. Persist voiceEnabled in createWorkspace() — add after the existing mode save:

// Persist voice preference — non-blocking
if (voiceEnabled) {
  try {
    await updateNexusSettings({ voiceEnabled: true });
  } catch {
    // Non-blocking
  }
}

h. Update old step 4 comment to say "Step 5 — Root Directory (was step 4)" Update old step 5 comment to say "Step 6 — Summary (was step 5)"

Step number audit checklist (verify each):

Step 3 provider: onSkip → setStep(4) (voice) -- was already 4, now means voice
Step 3 provider: onContinue → setStep(4) (voice) -- same
Step 4 (NEW voice): Enable → setStep(5), Skip → setStep(5), Back → setStep(3)
Step 5 (was 4, rootDir): "Review & finish" → setStep(6), Back → setStep(4) (voice), "Skip to summary" → setStep(6)
Step 6 (was 5, summary): onBack → setStep(5) (rootDir)
Step rendering: {step === 4 && ...} for rootDir becomes {step === 5 && ...}
Step rendering: {step === 5 && ...} for summary becomes {step === 6 && ...} cd /opt/nexus && grep -c "setStep" ui/src/components/NexusOnboardingWizard.tsx && grep -q "VoiceStep" ui/src/components/NexusOnboardingWizard.tsx && grep -q "step === 4" ui/src/components/NexusOnboardingWizard.tsx && grep -q "Step 4" ui/src/components/onboarding/VoiceStep.tsx 2>/dev/null; grep -q "VoiceStep" ui/src/components/onboarding/VoiceStep.tsx && echo "PASS" || echo "FAIL" <acceptance_criteria>
- grep -q "VoiceStep" ui/src/components/onboarding/VoiceStep.tsx returns 0
- grep -q "VoiceStep" ui/src/components/NexusOnboardingWizard.tsx returns 0
- grep -q "step === 6" ui/src/components/NexusOnboardingWizard.tsx returns 0 (summary is now step 6)
- grep -q "Step.*of 5" ui/src/components/NexusOnboardingWizard.tsx returns 0 (label updated from "of 4")
- grep -q "voiceEnabled" ui/src/components/NexusOnboardingWizard.tsx returns 0
- grep -q "enumerateDevices" ui/src/components/onboarding/VoiceStep.tsx returns 0 </acceptance_criteria> Onboarding wizard has 6 steps with voice at step 4. VoiceStep probes mic availability and offers enable/skip. voiceEnabled is persisted on workspace creation. All setStep() calls use correct updated numbers.

Task 2: Wire VoiceRecordButton and TtsButton into PersonalAssistant ui/src/pages/PersonalAssistant.tsx - ui/src/pages/PersonalAssistant.tsx (full file — understand input bar and message rendering) - ui/src/components/VoiceRecordButton.tsx (props interface) - ui/src/components/TtsButton.tsx (props interface, from Plan 01) - ui/src/hooks/usePiperTts.ts (hook API, from Plan 01) **1. Add imports to PersonalAssistant.tsx:** ```typescript import { VoiceRecordButton } from "@/components/VoiceRecordButton"; import { TtsButton } from "@/components/TtsButton"; import { usePiperTts } from "../hooks/usePiperTts"; import { Volume2 } from "lucide-react"; ```

2. Add usePiperTts hook in PersonalAssistant component body:

const { status: ttsStatus, progress: ttsProgress, prewarm, speak, stop } = usePiperTts();

3. Add VoiceRecordButton to the input bar: In the input bar section ({selectedConvId && ( <div className="px-6 py-4 ..."> ... </div> )}), add VoiceRecordButton inside the <div className="flex gap-3 items-end"> container, between the textarea and Send button:

<VoiceRecordButton
  onTranscription={(text) => setInputValue((prev) => prev ? prev + " " + text : text)}
  disabled={isSending}
/>

The onTranscription callback appends transcribed text to the input field (does not auto-send). This lets users review before sending.

4. Add TtsButton next to assistant messages in MessageBubble: Modify the MessageBubble component to accept an optional onSpeak callback and show a TtsButton for assistant messages:

Actually, a cleaner approach: add the TtsButton inline where messages are rendered, not inside MessageBubble (to avoid prop drilling the hook through). In the messages.map section, render a small TTS button after each assistant message:

{messages.map((msg) => (
  <div key={msg.id}>
    <MessageBubble message={msg} />
    {msg.role === "assistant" && msg.content && (
      <div className="flex justify-start pl-10 -mt-1 mb-1">
        <TtsButton
          status={ttsStatus}
          progress={ttsProgress}
          onSpeak={() => speak(msg.content)}
          onStop={stop}
          onPrewarm={prewarm}
        />
      </div>
    )}
  </div>
))}

The pl-10 aligns the button under the message bubble (past the avatar). The -mt-1 mb-1 tucks it close.

5. Auto-prewarm TTS when PersonalAssistant mounts (optional optimization): Do NOT auto-prewarm. Let the user trigger it on first click of any TtsButton. This avoids unexpected downloads. cd /opt/nexus && grep -q "VoiceRecordButton" ui/src/pages/PersonalAssistant.tsx && grep -q "TtsButton" ui/src/pages/PersonalAssistant.tsx && grep -q "usePiperTts" ui/src/pages/PersonalAssistant.tsx && echo "PASS" || echo "FAIL" <acceptance_criteria> - grep -q "VoiceRecordButton" ui/src/pages/PersonalAssistant.tsx returns 0 - grep -q "onTranscription" ui/src/pages/PersonalAssistant.tsx returns 0 - grep -q "TtsButton" ui/src/pages/PersonalAssistant.tsx returns 0 - grep -q "usePiperTts" ui/src/pages/PersonalAssistant.tsx returns 0 - grep -q "speak" ui/src/pages/PersonalAssistant.tsx returns 0 </acceptance_criteria> PersonalAssistant has VoiceRecordButton in the input bar (STT via /api/transcribe) and TtsButton next to each assistant message (TTS via Piper WASM). Voice input appends to textarea for review before sending.

- `grep -q "VoiceStep" ui/src/components/NexusOnboardingWizard.tsx` — voice step integrated - `grep -q "step === 6" ui/src/components/NexusOnboardingWizard.tsx` — summary correctly at step 6 - `grep -q "VoiceRecordButton" ui/src/pages/PersonalAssistant.tsx` — STT wired - `grep -q "TtsButton" ui/src/pages/PersonalAssistant.tsx` — TTS wired - `grep -q "enumerateDevices" ui/src/components/onboarding/VoiceStep.tsx` — mic detection

<success_criteria>

Onboarding wizard has voice at step 4 with mic detection and enable/skip (VOICE-03)
Steps 5 (rootDir) and 6 (summary) work with correct Back/Continue navigation
PersonalAssistant has VoiceRecordButton for STT input
PersonalAssistant has TtsButton for TTS playback on assistant messages
voiceEnabled preference is persisted when user enables voice during onboarding </success_criteria>

After completion, create `.planning/phases/34-voice/34-02-SUMMARY.md`

15 KiB Raw Blame History

15 KiB

Raw Blame History