nexus/.planning/phases/37-web-chat-voice-ui/37-03-PLAN.md

286 lines
14 KiB
Markdown

---
phase: 37-web-chat-voice-ui
plan: 03
type: execute
wave: 2
depends_on: ["37-01"]
files_modified:
- ui/src/components/ChatVoicePlayer.tsx
- ui/src/components/ChatVoiceBadge.tsx
- ui/src/components/VoiceModeToggle.tsx
autonomous: true
requirements:
- WCHAT-04
- WCHAT-05
- WCHAT-06
must_haves:
truths:
- "ChatVoicePlayer renders inline audio player with play/pause controls"
- "ChatVoicePlayer auto-plays when autoPlay setting is true"
- "ChatVoiceBadge shows 'Voice' badge on voice messages"
- "ChatVoiceBadge has collapsible full markdown section for voice_full messages"
- "VoiceModeToggle renders three pills: Text / Voice In / Full Voice"
- "VoiceModeToggle persists selection via useVoiceMode hook"
- "Auto-play preference stored in localStorage under nexus:voice:autoplay"
artifacts:
- path: "ui/src/components/ChatVoicePlayer.tsx"
provides: "Inline audio player for synthesized voice responses"
exports: ["ChatVoicePlayer"]
- path: "ui/src/components/ChatVoiceBadge.tsx"
provides: "Voice badge + collapsible markdown on agent messages"
exports: ["ChatVoiceBadge"]
- path: "ui/src/components/VoiceModeToggle.tsx"
provides: "Three-state pill toggle for voice mode"
exports: ["VoiceModeToggle"]
key_links:
- from: "ui/src/components/ChatVoicePlayer.tsx"
to: "/api/synthesize"
via: "fetch POST to get audio blob"
pattern: "fetch.*api/synthesize"
- from: "ui/src/components/ChatVoiceBadge.tsx"
to: "shadcn Collapsible"
via: "Collapsible/CollapsibleContent/CollapsibleTrigger"
pattern: "Collapsible"
- from: "ui/src/components/VoiceModeToggle.tsx"
to: "ui/src/hooks/useVoiceMode.ts"
via: "useVoiceMode() hook"
pattern: "useVoiceMode"
---
<objective>
Build the voice output and mode selection components: ChatVoicePlayer for inline audio playback, ChatVoiceBadge for voice message display, and VoiceModeToggle for switching between text/voice_input/full_voice modes.
Purpose: These components handle the output side of voice I/O (playing synthesized responses, showing voice badges on messages) and the mode selector that controls the entire voice behavior.
Output: 3 new component files — ChatVoicePlayer, ChatVoiceBadge, VoiceModeToggle
</objective>
<execution_context>
@$HOME/.claude/get-shit-done/workflows/execute-plan.md
@$HOME/.claude/get-shit-done/templates/summary.md
</execution_context>
<context>
@.planning/phases/37-web-chat-voice-ui/37-RESEARCH.md
<interfaces>
<!-- From Plan 01 — synthesize endpoint -->
```
POST /api/synthesize
Body: { text: string, voiceId?: string }
Response: audio/wav binary buffer
```
<!-- From useVoiceMode hook (Plan 02) -->
```typescript
type VoiceMode = "text" | "voice_input" | "full_voice";
export function useVoiceMode(): {
mode: VoiceMode;
setMode: (next: VoiceMode) => Promise<void>;
isLoading: boolean;
}
```
<!-- ChatMessage messageType values for voice -->
```
messageType: "voice_input" → user sent via voice, agent replied with text
messageType: "voice_full" → user sent via voice, agent replied with SPOKEN + DETAILED format
```
<!-- SPOKEN/DETAILED content format from formatForVoice -->
```
SPOKEN: <concise spoken version of the response>
DETAILED: <full markdown response with code blocks etc>
```
<!-- shadcn components already available -->
```typescript
import { Badge } from "@/components/ui/badge";
import { Collapsible, CollapsibleContent, CollapsibleTrigger } from "@/components/ui/collapsible";
import { Button } from "@/components/ui/button";
```
</interfaces>
</context>
<tasks>
<task type="auto">
<name>Task 1: Create ChatVoicePlayer and ChatVoiceBadge components</name>
<files>
ui/src/components/ChatVoicePlayer.tsx,
ui/src/components/ChatVoiceBadge.tsx
</files>
<read_first>
ui/src/components/ChatMessage.tsx,
ui/src/components/ChatMarkdownMessage.tsx
</read_first>
<action>
1. **ui/src/components/ChatVoicePlayer.tsx** — Inline audio player for voice responses:
```typescript
interface ChatVoicePlayerProps {
text: string; // The spoken text to synthesize
autoPlay?: boolean; // Whether to auto-play on mount
}
export function ChatVoicePlayer({ text, autoPlay = false }: ChatVoicePlayerProps)
```
Implementation:
- State: `status: "idle" | "loading" | "playing" | "paused"`, `audioUrl: string | null`
- On mount (or when text changes): POST /api/synthesize with `{ text }`, credentials: "include"
- Set status to "loading"
- Get response as blob: `const blob = await res.blob()`
- Create object URL: `const url = URL.createObjectURL(blob)`
- Store url in state, set status to "idle"
- Create `<audio>` element ref. Set src to audioUrl when available.
- If autoPlay is true AND audioUrl is set, call `audioRef.current.play()`, set status to "playing"
- Audio event listeners:
- `onEnded`: set status to "idle", revoke blob URL via `URL.revokeObjectURL(audioUrl)`
- `onPause`: set status to "paused"
- `onPlay`: set status to "playing"
- Render:
- loading: `<Loader2 className="h-3 w-3 animate-spin" />` with "Loading audio..." text
- idle/paused: `<Button variant="ghost" size="sm">` with `<Play className="h-3 w-3" />` icon. onClick: `audioRef.current.play()`
- playing: `<Button variant="ghost" size="sm">` with `<Pause className="h-3 w-3" />` icon. onClick: `audioRef.current.pause()`
- Hidden `<audio ref={audioRef} />` element with aria-label="Voice response"
- Import Play, Pause, Loader2 from lucide-react
- Cleanup: revoke any blob URL on unmount
2. **ui/src/components/ChatVoiceBadge.tsx** — Voice badge + collapsible markdown:
```typescript
interface ChatVoiceBadgeProps {
content: string;
messageType: string; // "voice_input" | "voice_full"
autoPlayVoice?: boolean;
}
export function ChatVoiceBadge({ content, messageType, autoPlayVoice = false }: ChatVoiceBadgeProps)
```
Implementation:
- Parse content for SPOKEN/DETAILED sections:
```typescript
const spokenMatch = content.match(/SPOKEN:\s*([\s\S]*?)(?=\nDETAILED:|$)/);
const spokenText = spokenMatch?.[1]?.trim() ?? content;
const detailedMatch = content.match(/DETAILED:\s*([\s\S]*)/);
```
- Render:
a. `<Badge variant="outline" className="text-xs mb-2">Voice</Badge>`
b. `<p className="text-sm">{spokenText}</p>`
c. If messageType === "voice_full":
- `<ChatVoicePlayer text={spokenText} autoPlay={autoPlayVoice} />`
- If detailedMatch exists, render shadcn Collapsible:
```
<Collapsible>
<CollapsibleTrigger className="text-xs text-muted-foreground hover:text-foreground mt-1">
{open ? "Hide full response" : "Show full response"}
</CollapsibleTrigger>
<CollapsibleContent>
<ChatMarkdownMessage content={detailedMatch[1].trim()} />
</CollapsibleContent>
</Collapsible>
```
- For voice_input messageType: just show badge + spoken text, no player, no collapsible
- Import ChatVoicePlayer from ./ChatVoicePlayer
- Import ChatMarkdownMessage from ./ChatMarkdownMessage (already exists in codebase)
- Import Badge from @/components/ui/badge
- Import Collapsible, CollapsibleContent, CollapsibleTrigger from @/components/ui/collapsible
</action>
<verify>
<automated>cd /opt/nexus/.claude/worktrees/agent-a009558f && test -f ui/src/components/ChatVoicePlayer.tsx && test -f ui/src/components/ChatVoiceBadge.tsx && grep -q "export function ChatVoicePlayer" ui/src/components/ChatVoicePlayer.tsx && grep -q "export function ChatVoiceBadge" ui/src/components/ChatVoiceBadge.tsx && grep -q "api/synthesize" ui/src/components/ChatVoicePlayer.tsx && grep -q "URL.createObjectURL" ui/src/components/ChatVoicePlayer.tsx && grep -q "URL.revokeObjectURL" ui/src/components/ChatVoicePlayer.tsx && grep -q "Collapsible" ui/src/components/ChatVoiceBadge.tsx && grep -q "Show full response" ui/src/components/ChatVoiceBadge.tsx && grep -q "Badge" ui/src/components/ChatVoiceBadge.tsx && grep -q "SPOKEN:" ui/src/components/ChatVoiceBadge.tsx && echo "PASS" || echo "FAIL"</automated>
</verify>
<acceptance_criteria>
- grep "export function ChatVoicePlayer" ui/src/components/ChatVoicePlayer.tsx returns match
- grep "export function ChatVoiceBadge" ui/src/components/ChatVoiceBadge.tsx returns match
- grep "api/synthesize" ui/src/components/ChatVoicePlayer.tsx returns match
- grep "URL.createObjectURL" ui/src/components/ChatVoicePlayer.tsx returns match
- grep "URL.revokeObjectURL" ui/src/components/ChatVoicePlayer.tsx returns match
- grep "audio" ui/src/components/ChatVoicePlayer.tsx returns match (native audio element)
- grep "aria-label.*Voice response" ui/src/components/ChatVoicePlayer.tsx returns match
- grep "Collapsible" ui/src/components/ChatVoiceBadge.tsx returns match
- grep "Show full response" ui/src/components/ChatVoiceBadge.tsx returns match
- grep "Hide full response" ui/src/components/ChatVoiceBadge.tsx returns match
- grep "Badge.*Voice" ui/src/components/ChatVoiceBadge.tsx returns match
- grep "SPOKEN:" ui/src/components/ChatVoiceBadge.tsx returns match
- grep "ChatVoicePlayer" ui/src/components/ChatVoiceBadge.tsx returns match (imports it)
</acceptance_criteria>
<done>ChatVoicePlayer synthesizes and plays audio with play/pause controls, auto-play support, and proper blob URL cleanup. ChatVoiceBadge shows Voice badge, spoken text, optional audio player, and collapsible full markdown for voice_full messages.</done>
</task>
<task type="auto">
<name>Task 2: Create VoiceModeToggle three-pill component</name>
<files>
ui/src/components/VoiceModeToggle.tsx
</files>
<read_first>
ui/src/hooks/useVoiceMode.ts
</read_first>
<action>
**ui/src/components/VoiceModeToggle.tsx** — Three-state pill toggle:
```typescript
export function VoiceModeToggle()
```
Implementation:
- Call `useVoiceMode()` to get `{ mode, setMode, isLoading }`
- Read auto-play preference from localStorage: `localStorage.getItem("nexus:voice:autoplay") === "true"`
- Provide `autoPlay` state + toggle in the component for WCHAT-06 (auto-play configurable)
- Render a `<div role="group" aria-label="Voice mode" className="flex items-center gap-1">`:
- Three pill buttons, each a `<button>`:
- "Text" → `setMode("text")`
- "Voice In" → `setMode("voice_input")`
- "Full Voice" → `setMode("full_voice")`
- Active pill: `bg-primary text-primary-foreground` classes
- Inactive pills: `bg-muted text-muted-foreground` classes
- All pills: `rounded-full px-3 py-1 text-xs font-medium transition-colors`
- Disabled when isLoading
- Below the pills (only when mode is "full_voice"), render auto-play toggle:
```
<label className="flex items-center gap-2 text-xs text-muted-foreground mt-1">
<input
type="checkbox"
checked={autoPlay}
onChange={(e) => {
setAutoPlay(e.target.checked);
localStorage.setItem("nexus:voice:autoplay", String(e.target.checked));
}}
/>
Auto-play voice responses
</label>
```
- Export autoPlay state for consumers: expose via a separate export or make VoiceModeToggle accept `onAutoPlayChange` callback. Better: just read localStorage directly in ChatVoiceBadge — keep it simple.
- The auto-play checkbox label text per UI spec: "Auto-play voice responses"
</action>
<verify>
<automated>cd /opt/nexus/.claude/worktrees/agent-a009558f && test -f ui/src/components/VoiceModeToggle.tsx && grep -q "export function VoiceModeToggle" ui/src/components/VoiceModeToggle.tsx && grep -q "useVoiceMode" ui/src/components/VoiceModeToggle.tsx && grep -q "Voice In" ui/src/components/VoiceModeToggle.tsx && grep -q "Full Voice" ui/src/components/VoiceModeToggle.tsx && grep -q "Text" ui/src/components/VoiceModeToggle.tsx && grep -q "bg-primary" ui/src/components/VoiceModeToggle.tsx && grep -q 'role="group"' ui/src/components/VoiceModeToggle.tsx && grep -q "nexus:voice:autoplay" ui/src/components/VoiceModeToggle.tsx && grep -q "Auto-play voice responses" ui/src/components/VoiceModeToggle.tsx && echo "PASS" || echo "FAIL"</automated>
</verify>
<acceptance_criteria>
- grep "export function VoiceModeToggle" ui/src/components/VoiceModeToggle.tsx returns match
- grep "useVoiceMode" ui/src/components/VoiceModeToggle.tsx returns match
- grep "Text" ui/src/components/VoiceModeToggle.tsx returns match (first pill)
- grep "Voice In" ui/src/components/VoiceModeToggle.tsx returns match (second pill)
- grep "Full Voice" ui/src/components/VoiceModeToggle.tsx returns match (third pill)
- grep "bg-primary text-primary-foreground" ui/src/components/VoiceModeToggle.tsx returns match (active state)
- grep "bg-muted text-muted-foreground" ui/src/components/VoiceModeToggle.tsx returns match (inactive state)
- grep 'role="group"' ui/src/components/VoiceModeToggle.tsx returns match
- grep 'aria-label="Voice mode"' ui/src/components/VoiceModeToggle.tsx returns match
- grep "nexus:voice:autoplay" ui/src/components/VoiceModeToggle.tsx returns match (localStorage key)
- grep "Auto-play voice responses" ui/src/components/VoiceModeToggle.tsx returns match
</acceptance_criteria>
<done>VoiceModeToggle renders three pills with active/inactive styling. Clicking a pill persists voiceMode to nexus-settings. Auto-play checkbox appears in full_voice mode and persists to localStorage.</done>
</task>
</tasks>
<verification>
- ChatVoicePlayer POSTs to /api/synthesize and plays via native audio element
- ChatVoicePlayer revokes blob URLs on cleanup (no memory leaks)
- ChatVoiceBadge parses SPOKEN/DETAILED content format
- ChatVoiceBadge shows collapsible section only for voice_full
- VoiceModeToggle has three pills with correct labels and accessibility
- Auto-play preference persisted in localStorage under nexus:voice:autoplay
</verification>
<success_criteria>
All three output-side voice components complete: ChatVoicePlayer plays synthesized audio with controls, ChatVoiceBadge renders voice badges with collapsible detail, VoiceModeToggle switches between text/voice_input/full_voice with persistence.
</success_criteria>
<output>
After completion, create `.planning/phases/37-web-chat-voice-ui/37-03-SUMMARY.md`
</output>