docs(34): research phase voice domain
This commit is contained in:
parent
784d09d929
commit
1d8f1c5912
1 changed files with 470 additions and 0 deletions
470
.planning/phases/34-voice/34-RESEARCH.md
Normal file
470
.planning/phases/34-voice/34-RESEARCH.md
Normal file
|
|
@ -0,0 +1,470 @@
|
|||
# Phase 34: Voice - Research
|
||||
|
||||
**Researched:** 2026-04-01
|
||||
**Domain:** Browser STT (Whisper via smart-whisper), Browser TTS (Piper via @mintplex-labs/piper-tts-web WASM), Onboarding voice step
|
||||
**Confidence:** MEDIUM
|
||||
|
||||
---
|
||||
|
||||
<user_constraints>
|
||||
## User Constraints (from CONTEXT.md)
|
||||
|
||||
### Locked Decisions
|
||||
None — all implementation choices are at Claude's discretion.
|
||||
|
||||
### Claude's Discretion
|
||||
All implementation choices are at Claude's discretion.
|
||||
|
||||
### Deferred Ideas (OUT OF SCOPE)
|
||||
None.
|
||||
</user_constraints>
|
||||
|
||||
---
|
||||
|
||||
<phase_requirements>
|
||||
## Phase Requirements
|
||||
|
||||
| ID | Description | Research Support |
|
||||
|----|-------------|------------------|
|
||||
| VOICE-01 | User gets Piper TTS speech output that works on CPU-only hardware | @mintplex-labs/piper-tts-web runs entirely in browser WASM via ONNX Runtime — no GPU needed |
|
||||
| VOICE-02 | Piper TTS pre-warms on first use with visible download progress (no silent 15-30s hang) | `tts.download(voiceId, progressCallback)` API provides loaded/total bytes; render a progress bar before calling `predict()` |
|
||||
| VOICE-03 | Voice features (Whisper STT + Piper TTS) offered during onboarding based on hardware capability | NexusOnboardingWizard currently has 5 steps; add a step 4 (voice) gated on `hardwareInfo.hardwareTier !== undefined`; all tiers can run voice since it is purely CPU-bound WASM |
|
||||
</phase_requirements>
|
||||
|
||||
---
|
||||
|
||||
## Summary
|
||||
|
||||
Phase 34 adds two voice capabilities: speech-to-text (STT) via Whisper, and text-to-speech (TTS) via Piper, plus an onboarding step where users can opt into voice features.
|
||||
|
||||
The STT side already has a server route (`POST /api/transcribe` in `chat-files.ts`) and a `VoiceRecordButton` component that calls it. The route is implemented correctly but has a critical gap: it is **exported from `routes/index.ts` but never registered in `app.ts`**, so `POST /api/transcribe` returns 404 at runtime. Fixing this registration is the primary STT task.
|
||||
|
||||
For TTS, the project currently has zero Piper integration. The recommended approach is browser-side WASM via `@mintplex-labs/piper-tts-web` (v1.0.4, MIT). This library wraps the Piper ONNX models in WebAssembly so synthesis runs on-device without a server round-trip, satisfying VOICE-01 (CPU-only hardware). The key UX concern (VOICE-02) is a 10-50 MB model download that blocks first synthesis — the library provides a `download()` method with a progress callback that must be wired to a visible UI element before calling `predict()`.
|
||||
|
||||
The onboarding voice step (VOICE-03) should be inserted into `NexusOnboardingWizard.tsx` as step 4 (shifting the existing "root directory" step to 5 and "summary" to 6). The step should probe mic permission availability and detect whether the browser supports `MediaRecorder` to inform the user, then offer a "yes, enable voice" / "skip" choice. Since all hardware tiers can run browser-WASM TTS, the gate is not tier-based — it is browser-capability-based.
|
||||
|
||||
**Primary recommendation:** Register `chatFileRoutes` in `app.ts` to fix STT; add `@mintplex-labs/piper-tts-web` for browser-side TTS with a progress-bar pre-warm flow; add a voice opt-in step in `NexusOnboardingWizard`.
|
||||
|
||||
---
|
||||
|
||||
## Standard Stack
|
||||
|
||||
### Core
|
||||
|
||||
| Library | Version | Purpose | Why Standard |
|
||||
|---------|---------|---------|--------------|
|
||||
| `@mintplex-labs/piper-tts-web` | 1.0.4 | Browser-side Piper TTS via WASM/ONNX | Browser-only, no server infra, CPU-safe, actively maintained fork used in AnythingLLM |
|
||||
| `smart-whisper` | 0.8.1 | Native Node.js Whisper.cpp binding for STT | Auto-downloads models, Metal on Apple Silicon, used as drop-in replacement for CLI approach |
|
||||
|
||||
### Supporting
|
||||
|
||||
| Library | Version | Purpose | When to Use |
|
||||
|---------|---------|---------|-------------|
|
||||
| `node-wav` | 0.0.2 | Decode WAV buffer to Float32Array for smart-whisper | Required: smart-whisper only accepts 16kHz Float32Array PCM, not raw webm |
|
||||
|
||||
**Note on audio conversion:** The browser sends `audio/webm;codecs=opus`. smart-whisper requires 16kHz mono Float32Array PCM. `ffmpeg` is not present on this machine. The existing `/transcribe` route writes a temp `.webm` file and calls the `whisper` or `whisper-cpp` CLI — this works when those CLIs are installed. If upgrading to `smart-whisper`, a conversion step is required. The server-side `ffmpeg` is not available, so either: (a) require `ffmpeg` as an install-time dep via `fluent-ffmpeg` + system `ffmpeg`, or (b) keep the CLI-fallback pattern in the existing route and just **fix the route registration** rather than rewriting the transcription logic. Option (b) is lower risk.
|
||||
|
||||
### Alternatives Considered
|
||||
|
||||
| Instead of | Could Use | Tradeoff |
|
||||
|------------|-----------|----------|
|
||||
| `@mintplex-labs/piper-tts-web` (browser) | `piper` Python CLI via server route | CLI requires Python + model install; adds server complexity; VOICE-F01 deferred to future |
|
||||
| Fix route registration | Rewrite transcription with `smart-whisper` | smart-whisper requires PCM conversion (no ffmpeg on host); high risk; the existing CLI fallback is simpler |
|
||||
|
||||
**Installation (UI):**
|
||||
```bash
|
||||
pnpm --filter @paperclipai/ui add @mintplex-labs/piper-tts-web
|
||||
```
|
||||
|
||||
**No new server deps needed** for the minimal fix (just registering existing route). If upgrading to smart-whisper in a future phase:
|
||||
```bash
|
||||
pnpm --filter @paperclipai/server add smart-whisper node-wav
|
||||
```
|
||||
|
||||
**Version verification (confirmed against npm registry 2026-04-01):**
|
||||
- `@mintplex-labs/piper-tts-web`: 1.0.4 (latest)
|
||||
- `smart-whisper`: 0.8.1 (latest)
|
||||
- `node-wav`: 0.0.2 (latest)
|
||||
|
||||
---
|
||||
|
||||
## Architecture Patterns
|
||||
|
||||
### Recommended Project Structure
|
||||
```
|
||||
server/src/
|
||||
├── app.ts # ADD: chatFileRoutes registration (1-line fix)
|
||||
├── routes/
|
||||
│ └── chat-files.ts # Existing /transcribe route — no changes needed
|
||||
│ └── voice.ts # Optional: extract a dedicated voice route if /synthesize added
|
||||
|
||||
ui/src/
|
||||
├── components/
|
||||
│ ├── VoiceRecordButton.tsx # Existing — no changes needed once server route is fixed
|
||||
│ ├── TtsButton.tsx # NEW: speaker icon button that calls piper-tts-web predict()
|
||||
│ └── onboarding/
|
||||
│ └── VoiceStep.tsx # NEW: opt-in step for voice features
|
||||
├── hooks/
|
||||
│ └── usePiperTts.ts # NEW: singleton TtsSession, download(), predict(), status
|
||||
├── NexusOnboardingWizard.tsx # MODIFY: insert step 4 (voice), shift steps 4→5, 5→6
|
||||
```
|
||||
|
||||
### Pattern 1: Route Registration Fix (STT)
|
||||
|
||||
**What:** `chatFileRoutes` is defined and exported but never registered in `app.ts`. Add one import and one `api.use()` call.
|
||||
|
||||
**When to use:** This is the only required change for STT to function.
|
||||
|
||||
**Example:**
|
||||
```typescript
|
||||
// server/src/app.ts — add after line ~31 (other imports)
|
||||
import { chatFileRoutes } from "./routes/chat-files.js";
|
||||
|
||||
// ...inside createApp, after api.use(assistantHandoffRoutes(db)):
|
||||
api.use(chatFileRoutes(db, opts.storageService));
|
||||
```
|
||||
|
||||
The `chatFileRoutes` function signature: `chatFileRoutes(db: Db, storage: StorageService)`.
|
||||
In `app.ts`, `opts.storageService` is the storage argument.
|
||||
|
||||
### Pattern 2: Piper TTS Hook (Browser-Side WASM)
|
||||
|
||||
**What:** A React hook wrapping `@mintplex-labs/piper-tts-web` that manages model download state and synthesis. The model download is the pre-warm step that prevents the silent 15-30s hang on first synthesis.
|
||||
|
||||
**When to use:** Any component that needs to read assistant responses aloud.
|
||||
|
||||
**Example:**
|
||||
```typescript
|
||||
// ui/src/hooks/usePiperTts.ts
|
||||
import { tts } from "@mintplex-labs/piper-tts-web";
|
||||
|
||||
const DEFAULT_VOICE = "en_US-hfc_female-medium";
|
||||
|
||||
export function usePiperTts() {
|
||||
const [status, setStatus] = useState<"idle" | "downloading" | "ready" | "speaking">("idle");
|
||||
const [progress, setProgress] = useState(0); // 0–100
|
||||
|
||||
async function prewarm() {
|
||||
setStatus("downloading");
|
||||
const stored = await tts.stored();
|
||||
if (!stored.includes(DEFAULT_VOICE)) {
|
||||
await tts.download(DEFAULT_VOICE, (p) => {
|
||||
setProgress(Math.round((p.loaded / p.total) * 100));
|
||||
});
|
||||
}
|
||||
setStatus("ready");
|
||||
}
|
||||
|
||||
async function speak(text: string) {
|
||||
if (status !== "ready") return;
|
||||
setStatus("speaking");
|
||||
const wav = await tts.predict({ text, voiceId: DEFAULT_VOICE });
|
||||
const audio = new Audio(wav);
|
||||
audio.onended = () => setStatus("ready");
|
||||
audio.play();
|
||||
}
|
||||
|
||||
return { status, progress, prewarm, speak };
|
||||
}
|
||||
```
|
||||
|
||||
**Key points:**
|
||||
- `tts.predict()` returns a Blob URL (WAV format). Use `new Audio(blobUrl).play()` — simplest approach, no Web Audio API needed.
|
||||
- `tts.stored()` checks IndexedDB cache; download is skipped if model already present.
|
||||
- The library is browser-only. Do not import in server code.
|
||||
|
||||
### Pattern 3: Onboarding Voice Step
|
||||
|
||||
**What:** Add a step 4 in `NexusOnboardingWizard.tsx` that shows STT+TTS capability, checks mic permission, and lets users opt in. Because piper-tts-web is CPU-safe WASM, the gate is browser capability (`navigator.mediaDevices`), not hardware tier.
|
||||
|
||||
**When to use:** VOICE-03 requirement — offer voice during onboarding.
|
||||
|
||||
**Step numbering shift:**
|
||||
- Current: 1=hardware, 2=mode, 3=provider, 4=rootDir, 5=summary
|
||||
- New: 1=hardware, 2=mode, 3=provider, **4=voice**, 5=rootDir, 6=summary
|
||||
- Update `Step` type from `1 | 2 | 3 | 4 | 5` to `1 | 2 | 3 | 4 | 5 | 6`
|
||||
- Update "Step X of 5" label to "Step X of 5" (keep label at 5 since summary is a bonus; or "Step X of 6")
|
||||
- Update all `setStep()` calls to use new numbers
|
||||
|
||||
**Voice opt-in state to track:**
|
||||
```typescript
|
||||
const [voiceEnabled, setVoiceEnabled] = useState(false);
|
||||
```
|
||||
Store in `nexus-settings.json` via a new field (e.g., `voiceEnabled: boolean`) if persistence across sessions is desired. Or store in localStorage if the no-DB-schema constraint applies (it does — no schema changes, use file-backed JSON).
|
||||
|
||||
**Example VoiceStep component structure:**
|
||||
```tsx
|
||||
// ui/src/components/onboarding/VoiceStep.tsx
|
||||
export function VoiceStep({ onEnable, onSkip }: VoiceStepProps) {
|
||||
const [micAvailable, setMicAvailable] = useState<boolean | null>(null);
|
||||
|
||||
useEffect(() => {
|
||||
// Non-blocking probe: does browser support mic?
|
||||
navigator.mediaDevices?.enumerateDevices()
|
||||
.then(devices => setMicAvailable(devices.some(d => d.kind === "audioinput")))
|
||||
.catch(() => setMicAvailable(false));
|
||||
}, []);
|
||||
|
||||
return (
|
||||
<>
|
||||
<h1>Voice features</h1>
|
||||
<p>Speak to your assistant (Whisper STT) and hear responses read aloud (Piper TTS). Runs entirely on your device.</p>
|
||||
{micAvailable === false && (
|
||||
<p className="text-muted-foreground text-sm">No microphone detected — STT unavailable, but TTS still works.</p>
|
||||
)}
|
||||
<Button onClick={onEnable}>Enable voice</Button>
|
||||
<Button variant="ghost" onClick={onSkip}>Skip</Button>
|
||||
</>
|
||||
);
|
||||
}
|
||||
```
|
||||
|
||||
### Anti-Patterns to Avoid
|
||||
|
||||
- **Importing piper-tts-web in Node.js:** The library explicitly does not support Node.js. It must only be imported in browser code (UI package). Vite will not include it in the server bundle.
|
||||
- **Calling `tts.predict()` before downloading the model:** Results in a 15-30s silent hang. Always call `tts.download()` first (or check `tts.stored()`), show progress, then call `predict()`.
|
||||
- **Registering `/transcribe` before auth middleware:** The existing `/transcribe` route calls `assertBoard(req)` — it must sit inside the `api` sub-router (after `boardMutationGuard`), not before it. The `chatFileRoutes` call belongs at line ~161 of `app.ts` alongside other `api.use()` calls.
|
||||
- **Using `new Audio()` with a raw Buffer:** `tts.predict()` returns a Blob URL string — pass it directly to `new Audio(url)`, not `new Audio(Buffer)`.
|
||||
|
||||
---
|
||||
|
||||
## Don't Hand-Roll
|
||||
|
||||
| Problem | Don't Build | Use Instead | Why |
|
||||
|---------|-------------|-------------|-----|
|
||||
| TTS synthesis in browser | Custom ONNX loader + Piper WASM integration | `@mintplex-labs/piper-tts-web` | Already bundles ort-wasm, phenomizer, model management — 492KB package handles all of it |
|
||||
| Model download progress | Manual fetch with XHR progress | `tts.download(voiceId, progressCb)` | Built-in progress callback, automatic IndexedDB caching |
|
||||
| PCM audio decoding for Whisper | Custom webm→PCM decoder | Keep CLI fallback in existing `/transcribe` route | No ffmpeg on host; smart-whisper requires PCM — adding conversion is out-of-scope for this phase |
|
||||
| Mic permission detection | Custom navigator probe | `navigator.mediaDevices.enumerateDevices()` | Native browser API, no library needed |
|
||||
|
||||
**Key insight:** The browser handles TTS completely — no server-side Piper install needed for VOICE-01/02. Server-side Piper (VOICE-F01) is explicitly deferred.
|
||||
|
||||
---
|
||||
|
||||
## Common Pitfalls
|
||||
|
||||
### Pitfall 1: `/transcribe` Returns 404
|
||||
|
||||
**What goes wrong:** VoiceRecordButton sends audio to `/api/transcribe`, gets a 404, swallows the error silently. Voice input appears broken with no feedback to the user.
|
||||
|
||||
**Why it happens:** `chatFileRoutes` is exported in `routes/index.ts` but not imported or registered in `app.ts`. The route exists in code but is never mounted.
|
||||
|
||||
**How to avoid:** Add `import { chatFileRoutes } from "./routes/chat-files.js"` and `api.use(chatFileRoutes(db, opts.storageService))` in `app.ts`.
|
||||
|
||||
**Warning signs:** `GET /api/transcribe` returns 404; no logs from the route handler; VoiceRecordButton spinner appears and disappears with no text inserted.
|
||||
|
||||
### Pitfall 2: Piper TTS Silent Hang on First Use
|
||||
|
||||
**What goes wrong:** User clicks "speak" button. Nothing happens for 15-30 seconds, then audio plays. User thinks it's broken.
|
||||
|
||||
**Why it happens:** `tts.predict()` internally downloads the ONNX model (~10-50MB) on first call with no progress feedback.
|
||||
|
||||
**How to avoid:** Call `tts.download(voiceId, progressCb)` explicitly before the first `predict()`. Show a progress bar or spinner with percentage. The `prewarm()` pattern in the hook above is the canonical fix for VOICE-02.
|
||||
|
||||
**Warning signs:** First TTS invocation is slow (10-30s), subsequent calls are fast. Model in browser DevTools IndexedDB after first successful call.
|
||||
|
||||
### Pitfall 3: `chatFileRoutes` Argument Mismatch
|
||||
|
||||
**What goes wrong:** Passing incorrect arguments to `chatFileRoutes(db, storage)` — e.g., passing the wrong storage interface type.
|
||||
|
||||
**Why it happens:** `app.ts` uses `opts.storageService` which is a `StorageService`. The function signature is `chatFileRoutes(db: Db, storage: StorageService)`.
|
||||
|
||||
**How to avoid:** Verify the StorageService import path and type. In `app.ts`, `opts.storageService` is already typed as `StorageService` and is used by other routes (e.g., `assetRoutes(db, opts.storageService)`). Mirror that pattern exactly.
|
||||
|
||||
### Pitfall 4: Onboarding Step Counter Mismatch
|
||||
|
||||
**What goes wrong:** Adding step 4 (voice) but forgetting to update Back/Continue `setStep()` calls in steps 5 and 6, causing step transitions to skip or loop.
|
||||
|
||||
**Why it happens:** `NexusOnboardingWizard.tsx` has hard-coded step numbers throughout (`setStep(4)`, `setStep(5)`, etc.) and a `Step` type union (`1 | 2 | 3 | 4 | 5`).
|
||||
|
||||
**How to avoid:** When inserting step 4, do a full audit of all `setStep(N)` calls and the `Step` type. The `type Step = 1 | 2 | 3 | 4 | 5` must become `1 | 2 | 3 | 4 | 5 | 6`. All old `setStep(4)` → `setStep(5)`, `setStep(5)` → `setStep(6)`.
|
||||
|
||||
### Pitfall 5: piper-tts-web In a Web Worker Context
|
||||
|
||||
**What goes wrong:** Importing `@mintplex-labs/piper-tts-web` fails in a Web Worker because of missing `window` or `document` globals.
|
||||
|
||||
**Why it happens:** The library expects browser globals. It also mentions supporting Web Worker patterns but requires careful WASM path configuration.
|
||||
|
||||
**How to avoid:** Use the library from a regular React component/hook (main thread). Do not import in server-side code, Node.js workers, or Vitest Node environment tests. Mark test files importing it with `@vitest-environment jsdom` if needed, or mock the module in tests.
|
||||
|
||||
---
|
||||
|
||||
## Code Examples
|
||||
|
||||
### Fix: Register chatFileRoutes in app.ts
|
||||
|
||||
```typescript
|
||||
// Source: server/src/app.ts (existing pattern — mirror assetRoutes)
|
||||
|
||||
// Near top of file with other route imports:
|
||||
import { chatFileRoutes } from "./routes/chat-files.js";
|
||||
|
||||
// Inside createApp(), after api.use(assistantHandoffRoutes(db)):
|
||||
api.use(chatFileRoutes(db, opts.storageService));
|
||||
```
|
||||
|
||||
Confirmed pattern from `app.ts` line 147:
|
||||
```typescript
|
||||
api.use(assetRoutes(db, opts.storageService));
|
||||
```
|
||||
|
||||
### TTS: Minimal predict() call
|
||||
|
||||
```typescript
|
||||
// Source: @mintplex-labs/piper-tts-web README
|
||||
|
||||
import { tts } from "@mintplex-labs/piper-tts-web";
|
||||
|
||||
// Download model with progress (pre-warm):
|
||||
await tts.download("en_US-hfc_female-medium", (progress) => {
|
||||
const pct = Math.round((progress.loaded / progress.total) * 100);
|
||||
console.log(`Downloading voice model: ${pct}%`);
|
||||
});
|
||||
|
||||
// Synthesize:
|
||||
const wav = await tts.predict({
|
||||
text: "Hello, I am your assistant.",
|
||||
voiceId: "en_US-hfc_female-medium",
|
||||
});
|
||||
|
||||
// Play:
|
||||
const audio = new Audio(wav);
|
||||
audio.play();
|
||||
```
|
||||
|
||||
### TTS: Check if already downloaded (skip re-download)
|
||||
|
||||
```typescript
|
||||
// Source: @mintplex-labs/piper-tts-web README
|
||||
|
||||
const stored = await tts.stored(); // string[] of cached voiceIds
|
||||
if (!stored.includes("en_US-hfc_female-medium")) {
|
||||
await tts.download("en_US-hfc_female-medium", progressCb);
|
||||
}
|
||||
```
|
||||
|
||||
### Mic availability probe (no library required)
|
||||
|
||||
```typescript
|
||||
// Source: MDN Web API (browser standard)
|
||||
|
||||
async function hasMicrophone(): Promise<boolean> {
|
||||
try {
|
||||
const devices = await navigator.mediaDevices.enumerateDevices();
|
||||
return devices.some((d) => d.kind === "audioinput");
|
||||
} catch {
|
||||
return false;
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## State of the Art
|
||||
|
||||
| Old Approach | Current Approach | When Changed | Impact |
|
||||
|--------------|------------------|--------------|--------|
|
||||
| whisper CLI (openai-whisper Python) | smart-whisper Node.js native binding | 2023-2024 | No Python runtime needed; better perf |
|
||||
| Piper CLI binary | @mintplex-labs/piper-tts-web WASM | 2024 | Runs in browser, no server setup |
|
||||
| Server-rendered TTS audio | Client-side WASM synthesis | 2024 | Eliminates network round-trip; offline-safe |
|
||||
|
||||
**Deprecated/outdated:**
|
||||
- `whisper-cpp` CLI: still works but requires system-level install; the existing `/transcribe` route already has this fallback — adequate for now
|
||||
- `rhasspy/piper` repository: archived Oct 2025, development moved to `OHF-Voice/piper1-gpl`; the `@mintplex-labs/piper-tts-web` npm package uses the original archived models (MIT) and still works
|
||||
|
||||
---
|
||||
|
||||
## Open Questions
|
||||
|
||||
1. **nexus-settings voice persistence**
|
||||
- What we know: `nexus-settings.json` currently only stores `{ mode }`. The `nexusSettingsSchema` is a Zod schema.
|
||||
- What's unclear: Should `voiceEnabled: boolean` be added to the schema? The constraint says "no DB schema changes" but this is a file-backed JSON, not a DB table.
|
||||
- Recommendation: Add `voiceEnabled: z.boolean().default(false)` to `nexusSettingsSchema`. This is a file field, not a DB migration. The planner should confirm this is acceptable under the "no DB schema changes" constraint.
|
||||
|
||||
2. **smart-whisper Apple Silicon unverified claim (from STATE.md blockers)**
|
||||
- What we know: STATE.md notes "smart-whisper Apple Silicon acceleration claim unverified on Mac Mini M4 — fall back to `tiny.en` if `base.en` acceleration not confirmed on device."
|
||||
- What's unclear: Whether Metal acceleration actually works for `base.en` on M4.
|
||||
- Recommendation: The current `/transcribe` route uses CLI fallback anyway. Since this phase is NOT rewriting STT with smart-whisper (just fixing route registration), this blocker does not apply to Phase 34.
|
||||
|
||||
3. **VoiceRecordButton in PersonalAssistant**
|
||||
- What we know: `ChatPanel` sets `enableVoiceInput={true}`. `PersonalAssistant.tsx` does not use `ChatInput` and has its own send form that does NOT include a `VoiceRecordButton`. Voice input only works in the project-mode `ChatPanel`, not in the personal assistant chat.
|
||||
- What's unclear: Whether VOICE-01/02/03 require voice in personal assistant chat specifically.
|
||||
- Recommendation: Planner should add `VoiceRecordButton` to `PersonalAssistant.tsx`'s input area as part of this phase, since personal assistant is the primary chat surface for v1.5.
|
||||
|
||||
---
|
||||
|
||||
## Environment Availability
|
||||
|
||||
| Dependency | Required By | Available | Version | Fallback |
|
||||
|------------|------------|-----------|---------|----------|
|
||||
| Node.js | Server runtime | Yes | v20.20.2 | — |
|
||||
| piper CLI | VOICE-01 (server-side) | No | — | Browser WASM via piper-tts-web (preferred) |
|
||||
| whisper CLI | /transcribe route | No | — | Route returns 503 with user-visible error |
|
||||
| whisper-cpp CLI | /transcribe route | No | — | Falls through to openai-whisper, then 503 |
|
||||
| ffmpeg | WebM→PCM conversion | No | — | Keep CLI-fallback STT; no smart-whisper upgrade this phase |
|
||||
| Browser MediaRecorder | VoiceRecordButton | N/A (browser) | — | Degrades gracefully (mic unavailable state) |
|
||||
|
||||
**Missing dependencies with no fallback:**
|
||||
- None that block this phase — the `/transcribe` route already handles missing Whisper CLIs gracefully with a 503 + descriptive error. Piper TTS runs entirely in browser WASM, no server dep.
|
||||
|
||||
**Missing dependencies with fallback:**
|
||||
- `whisper` / `whisper-cpp`: Not installed. Route returns `{ error: "Whisper not available. Install whisper-cpp or openai-whisper for voice input." }` with 503. This is existing behavior. STT will silently fail until user installs Whisper, which is acceptable given the 503 message guides them.
|
||||
|
||||
---
|
||||
|
||||
## Validation Architecture
|
||||
|
||||
### Test Framework
|
||||
|
||||
| Property | Value |
|
||||
|----------|-------|
|
||||
| Framework | Vitest 3.0.5 |
|
||||
| Config file | `server/vitest.config.ts` |
|
||||
| Quick run command | `npx vitest run server/src/__tests__/34-voice-routes.test.ts` |
|
||||
| Full suite command | `npx vitest run` (from `/opt/nexus`) |
|
||||
|
||||
### Phase Requirements → Test Map
|
||||
|
||||
| Req ID | Behavior | Test Type | Automated Command | File Exists? |
|
||||
|--------|----------|-----------|-------------------|-------------|
|
||||
| VOICE-01 | `chatFileRoutes` is mounted in `app.ts` — GET/POST routes are reachable | unit (route) | `npx vitest run server/src/__tests__/34-voice-routes.test.ts` | No — Wave 0 |
|
||||
| VOICE-02 | `usePiperTts` hook exposes `prewarm()`, `status`, `progress` | unit (hook) | `npx vitest run ui/src/hooks/usePiperTts.test.ts` | No — Wave 0 |
|
||||
| VOICE-03 | `NexusOnboardingWizard` renders voice step at step 4 | unit (component) | Manual / `npx vitest run ui/src/components/NexusOnboardingWizard.test.ts` | No — Wave 0 |
|
||||
|
||||
### Sampling Rate
|
||||
- **Per task commit:** `npx vitest run server/src/__tests__/34-voice-routes.test.ts`
|
||||
- **Per wave merge:** `npx vitest run`
|
||||
- **Phase gate:** Full suite green before `/gsd:verify-work`
|
||||
|
||||
### Wave 0 Gaps
|
||||
- [ ] `server/src/__tests__/34-voice-routes.test.ts` — covers VOICE-01 (route registration, 503 when no whisper CLI)
|
||||
- [ ] `ui/src/hooks/usePiperTts.test.ts` — covers VOICE-02 hook state machine (mock piper-tts-web)
|
||||
- [ ] `ui/src/components/onboarding/VoiceStep.test.tsx` — covers VOICE-03 step rendering
|
||||
|
||||
---
|
||||
|
||||
## Sources
|
||||
|
||||
### Primary (HIGH confidence)
|
||||
- Codebase inspection — `server/src/routes/chat-files.ts`, `server/src/app.ts`, `ui/src/components/VoiceRecordButton.tsx`, `ui/src/components/NexusOnboardingWizard.tsx`
|
||||
- npm registry — `@mintplex-labs/piper-tts-web@1.0.4`, `smart-whisper@0.8.1` (verified 2026-04-01)
|
||||
|
||||
### Secondary (MEDIUM confidence)
|
||||
- [Mintplex-Labs/piper-tts-web README](https://github.com/Mintplex-Labs/piper-tts-web/blob/main/README.md) — `tts.download()`, `tts.predict()`, `tts.stored()` API
|
||||
- [JacobLinCool/smart-whisper GitHub](https://github.com/JacobLinCool/smart-whisper) — Whisper class, PCM Float32Array requirement, Metal on Apple Silicon
|
||||
- [smart-whisper documentation](https://jacoblincool.github.io/smart-whisper/) — transcribe API, model manager
|
||||
|
||||
### Tertiary (LOW confidence)
|
||||
- WebSearch results for Piper TTS Node.js integration — browser-only WASM pattern confirmed by multiple sources
|
||||
|
||||
---
|
||||
|
||||
## Metadata
|
||||
|
||||
**Confidence breakdown:**
|
||||
- Standard stack: MEDIUM — npm package versions verified; API verified via README; no local test environment to run the library
|
||||
- Architecture: HIGH — based on direct codebase inspection (route registration gap confirmed, wizard step structure confirmed)
|
||||
- Pitfalls: HIGH — route registration gap is a confirmed code-level fact, not speculation
|
||||
|
||||
**Research date:** 2026-04-01
|
||||
**Valid until:** 2026-05-01 (stable libraries; piper-tts-web and smart-whisper are low-churn)
|
||||
Loading…
Add table
Reference in a new issue