docs(34-01): complete voice foundation plan — chatFileRoutes, usePiperTts, TtsButton, voiceEnabled
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
parent
847f316319
commit
2b568a0f5d
4 changed files with 147 additions and 17 deletions
|
|
@ -25,8 +25,8 @@
|
|||
|
||||
### Voice
|
||||
|
||||
- [ ] **VOICE-01**: User gets Piper TTS speech output that works on CPU-only hardware
|
||||
- [ ] **VOICE-02**: Piper TTS pre-warms on first use with visible download progress (no silent 15-30s hang)
|
||||
- [x] **VOICE-01**: User gets Piper TTS speech output that works on CPU-only hardware
|
||||
- [x] **VOICE-02**: Piper TTS pre-warms on first use with visible download progress (no silent 15-30s hang)
|
||||
- [ ] **VOICE-03**: Voice features (Whisper STT + Piper TTS) offered during onboarding based on hardware capability
|
||||
|
||||
### Personal AI Assistant
|
||||
|
|
@ -85,8 +85,8 @@
|
|||
| ASST-02 | Phase 33 | Complete |
|
||||
| ASST-03 | Phase 33 | Complete |
|
||||
| ASST-04 | Phase 33 | Complete |
|
||||
| VOICE-01 | Phase 34 | Pending |
|
||||
| VOICE-02 | Phase 34 | Pending |
|
||||
| VOICE-01 | Phase 34 | Complete |
|
||||
| VOICE-02 | Phase 34 | Complete |
|
||||
| VOICE-03 | Phase 34 | Pending |
|
||||
| CLI-01 | Phase 35 | Pending |
|
||||
| CLI-02 | Phase 35 | Pending |
|
||||
|
|
|
|||
|
|
@ -175,7 +175,7 @@ Plans:
|
|||
**Plans**: 2 plans
|
||||
|
||||
Plans:
|
||||
- [ ] 34-01-PLAN.md — Fix /transcribe route registration, Piper TTS hook + TtsButton, voiceEnabled in nexus-settings
|
||||
- [x] 34-01-PLAN.md — Fix /transcribe route registration, Piper TTS hook + TtsButton, voiceEnabled in nexus-settings
|
||||
- [ ] 34-02-PLAN.md — VoiceStep onboarding component, wizard step insertion, PersonalAssistant voice wiring
|
||||
**UI hint**: yes
|
||||
|
||||
|
|
@ -238,5 +238,5 @@ All 21 v1.5 requirements are mapped to exactly one phase. No orphans.
|
|||
| 31. Puter.js Zero-Config Cloud | v1.5 | 4/4 | Complete | 2026-04-03 |
|
||||
| 32. Multi-Step Onboarding Wizard | v1.5 | 1/1 | Complete | 2026-04-03 |
|
||||
| 33. Persistent Memory + Personal Assistant Mode | v1.5 | 3/3 | Complete | 2026-04-03 |
|
||||
| 34. Voice | v1.5 | 0/2 | Not started | - |
|
||||
| 34. Voice | v1.5 | 1/2 | In Progress| |
|
||||
| 35. npx buildthis CLI | v1.5 | 0/TBD | Not started | - |
|
||||
|
|
|
|||
|
|
@ -2,15 +2,15 @@
|
|||
gsd_state_version: 1.0
|
||||
milestone: v1.5
|
||||
milestone_name: Smart Onboarding + Personal AI Assistant
|
||||
status: verifying
|
||||
stopped_at: Completed 33-persistent-memory/33-03
|
||||
last_updated: "2026-04-03T22:15:46.392Z"
|
||||
status: executing
|
||||
stopped_at: Completed 34-voice/34-01
|
||||
last_updated: "2026-04-03T22:35:58.055Z"
|
||||
last_activity: 2026-04-03
|
||||
progress:
|
||||
total_phases: 6
|
||||
completed_phases: 4
|
||||
total_plans: 10
|
||||
completed_plans: 10
|
||||
total_plans: 12
|
||||
completed_plans: 11
|
||||
percent: 0
|
||||
---
|
||||
|
||||
|
|
@ -21,13 +21,13 @@ progress:
|
|||
See: .planning/PROJECT.md (updated 2026-04-02)
|
||||
|
||||
**Core value:** A fresh onboard asks for ONE thing (root directory), auto-creates PM + Engineer agents, and drops you in the dashboard.
|
||||
**Current focus:** Phase 33 — persistent-memory
|
||||
**Current focus:** Phase 34 — voice
|
||||
|
||||
## Current Position
|
||||
|
||||
Phase: 34
|
||||
Plan: Not started
|
||||
Status: Phase complete — ready for verification
|
||||
Phase: 34 (voice) — EXECUTING
|
||||
Plan: 2 of 2
|
||||
Status: Ready to execute
|
||||
Last activity: 2026-04-03
|
||||
|
||||
Progress: [__________] 0%
|
||||
|
|
@ -62,6 +62,7 @@ Progress: [__________] 0%
|
|||
| Phase 33 P01 | 4 | 2 tasks | 6 files |
|
||||
| Phase 33 P02 | 12 | 2 tasks | 8 files |
|
||||
| Phase 33-persistent-memory P03 | 20 | 2 tasks | 6 files |
|
||||
| Phase 34-voice P01 | 3 | 2 tasks | 7 files |
|
||||
|
||||
## Accumulated Context
|
||||
|
||||
|
|
@ -95,6 +96,8 @@ Key constraints for v1.5 (established at roadmap):
|
|||
- [Phase 33-persistent-memory]: Pre-fetch conversation/settings/memory BEFORE flushHeaders to avoid SSE header race (Pitfall 3 from research)
|
||||
- [Phase 33-persistent-memory]: puterProxyService.resolveToken wrapped in try/catch — graceful fallback to streamEcho when no puter token configured
|
||||
- [Phase 33-persistent-memory]: buildHandoffSummary exported as named pure function for direct unit testing without route test harness
|
||||
- [Phase 34-voice]: chatFileRoutes registered inside boardMutationGuard after assistantHandoffRoutes; nexusSettingsRoutes also added (was missing)
|
||||
- [Phase 34-voice]: voiceEnabled as Zod boolean with default(false) in nexus-settings — file-backed JSON, no DB migration
|
||||
|
||||
### Pending Todos
|
||||
|
||||
|
|
@ -109,6 +112,6 @@ None yet.
|
|||
|
||||
## Session Continuity
|
||||
|
||||
Last session: 2026-04-03T22:14:47.420Z
|
||||
Stopped at: Completed 33-persistent-memory/33-03
|
||||
Last session: 2026-04-03T22:35:58.052Z
|
||||
Stopped at: Completed 34-voice/34-01
|
||||
Resume file: None
|
||||
|
|
|
|||
127
.planning/phases/34-voice/34-01-SUMMARY.md
Normal file
127
.planning/phases/34-voice/34-01-SUMMARY.md
Normal file
|
|
@ -0,0 +1,127 @@
|
|||
---
|
||||
phase: 34-voice
|
||||
plan: 01
|
||||
subsystem: api, ui, tts
|
||||
tags: [piper-tts, whisper, voice, tts, nexus-settings, chat-files]
|
||||
|
||||
# Dependency graph
|
||||
requires:
|
||||
- phase: 33-persistent-memory
|
||||
provides: assistantHandoffRoutes, nexusSettingsRoutes, chat.ts streaming
|
||||
- phase: 25-file-system
|
||||
provides: chatFileRoutes, StorageService, chatFileService
|
||||
|
||||
provides:
|
||||
- POST /api/transcribe route registered (returns 503 when Whisper CLI absent, not 404)
|
||||
- GET/PATCH /nexus/settings with voiceEnabled persistence
|
||||
- usePiperTts hook with prewarm/speak/stop/status/progress
|
||||
- TtsButton component with download progress and speaker icon
|
||||
|
||||
affects: [34-02-PLAN, PersonalAssistant, VoiceOnboarding]
|
||||
|
||||
# Tech tracking
|
||||
tech-stack:
|
||||
added: ["@mintplex-labs/piper-tts-web (WASM TTS in browser)"]
|
||||
patterns:
|
||||
- "Browser WASM TTS: tts.stored() → tts.download(voice, progressCb) → tts.predict({text, voiceId})"
|
||||
- "TtsButton receives status/progress props from hook — zero TTS logic in component"
|
||||
- "nexus-settings: file-backed JSON via Zod schema (no DB migration)"
|
||||
|
||||
key-files:
|
||||
created:
|
||||
- ui/src/hooks/usePiperTts.ts
|
||||
- ui/src/components/TtsButton.tsx
|
||||
modified:
|
||||
- server/src/app.ts
|
||||
- server/src/services/nexus-settings.ts
|
||||
- ui/src/api/hardware.ts
|
||||
- ui/package.json
|
||||
- pnpm-lock.yaml
|
||||
|
||||
key-decisions:
|
||||
- "chatFileRoutes registered inside boardMutationGuard (after assistantHandoffRoutes) — assertBoard() requires authenticated api router"
|
||||
- "nexusSettingsRoutes also registered here (was missing from app.ts despite file existing)"
|
||||
- "voiceEnabled as Zod boolean with default(false) — file-backed JSON, no DB migration needed"
|
||||
- "Default return values in nexusSettingsService.get() updated to include voiceEnabled: false for safe fallback"
|
||||
|
||||
patterns-established:
|
||||
- "TTS hook pattern: prewarm triggers download, speak asserts ready status, stop interrupts audio"
|
||||
- "Audio playback via new Audio(blobUrl).play() — CPU-safe, no GPU required"
|
||||
|
||||
requirements-completed: [VOICE-01, VOICE-02]
|
||||
|
||||
# Metrics
|
||||
duration: 3min
|
||||
completed: 2026-04-01
|
||||
---
|
||||
|
||||
# Phase 34 Plan 01: Voice Foundation Summary
|
||||
|
||||
**chatFileRoutes and nexusSettingsRoutes mounted in app.ts; voiceEnabled added to nexus-settings; usePiperTts hook and TtsButton component created with @mintplex-labs/piper-tts-web WASM synthesis**
|
||||
|
||||
## Performance
|
||||
|
||||
- **Duration:** ~3 min
|
||||
- **Started:** 2026-04-01T22:32:52Z
|
||||
- **Completed:** 2026-04-01T22:35:57Z
|
||||
- **Tasks:** 2/2
|
||||
- **Files modified:** 7
|
||||
|
||||
## Accomplishments
|
||||
|
||||
### Task 1: Register chatFileRoutes in app.ts and add voiceEnabled to nexus-settings
|
||||
|
||||
- Added `chatFileRoutes(db, opts.storageService)` import and `api.use()` call after `assistantHandoffRoutes(db)` in `server/src/app.ts` — POST /api/transcribe now returns 503 (not 404) when Whisper CLI is absent
|
||||
- Added `nexusSettingsRoutes()` import and registration (was also missing from app.ts)
|
||||
- Extended `nexusSettingsSchema` with `voiceEnabled: z.boolean().default(false)` in `server/src/services/nexus-settings.ts`
|
||||
- Updated default fallback returns in `nexusSettingsService.get()` to include `voiceEnabled: false`
|
||||
- Added `voiceEnabled?: boolean` to `NexusSettings` client interface in `ui/src/api/hardware.ts`
|
||||
- All 10 existing chat-file-routes tests pass
|
||||
|
||||
### Task 2: Create usePiperTts hook and TtsButton component
|
||||
|
||||
- Installed `@mintplex-labs/piper-tts-web` as a UI dependency
|
||||
- Created `ui/src/hooks/usePiperTts.ts` — exposes `prewarm`, `speak`, `stop`, `status` (idle/downloading/ready/speaking/error), `progress` (0-100)
|
||||
- `tts.stored()` checks IndexedDB cache — skips download if model already present (satisfies VOICE-02 caching)
|
||||
- `tts.download()` with progress callback provides visible download progress during prewarm (satisfies VOICE-02 UX)
|
||||
- `tts.predict()` returns WAV blob URL — CPU-safe WASM synthesis, no GPU required (satisfies VOICE-01)
|
||||
- Created `ui/src/components/TtsButton.tsx` — shows Loader2 + progress% during downloading, VolumeX to stop during speaking, Volume2 for idle/ready states
|
||||
- TtsButton receives all state/callbacks from hook — zero TTS logic in component (clean separation)
|
||||
|
||||
## Verification
|
||||
|
||||
- `grep -q "chatFileRoutes" server/src/app.ts` — PASS
|
||||
- `grep -q "voiceEnabled" server/src/services/nexus-settings.ts` — PASS
|
||||
- `ls ui/src/hooks/usePiperTts.ts ui/src/components/TtsButton.tsx` — PASS
|
||||
- `npx vitest run server/src/__tests__/chat-file-routes.test.ts` — 10/10 tests PASS
|
||||
|
||||
## Deviations from Plan
|
||||
|
||||
### Auto-fixed Issues
|
||||
|
||||
**1. [Rule 2 - Missing Critical Functionality] nexusSettingsRoutes was also missing from app.ts**
|
||||
- **Found during:** Task 1
|
||||
- **Issue:** The plan only mentioned registering `chatFileRoutes`, but `nexusSettingsRoutes` was also absent from app.ts despite the route file existing
|
||||
- **Fix:** Added both `chatFileRoutes` and `nexusSettingsRoutes` imports and `api.use()` registrations
|
||||
- **Files modified:** `server/src/app.ts`
|
||||
- **Commit:** 0d318a31
|
||||
|
||||
**2. [Rule 2 - Missing Critical Functionality] Default fallback in nexusSettingsService.get() needed voiceEnabled**
|
||||
- **Found during:** Task 1
|
||||
- **Issue:** The hardcoded fallback `{ mode: "both" }` would cause TypeScript type mismatch after adding voiceEnabled to schema
|
||||
- **Fix:** Updated both catch-path and parse-failure-path returns to include `voiceEnabled: false`
|
||||
- **Files modified:** `server/src/services/nexus-settings.ts`
|
||||
- **Commit:** 0d318a31
|
||||
|
||||
## Known Stubs
|
||||
|
||||
None — all functionality is wired. TtsButton requires the caller to provide status/progress/callbacks from usePiperTts (documented pattern for Plan 02 integration into PersonalAssistant).
|
||||
|
||||
## Commits
|
||||
|
||||
| Commit | Task | Description |
|
||||
|--------|------|-------------|
|
||||
| 0d318a31 | Task 1 | feat(34-01): register chatFileRoutes + nexusSettingsRoutes in app.ts, add voiceEnabled to nexus-settings |
|
||||
| 8f8257e1 | Task 2 | feat(34-01): create usePiperTts hook and TtsButton component with piper-tts-web |
|
||||
|
||||
## Self-Check: PASSED
|
||||
Loading…
Add table
Reference in a new issue