docs(25-08): complete voice input plan — VoiceRecordButton, transcription endpoint, INPUT-02/03/04 complete

2026-04-02 00:00:52 +00:00 · 2026-04-02 00:00:52 +00:00 · bdbefde2f4
commit bdbefde2f4
parent 86f9421026
3 changed files with 150 additions and 13 deletions
--- a/.planning/ROADMAP.md
+++ b/.planning/ROADMAP.md
@ -116,7 +116,7 @@ Plans:
  5. When an agent generates a placeholder asset, `PLACEHOLDERS.md` is updated in the project directory; when the placeholder is replaced, the DB records the replacement chain and the manifest reflects the change
  6. A file uploaded in a conversation linked to a project lives in `files/projects/<slug>/`; a file from an unlinked conversation lives in `files/chat/<conversation-id>/`; the user can promote a chat file to project scope
  7. Voice input is available when local AI is enabled: user can hold the record button, speak, see a transcription preview, and confirm to send
-**Plans:** 9 plans (4 complete + 5 gap closure)
+**Plans:** 5/9 plans executed

 Plans:
 - [x] 25-00-PLAN.md — DB schema (chat_files + chat_file_references), shared types/validators, test stubs
@ -127,7 +127,7 @@ Plans:
 - [ ] 25-05-PLAN.md — Gap: File scope promotion API + UI (FILE-12)
 - [ ] 25-06-PLAN.md — Gap: Git integration for file operations + version history (FILE-09, FILE-10)
 - [ ] 25-07-PLAN.md — Gap: Agent-generated files + placeholder tracking (FILE-08, FILE-11)
- [ ] 25-08-PLAN.md — Gap: Voice input via Whisper (INPUT-04) + admin claims (INPUT-02, INPUT-03)
+- [x] 25-08-PLAN.md — Gap: Voice input via Whisper (INPUT-04) + admin claims (INPUT-02, INPUT-03)

 **UI hint**: yes

@ -227,5 +227,5 @@ All 65 v1 requirements are mapped to exactly one phase. No orphans.
 | 22. Agent Streaming | v1.3 | 6/6 | Complete    | 2026-04-01 |
 | 23. Brainstormer Flow | v1.3 | 4/4 | Complete    | 2026-04-01 |
 | 24. Search, History & Branching | v1.3 | 4/4 | Complete    | 2026-04-01 |
-| 25. File System | v1.3 | 4/9 | Gap closure | - |
+| 25. File System | v1.3 | 5/9 | In Progress|  |
 | 26. PWA & Performance | v1.3 | 0/? | Not started | - |
--- a/.planning/STATE.md
+++ b/.planning/STATE.md
@ -3,14 +3,14 @@ gsd_state_version: 1.0
 milestone: v1.3
 milestone_name: milestone
 status: executing
-stopped_at: Completed 25-file-system-25-03-PLAN.md
-last_updated: "2026-04-01T23:33:56.751Z"
-last_activity: 2026-04-01
+stopped_at: Completed 25-file-system-25-08-PLAN.md
+last_updated: "2026-04-02T00:00:43.784Z"
+last_activity: 2026-04-02
 progress:
  total_phases: 6
-  completed_phases: 5
-  total_plans: 25
-  completed_plans: 25
+  completed_phases: 4
+  total_plans: 30
+  completed_plans: 26
  percent: 100
 ---

@ -26,9 +26,9 @@ See: .planning/PROJECT.md (updated 2026-03-30)
 ## Current Position

 Phase: 25 (file-system) — EXECUTING
-Plan: 4 of 4
+Plan: 2 of 9
 Status: Ready to execute
-Last activity: 2026-04-01
+Last activity: 2026-04-02

 Progress: [██████████] 100%

@ -82,6 +82,7 @@ Progress: [██████████] 100%
 | Phase 25-file-system P02 | 15 | 2 tasks | 5 files |
 | Phase 25-file-system P01 | 15 | 2 tasks | 17 files |
 | Phase 25-file-system P03 | 3 | 2 tasks | 7 files |
+| Phase 25-file-system P08 | 8 | 2 tasks | 5 files |

 ## Accumulated Context

@ -146,6 +147,9 @@ Recent decisions affecting current work:
 - [Phase 25-file-system]: ChatFilePreview shows inline image with max-h-[300px] + ChatFileCard below; non-image types use ChatFileCard only
 - [Phase 25-file-system]: listMessages fetches chatFiles with inArray(messageId) as second query, merged in-memory
 - [Phase 25-file-system]: completedFileIds captured before clearCompleted in handleSend to avoid race condition
+- [Phase 25-file-system]: Use functional setState for transcription append in VoiceRecordButton — avoids stale closure vs native DOM event approach
+- [Phase 25-file-system]: enableVoiceInput defaults to false for backward-compat; ChatPanel passes true unconditionally — server returns 503 gracefully if whisper absent
+- [Phase 25-file-system]: execFileAsync over exec for whisper CLI invocation — no shell injection risk with system-generated tmpPath

 ### Pending Todos

@ -158,6 +162,6 @@ None yet.

 ## Session Continuity

-Last session: 2026-04-01T23:33:56.747Z
-Stopped at: Completed 25-file-system-25-03-PLAN.md
+Last session: 2026-04-02T00:00:43.781Z
+Stopped at: Completed 25-file-system-25-08-PLAN.md
 Resume file: None
--- a/.planning/phases/25-file-system/25-08-SUMMARY.md
+++ b/.planning/phases/25-file-system/25-08-SUMMARY.md
@ -0,0 +1,133 @@
+---
+phase: 25-file-system
+plan: 08
+subsystem: ui
+tags: [voice, whisper, mediarecorder, transcription, react, express, multer]
+
+# Dependency graph
+requires:
+  - phase: 25-file-system-02
+    provides: ChatInput with file upload props, chat-files route with multer pattern
+provides:
+  - VoiceRecordButton component with MediaRecorder API, idle/recording/transcribing states
+  - POST /transcribe server endpoint using execFileAsync for whisper-cpp or openai-whisper
+  - ChatInput enableVoiceInput prop that renders VoiceRecordButton conditionally
+affects: [25-file-system, chat-input, chat-panel]
+
+# Tech tracking
+tech-stack:
+  added: []
+  patterns:
+    - "MediaRecorder API with 250ms chunk collection and onstop blob assembly"
+    - "execFileAsync (not exec) for shell commands to avoid injection risk"
+    - "Whisper CLI cascade: whisper-cpp first, openai-whisper Python fallback, 503 if neither"
+
+key-files:
+  created:
+    - ui/src/components/VoiceRecordButton.tsx
+  modified:
+    - ui/src/components/ChatInput.tsx
+    - ui/src/components/ChatPanel.tsx
+    - server/src/routes/chat-files.ts
+    - .planning/REQUIREMENTS.md
+
+key-decisions:
+  - "Use setValue state updater (functional form) for transcription append — avoids stale closure vs native DOM event approach"
+  - "enableVoiceInput defaults to false for backward-compat; ChatPanel passes true unconditionally — server returns 503 gracefully if whisper absent"
+  - "execFileAsync over exec for whisper CLI invocation — no shell injection risk with system-generated tmpPath"
+
+patterns-established:
+  - "POST /transcribe uses separate multer instance with field name 'audio' to avoid conflict with 'file' field used by upload routes"
+
+requirements-completed: [INPUT-02, INPUT-03, INPUT-04]
+
+# Metrics
+duration: 8min
+completed: 2026-04-01
+---
+
+# Phase 25 Plan 08: Voice Input Summary
+
+**VoiceRecordButton with MediaRecorder API wired into ChatInput; POST /transcribe endpoint with whisper-cpp/openai-whisper cascade and graceful 503 fallback**
+
+## Performance
+
+- **Duration:** ~8 min
+- **Started:** 2026-04-01T23:58:00Z
+- **Completed:** 2026-04-01T23:59:00Z
+- **Tasks:** 2
+- **Files modified:** 5
+
+## Accomplishments
+- VoiceRecordButton component: idle (Mic), recording (Square/red), transcribing (Loader2 spinner) states using MediaRecorder API with 250ms chunks
+- POST /transcribe endpoint: writes audio to temp file, tries whisper-cpp CLI first, falls back to openai-whisper Python CLI, returns 503 with helpful install message if neither is present
+- ChatInput: new `enableVoiceInput` prop renders VoiceRecordButton; handleTranscription appends text to existing textarea value via functional setState
+- ChatPanel passes `enableVoiceInput={true}` unconditionally (server returns 503 if whisper unavailable)
+- INPUT-02, INPUT-03, INPUT-04 marked Complete in REQUIREMENTS.md
+
+## Task Commits
+
+Each task was committed atomically:
+
+1. **Task 1: Create VoiceRecordButton and server transcription endpoint** - `c7c46a02` (feat)
+2. **Task 2: Wire VoiceRecordButton into ChatInput and update REQUIREMENTS.md** - `a1e1b11b` (feat)
+
+## Files Created/Modified
+- `ui/src/components/VoiceRecordButton.tsx` - New voice recording button component with MediaRecorder API
+- `ui/src/components/ChatInput.tsx` - Added enableVoiceInput prop, handleTranscription callback, VoiceRecordButton render
+- `ui/src/components/ChatPanel.tsx` - Passes enableVoiceInput={true} to ChatInput
+- `server/src/routes/chat-files.ts` - Added POST /transcribe endpoint with whisper CLI cascade
+- `.planning/REQUIREMENTS.md` - Marked INPUT-02, INPUT-03, INPUT-04 as Complete
+
+## Decisions Made
+- Used functional form of setValue (`setValue((current) => ...)`) for transcription append to avoid stale closure issues — simpler than the native DOM event approach suggested in the plan
+- enableVoiceInput defaults to false in ChatInput props for backward compatibility; ChatPanel passes true unconditionally since the server returns a friendly 503 if whisper is not installed
+- Used a separate `audioUpload` multer instance with `.single("audio")` inside the transcribe handler to avoid field name collision with the existing `fileUpload` instance that uses `.single("file")`
+
+## Deviations from Plan
+
+### Auto-fixed Issues
+
+**1. [Rule 1 - Bug] Added `path` import at top of chat-files.ts**
+- **Found during:** Task 1 (server transcription endpoint)
+- **Issue:** The plan's code used `path.join(tmpdir(), ...)` but `path` was not imported in the file
+- **Fix:** Added `import path from "node:path";` at the top of chat-files.ts
+- **Files modified:** server/src/routes/chat-files.ts
+- **Verification:** TypeScript compiles without errors
+- **Committed in:** c7c46a02 (Task 1 commit)
+
+**2. [Rule 2 - Missing Critical] Separate multer instance for audio upload**
+- **Found during:** Task 1 (server transcription endpoint)
+- **Issue:** The plan's code called `runSingleFileUpload(fileUpload, req, res)` which uses `.single("file")` — but the audio field is named "audio", so no file would be found
+- **Fix:** Created separate `audioUpload` multer instance and `runAudioUpload` helper using `.single("audio")`
+- **Files modified:** server/src/routes/chat-files.ts
+- **Verification:** TypeScript compiles without errors; logic matches field name used by VoiceRecordButton
+- **Committed in:** c7c46a02 (Task 1 commit)
+
+**3. [Rule 1 - Bug] Functional setState for transcription append**
+- **Found during:** Task 2 (ChatInput integration)
+- **Issue:** Plan suggested using native DOM event dispatch to update the textarea — unnecessarily complex since ChatInput uses controlled `value` state directly
+- **Fix:** Used `setValue((current) => current ? \`\${current} \${text}\` : text)` which correctly appends without stale closure risk
+- **Files modified:** ui/src/components/ChatInput.tsx
+- **Verification:** TypeScript compiles without errors
+- **Committed in:** a1e1b11b (Task 2 commit)
+
+---
+
+**Total deviations:** 3 auto-fixed (1 missing import, 1 field name mismatch, 1 simpler state approach)
+**Impact on plan:** All fixes necessary for correctness. The path import and multer field name fixes would have caused runtime errors. The setState approach is simpler and more idiomatic React.
+
+## Issues Encountered
+None beyond the auto-fixed deviations above.
+
+## User Setup Required
+None — server returns 503 with install instructions if whisper is not present. No configuration required by default.
+
+## Next Phase Readiness
+- Voice input complete; INPUT-02/03/04 all marked Complete
+- Remaining Phase 25 plans can proceed independently
+- To enable transcription: install `whisper-cpp` (brew install whisper-cpp) or `openai-whisper` (pip install openai-whisper)
+
+---
+*Phase: 25-file-system*
+*Completed: 2026-04-01*