133 lines
6.6 KiB
Markdown
133 lines
6.6 KiB
Markdown
---
|
|
phase: 25-file-system
|
|
plan: 08
|
|
subsystem: ui
|
|
tags: [voice, whisper, mediarecorder, transcription, react, express, multer]
|
|
|
|
# Dependency graph
|
|
requires:
|
|
- phase: 25-file-system-02
|
|
provides: ChatInput with file upload props, chat-files route with multer pattern
|
|
provides:
|
|
- VoiceRecordButton component with MediaRecorder API, idle/recording/transcribing states
|
|
- POST /transcribe server endpoint using execFileAsync for whisper-cpp or openai-whisper
|
|
- ChatInput enableVoiceInput prop that renders VoiceRecordButton conditionally
|
|
affects: [25-file-system, chat-input, chat-panel]
|
|
|
|
# Tech tracking
|
|
tech-stack:
|
|
added: []
|
|
patterns:
|
|
- "MediaRecorder API with 250ms chunk collection and onstop blob assembly"
|
|
- "execFileAsync (not exec) for shell commands to avoid injection risk"
|
|
- "Whisper CLI cascade: whisper-cpp first, openai-whisper Python fallback, 503 if neither"
|
|
|
|
key-files:
|
|
created:
|
|
- ui/src/components/VoiceRecordButton.tsx
|
|
modified:
|
|
- ui/src/components/ChatInput.tsx
|
|
- ui/src/components/ChatPanel.tsx
|
|
- server/src/routes/chat-files.ts
|
|
- .planning/REQUIREMENTS.md
|
|
|
|
key-decisions:
|
|
- "Use setValue state updater (functional form) for transcription append — avoids stale closure vs native DOM event approach"
|
|
- "enableVoiceInput defaults to false for backward-compat; ChatPanel passes true unconditionally — server returns 503 gracefully if whisper absent"
|
|
- "execFileAsync over exec for whisper CLI invocation — no shell injection risk with system-generated tmpPath"
|
|
|
|
patterns-established:
|
|
- "POST /transcribe uses separate multer instance with field name 'audio' to avoid conflict with 'file' field used by upload routes"
|
|
|
|
requirements-completed: [INPUT-02, INPUT-03, INPUT-04]
|
|
|
|
# Metrics
|
|
duration: 8min
|
|
completed: 2026-04-01
|
|
---
|
|
|
|
# Phase 25 Plan 08: Voice Input Summary
|
|
|
|
**VoiceRecordButton with MediaRecorder API wired into ChatInput; POST /transcribe endpoint with whisper-cpp/openai-whisper cascade and graceful 503 fallback**
|
|
|
|
## Performance
|
|
|
|
- **Duration:** ~8 min
|
|
- **Started:** 2026-04-01T23:58:00Z
|
|
- **Completed:** 2026-04-01T23:59:00Z
|
|
- **Tasks:** 2
|
|
- **Files modified:** 5
|
|
|
|
## Accomplishments
|
|
- VoiceRecordButton component: idle (Mic), recording (Square/red), transcribing (Loader2 spinner) states using MediaRecorder API with 250ms chunks
|
|
- POST /transcribe endpoint: writes audio to temp file, tries whisper-cpp CLI first, falls back to openai-whisper Python CLI, returns 503 with helpful install message if neither is present
|
|
- ChatInput: new `enableVoiceInput` prop renders VoiceRecordButton; handleTranscription appends text to existing textarea value via functional setState
|
|
- ChatPanel passes `enableVoiceInput={true}` unconditionally (server returns 503 if whisper unavailable)
|
|
- INPUT-02, INPUT-03, INPUT-04 marked Complete in REQUIREMENTS.md
|
|
|
|
## Task Commits
|
|
|
|
Each task was committed atomically:
|
|
|
|
1. **Task 1: Create VoiceRecordButton and server transcription endpoint** - `c7c46a02` (feat)
|
|
2. **Task 2: Wire VoiceRecordButton into ChatInput and update REQUIREMENTS.md** - `a1e1b11b` (feat)
|
|
|
|
## Files Created/Modified
|
|
- `ui/src/components/VoiceRecordButton.tsx` - New voice recording button component with MediaRecorder API
|
|
- `ui/src/components/ChatInput.tsx` - Added enableVoiceInput prop, handleTranscription callback, VoiceRecordButton render
|
|
- `ui/src/components/ChatPanel.tsx` - Passes enableVoiceInput={true} to ChatInput
|
|
- `server/src/routes/chat-files.ts` - Added POST /transcribe endpoint with whisper CLI cascade
|
|
- `.planning/REQUIREMENTS.md` - Marked INPUT-02, INPUT-03, INPUT-04 as Complete
|
|
|
|
## Decisions Made
|
|
- Used functional form of setValue (`setValue((current) => ...)`) for transcription append to avoid stale closure issues — simpler than the native DOM event approach suggested in the plan
|
|
- enableVoiceInput defaults to false in ChatInput props for backward compatibility; ChatPanel passes true unconditionally since the server returns a friendly 503 if whisper is not installed
|
|
- Used a separate `audioUpload` multer instance with `.single("audio")` inside the transcribe handler to avoid field name collision with the existing `fileUpload` instance that uses `.single("file")`
|
|
|
|
## Deviations from Plan
|
|
|
|
### Auto-fixed Issues
|
|
|
|
**1. [Rule 1 - Bug] Added `path` import at top of chat-files.ts**
|
|
- **Found during:** Task 1 (server transcription endpoint)
|
|
- **Issue:** The plan's code used `path.join(tmpdir(), ...)` but `path` was not imported in the file
|
|
- **Fix:** Added `import path from "node:path";` at the top of chat-files.ts
|
|
- **Files modified:** server/src/routes/chat-files.ts
|
|
- **Verification:** TypeScript compiles without errors
|
|
- **Committed in:** c7c46a02 (Task 1 commit)
|
|
|
|
**2. [Rule 2 - Missing Critical] Separate multer instance for audio upload**
|
|
- **Found during:** Task 1 (server transcription endpoint)
|
|
- **Issue:** The plan's code called `runSingleFileUpload(fileUpload, req, res)` which uses `.single("file")` — but the audio field is named "audio", so no file would be found
|
|
- **Fix:** Created separate `audioUpload` multer instance and `runAudioUpload` helper using `.single("audio")`
|
|
- **Files modified:** server/src/routes/chat-files.ts
|
|
- **Verification:** TypeScript compiles without errors; logic matches field name used by VoiceRecordButton
|
|
- **Committed in:** c7c46a02 (Task 1 commit)
|
|
|
|
**3. [Rule 1 - Bug] Functional setState for transcription append**
|
|
- **Found during:** Task 2 (ChatInput integration)
|
|
- **Issue:** Plan suggested using native DOM event dispatch to update the textarea — unnecessarily complex since ChatInput uses controlled `value` state directly
|
|
- **Fix:** Used `setValue((current) => current ? \`\${current} \${text}\` : text)` which correctly appends without stale closure risk
|
|
- **Files modified:** ui/src/components/ChatInput.tsx
|
|
- **Verification:** TypeScript compiles without errors
|
|
- **Committed in:** a1e1b11b (Task 2 commit)
|
|
|
|
---
|
|
|
|
**Total deviations:** 3 auto-fixed (1 missing import, 1 field name mismatch, 1 simpler state approach)
|
|
**Impact on plan:** All fixes necessary for correctness. The path import and multer field name fixes would have caused runtime errors. The setState approach is simpler and more idiomatic React.
|
|
|
|
## Issues Encountered
|
|
None beyond the auto-fixed deviations above.
|
|
|
|
## User Setup Required
|
|
None — server returns 503 with install instructions if whisper is not present. No configuration required by default.
|
|
|
|
## Next Phase Readiness
|
|
- Voice input complete; INPUT-02/03/04 all marked Complete
|
|
- Remaining Phase 25 plans can proceed independently
|
|
- To enable transcription: install `whisper-cpp` (brew install whisper-cpp) or `openai-whisper` (pip install openai-whisper)
|
|
|
|
---
|
|
*Phase: 25-file-system*
|
|
*Completed: 2026-04-01*
|