docs(36-03): complete voice HTTP routes plan — POST /transcribe + POST /synthesize + voiceMode wiring
This commit is contained in:
parent
9fcf27fed9
commit
e76c52f693
1 changed files with 128 additions and 0 deletions
128
.planning/phases/36-voice-pipeline-foundation/36-03-SUMMARY.md
Normal file
128
.planning/phases/36-voice-pipeline-foundation/36-03-SUMMARY.md
Normal file
|
|
@ -0,0 +1,128 @@
|
||||||
|
---
|
||||||
|
phase: 36-voice-pipeline-foundation
|
||||||
|
plan: 03
|
||||||
|
subsystem: api
|
||||||
|
tags: [express, multer, voice, whisper, piper, sse, tts, stt]
|
||||||
|
|
||||||
|
# Dependency graph
|
||||||
|
requires:
|
||||||
|
- phase: 36-01
|
||||||
|
provides: voicePipelineService with transcribe/synthesize/formatForVoice
|
||||||
|
- phase: 36-02
|
||||||
|
provides: voiceMode field on createMessageSchema and ChatMessage interface
|
||||||
|
|
||||||
|
provides:
|
||||||
|
- POST /api/transcribe HTTP endpoint (audio upload → VoicePipelineService.transcribe)
|
||||||
|
- POST /api/synthesize HTTP endpoint (text body → VoicePipelineService.synthesize → audio/wav)
|
||||||
|
- voiceMode flag wired through chat stream endpoint with dual-output prompt injection (VPIPE-06)
|
||||||
|
- voiceMode persisted to messageType column on assistant message save
|
||||||
|
- Old inline /transcribe endpoint removed from chat-files.ts
|
||||||
|
|
||||||
|
affects:
|
||||||
|
- phase-37-voice-web-ui (uses POST /api/transcribe and POST /api/synthesize from browser)
|
||||||
|
- phase-38-telegram-bridge (uses POST /api/transcribe and POST /api/synthesize from Telegram)
|
||||||
|
|
||||||
|
# Tech tracking
|
||||||
|
tech-stack:
|
||||||
|
added: []
|
||||||
|
patterns:
|
||||||
|
- "voiceRoutes() factory function pattern matching chatFileRoutes()"
|
||||||
|
- "multer.memoryStorage() for audio upload in dedicated voice route"
|
||||||
|
- "dual-output SSE prompt injection — full_voice mode appends SPOKEN:/DETAILED: system message"
|
||||||
|
- "voiceMode-to-messageType mapping: full_voice→voice_full, voice_input→voice_input"
|
||||||
|
|
||||||
|
key-files:
|
||||||
|
created:
|
||||||
|
- server/src/routes/voice.ts
|
||||||
|
- server/src/__tests__/36-voice-routes.test.ts
|
||||||
|
modified:
|
||||||
|
- server/src/routes/chat.ts
|
||||||
|
- server/src/routes/chat-files.ts
|
||||||
|
- server/src/app.ts
|
||||||
|
|
||||||
|
key-decisions:
|
||||||
|
- "Voice routes are a dedicated module (voice.ts) rather than added to chat-files.ts for clean separation of concerns"
|
||||||
|
- "assertBoard(req) on both voice endpoints — same auth pattern as all other board-facing API routes"
|
||||||
|
- "voiceMode destructured as typed union in stream endpoint to enable compile-time safety"
|
||||||
|
|
||||||
|
patterns-established:
|
||||||
|
- "Dual-output prompt: SPOKEN: section for TTS delivery, DETAILED: section for full markdown"
|
||||||
|
- "messageType column stores voice_full/voice_input for downstream rendering decisions"
|
||||||
|
|
||||||
|
requirements-completed: [VPIPE-03, VPIPE-06]
|
||||||
|
|
||||||
|
# Metrics
|
||||||
|
duration: 25min
|
||||||
|
completed: 2026-04-04
|
||||||
|
---
|
||||||
|
|
||||||
|
# Phase 36 Plan 03: Voice HTTP Routes + voiceMode Wiring Summary
|
||||||
|
|
||||||
|
**Voice pipeline HTTP-accessible via POST /api/transcribe and POST /api/synthesize, with full_voice dual-output prompt injection and messageType persistence in the SSE stream endpoint**
|
||||||
|
|
||||||
|
## Performance
|
||||||
|
|
||||||
|
- **Duration:** 25 min
|
||||||
|
- **Started:** 2026-04-04T01:33:00Z
|
||||||
|
- **Completed:** 2026-04-04T01:58:00Z
|
||||||
|
- **Tasks:** 2
|
||||||
|
- **Files modified:** 5
|
||||||
|
|
||||||
|
## Accomplishments
|
||||||
|
|
||||||
|
- Created `server/src/routes/voice.ts` with POST /transcribe (multer audio upload → VoicePipelineService) and POST /synthesize (text → audio/wav buffer), both protected by `assertBoard(req)`
|
||||||
|
- Wired `voiceMode` through the chat stream endpoint: full_voice triggers dual-output SPOKEN:/DETAILED: system prompt injection; voice_input/full_voice persist to `messageType` column
|
||||||
|
- Removed 90-line inline `/transcribe` implementation from `chat-files.ts` and mounted `voiceRoutes()` in `app.ts`
|
||||||
|
- 5 integration tests pass covering both success and error cases for both endpoints
|
||||||
|
|
||||||
|
## Task Commits
|
||||||
|
|
||||||
|
1. **Task 1: Create voice.ts routes and tests** - `b49b0aa5` (feat + TDD)
|
||||||
|
2. **Task 2: Wire voiceMode in chat.ts, mount voice routes, remove old transcribe** - `3fc1ac10` (feat)
|
||||||
|
|
||||||
|
## Files Created/Modified
|
||||||
|
|
||||||
|
- `server/src/routes/voice.ts` — New voice routes factory: POST /transcribe and POST /synthesize
|
||||||
|
- `server/src/__tests__/36-voice-routes.test.ts` — 5 integration tests for both endpoints
|
||||||
|
- `server/src/routes/chat.ts` — voiceMode destructured, dual-output prompt injected, messageType persisted
|
||||||
|
- `server/src/routes/chat-files.ts` — Removed old inline /transcribe endpoint (90 lines removed)
|
||||||
|
- `server/src/app.ts` — Imported voiceRoutes and mounted with api.use(voiceRoutes())
|
||||||
|
|
||||||
|
## Decisions Made
|
||||||
|
|
||||||
|
- Voice routes are a dedicated `voice.ts` module rather than added to `chat-files.ts` for clean separation — voice pipeline is its own subsystem
|
||||||
|
- Both voice endpoints call `assertBoard(req)` for consistent auth with all other board-facing routes
|
||||||
|
- `voiceMode` typed as `"text" | "voice_input" | "full_voice"` union in stream endpoint for compile-time correctness
|
||||||
|
|
||||||
|
## Deviations from Plan
|
||||||
|
|
||||||
|
None — plan executed exactly as written.
|
||||||
|
|
||||||
|
Note: Pre-existing TypeScript errors in `voice-pipeline.ts` (from Plan 36-01, ffmpeg-static type mismatch) were not caused by this plan's changes. Files modified in this plan compile clean.
|
||||||
|
|
||||||
|
## Issues Encountered
|
||||||
|
|
||||||
|
- Worktree node_modules resolution: vitest from the worktree couldn't resolve `express`/`supertest` because the worktree has an empty local `node_modules`. Fixed by creating symlinks from the worktree's `node_modules` and `server/node_modules` to the main repo's equivalents. This is a one-time setup for the parallel worktree environment.
|
||||||
|
|
||||||
|
## User Setup Required
|
||||||
|
|
||||||
|
None — no external service configuration required.
|
||||||
|
|
||||||
|
## Next Phase Readiness
|
||||||
|
|
||||||
|
- Voice pipeline is fully HTTP-accessible from any transport (web, Telegram, CLI)
|
||||||
|
- Phase 37 (voice web UI) can use POST /api/transcribe and POST /api/synthesize directly
|
||||||
|
- Phase 38 (Telegram bridge) can use the same endpoints for voice message relay
|
||||||
|
- voiceMode flag flows end-to-end: client request body → dual-output prompt → messageType persistence
|
||||||
|
|
||||||
|
## Self-Check: PASSED
|
||||||
|
|
||||||
|
- voice.ts: FOUND
|
||||||
|
- 36-voice-routes.test.ts: FOUND
|
||||||
|
- 36-03-SUMMARY.md: FOUND
|
||||||
|
- commit b49b0aa5: FOUND
|
||||||
|
- commit 3fc1ac10: FOUND
|
||||||
|
|
||||||
|
---
|
||||||
|
*Phase: 36-voice-pipeline-foundation*
|
||||||
|
*Completed: 2026-04-04*
|
||||||
Loading…
Add table
Reference in a new issue