diff --git a/.planning/phases/36-voice-pipeline-foundation/36-03-SUMMARY.md b/.planning/phases/36-voice-pipeline-foundation/36-03-SUMMARY.md new file mode 100644 index 00000000..70131106 --- /dev/null +++ b/.planning/phases/36-voice-pipeline-foundation/36-03-SUMMARY.md @@ -0,0 +1,128 @@ +--- +phase: 36-voice-pipeline-foundation +plan: 03 +subsystem: api +tags: [express, multer, voice, whisper, piper, sse, tts, stt] + +# Dependency graph +requires: + - phase: 36-01 + provides: voicePipelineService with transcribe/synthesize/formatForVoice + - phase: 36-02 + provides: voiceMode field on createMessageSchema and ChatMessage interface + +provides: + - POST /api/transcribe HTTP endpoint (audio upload → VoicePipelineService.transcribe) + - POST /api/synthesize HTTP endpoint (text body → VoicePipelineService.synthesize → audio/wav) + - voiceMode flag wired through chat stream endpoint with dual-output prompt injection (VPIPE-06) + - voiceMode persisted to messageType column on assistant message save + - Old inline /transcribe endpoint removed from chat-files.ts + +affects: + - phase-37-voice-web-ui (uses POST /api/transcribe and POST /api/synthesize from browser) + - phase-38-telegram-bridge (uses POST /api/transcribe and POST /api/synthesize from Telegram) + +# Tech tracking +tech-stack: + added: [] + patterns: + - "voiceRoutes() factory function pattern matching chatFileRoutes()" + - "multer.memoryStorage() for audio upload in dedicated voice route" + - "dual-output SSE prompt injection — full_voice mode appends SPOKEN:/DETAILED: system message" + - "voiceMode-to-messageType mapping: full_voice→voice_full, voice_input→voice_input" + +key-files: + created: + - server/src/routes/voice.ts + - server/src/__tests__/36-voice-routes.test.ts + modified: + - server/src/routes/chat.ts + - server/src/routes/chat-files.ts + - server/src/app.ts + +key-decisions: + - "Voice routes are a dedicated module (voice.ts) rather than added to chat-files.ts for clean separation of concerns" + - "assertBoard(req) on both voice endpoints — same auth pattern as all other board-facing API routes" + - "voiceMode destructured as typed union in stream endpoint to enable compile-time safety" + +patterns-established: + - "Dual-output prompt: SPOKEN: section for TTS delivery, DETAILED: section for full markdown" + - "messageType column stores voice_full/voice_input for downstream rendering decisions" + +requirements-completed: [VPIPE-03, VPIPE-06] + +# Metrics +duration: 25min +completed: 2026-04-04 +--- + +# Phase 36 Plan 03: Voice HTTP Routes + voiceMode Wiring Summary + +**Voice pipeline HTTP-accessible via POST /api/transcribe and POST /api/synthesize, with full_voice dual-output prompt injection and messageType persistence in the SSE stream endpoint** + +## Performance + +- **Duration:** 25 min +- **Started:** 2026-04-04T01:33:00Z +- **Completed:** 2026-04-04T01:58:00Z +- **Tasks:** 2 +- **Files modified:** 5 + +## Accomplishments + +- Created `server/src/routes/voice.ts` with POST /transcribe (multer audio upload → VoicePipelineService) and POST /synthesize (text → audio/wav buffer), both protected by `assertBoard(req)` +- Wired `voiceMode` through the chat stream endpoint: full_voice triggers dual-output SPOKEN:/DETAILED: system prompt injection; voice_input/full_voice persist to `messageType` column +- Removed 90-line inline `/transcribe` implementation from `chat-files.ts` and mounted `voiceRoutes()` in `app.ts` +- 5 integration tests pass covering both success and error cases for both endpoints + +## Task Commits + +1. **Task 1: Create voice.ts routes and tests** - `b49b0aa5` (feat + TDD) +2. **Task 2: Wire voiceMode in chat.ts, mount voice routes, remove old transcribe** - `3fc1ac10` (feat) + +## Files Created/Modified + +- `server/src/routes/voice.ts` — New voice routes factory: POST /transcribe and POST /synthesize +- `server/src/__tests__/36-voice-routes.test.ts` — 5 integration tests for both endpoints +- `server/src/routes/chat.ts` — voiceMode destructured, dual-output prompt injected, messageType persisted +- `server/src/routes/chat-files.ts` — Removed old inline /transcribe endpoint (90 lines removed) +- `server/src/app.ts` — Imported voiceRoutes and mounted with api.use(voiceRoutes()) + +## Decisions Made + +- Voice routes are a dedicated `voice.ts` module rather than added to `chat-files.ts` for clean separation — voice pipeline is its own subsystem +- Both voice endpoints call `assertBoard(req)` for consistent auth with all other board-facing routes +- `voiceMode` typed as `"text" | "voice_input" | "full_voice"` union in stream endpoint for compile-time correctness + +## Deviations from Plan + +None — plan executed exactly as written. + +Note: Pre-existing TypeScript errors in `voice-pipeline.ts` (from Plan 36-01, ffmpeg-static type mismatch) were not caused by this plan's changes. Files modified in this plan compile clean. + +## Issues Encountered + +- Worktree node_modules resolution: vitest from the worktree couldn't resolve `express`/`supertest` because the worktree has an empty local `node_modules`. Fixed by creating symlinks from the worktree's `node_modules` and `server/node_modules` to the main repo's equivalents. This is a one-time setup for the parallel worktree environment. + +## User Setup Required + +None — no external service configuration required. + +## Next Phase Readiness + +- Voice pipeline is fully HTTP-accessible from any transport (web, Telegram, CLI) +- Phase 37 (voice web UI) can use POST /api/transcribe and POST /api/synthesize directly +- Phase 38 (Telegram bridge) can use the same endpoints for voice message relay +- voiceMode flag flows end-to-end: client request body → dual-output prompt → messageType persistence + +## Self-Check: PASSED + +- voice.ts: FOUND +- 36-voice-routes.test.ts: FOUND +- 36-03-SUMMARY.md: FOUND +- commit b49b0aa5: FOUND +- commit 3fc1ac10: FOUND + +--- +*Phase: 36-voice-pipeline-foundation* +*Completed: 2026-04-04*