nexus/.planning/phases/36-voice-pipeline-foundation/36-03-SUMMARY.md

5.5 KiB

phase plan subsystem tags requires provides affects tech-stack key-files key-decisions patterns-established requirements-completed duration completed
36-voice-pipeline-foundation 03 api
express
multer
voice
whisper
piper
sse
tts
stt
phase provides
36-01 voicePipelineService with transcribe/synthesize/formatForVoice
phase provides
36-02 voiceMode field on createMessageSchema and ChatMessage interface
POST /api/transcribe HTTP endpoint (audio upload → VoicePipelineService.transcribe)
POST /api/synthesize HTTP endpoint (text body → VoicePipelineService.synthesize → audio/wav)
voiceMode flag wired through chat stream endpoint with dual-output prompt injection (VPIPE-06)
voiceMode persisted to messageType column on assistant message save
Old inline /transcribe endpoint removed from chat-files.ts
phase-37-voice-web-ui (uses POST /api/transcribe and POST /api/synthesize from browser)
phase-38-telegram-bridge (uses POST /api/transcribe and POST /api/synthesize from Telegram)
added patterns
voiceRoutes() factory function pattern matching chatFileRoutes()
multer.memoryStorage() for audio upload in dedicated voice route
dual-output SSE prompt injection — full_voice mode appends SPOKEN:/DETAILED: system message
voiceMode-to-messageType mapping: full_voice→voice_full, voice_input→voice_input
created modified
server/src/routes/voice.ts
server/src/__tests__/36-voice-routes.test.ts
server/src/routes/chat.ts
server/src/routes/chat-files.ts
server/src/app.ts
Voice routes are a dedicated module (voice.ts) rather than added to chat-files.ts for clean separation of concerns
assertBoard(req) on both voice endpoints — same auth pattern as all other board-facing API routes
voiceMode destructured as typed union in stream endpoint to enable compile-time safety
Dual-output prompt: SPOKEN: section for TTS delivery, DETAILED: section for full markdown
messageType column stores voice_full/voice_input for downstream rendering decisions
VPIPE-03
VPIPE-06
25min 2026-04-04

Phase 36 Plan 03: Voice HTTP Routes + voiceMode Wiring Summary

Voice pipeline HTTP-accessible via POST /api/transcribe and POST /api/synthesize, with full_voice dual-output prompt injection and messageType persistence in the SSE stream endpoint

Performance

  • Duration: 25 min
  • Started: 2026-04-04T01:33:00Z
  • Completed: 2026-04-04T01:58:00Z
  • Tasks: 2
  • Files modified: 5

Accomplishments

  • Created server/src/routes/voice.ts with POST /transcribe (multer audio upload → VoicePipelineService) and POST /synthesize (text → audio/wav buffer), both protected by assertBoard(req)
  • Wired voiceMode through the chat stream endpoint: full_voice triggers dual-output SPOKEN:/DETAILED: system prompt injection; voice_input/full_voice persist to messageType column
  • Removed 90-line inline /transcribe implementation from chat-files.ts and mounted voiceRoutes() in app.ts
  • 5 integration tests pass covering both success and error cases for both endpoints

Task Commits

  1. Task 1: Create voice.ts routes and tests - b49b0aa5 (feat + TDD)
  2. Task 2: Wire voiceMode in chat.ts, mount voice routes, remove old transcribe - 3fc1ac10 (feat)

Files Created/Modified

  • server/src/routes/voice.ts — New voice routes factory: POST /transcribe and POST /synthesize
  • server/src/__tests__/36-voice-routes.test.ts — 5 integration tests for both endpoints
  • server/src/routes/chat.ts — voiceMode destructured, dual-output prompt injected, messageType persisted
  • server/src/routes/chat-files.ts — Removed old inline /transcribe endpoint (90 lines removed)
  • server/src/app.ts — Imported voiceRoutes and mounted with api.use(voiceRoutes())

Decisions Made

  • Voice routes are a dedicated voice.ts module rather than added to chat-files.ts for clean separation — voice pipeline is its own subsystem
  • Both voice endpoints call assertBoard(req) for consistent auth with all other board-facing routes
  • voiceMode typed as "text" | "voice_input" | "full_voice" union in stream endpoint for compile-time correctness

Deviations from Plan

None — plan executed exactly as written.

Note: Pre-existing TypeScript errors in voice-pipeline.ts (from Plan 36-01, ffmpeg-static type mismatch) were not caused by this plan's changes. Files modified in this plan compile clean.

Issues Encountered

  • Worktree node_modules resolution: vitest from the worktree couldn't resolve express/supertest because the worktree has an empty local node_modules. Fixed by creating symlinks from the worktree's node_modules and server/node_modules to the main repo's equivalents. This is a one-time setup for the parallel worktree environment.

User Setup Required

None — no external service configuration required.

Next Phase Readiness

  • Voice pipeline is fully HTTP-accessible from any transport (web, Telegram, CLI)
  • Phase 37 (voice web UI) can use POST /api/transcribe and POST /api/synthesize directly
  • Phase 38 (Telegram bridge) can use the same endpoints for voice message relay
  • voiceMode flag flows end-to-end: client request body → dual-output prompt → messageType persistence

Self-Check: PASSED

  • voice.ts: FOUND
  • 36-voice-routes.test.ts: FOUND
  • 36-03-SUMMARY.md: FOUND
  • commit b49b0aa5: FOUND
  • commit 3fc1ac10: FOUND

Phase: 36-voice-pipeline-foundation Completed: 2026-04-04