Nexus Dev e76c52f693 docs(36-03): complete voice HTTP routes plan — POST /transcribe + POST /synthesize + voiceMode wiring

2026-04-04 03:55:50 +00:00

5.5 KiB

Raw Blame History

phase

plan

subsystem

tags

requires

provides

affects

tech-stack

key-files

key-decisions

patterns-established

requirements-completed

duration

completed

36-voice-pipeline-foundation

api

express

multer

voice

whisper

piper

sse

tts

stt

phase	provides
36-01	voicePipelineService with transcribe/synthesize/formatForVoice

phase	provides
36-02	voiceMode field on createMessageSchema and ChatMessage interface

POST /api/transcribe HTTP endpoint (audio upload → VoicePipelineService.transcribe)

POST /api/synthesize HTTP endpoint (text body → VoicePipelineService.synthesize → audio/wav)

voiceMode flag wired through chat stream endpoint with dual-output prompt injection (VPIPE-06)

voiceMode persisted to messageType column on assistant message save

Old inline /transcribe endpoint removed from chat-files.ts

phase-37-voice-web-ui (uses POST /api/transcribe and POST /api/synthesize from browser)

phase-38-telegram-bridge (uses POST /api/transcribe and POST /api/synthesize from Telegram)

added

patterns

voiceRoutes() factory function pattern matching chatFileRoutes()

multer.memoryStorage() for audio upload in dedicated voice route

dual-output SSE prompt injection — full_voice mode appends SPOKEN:/DETAILED: system message

voiceMode-to-messageType mapping: full_voice→voice_full, voice_input→voice_input

created

modified

server/src/routes/voice.ts

server/src/__tests__/36-voice-routes.test.ts

server/src/routes/chat.ts

server/src/routes/chat-files.ts

server/src/app.ts

Voice routes are a dedicated module (voice.ts) rather than added to chat-files.ts for clean separation of concerns

assertBoard(req) on both voice endpoints — same auth pattern as all other board-facing API routes

voiceMode destructured as typed union in stream endpoint to enable compile-time safety

Dual-output prompt: SPOKEN: section for TTS delivery, DETAILED: section for full markdown

messageType column stores voice_full/voice_input for downstream rendering decisions

VPIPE-03

VPIPE-06

25min

2026-04-04

Phase 36 Plan 03: Voice HTTP Routes + voiceMode Wiring Summary

Voice pipeline HTTP-accessible via POST /api/transcribe and POST /api/synthesize, with full_voice dual-output prompt injection and messageType persistence in the SSE stream endpoint

Performance

Duration: 25 min
Started: 2026-04-04T01:33:00Z
Completed: 2026-04-04T01:58:00Z
Tasks: 2
Files modified: 5

Accomplishments

Created server/src/routes/voice.ts with POST /transcribe (multer audio upload → VoicePipelineService) and POST /synthesize (text → audio/wav buffer), both protected by assertBoard(req)
Wired voiceMode through the chat stream endpoint: full_voice triggers dual-output SPOKEN:/DETAILED: system prompt injection; voice_input/full_voice persist to messageType column
Removed 90-line inline /transcribe implementation from chat-files.ts and mounted voiceRoutes() in app.ts
5 integration tests pass covering both success and error cases for both endpoints

Task Commits

Task 1: Create voice.ts routes and tests - b49b0aa5 (feat + TDD)
Task 2: Wire voiceMode in chat.ts, mount voice routes, remove old transcribe - 3fc1ac10 (feat)

Files Created/Modified

server/src/routes/voice.ts — New voice routes factory: POST /transcribe and POST /synthesize
server/src/__tests__/36-voice-routes.test.ts — 5 integration tests for both endpoints
server/src/routes/chat.ts — voiceMode destructured, dual-output prompt injected, messageType persisted
server/src/routes/chat-files.ts — Removed old inline /transcribe endpoint (90 lines removed)
server/src/app.ts — Imported voiceRoutes and mounted with api.use(voiceRoutes())

Decisions Made

Voice routes are a dedicated voice.ts module rather than added to chat-files.ts for clean separation — voice pipeline is its own subsystem
Both voice endpoints call assertBoard(req) for consistent auth with all other board-facing routes
voiceMode typed as "text" | "voice_input" | "full_voice" union in stream endpoint for compile-time correctness

Deviations from Plan

None — plan executed exactly as written.

Note: Pre-existing TypeScript errors in voice-pipeline.ts (from Plan 36-01, ffmpeg-static type mismatch) were not caused by this plan's changes. Files modified in this plan compile clean.

Issues Encountered

Worktree node_modules resolution: vitest from the worktree couldn't resolve express/supertest because the worktree has an empty local node_modules. Fixed by creating symlinks from the worktree's node_modules and server/node_modules to the main repo's equivalents. This is a one-time setup for the parallel worktree environment.

User Setup Required

None — no external service configuration required.

Next Phase Readiness

Voice pipeline is fully HTTP-accessible from any transport (web, Telegram, CLI)
Phase 37 (voice web UI) can use POST /api/transcribe and POST /api/synthesize directly
Phase 38 (Telegram bridge) can use the same endpoints for voice message relay
voiceMode flag flows end-to-end: client request body → dual-output prompt → messageType persistence

Self-Check: PASSED

voice.ts: FOUND
36-voice-routes.test.ts: FOUND
36-03-SUMMARY.md: FOUND
commit b49b0aa5: FOUND
commit 3fc1ac10: FOUND

Phase: 36-voice-pipeline-foundation Completed: 2026-04-04

5.5 KiB Raw Blame History