3 KiB
Phase 36: Voice Pipeline Foundation - Context
Gathered: 2026-04-04 Status: Ready for planning Mode: Auto-generated (discuss skipped via workflow.skip_discuss)
## Phase BoundaryThe transport-agnostic voice pipeline is live and callable from any consumer — web chat, Telegram, or future integrations — with correct audio transcoding, voice mode flag propagation, and dual output formatting baked in from the start.
Requirements: VPIPE-01, VPIPE-02, VPIPE-03, VPIPE-04, VPIPE-05, VPIPE-06
## Implementation DecisionsClaude's Discretion
All implementation choices are at Claude's discretion — discuss phase was skipped per user setting. Use ROADMAP phase goal, success criteria, and codebase conventions to guide decisions.
Key research findings to incorporate:
- VoicePipelineService as server-side service:
transcribe(buffer, format),synthesize(text, voiceId?),formatForVoice(text) - Move
/transcribefromchat-files.tsto newvoice.tsroute to reduce rebase conflict surface - Use
ffmpeg-static ^5.2.0(NOT archived fluent-ffmpeg) for WebM→WAV and OGG→WAV transcoding - Use
execFile(notexec) for CLI subprocess calls — prevents shell injection - Wrap CLI calls (
piper,ffmpeg) inPromise.race([call, timeout(8000)])for graceful degradation - Voice mode flag must survive: client → Express → message persistence → agent session codec
- Dual output: prompt engineering requests
SPOKEN: [prose]+DETAILED: [markdown]with post-processing strip as fallback - nexus-settings schema extension:
voiceMode: "text" | "voice_input" | "full_voice", optionaltelegramToken - No DB migrations — all state in existing JSONB fields and file-backed JSON
<code_context>
Existing Code Insights
Reusable Assets
server/src/routes/chat-files.ts— existing/transcribeendpoint with whisper-cpp/openai-whisper cascadeserver/src/services/nexus-settings.ts— file-backed JSON with Zod validationpackages/shared/src/validators/chat.ts— Zod chat message validatorspackages/shared/src/types/chat.ts— ChatMessage type definitionsui/src/components/VoiceRecordButton.tsx— MediaRecorder API (client-side)ui/src/components/TtsButton.tsx— @mintplex-labs/piper-tts-web WASMui/src/hooks/usePiperTts.ts— browser TTS hook
Established Patterns
- Express route files in
server/src/routes/ - Service files in
server/src/services/ - Zod validators in
packages/shared/src/validators/ - Routes mounted in
server/src/app.ts - File-backed settings via nexus-settings service
Integration Points
server/src/app.ts— mount new voice routespackages/shared— extend chat message types with voiceMode fieldserver/src/services/nexus-settings.ts— extend schema for voiceMode and telegramToken
</code_context>
## Specific IdeasNo specific requirements — discuss phase skipped. Refer to ROADMAP phase description and success criteria.
## Deferred IdeasNone — discuss phase skipped.