nexus/.planning/phases/36-voice-pipeline-foundation/36-CONTEXT.md

3 KiB

Phase 36: Voice Pipeline Foundation - Context

Gathered: 2026-04-04 Status: Ready for planning Mode: Auto-generated (discuss skipped via workflow.skip_discuss)

## Phase Boundary

The transport-agnostic voice pipeline is live and callable from any consumer — web chat, Telegram, or future integrations — with correct audio transcoding, voice mode flag propagation, and dual output formatting baked in from the start.

Requirements: VPIPE-01, VPIPE-02, VPIPE-03, VPIPE-04, VPIPE-05, VPIPE-06

## Implementation Decisions

Claude's Discretion

All implementation choices are at Claude's discretion — discuss phase was skipped per user setting. Use ROADMAP phase goal, success criteria, and codebase conventions to guide decisions.

Key research findings to incorporate:

  • VoicePipelineService as server-side service: transcribe(buffer, format), synthesize(text, voiceId?), formatForVoice(text)
  • Move /transcribe from chat-files.ts to new voice.ts route to reduce rebase conflict surface
  • Use ffmpeg-static ^5.2.0 (NOT archived fluent-ffmpeg) for WebM→WAV and OGG→WAV transcoding
  • Use execFile (not exec) for CLI subprocess calls — prevents shell injection
  • Wrap CLI calls (piper, ffmpeg) in Promise.race([call, timeout(8000)]) for graceful degradation
  • Voice mode flag must survive: client → Express → message persistence → agent session codec
  • Dual output: prompt engineering requests SPOKEN: [prose] + DETAILED: [markdown] with post-processing strip as fallback
  • nexus-settings schema extension: voiceMode: "text" | "voice_input" | "full_voice", optional telegramToken
  • No DB migrations — all state in existing JSONB fields and file-backed JSON

<code_context>

Existing Code Insights

Reusable Assets

  • server/src/routes/chat-files.ts — existing /transcribe endpoint with whisper-cpp/openai-whisper cascade
  • server/src/services/nexus-settings.ts — file-backed JSON with Zod validation
  • packages/shared/src/validators/chat.ts — Zod chat message validators
  • packages/shared/src/types/chat.ts — ChatMessage type definitions
  • ui/src/components/VoiceRecordButton.tsx — MediaRecorder API (client-side)
  • ui/src/components/TtsButton.tsx — @mintplex-labs/piper-tts-web WASM
  • ui/src/hooks/usePiperTts.ts — browser TTS hook

Established Patterns

  • Express route files in server/src/routes/
  • Service files in server/src/services/
  • Zod validators in packages/shared/src/validators/
  • Routes mounted in server/src/app.ts
  • File-backed settings via nexus-settings service

Integration Points

  • server/src/app.ts — mount new voice routes
  • packages/shared — extend chat message types with voiceMode field
  • server/src/services/nexus-settings.ts — extend schema for voiceMode and telegramToken

</code_context>

## Specific Ideas

No specific requirements — discuss phase skipped. Refer to ROADMAP phase description and success criteria.

## Deferred Ideas

None — discuss phase skipped.