From 0dfd0cbac5d77006312463ae90276e647fdbdcda Mon Sep 17 00:00:00 2001 From: Nexus Dev Date: Sat, 4 Apr 2026 00:55:08 +0000 Subject: [PATCH] docs(36): auto-generated context (discuss skipped) --- .../36-CONTEXT.md | 73 +++++++++++++++++++ 1 file changed, 73 insertions(+) create mode 100644 .planning/phases/36-voice-pipeline-foundation/36-CONTEXT.md diff --git a/.planning/phases/36-voice-pipeline-foundation/36-CONTEXT.md b/.planning/phases/36-voice-pipeline-foundation/36-CONTEXT.md new file mode 100644 index 00000000..4e92b6ac --- /dev/null +++ b/.planning/phases/36-voice-pipeline-foundation/36-CONTEXT.md @@ -0,0 +1,73 @@ +# Phase 36: Voice Pipeline Foundation - Context + +**Gathered:** 2026-04-04 +**Status:** Ready for planning +**Mode:** Auto-generated (discuss skipped via workflow.skip_discuss) + + +## Phase Boundary + +The transport-agnostic voice pipeline is live and callable from any consumer — web chat, Telegram, or future integrations — with correct audio transcoding, voice mode flag propagation, and dual output formatting baked in from the start. + +Requirements: VPIPE-01, VPIPE-02, VPIPE-03, VPIPE-04, VPIPE-05, VPIPE-06 + + + + +## Implementation Decisions + +### Claude's Discretion +All implementation choices are at Claude's discretion — discuss phase was skipped per user setting. Use ROADMAP phase goal, success criteria, and codebase conventions to guide decisions. + +Key research findings to incorporate: +- VoicePipelineService as server-side service: `transcribe(buffer, format)`, `synthesize(text, voiceId?)`, `formatForVoice(text)` +- Move `/transcribe` from `chat-files.ts` to new `voice.ts` route to reduce rebase conflict surface +- Use `ffmpeg-static ^5.2.0` (NOT archived fluent-ffmpeg) for WebM→WAV and OGG→WAV transcoding +- Use `execFile` (not `exec`) for CLI subprocess calls — prevents shell injection +- Wrap CLI calls (`piper`, `ffmpeg`) in `Promise.race([call, timeout(8000)])` for graceful degradation +- Voice mode flag must survive: client → Express → message persistence → agent session codec +- Dual output: prompt engineering requests `SPOKEN: [prose]` + `DETAILED: [markdown]` with post-processing strip as fallback +- nexus-settings schema extension: `voiceMode: "text" | "voice_input" | "full_voice"`, optional `telegramToken` +- No DB migrations — all state in existing JSONB fields and file-backed JSON + + + + +## Existing Code Insights + +### Reusable Assets +- `server/src/routes/chat-files.ts` — existing `/transcribe` endpoint with whisper-cpp/openai-whisper cascade +- `server/src/services/nexus-settings.ts` — file-backed JSON with Zod validation +- `packages/shared/src/validators/chat.ts` — Zod chat message validators +- `packages/shared/src/types/chat.ts` — ChatMessage type definitions +- `ui/src/components/VoiceRecordButton.tsx` — MediaRecorder API (client-side) +- `ui/src/components/TtsButton.tsx` — @mintplex-labs/piper-tts-web WASM +- `ui/src/hooks/usePiperTts.ts` — browser TTS hook + +### Established Patterns +- Express route files in `server/src/routes/` +- Service files in `server/src/services/` +- Zod validators in `packages/shared/src/validators/` +- Routes mounted in `server/src/app.ts` +- File-backed settings via nexus-settings service + +### Integration Points +- `server/src/app.ts` — mount new voice routes +- `packages/shared` — extend chat message types with voiceMode field +- `server/src/services/nexus-settings.ts` — extend schema for voiceMode and telegramToken + + + + +## Specific Ideas + +No specific requirements — discuss phase skipped. Refer to ROADMAP phase description and success criteria. + + + + +## Deferred Ideas + +None — discuss phase skipped. + +