# Project Research Summary **Project:** Nexus v1.6 — Voice Pipeline + Telegram Bridge **Domain:** Server-side STT/TTS voice pipeline with transport-agnostic service abstraction and a minimal Telegram relay bridge **Researched:** 2026-04-03 **Confidence:** MEDIUM-HIGH --- ## Executive Summary Nexus v1.6 adds two parallel capability tracks onto an existing React/Express/Paperclip monorepo: a transport-agnostic voice pipeline (Whisper STT + Piper TTS) and a minimal Telegram bridge that reuses those pipeline primitives for phone access. The established expert pattern for this class of system is a shared service abstraction (`voicePipelineService`) that both the web HTTP layer and the Telegram bot call directly — never duplicating STT/TTS logic across transports. The Telegram bridge must be a thin relay only, forwarding messages to the existing `chatService` and returning the response, with no separate bot personality, no rich UI elements, and no per-user conversation branching beyond the existing single-workspace model. The recommended approach is to build `voicePipelineService` first as the keystone service (`transcribe`, `synthesize`, `formatForVoice`), then wire the web voice UI improvements on top of it, then attach the Telegram bridge as a consumer of the same service. Audio format conversion via `ffmpeg-static` (not the archived `fluent-ffmpeg`) handles the two required transcoding paths: browser WebM/Opus to WAV 16kHz for Whisper, and Telegram OGG/Opus to WAV 16kHz for Whisper. The `@ricky0123/vad-react` library handles browser-side voice activity detection. `grammy ^1.41.1` handles the Telegram bot layer with long polling (correct for a local Mac Mini deployment without a public HTTPS endpoint). The key risks are: (1) audio format mismatches causing silent transcription failures across browsers and the Telegram path, which require ffmpeg transcoding at every entry point; (2) the voice mode flag being stripped as it traverses the message pipeline layers, causing agents to respond with full markdown that TTS then renders as "asterisk asterisk important asterisk asterisk"; (3) Piper being invoked as a new process per request, causing 200–800ms model reload latency on every TTS response and silent truncation on responses over ~400 characters; and (4) browser autoplay policy blocking audio playback unless the `AudioContext` is unlocked during the user's initial "start voice mode" gesture. --- ## Key Findings ### Recommended Stack v1.6 is additive to the v1.5 stack. The existing `smart-whisper`, `@mintplex-labs/piper-tts-web`, `multer`, and Express foundations remain unchanged. Three new libraries are required. **Core technologies:** - `@ricky0123/vad-react ^0.0.36` (ui/) — Browser-side Silero VAD via ONNX Runtime Web; delivers `Float32Array` at 16kHz on speech end; React 19 peer dep confirmed fixed August 2025; requires COOP/COEP headers for `SharedArrayBuffer` - `ffmpeg-static ^5.2.0` (server/) — Ships FFmpeg 6.1.1 binaries including macOS arm64; invoked via `child_process.spawn`; do NOT use the archived `fluent-ffmpeg` (archived May 2025) or stale `@ffmpeg-installer/ffmpeg` (FFmpeg 4.x) - `grammy ^1.41.1` (server/) — TypeScript-native Telegram bot framework (1.4M weekly downloads, higher than Telegraf); long polling for local deployment; clean file handling API via `ctx.getFile()`; Bot API 9.6 support confirmed No new library is required for server-side Piper TTS (existing `child_process.spawn` pattern from v1.5) or audio playback (native `