# Feature Research **Domain:** Voice Pipeline (Whisper STT + Piper TTS) + Telegram Bridge (Nexus v1.6) **Researched:** 2026-04-03 **Confidence:** MEDIUM-HIGH — STT/TTS pipeline patterns are well-documented; Telegram bot API is stable; dual-output formatting and voice mode UX patterns inferred from ChatGPT/Meta AI voice implementations and community patterns --- ## Milestone Scope This document covers only the NEW features in v1.6. The following are already built and are dependencies, not deliverables: - VoiceRecordButton with MediaRecorder API in ChatInput (v1.3) - TtsButton with @mintplex-labs/piper-tts-web WASM synthesis (v1.3/v1.5) - POST /transcribe endpoint with whisper-cpp/openai-whisper cascade (v1.3) - VoiceStep in onboarding wizard (v1.5) - voiceEnabled in nexus-settings (v1.5) - Full chat system with streaming SSE (v1.3) **New features being researched:** - Transport-agnostic voice pipeline (server-side, not just browser WASM) - Voice mode flag on messages (affects response formatting) - Dual output pattern: voice-optimized prose + full markdown text - Web chat voice UI improvements: silence detection, waveform, auto-submit - Web chat audio playback: inline player, auto-play toggle - Voice mode toggle setting (text only / voice input / full voice) - Minimal Telegram bridge: single bot, text + voice relay, agent prefixing --- ## Feature Landscape ### Table Stakes (Users Expect These) Features users assume exist when voice or Telegram is mentioned. Missing these makes the feature feel broken or incomplete. | Feature | Why Expected | Complexity | Notes | |---------|--------------|------------|-------| | Silence-based auto-submit | Every voice input UI (Siri, Google, Whisper demos) stops recording on silence; holding a button feels archaic | MEDIUM | WebRTC VAD or AudioWorklet amplitude monitoring; 1.5s silence threshold typical; must show countdown so user knows what's happening | | Waveform/amplitude visualization while recording | Users expect visual feedback that the mic is active; a static "recording..." text feels broken | LOW | Canvas or SVG with 30-50 data points; AnalyserNode from Web Audio API; real-time amplitude bars, not pre-rendered waveform | | Voice response auto-play toggle | If the AI responded with audio, playing it automatically is expected unless the user disabled it; manual play-only feels incomplete | LOW | Boolean setting in nexus-settings (voiceAutoPlay); inline HTML5 `