# Requirements: Nexus v1.6 — Voice Pipeline + Minimal Message Bridge **Defined:** 2026-04-04 **Core Value:** A fresh onboard asks for ONE thing (root directory), auto-creates PM + Engineer agents, and drops you in the dashboard. ## v1.6 Requirements ### Voice Pipeline - [x] **VPIPE-01**: User's voice input is transcribed via local Whisper STT with automatic language detection - [x] **VPIPE-02**: Agent text responses are synthesized to speech via local Piper TTS in under 3 seconds - [x] **VPIPE-03**: Voice pipeline accepts audio from any transport (web chat, Telegram) via a shared VoicePipelineService - [x] **VPIPE-04**: Audio from any source is transcoded to WAV 16kHz mono via ffmpeg before Whisper processing - [x] **VPIPE-05**: Voice mode flag on messages triggers voice-optimized response formatting (no markdown, natural prose) - [x] **VPIPE-06**: Every voice interaction produces dual output: spoken prose response + full text with code blocks - [x] **VPIPE-07**: TTS plays first sentence while subsequent sentences are still synthesizing (sentence-buffered streaming) - [x] **VPIPE-08**: User can synthesize a single text response into multiple language audio outputs (multi-language TTS) ### Web Chat Voice - [x] **WCHAT-01**: Mic button in chat input starts/stops voice recording with visual state (idle/recording/processing) - [x] **WCHAT-02**: Recording auto-stops on silence detection via VAD (voice activity detection) - [x] **WCHAT-03**: Real-time waveform/amplitude visualization displays while recording - [x] **WCHAT-04**: Voice response audio plays inline in chat message with audio player controls - [x] **WCHAT-05**: User can toggle voice mode: text only / voice input only / full voice (input + output) - [x] **WCHAT-06**: Auto-play of voice responses is configurable (on/off in settings) ### Telegram Bridge - [x] **TGRAM-01**: Single Telegram bot relays text messages bidirectionally between user and agents - [x] **TGRAM-02**: Agent replies in Telegram are prefixed with agent identity (e.g. `[PM]`, `[Engineer]`) - [x] **TGRAM-03**: Telegram voice messages are transcribed (OGG → Whisper) and forwarded to agent as text - [x] **TGRAM-04**: Agent responses can be sent back as Telegram voice notes (TTS → OGG) - [x] **TGRAM-05**: Telegram bridge uses long polling (no public HTTPS required) - [x] **TGRAM-06**: Telegram bridge is under 500 lines of code ### Onboarding - [ ] **ONBRD-01**: Onboarding hardware probe detects Whisper STT and Piper TTS capability - [ ] **ONBRD-02**: Onboarding presents voice enable/skip step based on hardware detection results - [x] **ONBRD-03**: Guided BotFather setup flow for Telegram bot token during onboarding ## Future Requirements ### Voice Enhancements - **VFUT-01**: Wake word detection ("Hey Nexus") for hands-free activation - **VFUT-02**: Real-time speech-to-speech streaming (full-duplex WebSocket) - **VFUT-03**: Streaming TTS word-by-word playback ### Telegram Enhancements - **TFUT-01**: Deep Telegram ↔ web chat session sync via Postgres event bus - **TFUT-02**: Rich Telegram elements (inline keyboards, threaded replies) - **TFUT-03**: Per-agent Telegram bots ## Out of Scope | Feature | Reason | |---------|--------| | Real-time speech-to-speech | Entirely different architecture (LiveKit/Pipecat); future milestone | | Per-agent Telegram bots | Maintenance nightmare; single bot + agent prefix is correct | | Deep Telegram ↔ web chat sync | Requires Postgres event bus; deferred to v2.2 Command Center | | Telegram inline keyboards/threads | Thin bridge only; rich elements deferred to Command Center | | Wake word detection | Always-on mic; hardware device concern; future | | Streaming TTS word-by-word | Audio clicks/gaps; sentence-buffered gives 95% of the benefit | | Inline code execution over Telegram | Security risk; bridge is relay only | | GSD formatting in Telegram | Stateful session tracking; plain text + Markdown v1 only | | Transcription editing before sending | Breaks hands-free flow; show transcript in chat bubble after | ## Traceability | Requirement | Phase | Status | |-------------|-------|--------| | VPIPE-01 | Phase 36 | Complete | | VPIPE-02 | Phase 36 | Complete | | VPIPE-03 | Phase 36 | Complete | | VPIPE-04 | Phase 36 | Complete | | VPIPE-05 | Phase 36 | Complete | | VPIPE-06 | Phase 36 | Complete | | VPIPE-07 | Phase 39 | Complete | | VPIPE-08 | Phase 39 | Complete | | WCHAT-01 | Phase 37 | Complete | | WCHAT-02 | Phase 37 | Complete | | WCHAT-03 | Phase 37 | Complete | | WCHAT-04 | Phase 37 | Complete | | WCHAT-05 | Phase 37 | Complete | | WCHAT-06 | Phase 37 | Complete | | TGRAM-01 | Phase 38 | Complete | | TGRAM-02 | Phase 38 | Complete | | TGRAM-03 | Phase 38 | Complete | | TGRAM-04 | Phase 38 | Complete | | TGRAM-05 | Phase 38 | Complete | | TGRAM-06 | Phase 38 | Complete | | ONBRD-01 | Phase 39 | Pending | | ONBRD-02 | Phase 39 | Pending | | ONBRD-03 | Phase 38 | Complete | **Coverage:** - v1.6 requirements: 23 total - Mapped to phases: 23 - Unmapped: 0 ✓ --- *Requirements defined: 2026-04-04* *Last updated: 2026-04-03 — traceability populated after roadmap creation*