5.2 KiB
5.2 KiB
Requirements Archive: v1.6 Voice Pipeline + Minimal Message Bridge
Archived: 2026-04-04 Status: SHIPPED
For current requirements, see .planning/REQUIREMENTS.md.
Requirements: Nexus v1.6 — Voice Pipeline + Minimal Message Bridge
Defined: 2026-04-04 Core Value: A fresh onboard asks for ONE thing (root directory), auto-creates PM + Engineer agents, and drops you in the dashboard.
v1.6 Requirements
Voice Pipeline
- VPIPE-01: User's voice input is transcribed via local Whisper STT with automatic language detection
- VPIPE-02: Agent text responses are synthesized to speech via local Piper TTS in under 3 seconds
- VPIPE-03: Voice pipeline accepts audio from any transport (web chat, Telegram) via a shared VoicePipelineService
- VPIPE-04: Audio from any source is transcoded to WAV 16kHz mono via ffmpeg before Whisper processing
- VPIPE-05: Voice mode flag on messages triggers voice-optimized response formatting (no markdown, natural prose)
- VPIPE-06: Every voice interaction produces dual output: spoken prose response + full text with code blocks
- VPIPE-07: TTS plays first sentence while subsequent sentences are still synthesizing (sentence-buffered streaming)
- VPIPE-08: User can synthesize a single text response into multiple language audio outputs (multi-language TTS)
Web Chat Voice
- WCHAT-01: Mic button in chat input starts/stops voice recording with visual state (idle/recording/processing)
- WCHAT-02: Recording auto-stops on silence detection via VAD (voice activity detection)
- WCHAT-03: Real-time waveform/amplitude visualization displays while recording
- WCHAT-04: Voice response audio plays inline in chat message with audio player controls
- WCHAT-05: User can toggle voice mode: text only / voice input only / full voice (input + output)
- WCHAT-06: Auto-play of voice responses is configurable (on/off in settings)
Telegram Bridge
- TGRAM-01: Single Telegram bot relays text messages bidirectionally between user and agents
- TGRAM-02: Agent replies in Telegram are prefixed with agent identity (e.g.
[PM],[Engineer]) - TGRAM-03: Telegram voice messages are transcribed (OGG → Whisper) and forwarded to agent as text
- TGRAM-04: Agent responses can be sent back as Telegram voice notes (TTS → OGG)
- TGRAM-05: Telegram bridge uses long polling (no public HTTPS required)
- TGRAM-06: Telegram bridge is under 500 lines of code
Onboarding
- ONBRD-01: Onboarding hardware probe detects Whisper STT and Piper TTS capability
- ONBRD-02: Onboarding presents voice enable/skip step based on hardware detection results
- ONBRD-03: Guided BotFather setup flow for Telegram bot token during onboarding
Future Requirements
Voice Enhancements
- VFUT-01: Wake word detection ("Hey Nexus") for hands-free activation
- VFUT-02: Real-time speech-to-speech streaming (full-duplex WebSocket)
- VFUT-03: Streaming TTS word-by-word playback
Telegram Enhancements
- TFUT-01: Deep Telegram ↔ web chat session sync via Postgres event bus
- TFUT-02: Rich Telegram elements (inline keyboards, threaded replies)
- TFUT-03: Per-agent Telegram bots
Out of Scope
| Feature | Reason |
|---|---|
| Real-time speech-to-speech | Entirely different architecture (LiveKit/Pipecat); future milestone |
| Per-agent Telegram bots | Maintenance nightmare; single bot + agent prefix is correct |
| Deep Telegram ↔ web chat sync | Requires Postgres event bus; deferred to v2.2 Command Center |
| Telegram inline keyboards/threads | Thin bridge only; rich elements deferred to Command Center |
| Wake word detection | Always-on mic; hardware device concern; future |
| Streaming TTS word-by-word | Audio clicks/gaps; sentence-buffered gives 95% of the benefit |
| Inline code execution over Telegram | Security risk; bridge is relay only |
| GSD formatting in Telegram | Stateful session tracking; plain text + Markdown v1 only |
| Transcription editing before sending | Breaks hands-free flow; show transcript in chat bubble after |
Traceability
| Requirement | Phase | Status |
|---|---|---|
| VPIPE-01 | Phase 36 | Complete |
| VPIPE-02 | Phase 36 | Complete |
| VPIPE-03 | Phase 36 | Complete |
| VPIPE-04 | Phase 36 | Complete |
| VPIPE-05 | Phase 36 | Complete |
| VPIPE-06 | Phase 36 | Complete |
| VPIPE-07 | Phase 39 | Complete |
| VPIPE-08 | Phase 39 | Complete |
| WCHAT-01 | Phase 37 | Complete |
| WCHAT-02 | Phase 37 | Complete |
| WCHAT-03 | Phase 37 | Complete |
| WCHAT-04 | Phase 37 | Complete |
| WCHAT-05 | Phase 37 | Complete |
| WCHAT-06 | Phase 37 | Complete |
| TGRAM-01 | Phase 38 | Complete |
| TGRAM-02 | Phase 38 | Complete |
| TGRAM-03 | Phase 38 | Complete |
| TGRAM-04 | Phase 38 | Complete |
| TGRAM-05 | Phase 38 | Complete |
| TGRAM-06 | Phase 38 | Complete |
| ONBRD-01 | Phase 39 | Complete |
| ONBRD-02 | Phase 39 | Complete |
| ONBRD-03 | Phase 38 | Complete |
Coverage:
- v1.6 requirements: 23 total
- Mapped to phases: 23
- Unmapped: 0 ✓
Requirements defined: 2026-04-04 Last updated: 2026-04-03 — traceability populated after roadmap creation