nexus/.planning/milestones/v1.6-REQUIREMENTS.md
Nexus Dev 8883cb6ecb
Some checks failed
Docker / build-and-push (push) Has been cancelled
chore: complete v1.6 Voice Pipeline + Minimal Message Bridge milestone
2026-04-04 03:52:25 +00:00

5.2 KiB

Requirements Archive: v1.6 Voice Pipeline + Minimal Message Bridge

Archived: 2026-04-04 Status: SHIPPED

For current requirements, see .planning/REQUIREMENTS.md.


Requirements: Nexus v1.6 — Voice Pipeline + Minimal Message Bridge

Defined: 2026-04-04 Core Value: A fresh onboard asks for ONE thing (root directory), auto-creates PM + Engineer agents, and drops you in the dashboard.

v1.6 Requirements

Voice Pipeline

  • VPIPE-01: User's voice input is transcribed via local Whisper STT with automatic language detection
  • VPIPE-02: Agent text responses are synthesized to speech via local Piper TTS in under 3 seconds
  • VPIPE-03: Voice pipeline accepts audio from any transport (web chat, Telegram) via a shared VoicePipelineService
  • VPIPE-04: Audio from any source is transcoded to WAV 16kHz mono via ffmpeg before Whisper processing
  • VPIPE-05: Voice mode flag on messages triggers voice-optimized response formatting (no markdown, natural prose)
  • VPIPE-06: Every voice interaction produces dual output: spoken prose response + full text with code blocks
  • VPIPE-07: TTS plays first sentence while subsequent sentences are still synthesizing (sentence-buffered streaming)
  • VPIPE-08: User can synthesize a single text response into multiple language audio outputs (multi-language TTS)

Web Chat Voice

  • WCHAT-01: Mic button in chat input starts/stops voice recording with visual state (idle/recording/processing)
  • WCHAT-02: Recording auto-stops on silence detection via VAD (voice activity detection)
  • WCHAT-03: Real-time waveform/amplitude visualization displays while recording
  • WCHAT-04: Voice response audio plays inline in chat message with audio player controls
  • WCHAT-05: User can toggle voice mode: text only / voice input only / full voice (input + output)
  • WCHAT-06: Auto-play of voice responses is configurable (on/off in settings)

Telegram Bridge

  • TGRAM-01: Single Telegram bot relays text messages bidirectionally between user and agents
  • TGRAM-02: Agent replies in Telegram are prefixed with agent identity (e.g. [PM], [Engineer])
  • TGRAM-03: Telegram voice messages are transcribed (OGG → Whisper) and forwarded to agent as text
  • TGRAM-04: Agent responses can be sent back as Telegram voice notes (TTS → OGG)
  • TGRAM-05: Telegram bridge uses long polling (no public HTTPS required)
  • TGRAM-06: Telegram bridge is under 500 lines of code

Onboarding

  • ONBRD-01: Onboarding hardware probe detects Whisper STT and Piper TTS capability
  • ONBRD-02: Onboarding presents voice enable/skip step based on hardware detection results
  • ONBRD-03: Guided BotFather setup flow for Telegram bot token during onboarding

Future Requirements

Voice Enhancements

  • VFUT-01: Wake word detection ("Hey Nexus") for hands-free activation
  • VFUT-02: Real-time speech-to-speech streaming (full-duplex WebSocket)
  • VFUT-03: Streaming TTS word-by-word playback

Telegram Enhancements

  • TFUT-01: Deep Telegram ↔ web chat session sync via Postgres event bus
  • TFUT-02: Rich Telegram elements (inline keyboards, threaded replies)
  • TFUT-03: Per-agent Telegram bots

Out of Scope

Feature Reason
Real-time speech-to-speech Entirely different architecture (LiveKit/Pipecat); future milestone
Per-agent Telegram bots Maintenance nightmare; single bot + agent prefix is correct
Deep Telegram ↔ web chat sync Requires Postgres event bus; deferred to v2.2 Command Center
Telegram inline keyboards/threads Thin bridge only; rich elements deferred to Command Center
Wake word detection Always-on mic; hardware device concern; future
Streaming TTS word-by-word Audio clicks/gaps; sentence-buffered gives 95% of the benefit
Inline code execution over Telegram Security risk; bridge is relay only
GSD formatting in Telegram Stateful session tracking; plain text + Markdown v1 only
Transcription editing before sending Breaks hands-free flow; show transcript in chat bubble after

Traceability

Requirement Phase Status
VPIPE-01 Phase 36 Complete
VPIPE-02 Phase 36 Complete
VPIPE-03 Phase 36 Complete
VPIPE-04 Phase 36 Complete
VPIPE-05 Phase 36 Complete
VPIPE-06 Phase 36 Complete
VPIPE-07 Phase 39 Complete
VPIPE-08 Phase 39 Complete
WCHAT-01 Phase 37 Complete
WCHAT-02 Phase 37 Complete
WCHAT-03 Phase 37 Complete
WCHAT-04 Phase 37 Complete
WCHAT-05 Phase 37 Complete
WCHAT-06 Phase 37 Complete
TGRAM-01 Phase 38 Complete
TGRAM-02 Phase 38 Complete
TGRAM-03 Phase 38 Complete
TGRAM-04 Phase 38 Complete
TGRAM-05 Phase 38 Complete
TGRAM-06 Phase 38 Complete
ONBRD-01 Phase 39 Complete
ONBRD-02 Phase 39 Complete
ONBRD-03 Phase 38 Complete

Coverage:

  • v1.6 requirements: 23 total
  • Mapped to phases: 23
  • Unmapped: 0 ✓

Requirements defined: 2026-04-04 Last updated: 2026-04-03 — traceability populated after roadmap creation