From 1cdcfac10b157e73d4faf064da54e158324a6392 Mon Sep 17 00:00:00 2001
From: Nexus Dev <nexus@local>
Date: Sat, 4 Apr 2026 00:25:20 +0000
Subject: [PATCH] docs: define milestone v1.6 requirements

---
 .planning/REQUIREMENTS.md | 155 ++++++++++++++++++++------------------
 1 file changed, 80 insertions(+), 75 deletions(-)

diff --git a/.planning/REQUIREMENTS.md b/.planning/REQUIREMENTS.md
index 4c71671f..e6d2692f 100644
--- a/.planning/REQUIREMENTS.md
+++ b/.planning/REQUIREMENTS.md
@@ -1,101 +1,106 @@
-# Requirements: Nexus v1.5 — Smart Onboarding + Personal AI Assistant
+# Requirements: Nexus v1.6 — Voice Pipeline + Minimal Message Bridge
 
-**Defined:** 2026-04-02
+**Defined:** 2026-04-04
 **Core Value:** A fresh onboard asks for ONE thing (root directory), auto-creates PM + Engineer agents, and drops you in the dashboard.
 
-## v1.5 Requirements
+## v1.6 Requirements
+
+### Voice Pipeline
+
+- [ ] **VPIPE-01**: User's voice input is transcribed via local Whisper STT with automatic language detection
+- [ ] **VPIPE-02**: Agent text responses are synthesized to speech via local Piper TTS in under 3 seconds
+- [ ] **VPIPE-03**: Voice pipeline accepts audio from any transport (web chat, Telegram) via a shared VoicePipelineService
+- [ ] **VPIPE-04**: Audio from any source is transcoded to WAV 16kHz mono via ffmpeg before Whisper processing
+- [ ] **VPIPE-05**: Voice mode flag on messages triggers voice-optimized response formatting (no markdown, natural prose)
+- [ ] **VPIPE-06**: Every voice interaction produces dual output: spoken prose response + full text with code blocks
+- [ ] **VPIPE-07**: TTS plays first sentence while subsequent sentences are still synthesizing (sentence-buffered streaming)
+- [ ] **VPIPE-08**: User can synthesize a single text response into multiple language audio outputs (multi-language TTS)
+
+### Web Chat Voice
+
+- [ ] **WCHAT-01**: Mic button in chat input starts/stops voice recording with visual state (idle/recording/processing)
+- [ ] **WCHAT-02**: Recording auto-stops on silence detection via VAD (voice activity detection)
+- [ ] **WCHAT-03**: Real-time waveform/amplitude visualization displays while recording
+- [ ] **WCHAT-04**: Voice response audio plays inline in chat message with audio player controls
+- [ ] **WCHAT-05**: User can toggle voice mode: text only / voice input only / full voice (input + output)
+- [ ] **WCHAT-06**: Auto-play of voice responses is configurable (on/off in settings)
+
+### Telegram Bridge
+
+- [ ] **TGRAM-01**: Single Telegram bot relays text messages bidirectionally between user and agents
+- [ ] **TGRAM-02**: Agent replies in Telegram are prefixed with agent identity (e.g. `[PM]`, `[Engineer]`)
+- [ ] **TGRAM-03**: Telegram voice messages are transcribed (OGG → Whisper) and forwarded to agent as text
+- [ ] **TGRAM-04**: Agent responses can be sent back as Telegram voice notes (TTS → OGG)
+- [ ] **TGRAM-05**: Telegram bridge uses long polling (no public HTTPS required)
+- [ ] **TGRAM-06**: Telegram bridge is under 500 lines of code
 
 ### Onboarding
 
-- [x] **ONBD-01**: User can select mode (Personal AI Assistant / Project Builder / Both) during onboarding
-- [x] **ONBD-02**: System auto-detects GPU, RAM, and Apple Silicon unified memory within 5 seconds
-- [x] **ONBD-03**: System recommends best local model from pre-built JSON database based on detected hardware
-- [x] **ONBD-04**: User can skip any onboarding step without blocking subsequent steps
-- [x] **ONBD-05**: User sees summary screen showing configured providers and agent-model pairings
-- [x] **ONBD-06**: User can go from summary screen directly into chat with one click
-- [x] **ONBD-07**: Local AI framed as privacy premium ("runs entirely on your machine, no accounts, works offline")
-
-### Cloud Providers
-
-- [x] **CLOUD-01**: User gets working AI via Puter.js with zero API keys and no sign-up required
-- [x] **CLOUD-02**: Puter.js integrated as server-proxied adapter (not browser-direct) with full cost tracking
-- [x] **CLOUD-03**: User can sign in via Google OAuth to access Gemini free tier
-- [x] **CLOUD-04**: System auto-detects installed tools (Hermes, Claude Code, OpenClaw) and pre-fills configuration
-- [x] **CLOUD-05**: User can enter API keys for subscription providers during onboarding
-
-### Voice
-
-- [x] **VOICE-01**: User gets Piper TTS speech output that works on CPU-only hardware
-- [x] **VOICE-02**: Piper TTS pre-warms on first use with visible download progress (no silent 15-30s hang)
-- [x] **VOICE-03**: Voice features (Whisper STT + Piper TTS) offered during onboarding based on hardware capability
-
-### Personal AI Assistant
-
-- [x] **ASST-01**: User has persistent memory across chat sessions (summary-based, injected into system prompts)
-- [x] **ASST-02**: Memory content sanitized at write time to prevent prompt injection
-- [x] **ASST-03**: User can hand off an assistant conversation to a PM agent with one click, transferring context
-- [x] **ASST-04**: Assistant and Project Builder modes work standalone or together
-
-### CLI
-
-- [x] **CLI-01**: User can run `npx buildthis` to bootstrap Nexus from scratch
-- [x] **CLI-02**: CLI bootstrapper detects hardware and walks through the same provider tiering as web onboarding
+- [ ] **ONBRD-01**: Onboarding hardware probe detects Whisper STT and Piper TTS capability
+- [ ] **ONBRD-02**: Onboarding presents voice enable/skip step based on hardware detection results
+- [ ] **ONBRD-03**: Guided BotFather setup flow for Telegram bot token during onboarding
 
 ## Future Requirements
 
-### Cloud
+### Voice Enhancements
 
-- **CLOUD-F01**: OpenAI OAuth free tier (unstable API, defer to v2+)
+- **VFUT-01**: Wake word detection ("Hey Nexus") for hands-free activation
+- **VFUT-02**: Real-time speech-to-speech streaming (full-duplex WebSocket)
+- **VFUT-03**: Streaming TTS word-by-word playback
 
-### Assistant
+### Telegram Enhancements
 
-- **ASST-F01**: MCP connections for assistant mode
-
-### Voice
-
-- **VOICE-F01**: Server-side TTS fallback for headless mode
+- **TFUT-01**: Deep Telegram ↔ web chat session sync via Postgres event bus
+- **TFUT-02**: Rich Telegram elements (inline keyboards, threaded replies)
+- **TFUT-03**: Per-agent Telegram bots
 
 ## Out of Scope
 
 | Feature | Reason |
 |---------|--------|
-| OpenAI OAuth | Endpoint specifics unstable, low confidence on free tier details |
-| MCP tool connections | Complexity too high for v1.5; assistant works without it |
-| Server-side Piper TTS | Browser WASM sufficient; headless is edge case |
-| DB schema changes | Upstream sync constraint — all state in existing JSONB/files |
-| Vector database for memory | Summary-based approach sufficient; no infra overhead |
+| Real-time speech-to-speech | Entirely different architecture (LiveKit/Pipecat); future milestone |
+| Per-agent Telegram bots | Maintenance nightmare; single bot + agent prefix is correct |
+| Deep Telegram ↔ web chat sync | Requires Postgres event bus; deferred to v2.2 Command Center |
+| Telegram inline keyboards/threads | Thin bridge only; rich elements deferred to Command Center |
+| Wake word detection | Always-on mic; hardware device concern; future |
+| Streaming TTS word-by-word | Audio clicks/gaps; sentence-buffered gives 95% of the benefit |
+| Inline code execution over Telegram | Security risk; bridge is relay only |
+| GSD formatting in Telegram | Stateful session tracking; plain text + Markdown v1 only |
+| Transcription editing before sending | Breaks hands-free flow; show transcript in chat bubble after |
 
 ## Traceability
 
 | Requirement | Phase | Status |
 |-------------|-------|--------|
-| ONBD-01 | Phase 30 | Complete |
-| ONBD-02 | Phase 30 | Complete |
-| ONBD-03 | Phase 30 | Complete |
-| ONBD-07 | Phase 30 | Complete |
-| CLOUD-01 | Phase 31 | Complete |
-| CLOUD-02 | Phase 31 | Complete |
-| CLOUD-03 | Phase 31 | Complete |
-| CLOUD-04 | Phase 31 | Complete |
-| CLOUD-05 | Phase 31 | Complete |
-| ONBD-04 | Phase 32 | Complete |
-| ONBD-05 | Phase 32 | Complete |
-| ONBD-06 | Phase 32 | Complete |
-| ASST-01 | Phase 33 | Complete |
-| ASST-02 | Phase 33 | Complete |
-| ASST-03 | Phase 33 | Complete |
-| ASST-04 | Phase 33 | Complete |
-| VOICE-01 | Phase 34 | Complete |
-| VOICE-02 | Phase 34 | Complete |
-| VOICE-03 | Phase 34 | Complete |
-| CLI-01 | Phase 35 | Complete |
-| CLI-02 | Phase 35 | Complete |
+| VPIPE-01 | — | Pending |
+| VPIPE-02 | — | Pending |
+| VPIPE-03 | — | Pending |
+| VPIPE-04 | — | Pending |
+| VPIPE-05 | — | Pending |
+| VPIPE-06 | — | Pending |
+| VPIPE-07 | — | Pending |
+| VPIPE-08 | — | Pending |
+| WCHAT-01 | — | Pending |
+| WCHAT-02 | — | Pending |
+| WCHAT-03 | — | Pending |
+| WCHAT-04 | — | Pending |
+| WCHAT-05 | — | Pending |
+| WCHAT-06 | — | Pending |
+| TGRAM-01 | — | Pending |
+| TGRAM-02 | — | Pending |
+| TGRAM-03 | — | Pending |
+| TGRAM-04 | — | Pending |
+| TGRAM-05 | — | Pending |
+| TGRAM-06 | — | Pending |
+| ONBRD-01 | — | Pending |
+| ONBRD-02 | — | Pending |
+| ONBRD-03 | — | Pending |
 
 **Coverage:**
-- v1.5 requirements: 21 total
-- Mapped to phases: 21
-- Unmapped: 0 ✓
+- v1.6 requirements: 23 total
+- Mapped to phases: 0
+- Unmapped: 23 ⚠️
 
 ---
-*Requirements defined: 2026-04-02*
-*Last updated: 2026-04-02 after roadmap created (phases 30-35)*
+*Requirements defined: 2026-04-04*
+*Last updated: 2026-04-04 after initial definition*