231 lines
12 KiB
Markdown
231 lines
12 KiB
Markdown
# Roadmap: Nexus
|
|
|
|
## Milestones
|
|
|
|
- ✅ **v1.2.1 Universal Skill Management** - Phase 1 (shipped 2026-04-01)
|
|
- ✅ **v1.3 Chat & PWA** - Phases 21-26 (shipped 2026-04-02)
|
|
- ✅ **v1.4 Hermes Default Provider** - Phases 27-29 (shipped 2026-04-02)
|
|
- ✅ **v1.5 Smart Onboarding + Personal AI Assistant** - Phases 30-35 (shipped 2026-04-03)
|
|
- 🚧 **v1.6 Voice Pipeline + Minimal Message Bridge** - Phases 36-39 (in progress)
|
|
|
|
---
|
|
|
|
<details>
|
|
<summary>✅ v1.2.1 Universal Skill Management (Phase 1) - SHIPPED 2026-04-01</summary>
|
|
|
|
### Phase 1: Foundation
|
|
**Goal**: Establish the display-layer rename infrastructure, git hygiene tooling, and rebase safety primitives that all subsequent phases depend on
|
|
**Plans**: 2/2 plans complete
|
|
|
|
Plans:
|
|
- [x] 01-01-PLAN.md — Branding package, VOCAB constants, commit-msg hook
|
|
- [x] 01-02-PLAN.md — Zone taxonomy, rerere config, rebase safety infrastructure
|
|
|
|
</details>
|
|
|
|
<details>
|
|
<summary>✅ v1.3 Chat & PWA (Phases 21-26) - SHIPPED 2026-04-02</summary>
|
|
|
|
### Phase 21: Chat Foundation
|
|
**Goal**: Users can have real-time chat conversations with agents
|
|
**Plans**: 7/7 plans complete
|
|
|
|
### Phase 22: Agent Streaming
|
|
**Goal**: Agent responses stream in real-time with identity, edit, retry, and stop controls
|
|
**Plans**: 5/5 plans complete
|
|
|
|
### Phase 23: Brainstormer Flow
|
|
**Goal**: Users can turn a chat conversation into a tracked project with one handoff action
|
|
**Plans**: 4/4 plans complete
|
|
|
|
### Phase 24: Search, History & Branching
|
|
**Goal**: Users can find, bookmark, branch, and export any conversation
|
|
**Plans**: 4/4 plans complete
|
|
|
|
### Phase 25: File System
|
|
**Goal**: Users can upload, preview, and version files within chat; voice input transcribes speech to text
|
|
**Plans**: 9/9 plans complete
|
|
|
|
### Phase 26: PWA & Performance
|
|
**Goal**: Nexus installs as a PWA, works offline, and loads fast on mobile
|
|
**Plans**: 5/5 plans complete
|
|
|
|
</details>
|
|
|
|
<details>
|
|
<summary>✅ v1.4 Hermes Default Provider (Phases 27-29) - SHIPPED 2026-04-02</summary>
|
|
|
|
### Phase 27: Hermes Adapter
|
|
**Goal**: Users can create a Hermes agent in Nexus, configure it, and have it execute heartbeats that spawn `hermes chat -q`, return a result, and persist the session across runs
|
|
**Plans**: 1/1 plans complete
|
|
|
|
### Phase 28: Ollama Integration & Agent Surface
|
|
**Goal**: Users can see which Ollama models are available, get a recommendation for their hardware, configure any Hermes agent to use a local model, and see Hermes-specific runtime data in the dashboard and agent config
|
|
**Plans**: 3/3 plans complete
|
|
|
|
### Phase 29: Default Provider & End-to-End
|
|
**Goal**: A fresh Nexus install with only Hermes and Ollama works end-to-end — onboarding offers Hermes as the default, PM and Engineer templates run correctly on the Hermes runtime, and GSD workflow tasks complete successfully
|
|
**Plans**: 2/2 plans complete
|
|
|
|
</details>
|
|
|
|
<details>
|
|
<summary>✅ v1.5 Smart Onboarding + Personal AI Assistant (Phases 30-35) - SHIPPED 2026-04-03</summary>
|
|
|
|
### Phase 30: Hardware Detection + Mode Selection
|
|
**Goal**: Users see accurate hardware information during onboarding, get a model recommendation matched to their machine, and choose a mode that correctly gates all downstream features
|
|
**Plans**: 2/2 plans complete
|
|
|
|
### Phase 31: Puter.js Zero-Config Cloud
|
|
**Goal**: Users without Ollama installed can reach working AI in one click via Puter.js
|
|
**Plans**: 4/4 plans complete
|
|
|
|
### Phase 32: Multi-Step Onboarding Wizard
|
|
**Goal**: Users move through a complete, skippable onboarding flow that assembles hardware data, provider selection, and voice options into a summary screen
|
|
**Plans**: 1/1 plans complete
|
|
|
|
### Phase 33: Persistent Memory + Personal Assistant Mode
|
|
**Goal**: Users in Personal AI Assistant mode accumulate memory across sessions that shapes future responses
|
|
**Plans**: 3/3 plans complete
|
|
|
|
### Phase 34: Voice
|
|
**Goal**: Users can speak to the assistant (Whisper STT) and hear responses read aloud (Piper TTS)
|
|
**Plans**: 2/2 plans complete
|
|
|
|
### Phase 35: npx buildthis CLI
|
|
**Goal**: A developer can run `npx buildthis` on a fresh machine and either open an already-running Nexus or be guided through install
|
|
**Plans**: 1/1 plans complete
|
|
|
|
</details>
|
|
|
|
---
|
|
|
|
### 🚧 v1.6 Voice Pipeline + Minimal Message Bridge (In Progress)
|
|
|
|
**Milestone Goal:** Transport-agnostic voice pipeline (Whisper STT + Piper TTS) integrated into web chat, plus a minimal Telegram bridge for phone access. Voice infrastructure designed to survive v2.2 Command Center migration.
|
|
|
|
## Phases
|
|
|
|
- [x] **Phase 36: Voice Pipeline Foundation** — Transport-agnostic VoicePipelineService (transcribe, synthesize, formatForVoice), voice.ts route, ffmpeg audio transcoding, voiceMode flag, dual output pattern (completed 2026-04-04)
|
|
- [x] **Phase 37: Web Chat Voice UI** — VAD silence detection, waveform visualization, voice mode toggle, inline audio player, auto-play toggle, COOP/COEP headers (completed 2026-04-04)
|
|
- [x] **Phase 38: Telegram Bridge** — grammY long polling relay, text + voice note bidirectional relay, agent identity prefix, BotFather onboarding setup (completed 2026-04-04)
|
|
- [x] **Phase 39: Voice Polish** — Sentence-buffered TTS streaming, multi-language TTS output, onboarding STT/TTS hardware detection step (completed 2026-04-04)
|
|
|
|
## Phase Details
|
|
|
|
### Phase 36: Voice Pipeline Foundation
|
|
**Goal**: The transport-agnostic voice pipeline is live and callable from any consumer — web chat, Telegram, or future integrations — with correct audio transcoding, voice mode flag propagation, and dual output formatting baked in from the start
|
|
**Depends on**: Phase 35 (v1.5 shipped)
|
|
**Requirements**: VPIPE-01, VPIPE-02, VPIPE-03, VPIPE-04, VPIPE-05, VPIPE-06
|
|
**Success Criteria** (what must be TRUE):
|
|
1. Posting a WAV audio file to `POST /api/transcribe` returns a transcription with detected language, regardless of whether the request came from the web UI or a test harness
|
|
2. Calling `POST /api/synthesize` with a markdown-heavy agent response returns two outputs: a voice-optimized prose version (no markdown) and the original full text with code blocks
|
|
3. A WebM/Opus browser recording and an OGG/Opus Telegram voice note both produce identical Whisper transcription quality after ffmpeg transcodes each to WAV 16kHz mono
|
|
4. The `voiceMode` flag on a chat message survives from client request through Express route to message persistence — verifiable in the DB record
|
|
5. `nexus-settings.json` accepts `voiceMode: "text" | "voice_input" | "full_voice"` and `telegramToken` fields without breaking existing settings reads
|
|
**Plans**: 3 plans
|
|
|
|
Plans:
|
|
- [x] 36-01-PLAN.md — VoicePipelineService: ffmpeg transcoding, Whisper STT, Piper TTS, formatForVoice
|
|
- [x] 36-02-PLAN.md — Schema extensions: voiceMode in shared validators/types + nexus-settings
|
|
- [ ] 36-03-PLAN.md — Voice routes, chat.ts voiceMode wiring, app.ts mount, old transcribe removal
|
|
|
|
### Phase 37: Web Chat Voice UI
|
|
**Goal**: Users can speak to any agent in web chat — recording auto-stops on silence, a live waveform confirms the mic is active, responses play back automatically (toggleable), and voice mode is a first-class setting
|
|
**Depends on**: Phase 36
|
|
**Requirements**: WCHAT-01, WCHAT-02, WCHAT-03, WCHAT-04, WCHAT-05, WCHAT-06
|
|
**Success Criteria** (what must be TRUE):
|
|
1. Clicking the mic button starts recording; the waveform animates to show audio levels; speaking and then pausing for 1.5 seconds auto-submits the recording without pressing any button
|
|
2. The voice mode toggle has three visible states (text only / voice input / full voice) and persists the selected mode across page refreshes
|
|
3. An agent response delivered in full voice mode plays back automatically in the chat thread; the auto-play can be turned off in settings and stays off after a page reload
|
|
4. The chat message for a voice interaction shows a voice badge and an expandable section revealing the full markdown response with code blocks intact
|
|
5. Voice recording and VAD work correctly in Chrome and Firefox on the Mac Mini (COOP/COEP headers satisfy SharedArrayBuffer requirements)
|
|
**Plans**: TBD
|
|
**UI hint**: yes
|
|
|
|
### Phase 38: Telegram Bridge
|
|
**Goal**: The user can message any Nexus agent from their phone via Telegram — text and voice notes both work, agent identity is visible on every reply, and the bot is set up through guided onboarding with no manual token entry in config files
|
|
**Depends on**: Phase 36
|
|
**Requirements**: TGRAM-01, TGRAM-02, TGRAM-03, TGRAM-04, TGRAM-05, TGRAM-06, ONBRD-03
|
|
**Success Criteria** (what must be TRUE):
|
|
1. Sending a text message to the Nexus Telegram bot from a phone produces an agent reply prefixed with the agent name (e.g. `[PM]: response`) within 10 seconds
|
|
2. Sending a voice note to the Telegram bot produces a transcription confirmation message followed by the agent's text reply — the bot does not silently fail or miss the update
|
|
3. Requesting a voice reply from the bot returns an OGG voice note that plays back correctly in the Telegram mobile app
|
|
4. The Telegram bridge runs via long polling with no public HTTPS endpoint required — verified by running on the Mac Mini behind NAT
|
|
5. The entire `telegram.ts` service file is under 500 lines
|
|
6. The onboarding wizard includes a BotFather setup step that walks through creating a bot token and saves it to `nexus-settings.json` without manual file editing
|
|
**Plans**: TBD
|
|
|
|
### Phase 39: Voice Polish
|
|
**Goal**: Voice responses begin playing before synthesis is complete (sentence-buffered), a single response can be synthesized in multiple languages simultaneously, and new installs can detect STT/TTS hardware capability during onboarding and enable voice in one step
|
|
**Depends on**: Phase 37
|
|
**Requirements**: VPIPE-07, VPIPE-08, ONBRD-01, ONBRD-02
|
|
**Success Criteria** (what must be TRUE):
|
|
1. For a multi-sentence agent response, the first sentence begins playing in the browser before the second sentence has finished synthesizing — the gap between text completion and first audio is under 1 second
|
|
2. A user can request the same agent response as audio in both English and Danish; both OGG files are generated and available for playback without a second agent call
|
|
3. On a fresh install, the onboarding hardware probe reports whether Whisper STT and Piper TTS are runnable on the detected hardware tier
|
|
4. The onboarding voice step activates (showing enable/skip options) only when the hardware probe confirms sufficient capability; on hardware below threshold it shows a capability note and skips to the next step
|
|
**Plans**: 2 plans
|
|
|
|
Plans:
|
|
- [x] 39-01-PLAN.md — Sentence-buffered TTS streaming + multi-language synthesis
|
|
- [ ] 39-02-PLAN.md — Onboarding voice hardware capability probe
|
|
|
|
---
|
|
|
|
## Coverage Validation
|
|
|
|
All 23 v1.6 requirements are mapped to exactly one phase. No orphans.
|
|
|
|
| Requirement | Phase |
|
|
|-------------|-------|
|
|
| VPIPE-01 | 36 |
|
|
| VPIPE-02 | 36 |
|
|
| VPIPE-03 | 36 |
|
|
| VPIPE-04 | 36 |
|
|
| VPIPE-05 | 36 |
|
|
| VPIPE-06 | 36 |
|
|
| WCHAT-01 | 37 |
|
|
| WCHAT-02 | 37 |
|
|
| WCHAT-03 | 37 |
|
|
| WCHAT-04 | 37 |
|
|
| WCHAT-05 | 37 |
|
|
| WCHAT-06 | 37 |
|
|
| TGRAM-01 | 38 |
|
|
| TGRAM-02 | 38 |
|
|
| TGRAM-03 | 38 |
|
|
| TGRAM-04 | 38 |
|
|
| TGRAM-05 | 38 |
|
|
| TGRAM-06 | 38 |
|
|
| ONBRD-03 | 38 |
|
|
| VPIPE-07 | 39 |
|
|
| VPIPE-08 | 39 |
|
|
| ONBRD-01 | 39 |
|
|
| ONBRD-02 | 39 |
|
|
|
|
---
|
|
|
|
## Progress
|
|
|
|
| Phase | Milestone | Plans Complete | Status | Completed |
|
|
|-------|-----------|----------------|--------|-----------|
|
|
| 1. Foundation | v1.2.1 | 2/2 | Complete | 2026-04-01 |
|
|
| 21. Chat Foundation | v1.3 | 7/7 | Complete | 2026-04-02 |
|
|
| 22. Agent Streaming | v1.3 | 5/5 | Complete | 2026-04-02 |
|
|
| 23. Brainstormer Flow | v1.3 | 4/4 | Complete | 2026-04-02 |
|
|
| 24. Search, History & Branching | v1.3 | 4/4 | Complete | 2026-04-02 |
|
|
| 25. File System | v1.3 | 9/9 | Complete | 2026-04-02 |
|
|
| 26. PWA & Performance | v1.3 | 5/5 | Complete | 2026-04-02 |
|
|
| 27. Hermes Adapter | v1.4 | 1/1 | Complete | 2026-04-02 |
|
|
| 28. Ollama Integration & Agent Surface | v1.4 | 3/3 | Complete | 2026-04-02 |
|
|
| 29. Default Provider & End-to-End | v1.4 | 2/2 | Complete | 2026-04-02 |
|
|
| 30. Hardware Detection + Mode Selection | v1.5 | 2/2 | Complete | 2026-04-03 |
|
|
| 31. Puter.js Zero-Config Cloud | v1.5 | 4/4 | Complete | 2026-04-03 |
|
|
| 32. Multi-Step Onboarding Wizard | v1.5 | 1/1 | Complete | 2026-04-03 |
|
|
| 33. Persistent Memory + Personal Assistant Mode | v1.5 | 3/3 | Complete | 2026-04-03 |
|
|
| 34. Voice | v1.5 | 2/2 | Complete | 2026-04-03 |
|
|
| 35. npx buildthis CLI | v1.5 | 1/1 | Complete | 2026-04-03 |
|
|
| 36. Voice Pipeline Foundation | v1.6 | 2/3 | Complete | 2026-04-04 |
|
|
| 37. Web Chat Voice UI | v1.6 | 3/4 | Complete | 2026-04-04 |
|
|
| 38. Telegram Bridge | v1.6 | 3/3 | Complete | 2026-04-04 |
|
|
| 39. Voice Polish | v1.6 | 1/2 | Complete | 2026-04-04 |
|