mikkel/nexus

Fork 0

Nexus Dev 8883cb6ecb

Docker / build-and-push (push) Has been cancelled

Details

chore: complete v1.6 Voice Pipeline + Minimal Message Bridge milestone

2026-04-04 03:52:25 +00:00

12 KiB

Raw Blame History

Roadmap: Nexus

Milestones

✅ v1.2.1 Universal Skill Management - Phase 1 (shipped 2026-04-01)
✅ v1.3 Chat & PWA - Phases 21-26 (shipped 2026-04-02)
✅ v1.4 Hermes Default Provider - Phases 27-29 (shipped 2026-04-02)
✅ v1.5 Smart Onboarding + Personal AI Assistant - Phases 30-35 (shipped 2026-04-03)
🚧 v1.6 Voice Pipeline + Minimal Message Bridge - Phases 36-39 (in progress)

✅ v1.2.1 Universal Skill Management (Phase 1) - SHIPPED 2026-04-01

Phase 1: Foundation

Goal: Establish the display-layer rename infrastructure, git hygiene tooling, and rebase safety primitives that all subsequent phases depend on Plans: 2/2 plans complete

Plans:

01-01-PLAN.md — Branding package, VOCAB constants, commit-msg hook
01-02-PLAN.md — Zone taxonomy, rerere config, rebase safety infrastructure

✅ v1.3 Chat & PWA (Phases 21-26) - SHIPPED 2026-04-02

Phase 21: Chat Foundation

Goal: Users can have real-time chat conversations with agents Plans: 7/7 plans complete

Phase 22: Agent Streaming

Goal: Agent responses stream in real-time with identity, edit, retry, and stop controls Plans: 5/5 plans complete

Phase 23: Brainstormer Flow

Goal: Users can turn a chat conversation into a tracked project with one handoff action Plans: 4/4 plans complete

Phase 24: Search, History & Branching

Goal: Users can find, bookmark, branch, and export any conversation Plans: 4/4 plans complete

Phase 25: File System

Goal: Users can upload, preview, and version files within chat; voice input transcribes speech to text Plans: 9/9 plans complete

Phase 26: PWA & Performance

Goal: Nexus installs as a PWA, works offline, and loads fast on mobile Plans: 5/5 plans complete

✅ v1.4 Hermes Default Provider (Phases 27-29) - SHIPPED 2026-04-02

Phase 27: Hermes Adapter

Goal: Users can create a Hermes agent in Nexus, configure it, and have it execute heartbeats that spawn hermes chat -q, return a result, and persist the session across runs Plans: 1/1 plans complete

Phase 28: Ollama Integration & Agent Surface

Goal: Users can see which Ollama models are available, get a recommendation for their hardware, configure any Hermes agent to use a local model, and see Hermes-specific runtime data in the dashboard and agent config Plans: 3/3 plans complete

Phase 29: Default Provider & End-to-End

Goal: A fresh Nexus install with only Hermes and Ollama works end-to-end — onboarding offers Hermes as the default, PM and Engineer templates run correctly on the Hermes runtime, and GSD workflow tasks complete successfully Plans: 2/2 plans complete

✅ v1.5 Smart Onboarding + Personal AI Assistant (Phases 30-35) - SHIPPED 2026-04-03

Phase 30: Hardware Detection + Mode Selection

Goal: Users see accurate hardware information during onboarding, get a model recommendation matched to their machine, and choose a mode that correctly gates all downstream features Plans: 2/2 plans complete

Phase 31: Puter.js Zero-Config Cloud

Goal: Users without Ollama installed can reach working AI in one click via Puter.js Plans: 4/4 plans complete

Phase 32: Multi-Step Onboarding Wizard

Goal: Users move through a complete, skippable onboarding flow that assembles hardware data, provider selection, and voice options into a summary screen Plans: 1/1 plans complete

Phase 33: Persistent Memory + Personal Assistant Mode

Goal: Users in Personal AI Assistant mode accumulate memory across sessions that shapes future responses Plans: 3/3 plans complete

Phase 34: Voice

Goal: Users can speak to the assistant (Whisper STT) and hear responses read aloud (Piper TTS) Plans: 2/2 plans complete

Phase 35: npx buildthis CLI

Goal: A developer can run npx buildthis on a fresh machine and either open an already-running Nexus or be guided through install Plans: 1/1 plans complete

🚧 v1.6 Voice Pipeline + Minimal Message Bridge (In Progress)

Milestone Goal: Transport-agnostic voice pipeline (Whisper STT + Piper TTS) integrated into web chat, plus a minimal Telegram bridge for phone access. Voice infrastructure designed to survive v2.2 Command Center migration.

Phases

Phase 36: Voice Pipeline Foundation — Transport-agnostic VoicePipelineService (transcribe, synthesize, formatForVoice), voice.ts route, ffmpeg audio transcoding, voiceMode flag, dual output pattern (completed 2026-04-04)
Phase 37: Web Chat Voice UI — VAD silence detection, waveform visualization, voice mode toggle, inline audio player, auto-play toggle, COOP/COEP headers (completed 2026-04-04)
Phase 38: Telegram Bridge — grammY long polling relay, text + voice note bidirectional relay, agent identity prefix, BotFather onboarding setup (completed 2026-04-04)
Phase 39: Voice Polish — Sentence-buffered TTS streaming, multi-language TTS output, onboarding STT/TTS hardware detection step (completed 2026-04-04)

Phase Details

Phase 36: Voice Pipeline Foundation

Goal: The transport-agnostic voice pipeline is live and callable from any consumer — web chat, Telegram, or future integrations — with correct audio transcoding, voice mode flag propagation, and dual output formatting baked in from the start Depends on: Phase 35 (v1.5 shipped) Requirements: VPIPE-01, VPIPE-02, VPIPE-03, VPIPE-04, VPIPE-05, VPIPE-06 Success Criteria (what must be TRUE):

Posting a WAV audio file to POST /api/transcribe returns a transcription with detected language, regardless of whether the request came from the web UI or a test harness
Calling POST /api/synthesize with a markdown-heavy agent response returns two outputs: a voice-optimized prose version (no markdown) and the original full text with code blocks
A WebM/Opus browser recording and an OGG/Opus Telegram voice note both produce identical Whisper transcription quality after ffmpeg transcodes each to WAV 16kHz mono
The voiceMode flag on a chat message survives from client request through Express route to message persistence — verifiable in the DB record
nexus-settings.json accepts voiceMode: "text" | "voice_input" | "full_voice" and telegramToken fields without breaking existing settings reads Plans: 3 plans

Plans:

36-01-PLAN.md — VoicePipelineService: ffmpeg transcoding, Whisper STT, Piper TTS, formatForVoice
36-02-PLAN.md — Schema extensions: voiceMode in shared validators/types + nexus-settings
36-03-PLAN.md — Voice routes, chat.ts voiceMode wiring, app.ts mount, old transcribe removal

Phase 37: Web Chat Voice UI

Goal: Users can speak to any agent in web chat — recording auto-stops on silence, a live waveform confirms the mic is active, responses play back automatically (toggleable), and voice mode is a first-class setting Depends on: Phase 36 Requirements: WCHAT-01, WCHAT-02, WCHAT-03, WCHAT-04, WCHAT-05, WCHAT-06 Success Criteria (what must be TRUE):

Clicking the mic button starts recording; the waveform animates to show audio levels; speaking and then pausing for 1.5 seconds auto-submits the recording without pressing any button
The voice mode toggle has three visible states (text only / voice input / full voice) and persists the selected mode across page refreshes
An agent response delivered in full voice mode plays back automatically in the chat thread; the auto-play can be turned off in settings and stays off after a page reload
The chat message for a voice interaction shows a voice badge and an expandable section revealing the full markdown response with code blocks intact
Voice recording and VAD work correctly in Chrome and Firefox on the Mac Mini (COOP/COEP headers satisfy SharedArrayBuffer requirements) Plans: TBD UI hint: yes

Phase 38: Telegram Bridge

Goal: The user can message any Nexus agent from their phone via Telegram — text and voice notes both work, agent identity is visible on every reply, and the bot is set up through guided onboarding with no manual token entry in config files Depends on: Phase 36 Requirements: TGRAM-01, TGRAM-02, TGRAM-03, TGRAM-04, TGRAM-05, TGRAM-06, ONBRD-03 Success Criteria (what must be TRUE):

Sending a text message to the Nexus Telegram bot from a phone produces an agent reply prefixed with the agent name (e.g. [PM]: response) within 10 seconds
Sending a voice note to the Telegram bot produces a transcription confirmation message followed by the agent's text reply — the bot does not silently fail or miss the update
Requesting a voice reply from the bot returns an OGG voice note that plays back correctly in the Telegram mobile app
The Telegram bridge runs via long polling with no public HTTPS endpoint required — verified by running on the Mac Mini behind NAT
The entire telegram.ts service file is under 500 lines
The onboarding wizard includes a BotFather setup step that walks through creating a bot token and saves it to nexus-settings.json without manual file editing Plans: TBD

Phase 39: Voice Polish

Goal: Voice responses begin playing before synthesis is complete (sentence-buffered), a single response can be synthesized in multiple languages simultaneously, and new installs can detect STT/TTS hardware capability during onboarding and enable voice in one step Depends on: Phase 37 Requirements: VPIPE-07, VPIPE-08, ONBRD-01, ONBRD-02 Success Criteria (what must be TRUE):

For a multi-sentence agent response, the first sentence begins playing in the browser before the second sentence has finished synthesizing — the gap between text completion and first audio is under 1 second
A user can request the same agent response as audio in both English and Danish; both OGG files are generated and available for playback without a second agent call
On a fresh install, the onboarding hardware probe reports whether Whisper STT and Piper TTS are runnable on the detected hardware tier
The onboarding voice step activates (showing enable/skip options) only when the hardware probe confirms sufficient capability; on hardware below threshold it shows a capability note and skips to the next step Plans: 2 plans

Plans:

39-01-PLAN.md — Sentence-buffered TTS streaming + multi-language synthesis
39-02-PLAN.md — Onboarding voice hardware capability probe

Coverage Validation

All 23 v1.6 requirements are mapped to exactly one phase. No orphans.

Requirement	Phase
VPIPE-01	36
VPIPE-02	36
VPIPE-03	36
VPIPE-04	36
VPIPE-05	36
VPIPE-06	36
WCHAT-01	37
WCHAT-02	37
WCHAT-03	37
WCHAT-04	37
WCHAT-05	37
WCHAT-06	37
TGRAM-01	38
TGRAM-02	38
TGRAM-03	38
TGRAM-04	38
TGRAM-05	38
TGRAM-06	38
ONBRD-03	38
VPIPE-07	39
VPIPE-08	39
ONBRD-01	39
ONBRD-02	39

Progress

Phase	Milestone	Plans Complete	Status	Completed
1. Foundation	v1.2.1	2/2	Complete	2026-04-01
21. Chat Foundation	v1.3	7/7	Complete	2026-04-02
22. Agent Streaming	v1.3	5/5	Complete	2026-04-02
23. Brainstormer Flow	v1.3	4/4	Complete	2026-04-02
24. Search, History & Branching	v1.3	4/4	Complete	2026-04-02
25. File System	v1.3	9/9	Complete	2026-04-02
26. PWA & Performance	v1.3	5/5	Complete	2026-04-02
27. Hermes Adapter	v1.4	1/1	Complete	2026-04-02
28. Ollama Integration & Agent Surface	v1.4	3/3	Complete	2026-04-02
29. Default Provider & End-to-End	v1.4	2/2	Complete	2026-04-02
30. Hardware Detection + Mode Selection	v1.5	2/2	Complete	2026-04-03
31. Puter.js Zero-Config Cloud	v1.5	4/4	Complete	2026-04-03
32. Multi-Step Onboarding Wizard	v1.5	1/1	Complete	2026-04-03
33. Persistent Memory + Personal Assistant Mode	v1.5	3/3	Complete	2026-04-03
34. Voice	v1.5	2/2	Complete	2026-04-03
35. npx buildthis CLI	v1.5	1/1	Complete	2026-04-03
36. Voice Pipeline Foundation	v1.6	2/3	Complete	2026-04-04
37. Web Chat Voice UI	v1.6	3/4	Complete	2026-04-04
38. Telegram Bridge	v1.6	3/3	Complete	2026-04-04
39. Voice Polish	v1.6	1/2	Complete	2026-04-04

12 KiB Raw Blame History

Roadmap: Nexus

Milestones

Phase 1: Foundation

Phase 21: Chat Foundation

Phase 22: Agent Streaming

Phase 23: Brainstormer Flow

Phase 24: Search, History & Branching

Phase 25: File System

Phase 26: PWA & Performance

Phase 27: Hermes Adapter

Phase 28: Ollama Integration & Agent Surface

Phase 29: Default Provider & End-to-End

Phase 30: Hardware Detection + Mode Selection

Phase 31: Puter.js Zero-Config Cloud

Phase 32: Multi-Step Onboarding Wizard

Phase 33: Persistent Memory + Personal Assistant Mode

Phase 34: Voice

Phase 35: npx buildthis CLI

🚧 v1.6 Voice Pipeline + Minimal Message Bridge (In Progress)

Phases

Phase Details

Phase 36: Voice Pipeline Foundation

Phase 37: Web Chat Voice UI

Phase 38: Telegram Bridge

Phase 39: Voice Polish

Coverage Validation

Progress

12 KiB

Raw Blame History