24 KiB
Roadmap: Nexus
Milestones
- ✅ v1.2.1 Universal Skill Management - Phase 1 (shipped 2026-04-01)
- ✅ v1.3 Chat & PWA - Phases 21-26 (shipped 2026-04-02)
- ✅ v1.4 Hermes Default Provider - Phases 27-29 (shipped 2026-04-02)
- ✅ v1.5 Smart Onboarding + Personal AI Assistant - Phases 30-35 (shipped 2026-04-03)
- ✅ v1.6 Voice Pipeline + Minimal Message Bridge - Phases 36-39 (shipped 2026-04-04)
- 🚧 v1.7 Content Generation - Phases 40-45 (in progress)
✅ v1.2.1 Universal Skill Management (Phase 1) - SHIPPED 2026-04-01
Phase 1: Foundation
Goal: Establish the display-layer rename infrastructure, git hygiene tooling, and rebase safety primitives that all subsequent phases depend on Plans: 2/2 plans complete
Plans:
- 01-01-PLAN.md — Branding package, VOCAB constants, commit-msg hook
- 01-02-PLAN.md — Zone taxonomy, rerere config, rebase safety infrastructure
✅ v1.3 Chat & PWA (Phases 21-26) - SHIPPED 2026-04-02
Phase 21: Chat Foundation
Goal: Users can have real-time chat conversations with agents Plans: 7/7 plans complete
Phase 22: Agent Streaming
Goal: Agent responses stream in real-time with identity, edit, retry, and stop controls Plans: 5/5 plans complete
Phase 23: Brainstormer Flow
Goal: Users can turn a chat conversation into a tracked project with one handoff action Plans: 4/4 plans complete
Phase 24: Search, History & Branching
Goal: Users can find, bookmark, branch, and export any conversation Plans: 4/4 plans complete
Phase 25: File System
Goal: Users can upload, preview, and version files within chat; voice input transcribes speech to text Plans: 9/9 plans complete
Phase 26: PWA & Performance
Goal: Nexus installs as a PWA, works offline, and loads fast on mobile Plans: 5/5 plans complete
✅ v1.4 Hermes Default Provider (Phases 27-29) - SHIPPED 2026-04-02
Phase 27: Hermes Adapter
Goal: Users can create a Hermes agent in Nexus, configure it, and have it execute heartbeats that spawn hermes chat -q, return a result, and persist the session across runs
Plans: 1/1 plans complete
Phase 28: Ollama Integration & Agent Surface
Goal: Users can see which Ollama models are available, get a recommendation for their hardware, configure any Hermes agent to use a local model, and see Hermes-specific runtime data in the dashboard and agent config Plans: 3/3 plans complete
Phase 29: Default Provider & End-to-End
Goal: A fresh Nexus install with only Hermes and Ollama works end-to-end — onboarding offers Hermes as the default, PM and Engineer templates run correctly on the Hermes runtime, and GSD workflow tasks complete successfully Plans: 2/2 plans complete
✅ v1.5 Smart Onboarding + Personal AI Assistant (Phases 30-35) - SHIPPED 2026-04-03
Phase 30: Hardware Detection + Mode Selection
Goal: Users see accurate hardware information during onboarding, get a model recommendation matched to their machine, and choose a mode that correctly gates all downstream features Plans: 2/2 plans complete
Phase 31: Puter.js Zero-Config Cloud
Goal: Users without Ollama installed can reach working AI in one click via Puter.js Plans: 4/4 plans complete
Phase 32: Multi-Step Onboarding Wizard
Goal: Users move through a complete, skippable onboarding flow that assembles hardware data, provider selection, and voice options into a summary screen Plans: 1/1 plans complete
Phase 33: Persistent Memory + Personal Assistant Mode
Goal: Users in Personal AI Assistant mode accumulate memory across sessions that shapes future responses Plans: 3/3 plans complete
Phase 34: Voice
Goal: Users can speak to the assistant (Whisper STT) and hear responses read aloud (Piper TTS) Plans: 2/2 plans complete
Phase 35: npx buildthis CLI
Goal: A developer can run npx buildthis on a fresh machine and either open an already-running Nexus or be guided through install
Plans: 1/1 plans complete
✅ v1.6 Voice Pipeline + Minimal Message Bridge (Phases 36-39) - SHIPPED 2026-04-04
Phase 36: Voice Pipeline Foundation
Goal: The transport-agnostic voice pipeline is live and callable from any consumer — web chat, Telegram, or future integrations — with correct audio transcoding, voice mode flag propagation, and dual output formatting baked in from the start Depends on: Phase 35 (v1.5 shipped) Requirements: VPIPE-01, VPIPE-02, VPIPE-03, VPIPE-04, VPIPE-05, VPIPE-06 Success Criteria (what must be TRUE):
- Posting a WAV audio file to
POST /api/transcribereturns a transcription with detected language, regardless of whether the request came from the web UI or a test harness - Calling
POST /api/synthesizewith a markdown-heavy agent response returns two outputs: a voice-optimized prose version (no markdown) and the original full text with code blocks - A WebM/Opus browser recording and an OGG/Opus Telegram voice note both produce identical Whisper transcription quality after ffmpeg transcodes each to WAV 16kHz mono
- The
voiceModeflag on a chat message survives from client request through Express route to message persistence — verifiable in the DB record nexus-settings.jsonacceptsvoiceMode: "text" | "voice_input" | "full_voice"andtelegramTokenfields without breaking existing settings reads Plans: 3 plans
Plans:
- 36-01-PLAN.md — VoicePipelineService: ffmpeg transcoding, Whisper STT, Piper TTS, formatForVoice
- 36-02-PLAN.md — Schema extensions: voiceMode in shared validators/types + nexus-settings
- 36-03-PLAN.md — Voice routes, chat.ts voiceMode wiring, app.ts mount, old transcribe removal
Phase 37: Web Chat Voice UI
Goal: Users can speak to any agent in web chat — recording auto-stops on silence, a live waveform confirms the mic is active, responses play back automatically (toggleable), and voice mode is a first-class setting Depends on: Phase 36 Requirements: WCHAT-01, WCHAT-02, WCHAT-03, WCHAT-04, WCHAT-05, WCHAT-06 Success Criteria (what must be TRUE):
- Clicking the mic button starts recording; the waveform animates to show audio levels; speaking and then pausing for 1.5 seconds auto-submits the recording without pressing any button
- The voice mode toggle has three visible states (text only / voice input / full voice) and persists the selected mode across page refreshes
- An agent response delivered in full voice mode plays back automatically in the chat thread; the auto-play can be turned off in settings and stays off after a page reload
- The chat message for a voice interaction shows a voice badge and an expandable section revealing the full markdown response with code blocks intact
- Voice recording and VAD work correctly in Chrome and Firefox on the Mac Mini (COOP/COEP headers satisfy SharedArrayBuffer requirements) Plans: 3 plans
Plans:
- 44-01-PLAN.md — Remotion workspace package, compositions, shared constants, types, job-runner wiring
- 44-02-PLAN.md — Presentation renderer with LLM slide generation, Remotion render, SSE progress
- 44-03-PLAN.md — PresentationPanel UI, useContentJob progress extension, ContentStudio tab UI hint: yes UI hint: yes
Phase 38: Telegram Bridge
Goal: The user can message any Nexus agent from their phone via Telegram — text and voice notes both work, agent identity is visible on every reply, and the bot is set up through guided onboarding with no manual token entry in config files Depends on: Phase 36 Requirements: TGRAM-01, TGRAM-02, TGRAM-03, TGRAM-04, TGRAM-05, TGRAM-06, ONBRD-03 Success Criteria (what must be TRUE):
- Sending a text message to the Nexus Telegram bot from a phone produces an agent reply prefixed with the agent name (e.g.
[PM]: response) within 10 seconds - Sending a voice note to the Telegram bot produces a transcription confirmation message followed by the agent's text reply — the bot does not silently fail or miss the update
- Requesting a voice reply from the bot returns an OGG voice note that plays back correctly in the Telegram mobile app
- The Telegram bridge runs via long polling with no public HTTPS endpoint required — verified by running on the Mac Mini behind NAT
- The entire
telegram.tsservice file is under 500 lines - The onboarding wizard includes a BotFather setup step that walks through creating a bot token and saves it to
nexus-settings.jsonwithout manual file editing Plans: 3 plans
Plans:
- 44-01-PLAN.md — Remotion workspace package, compositions, shared constants, types, job-runner wiring
- 44-02-PLAN.md — Presentation renderer with LLM slide generation, Remotion render, SSE progress
- 44-03-PLAN.md — PresentationPanel UI, useContentJob progress extension, ContentStudio tab UI hint: yes
Phase 39: Voice Polish
Goal: Voice responses begin playing before synthesis is complete (sentence-buffered), a single response can be synthesized in multiple languages simultaneously, and new installs can detect STT/TTS hardware capability during onboarding and enable voice in one step Depends on: Phase 37 Requirements: VPIPE-07, VPIPE-08, ONBRD-01, ONBRD-02 Success Criteria (what must be TRUE):
- For a multi-sentence agent response, the first sentence begins playing in the browser before the second sentence has finished synthesizing — the gap between text completion and first audio is under 1 second
- A user can request the same agent response as audio in both English and Danish; both OGG files are generated and available for playback without a second agent call
- On a fresh install, the onboarding hardware probe reports whether Whisper STT and Piper TTS are runnable on the detected hardware tier
- The onboarding voice step activates (showing enable/skip options) only when the hardware probe confirms sufficient capability; on hardware below threshold it shows a capability note and skips to the next step Plans: 2 plans
Plans:
- 39-01-PLAN.md — Sentence-buffered TTS streaming + multi-language synthesis
- 39-02-PLAN.md — Onboarding voice hardware capability probe
🚧 v1.7 Content Generation (In Progress)
Milestone Goal: Agents produce real deliverables — diagrams, themes, PDFs, wallpapers, social assets, icons, and video — entirely on-device. Every content type is an installable skill. Long-running renders are async with SSE progress from the first request.
Phases
- Phase 40: Job Infrastructure — content_jobs table, async render lifecycle, SSE progress events, namespaced storage without size limit (INFRA-01..04) (completed 2026-04-04)
- Phase 41: Diagrams, Icons & Theme Engine — Mermaid diagrams, SVG icon generation, OKLCH theme palette with WCAG AA and live preview (DIAG-01..05, ICON-01..03, THEME-01..07) (completed 2026-04-04)
- Phase 42: Wallpapers, Social, Format Conversion & Voice — LLM SVG + sharp wallpapers, social content, format conversion registry with AI fallback, Whisper web chat mic (WALL-01..04, SOCIAL-01..03, CONV-01..09, VOICE-01..03) (completed 2026-04-04)
- Phase 43: Documents & Branding — Playwright PDF reports and invoices, full brand identity kit with zip export (DOC-01..03, BRAND-01..06) (completed 2026-04-04)
- Phase 44: Video & Presentations — Remotion workspace package, pitch decks and demo videos, SSE render progress (PRES-01..04) (completed 2026-04-04)
- Phase 45: Content as Skills — Markdown skill files for all content types, Creative skill group on generalist agent (SKILL-01..03) (completed 2026-04-04)
Phase Details
Phase 40: Job Infrastructure
Goal: Every content generation request returns a job ID immediately, progresses through a tracked lifecycle, and stores its output in namespaced storage — so nothing blocks and nothing is orphaned Depends on: Phase 39 (v1.6 shipped) Requirements: INFRA-01, INFRA-02, INFRA-03, INFRA-04 Success Criteria (what must be TRUE):
- Submitting a content generation request returns HTTP 202 with a job ID within 200ms, regardless of how long the render takes
- A connected browser receives SSE events as a job progresses through queued → generating → ready (or error), with no polling required
- A generated video file larger than 10MB can be stored and retrieved without a size-limit error — the generated/ storage namespace bypasses the upload route limit
- Every generated asset in the database has a sourceTaskId linking it to the originating conversation task, visible via the asset list API Plans: 2 plans
Plans:
- 40-01-PLAN.md — Schema, constants, migrations, contentJobStore + contentJobRunner services
- 40-02-PLAN.md — HTTP routes (POST 202, GET, SSE), app.ts wiring, integration tests
Phase 41: Diagrams, Icons & Theme Engine
Goal: Users can generate diagrams from natural language, produce SVG icon sets from descriptions, and create a complete OKLCH color theme from a single seed color — all without binary dependencies beyond what is already installed Depends on: Phase 40 Requirements: DIAG-01, DIAG-02, DIAG-03, DIAG-04, DIAG-05, ICON-01, ICON-02, ICON-03, THEME-01, THEME-02, THEME-03, THEME-04, THEME-05, THEME-06, THEME-07 Success Criteria (what must be TRUE):
- Describing an architecture in chat produces a rendered Mermaid diagram (SVG and PNG) attached to the conversation, with the editable Mermaid source visible in a collapsible panel
- Mermaid rendering uses strict security level — a diagram with a
clickdirective or%%{init}%%override is stripped before render, and SVG output passes DOMPurify before reaching the DOM - Requesting an icon set from a description returns a cohesive set of SVG icons downloadable in SVG and PNG formats at multiple sizes
- Picking a seed color produces a full palette (background, surface, overlay, text, accents) in OKLCH with separate dark and light variants, all passing WCAG AA contrast checks
- The generated theme can be previewed live in the Nexus UI via CSS custom property injection and applied permanently in one click; export works for CSS variables, Tailwind config, VS Code theme, and JSON Plans: 6 plans
Plans:
- 41-01-PLAN.md — Dependencies, shared types, content-job-runner switch, useContentJob hook
- 41-02-PLAN.md — Diagram renderer (Playwright Mermaid + DOMPurify) and icon renderer (LLM SVG + SVGO)
- 41-03-PLAN.md — OKLCH theme palette engine, WCAG validation, export formatters, nexus-settings extension
- 41-04-PLAN.md — ContentStudio page, Diagram UI (generate, preview, source editor), Icon UI (grid, download)
- 41-05-PLAN.md — Theme UI (seed input, palette grid, live preview, export tabs, apply flow)
- 41-06-PLAN.md — Full test suite + visual checkpoint verification UI hint: yes
Phase 42: Wallpapers, Social, Format Conversion & Voice
Goal: Users can generate platform-ready images (wallpapers, OG images, social banners) via LLM SVG + sharp rasterization, convert between any file format pair, and record voice directly in web chat via the Whisper mic button Depends on: Phase 40 Requirements: WALL-01, WALL-02, WALL-03, WALL-04, SOCIAL-01, SOCIAL-02, SOCIAL-03, CONV-01, CONV-02, CONV-03, CONV-04, CONV-05, CONV-06, CONV-07, CONV-08, CONV-09, VOICE-01, VOICE-02, VOICE-03 Success Criteria (what must be TRUE):
- Requesting a desktop wallpaper returns a 2560x1440 PNG; requesting an Instagram banner returns a correctly-dimensioned image — platform dimensions are constants, not magic numbers
- The format conversion UI allows drag-drop of a source file, selection of a target format, and download of the converted file; direct conversion pairs (image, audio/video, document, data) use native tools; any unsupported pair falls through to AI-bridged conversion rather than showing as unavailable
- Navigating to
/convert/png/svgdeep-links directly to the PNG->SVG conversion flow with source and target pre-selected - An uploaded file is validated against its magic bytes before processing — a JPEG renamed to
.pngis rejected with a clear error, not silently misprocessed - Clicking the mic button in web chat records audio, transcribes it via local Whisper, and populates the chat input — works offline with the locally cached model Plans: 6 plans
Plans:
- 42-01-PLAN.md — Dependencies, bundle types, job-runner switch, converter capabilities probe
- 42-02-PLAN.md — Wallpaper renderer (LLM SVG + sharp) and social post renderer (LLM JSON + hashtags)
- 42-03-PLAN.md — Convert renderer (sharp/ffmpeg/xlsx/AI-bridge) and multipart upload route with MIME validation
- 42-04-PLAN.md — Voice offline badge wiring (useSystemProviders hook + ChatInput badge)
- 42-05-PLAN.md — Wallpaper/Social UI panels + ContentStudio tab extensions
- 42-06-PLAN.md — Format conversion UI page with drag-drop, format chips, deep-link routing UI hint: yes
Phase 43: Documents & Branding
Goal: Users can generate polished PDF reports and invoices via Playwright, and create a complete brand identity (logo, avatars, social profiles, letterhead, guidelines PDF, zip package) from a single conversation Depends on: Phase 41 Requirements: DOC-01, DOC-02, DOC-03, BRAND-01, BRAND-02, BRAND-03, BRAND-04, BRAND-05, BRAND-06 Success Criteria (what must be TRUE):
- Generating a PDF report from a conversation produces a downloadable PDF with correct layout; generating an invoice from a template produces a filled invoice PDF with correct line items
- Generating a one-pager or API reference document produces a styled PDF with navigable headings
- Starting a brand identity conversation produces a logo mark (SVG), avatar at multiple sizes, platform-specific social images, an email signature, and a brand guidelines PDF — all in a single brand kit
- The complete brand kit can be downloaded as a single zip file with assets organized by type Plans: 3 plans
Plans:
- 43-01-PLAN.md — Types, archiver install, PDF renderer (Playwright HTML-to-PDF), job-runner wiring
- 43-02-PLAN.md — Brand kit renderer (logo, avatars, social images, templates, guidelines PDF, ZIP packaging)
- 43-03-PLAN.md — Document and Brand UI panels, ContentStudio tab extensions UI hint: yes
Phase 44: Video & Presentations
Goal: Agents can produce pitch deck presentations and demo videos rendered by Remotion from a conversation, with SSE progress updates throughout the render — which may take several minutes on the M4 Depends on: Phase 40 Requirements: PRES-01, PRES-02, PRES-03, PRES-04 Success Criteria (what must be TRUE):
- Requesting a pitch deck from a conversation description produces a Remotion-rendered interactive web presentation or MP4; the render runs in a separate workspace package and does not block the main server process
- The Remotion bundle is compiled once at server startup and reused for all renders — submitting a second render request does not trigger a second webpack compilation
- A browser connected during a video render receives SSE progress events (percentage complete) throughout the render; the final event delivers the download URL
- Concurrent LLM inference and video rendering do not cause the server to become unresponsive — render concurrency is capped and serialized with LLM workloads Plans: 3 plans
Plans:
- 44-01-PLAN.md — Remotion workspace package, compositions, shared constants, types, job-runner wiring
- 44-02-PLAN.md — Presentation renderer with LLM slide generation, Remotion render, SSE progress
- 44-03-PLAN.md — PresentationPanel UI, useContentJob progress extension, ContentStudio tab UI hint: yes
Phase 45: Content as Skills
Goal: Every content type built in Phases 41-44 is accessible to agents as an installable Markdown skill, and the generalist agent ships pre-loaded with the Creative skill group Depends on: Phase 44 Requirements: SKILL-01, SKILL-02, SKILL-03 Success Criteria (what must be TRUE):
- Each content type (diagram, theme, icon, wallpaper, social post, PDF, brand kit, video) has a corresponding skill file that an agent can load and use to call the correct content job API
- A freshly created generalist agent has the Creative skill group pre-loaded — it can generate diagrams and themes without any manual skill configuration
- A user can add or remove individual content type skills through the Skill Aggregator UI without touching configuration files Plans: 1 plan
Plans:
- 45-01-PLAN.md — 9 SKILL.md files, local-nexus-content source type, Creative group seeding, startup wiring UI hint: yes
Coverage Validation
All 52 v1.7 requirements are mapped to exactly one phase. No orphans.
| Requirement | Phase |
|---|---|
| INFRA-01 | 40 |
| INFRA-02 | 40 |
| INFRA-03 | 40 |
| INFRA-04 | 40 |
| DIAG-01 | 41 |
| DIAG-02 | 41 |
| DIAG-03 | 41 |
| DIAG-04 | 41 |
| DIAG-05 | 41 |
| THEME-01 | 41 |
| THEME-02 | 41 |
| THEME-03 | 41 |
| THEME-04 | 41 |
| THEME-05 | 41 |
| THEME-06 | 41 |
| THEME-07 | 41 |
| ICON-01 | 41 |
| ICON-02 | 41 |
| ICON-03 | 41 |
| WALL-01 | 42 |
| WALL-02 | 42 |
| WALL-03 | 42 |
| WALL-04 | 42 |
| SOCIAL-01 | 42 |
| SOCIAL-02 | 42 |
| SOCIAL-03 | 42 |
| CONV-01 | 42 |
| CONV-02 | 42 |
| CONV-03 | 42 |
| CONV-04 | 42 |
| CONV-05 | 42 |
| CONV-06 | 42 |
| CONV-07 | 42 |
| CONV-08 | 42 |
| CONV-09 | 42 |
| VOICE-01 | 42 |
| VOICE-02 | 42 |
| VOICE-03 | 42 |
| DOC-01 | 43 |
| DOC-02 | 43 |
| DOC-03 | 43 |
| BRAND-01 | 43 |
| BRAND-02 | 43 |
| BRAND-03 | 43 |
| BRAND-04 | 43 |
| BRAND-05 | 43 |
| BRAND-06 | 43 |
| PRES-01 | 44 |
| PRES-02 | 44 |
| PRES-03 | 44 |
| PRES-04 | 44 |
| SKILL-01 | 45 |
| SKILL-02 | 45 |
| SKILL-03 | 45 |
Progress
| Phase | Milestone | Plans Complete | Status | Completed |
|---|---|---|---|---|
| 1. Foundation | v1.2.1 | 2/2 | Complete | 2026-04-01 |
| 21. Chat Foundation | v1.3 | 7/7 | Complete | 2026-04-02 |
| 22. Agent Streaming | v1.3 | 5/5 | Complete | 2026-04-02 |
| 23. Brainstormer Flow | v1.3 | 4/4 | Complete | 2026-04-02 |
| 24. Search, History & Branching | v1.3 | 4/4 | Complete | 2026-04-02 |
| 25. File System | v1.3 | 9/9 | Complete | 2026-04-02 |
| 26. PWA & Performance | v1.3 | 5/5 | Complete | 2026-04-02 |
| 27. Hermes Adapter | v1.4 | 1/1 | Complete | 2026-04-02 |
| 28. Ollama Integration & Agent Surface | v1.4 | 3/3 | Complete | 2026-04-02 |
| 29. Default Provider & End-to-End | v1.4 | 2/2 | Complete | 2026-04-02 |
| 30. Hardware Detection + Mode Selection | v1.5 | 2/2 | Complete | 2026-04-03 |
| 31. Puter.js Zero-Config Cloud | v1.5 | 4/4 | Complete | 2026-04-03 |
| 32. Multi-Step Onboarding Wizard | v1.5 | 1/1 | Complete | 2026-04-03 |
| 33. Persistent Memory + Personal Assistant Mode | v1.5 | 3/3 | Complete | 2026-04-03 |
| 34. Voice | v1.5 | 2/2 | Complete | 2026-04-03 |
| 35. npx buildthis CLI | v1.5 | 1/1 | Complete | 2026-04-03 |
| 36. Voice Pipeline Foundation | v1.6 | 2/3 | Complete | 2026-04-04 |
| 37. Web Chat Voice UI | v1.6 | 3/4 | Complete | 2026-04-04 |
| 38. Telegram Bridge | v1.6 | 3/3 | Complete | 2026-04-04 |
| 39. Voice Polish | v1.6 | 1/2 | Complete | 2026-04-04 |
| 40. Job Infrastructure | v1.7 | 2/2 | Complete | 2026-04-04 |
| 41. Diagrams, Icons & Theme Engine | v1.7 | 6/6 | Complete | 2026-04-04 |
| 42. Wallpapers, Social, Format Conversion & Voice | v1.7 | 6/6 | Complete | 2026-04-04 |
| 43. Documents & Branding | v1.7 | 3/3 | Complete | 2026-04-04 |
| 44. Video & Presentations | v1.7 | 3/3 | Complete | 2026-04-04 |
| 45. Content as Skills | v1.7 | 1/1 | Complete | 2026-04-04 |