From 5ea3f2d6b52c0f01ab732c326a0a466673108a69 Mon Sep 17 00:00:00 2001 From: Nexus Dev Date: Fri, 3 Apr 2026 23:31:55 +0000 Subject: [PATCH] docs: start milestone v1.6 Voice Pipeline + Minimal Message Bridge --- .planning/PROJECT.md | 61 ++++++++++++++----------- .planning/STATE.md | 106 ++++++++++--------------------------------- 2 files changed, 59 insertions(+), 108 deletions(-) diff --git a/.planning/PROJECT.md b/.planning/PROJECT.md index 306ef262..dddabd45 100644 --- a/.planning/PROJECT.md +++ b/.planning/PROJECT.md @@ -4,7 +4,7 @@ Nexus is a personal fork of Paperclip (MIT, v2026.318.0) that reframes the "companies with CEOs" corporate metaphor as "workspaces with agents." It's a project orchestration tool for a solo developer (Mikkel) managing AI agents across personal and professional projects. The fork stays mergeable with upstream by limiting changes to the display layer (UI strings, CLI output, agent templates, documentation) while leaving DB schema, API routes, code identifiers, and token formats unchanged. -v1.3 added a full chat interface with streaming, brainstormer workflow, file system, search/branching, and PWA support. v1.4 made Hermes the default local provider with Ollama integration. v1.5 adds smart onboarding with hardware detection, tiered AI providers, and a Personal AI Assistant mode. +v1.3 added a full chat interface with streaming, brainstormer workflow, file system, search/branching, and PWA support. v1.4 made Hermes the default local provider with Ollama integration. v1.5 adds smart onboarding with hardware detection, tiered AI providers, and a Personal AI Assistant mode. v1.6 adds a transport-agnostic voice pipeline (Whisper STT + Piper TTS) and a minimal Telegram bridge for phone access. ## Core Value @@ -37,21 +37,25 @@ A fresh onboard asks for ONE thing (root directory), auto-creates PM + Engineer - ✓ Default provider logic (fallback to Hermes when no cloud provider) — v1.4 - ✓ Agent templates working with Hermes runtime — v1.4 - ✓ Dashboard Hermes-specific info (model, VRAM, native skills) — v1.4 +- ✓ Mode selection (Personal AI Assistant / Project Builder / Both) — v1.5 +- ✓ Hardware detection with pre-built model database (GPU, Apple Silicon, CPU-only) — v1.5 +- ✓ Local AI setup via Ollama with RAM-aware model recommendations — v1.5 +- ✓ Zero-config cloud via Puter.js (no API keys, no sign-up) — v1.5 +- ✓ Multi-step onboarding wizard with skip buttons and summary screen — v1.5 +- ✓ Personal AI Assistant with persistent memory, voice, project handoff — v1.5 +- ✓ `npx buildthis` CLI entry point with hardware detection — v1.5 ### Active -- [ ] Mode selection: Personal AI Assistant / Project Builder / Both -- [ ] Hardware detection with pre-built model database (GPU, Apple Silicon, CPU-only) -- [ ] Local AI setup via Ollama with RAM-aware model recommendations -- [ ] Voice features (Whisper + Piper) working on CPU-only hardware -- [ ] Zero-config cloud via Puter.js (no API keys, no sign-up) -- [ ] OAuth cloud tier (Google Gemini, OpenAI free tiers) -- [ ] Subscription/API key auto-detection (Hermes, Claude Code, OpenClaw) -- [ ] Personal AI Assistant with persistent memory, MCP, voice -- [ ] Project handoff: assistant conversation → PM with context transfer -- [ ] Summary screen → straight into chat -- [ ] `npx buildthis` CLI entry point -- [ ] Every onboarding step skippable +- [ ] Whisper STT pipeline (local, transport-agnostic, language auto-detection, CPU fallback) +- [ ] Piper TTS pipeline (local, multiple voices, <3s response, CPU-only) +- [ ] Voice mode flag on messages (text mode vs voice mode response formatting) +- [ ] Dual output pattern (voice-optimized response + full text with code blocks) +- [ ] Web chat mic button (record, silence detection, waveform UI, auto-send) +- [ ] Web chat audio playback (inline player, auto-play toggle) +- [ ] Voice mode toggle setting (text only / voice input / full voice) +- [ ] Telegram bridge — single bot, text + voice relay, agent prefixing +- [ ] Onboarding STT/TTS hardware detection and voice enable step ### Out of Scope @@ -65,7 +69,12 @@ A fresh onboard asks for ONE thing (root directory), auto-creates PM + Engineer - .paperclip.yaml export format rename — breaks upstream import compatibility - Recipe Registry plugin — separate project - Catppuccin Mocha full theme — stretch goal, not v1 -- Telegram Channels integration — future +- Per-agent Telegram bots — replaced by Command Center agent visualization +- GSD question formatting for Telegram — replaced by Command Center rich elements +- Deep Telegram ↔ web chat sync — replaced by Postgres bus +- Telegram threads/topics/inline keyboards — thin bridge only +- Voice call / real-time audio streaming — future consideration +- Wake word detection ("Hey Nexus") — future - NPM reverse proxy — future - Danish business integrations — future - Multi-workspace support — works via existing multi-company feature, just renamed @@ -142,21 +151,19 @@ After every `/gsd:complete-milestone`, perform an upstream rebase before startin **Autonomous mode:** The autonomous workflow MUST check for this section and run the rebase after `complete-milestone` returns, before starting the next milestone. -## Current Milestone: v1.5 Smart Onboarding + Personal AI Assistant +## Current Milestone: v1.6 Voice Pipeline + Minimal Message Bridge -**Goal:** The definitive onboarding experience — hardware detection, tiered provider setup (local/free cloud/paid), and a Personal AI Assistant mode that coexists with the Project Builder. +**Goal:** Transport-agnostic voice pipeline (Whisper STT + Piper TTS) integrated into web chat, plus a minimal Telegram bridge for phone access. Voice infrastructure designed to survive v2.2 Command Center migration. **Target features:** -- Mode selection (Personal AI Assistant / Project Builder / Both) -- Hardware detection with pre-built model database (GPU, Apple Silicon, CPU-only) -- Local AI setup via Ollama with RAM-aware model recommendations + voice (Whisper + Piper) -- Zero-config cloud via Puter.js (500+ models, no API keys) -- OAuth cloud tier (Google Gemini, OpenAI free tiers) -- Subscription/API key auto-detection (Hermes, Claude Code, OpenClaw) -- Personal AI Assistant with persistent memory, MCP, voice, project handoff -- Summary screen → straight into chat -- `npx buildthis` CLI entry point -- Every step skippable, local-first framed as privacy premium +- Whisper STT pipeline (local, transport-agnostic, language auto-detection, CPU fallback) +- Piper TTS pipeline (local, multiple voices, <3s response, CPU-only) +- Voice mode flag + dual output pattern (voice-optimized + full text) +- Web chat mic button with recording, silence detection, waveform UI +- Web chat audio playback (inline player, auto-play toggle) +- Voice mode toggle (text only / voice input / full voice) +- Minimal Telegram bridge — single bot, text + voice relay, agent prefixing +- Onboarding STT/TTS hardware detection --- -*Last updated: 2026-04-03 after v1.5 milestone* +*Last updated: 2026-04-03 after v1.6 milestone start* diff --git a/.planning/STATE.md b/.planning/STATE.md index 0ef24bf3..0f496b72 100644 --- a/.planning/STATE.md +++ b/.planning/STATE.md @@ -1,16 +1,16 @@ --- gsd_state_version: 1.0 -milestone: v1.5 -milestone_name: Smart Onboarding + Personal AI Assistant -status: verifying -stopped_at: Completed 35-npx-buildthis-cli/35-01 -last_updated: "2026-04-03T23:03:36.034Z" +milestone: v1.6 +milestone_name: Voice Pipeline + Minimal Message Bridge +status: planning +stopped_at: null +last_updated: "2026-04-03" last_activity: 2026-04-03 progress: - total_phases: 6 - completed_phases: 6 - total_plans: 13 - completed_plans: 13 + total_phases: 0 + completed_phases: 0 + total_plans: 0 + completed_plans: 0 percent: 0 --- @@ -18,91 +18,38 @@ progress: ## Project Reference -See: .planning/PROJECT.md (updated 2026-04-02) +See: .planning/PROJECT.md (updated 2026-04-03) **Core value:** A fresh onboard asks for ONE thing (root directory), auto-creates PM + Engineer agents, and drops you in the dashboard. -**Current focus:** Phase 35 — npx-buildthis-cli +**Current focus:** Defining requirements for v1.6 ## Current Position -Phase: 35 -Plan: Not started -Status: Phase complete — ready for verification -Last activity: 2026-04-03 - -Progress: [__________] 0% +Phase: Not started (defining requirements) +Plan: — +Status: Defining requirements +Last activity: 2026-04-03 — Milestone v1.6 started ## Performance Metrics **Velocity:** -- Total plans completed: 0 (v1.5) +- Total plans completed: 0 (v1.6) - Average duration: - - Total execution time: 0 hours -**By Phase:** - -| Phase | Plans | Total | Avg/Plan | -|-------|-------|-------|----------| -| - | - | - | - | - -**Recent Trend:** - -- Last 5 plans: none yet (v1.5) -- Trend: - - -*Updated after each plan completion* -| Phase 30-hardware-detection-mode-selection P01 | 15 | 2 tasks | 8 files | -| Phase 30-hardware-detection-mode-selection P02 | 15 | 2 tasks | 6 files | -| Phase 31-puter.js-zero-config-cloud P01 | 4 | 2 tasks | 4 files | -| Phase 31-puter.js-zero-config-cloud P02 | 202 | 3 tasks | 4 files | -| Phase 31-puter.js-zero-config-cloud P03 | 5 | 2 tasks | 6 files | -| Phase 31-puter.js-zero-config-cloud P04 | 1 | 1 tasks | 0 files | -| Phase 32-multi-step-onboarding-wizard P01 | 4 | 2 tasks | 3 files | -| Phase 33 P01 | 4 | 2 tasks | 6 files | -| Phase 33 P02 | 12 | 2 tasks | 8 files | -| Phase 33-persistent-memory P03 | 20 | 2 tasks | 6 files | -| Phase 34-voice P01 | 3 | 2 tasks | 7 files | -| Phase 34-voice P02 | 4 | 2 tasks | 3 files | -| Phase 35-npx-buildthis-cli P01 | 263 | 2 tasks | 11 files | - ## Accumulated Context ### Decisions Decisions are logged in PROJECT.md Key Decisions table. -Key constraints for v1.5 (established at roadmap): +Key constraints for v1.6: -- No DB schema changes — all state in existing JSONB fields (`instance_settings.general`) and file-backed JSON (`data/memory/.json`) -- Puter.js is server-proxied adapter only — `@heyputer/puter.js` browser import is for auth popup only; all AI calls via `POST /api/puter-proxy/chat` -- OAuth tokens (Google, Puter) stored server-side via `secretService` — never in localStorage -- Memory sanitization blocklist applied at write time, not retrieval time -- Apple Silicon: use `os.freemem()` × 0.75 for VRAM estimate; label as "unified memory" not "VRAM"; use `systeminformation` v5 (not v6) -- Unauthenticated `GET /system/providers` endpoint required for pre-auth hardware probe -- Google OAuth cloud tier: include but flag policy risk (Gemini CLI abuse detection issue #21866) -- Skip-all minimum valid state: one working agent with a valid provider must be created when user skips all steps -- [Phase 30-hardware-detection-mode-selection]: Hardware routes mounted before api Router to bypass boardMutationGuard; Apple Silicon detection via process.platform + cpuModel.startsWith('Apple') without calling si.graphics(); Promise.race 3s timeout on GPU probe for cpu_only fallback -- [Phase 30-hardware-detection-mode-selection]: Hardware probe is non-blocking — wizard step 1 always has an enabled Continue button regardless of probe outcome -- [Phase 30-hardware-detection-mode-selection]: Mode save on wizard completion is non-blocking — wrapped in try/catch, defaults to 'both' on failure -- [Phase 31-puter.js-zero-config-cloud]: agentId is optional in puterProxyService.chatStream — cost recording skipped when null/undefined to avoid FK violation in cost_events -- [Phase 31-puter.js-zero-config-cloud]: pendingPkce stores only verifier (no companyId) — company does not exist at authorize time during onboarding -- [Phase 31-puter.js-zero-config-cloud]: pendingTokens pattern: callback parks tokens by stateId, claim endpoint links to real companyId post-company-creation -- [Phase 31-puter.js-zero-config-cloud]: Provider heading in wizard wrapper (not ProviderSelectionStep) for consistency with ModeSelector pattern; credentials captured in React state and posted after company creation -- [Phase 31-puter.js-zero-config-cloud]: Plan 04 is verification-only — auto-approved under workflow.auto_advance=true; full UAT deferred to manual QA session -- [Phase 32-multi-step-onboarding-wizard]: createWorkspace() helper extracted so both handleSubmit and handleStartChat share workspace creation without duplication -- [Phase 32-multi-step-onboarding-wizard]: Step 4 form submit removed — replaced with button advancing to step 5; actual workspace creation deferred to summary CTA -- [Phase 33]: Removed zod dependency from assistant-memory.ts — replaced with manual type guard due to worktree node_modules not having zod symlink -- [Phase 33]: GitHub PAT regex changed from {36} to {36,} to handle tokens longer than the minimum expected length -- [Phase 33]: PersonalAssistant uses chatApi directly for standalone full-page chat (no ChatPanel dependency) — maintains worktree isolation for parallel execution -- [Phase 33]: useNexusMode defaults to 'both' while loading — prevents flash-redirect to dashboard on initial mount before settings resolve -- [Phase 33-persistent-memory]: Pre-fetch conversation/settings/memory BEFORE flushHeaders to avoid SSE header race (Pitfall 3 from research) -- [Phase 33-persistent-memory]: puterProxyService.resolveToken wrapped in try/catch — graceful fallback to streamEcho when no puter token configured -- [Phase 33-persistent-memory]: buildHandoffSummary exported as named pure function for direct unit testing without route test harness -- [Phase 34-voice]: chatFileRoutes registered inside boardMutationGuard after assistantHandoffRoutes; nexusSettingsRoutes also added (was missing) -- [Phase 34-voice]: voiceEnabled as Zod boolean with default(false) in nexus-settings — file-backed JSON, no DB migration -- [Phase 34-voice]: VoiceStep inserted at step 4; rootDir shifts to step 5, summary to step 6 — clean sequential numbering -- [Phase 34-voice]: TtsButton rendered inline in messages.map rather than inside MessageBubble — avoids prop drilling usePiperTts -- [Phase 35-npx-buildthis-cli]: detectHardware() accepts optional platform param for testability; getProviderOptions() extracted as pure function; controller.vram uses nullish coalescing for TypeScript strict mode +- Voice pipeline is transport-agnostic — no Telegram-specific code in core voice components +- Telegram bridge is intentionally disposable (<500 lines) — will be replaced by v2.2 Command Center +- Dual output always: voice response + full technical details in text +- Voice mode is a per-message flag, not a per-agent setting +- v1.5 already has VoiceRecordButton, TtsButton, usePiperTts hooks in place — build on these ### Pending Todos @@ -110,13 +57,10 @@ None yet. ### Blockers/Concerns -- [Phase 31] Puter.js Node.js server-side streaming API surface unverified — confirm `stream: true` works server-side before designing `puterProxyService`; plan-phase should include a research spike -- [Phase 31] Puter.js ToS on server-side request relaying unverified — attribute costs to user's Puter account in all UI copy as mitigation -- [Phase 33] Chat route injection point needs codebase inspection — confirm correct hook location in `server/src/services/chat.ts` during plan-phase -- [Phase 34] smart-whisper Apple Silicon acceleration claim unverified on Mac Mini M4 — fall back to `tiny.en` if `base.en` acceleration not confirmed on device +- [v1.5 carryover] smart-whisper Apple Silicon acceleration claim unverified on Mac Mini M4 — fall back to `tiny.en` if `base.en` acceleration not confirmed on device ## Session Continuity -Last session: 2026-04-03T23:00:01.397Z -Stopped at: Completed 35-npx-buildthis-cli/35-01 +Last session: 2026-04-03 +Stopped at: Milestone v1.6 initialized Resume file: None