diff --git a/.planning/REQUIREMENTS.md b/.planning/REQUIREMENTS.md index e6d2692f..dee33376 100644 --- a/.planning/REQUIREMENTS.md +++ b/.planning/REQUIREMENTS.md @@ -72,35 +72,35 @@ | Requirement | Phase | Status | |-------------|-------|--------| -| VPIPE-01 | — | Pending | -| VPIPE-02 | — | Pending | -| VPIPE-03 | — | Pending | -| VPIPE-04 | — | Pending | -| VPIPE-05 | — | Pending | -| VPIPE-06 | — | Pending | -| VPIPE-07 | — | Pending | -| VPIPE-08 | — | Pending | -| WCHAT-01 | — | Pending | -| WCHAT-02 | — | Pending | -| WCHAT-03 | — | Pending | -| WCHAT-04 | — | Pending | -| WCHAT-05 | — | Pending | -| WCHAT-06 | — | Pending | -| TGRAM-01 | — | Pending | -| TGRAM-02 | — | Pending | -| TGRAM-03 | — | Pending | -| TGRAM-04 | — | Pending | -| TGRAM-05 | — | Pending | -| TGRAM-06 | — | Pending | -| ONBRD-01 | — | Pending | -| ONBRD-02 | — | Pending | -| ONBRD-03 | — | Pending | +| VPIPE-01 | Phase 36 | Pending | +| VPIPE-02 | Phase 36 | Pending | +| VPIPE-03 | Phase 36 | Pending | +| VPIPE-04 | Phase 36 | Pending | +| VPIPE-05 | Phase 36 | Pending | +| VPIPE-06 | Phase 36 | Pending | +| VPIPE-07 | Phase 39 | Pending | +| VPIPE-08 | Phase 39 | Pending | +| WCHAT-01 | Phase 37 | Pending | +| WCHAT-02 | Phase 37 | Pending | +| WCHAT-03 | Phase 37 | Pending | +| WCHAT-04 | Phase 37 | Pending | +| WCHAT-05 | Phase 37 | Pending | +| WCHAT-06 | Phase 37 | Pending | +| TGRAM-01 | Phase 38 | Pending | +| TGRAM-02 | Phase 38 | Pending | +| TGRAM-03 | Phase 38 | Pending | +| TGRAM-04 | Phase 38 | Pending | +| TGRAM-05 | Phase 38 | Pending | +| TGRAM-06 | Phase 38 | Pending | +| ONBRD-01 | Phase 39 | Pending | +| ONBRD-02 | Phase 39 | Pending | +| ONBRD-03 | Phase 38 | Pending | **Coverage:** - v1.6 requirements: 23 total -- Mapped to phases: 0 -- Unmapped: 23 ⚠️ +- Mapped to phases: 23 +- Unmapped: 0 ✓ --- *Requirements defined: 2026-04-04* -*Last updated: 2026-04-04 after initial definition* +*Last updated: 2026-04-03 — traceability populated after roadmap creation* diff --git a/.planning/ROADMAP.md b/.planning/ROADMAP.md index 1c32e6c0..24d37f8e 100644 --- a/.planning/ROADMAP.md +++ b/.planning/ROADMAP.md @@ -5,7 +5,8 @@ - ✅ **v1.2.1 Universal Skill Management** - Phase 1 (shipped 2026-04-01) - ✅ **v1.3 Chat & PWA** - Phases 21-26 (shipped 2026-04-02) - ✅ **v1.4 Hermes Default Provider** - Phases 27-29 (shipped 2026-04-02) -- 🚧 **v1.5 Smart Onboarding + Personal AI Assistant** - Phases 30-35 (in progress) +- ✅ **v1.5 Smart Onboarding + Personal AI Assistant** - Phases 30-35 (shipped 2026-04-03) +- 🚧 **v1.6 Voice Pipeline + Minimal Message Bridge** - Phases 36-39 (in progress) --- @@ -58,168 +59,140 @@ Plans: **Goal**: Users can create a Hermes agent in Nexus, configure it, and have it execute heartbeats that spawn `hermes chat -q`, return a result, and persist the session across runs **Plans**: 1/1 plans complete -Plans: -- [x] 27-01-PLAN.md — Close four integration gaps: SESSIONED_LOCAL_ADAPTERS, create-mode toolsets bug, duplicate constant, session codec test - ### Phase 28: Ollama Integration & Agent Surface **Goal**: Users can see which Ollama models are available, get a recommendation for their hardware, configure any Hermes agent to use a local model, and see Hermes-specific runtime data in the dashboard and agent config **Plans**: 3/3 plans complete -Plans: -- [x] 28-01-PLAN.md — Ollama service, routes, model catalog, and unit tests -- [x] 28-02-PLAN.md — UI model selector dropdown, install callout, Hermes skill badge -- [x] 28-03-PLAN.md — Hermes stateJson runtime data and dashboard HermesRuntimeCard - ### Phase 29: Default Provider & End-to-End **Goal**: A fresh Nexus install with only Hermes and Ollama works end-to-end — onboarding offers Hermes as the default, PM and Engineer templates run correctly on the Hermes runtime, and GSD workflow tasks complete successfully **Plans**: 2/2 plans complete -Plans: -- [x] 29-01-PLAN.md — Adapter probe route, onboarding wizard Hermes fallback, adapter-neutral templates -- [x] 29-02-PLAN.md — Hermes skill injection via promptTemplate, integration tests + + +
+✅ v1.5 Smart Onboarding + Personal AI Assistant (Phases 30-35) - SHIPPED 2026-04-03 + +### Phase 30: Hardware Detection + Mode Selection +**Goal**: Users see accurate hardware information during onboarding, get a model recommendation matched to their machine, and choose a mode that correctly gates all downstream features +**Plans**: 2/2 plans complete + +### Phase 31: Puter.js Zero-Config Cloud +**Goal**: Users without Ollama installed can reach working AI in one click via Puter.js +**Plans**: 4/4 plans complete + +### Phase 32: Multi-Step Onboarding Wizard +**Goal**: Users move through a complete, skippable onboarding flow that assembles hardware data, provider selection, and voice options into a summary screen +**Plans**: 1/1 plans complete + +### Phase 33: Persistent Memory + Personal Assistant Mode +**Goal**: Users in Personal AI Assistant mode accumulate memory across sessions that shapes future responses +**Plans**: 3/3 plans complete + +### Phase 34: Voice +**Goal**: Users can speak to the assistant (Whisper STT) and hear responses read aloud (Piper TTS) +**Plans**: 2/2 plans complete + +### Phase 35: npx buildthis CLI +**Goal**: A developer can run `npx buildthis` on a fresh machine and either open an already-running Nexus or be guided through install +**Plans**: 1/1 plans complete
--- -### 🚧 v1.5 Smart Onboarding + Personal AI Assistant (In Progress) +### 🚧 v1.6 Voice Pipeline + Minimal Message Bridge (In Progress) -**Milestone Goal:** The definitive onboarding experience — hardware detection, tiered provider setup (local/free cloud/paid), and a Personal AI Assistant mode that coexists with the Project Builder. +**Milestone Goal:** Transport-agnostic voice pipeline (Whisper STT + Piper TTS) integrated into web chat, plus a minimal Telegram bridge for phone access. Voice infrastructure designed to survive v2.2 Command Center migration. ## Phases -- [x] **Phase 30: Hardware Detection + Mode Selection** — Unauthenticated hardware probe, Apple Silicon unified memory handling, model recommendation database, and mode selector that gates all assistant-specific features (completed 2026-04-02) -- [x] **Phase 31: Puter.js Zero-Config Cloud** — Server-proxied Puter.js adapter with full cost tracking, Google OAuth PKCE tier, and subscription auto-detection; no API keys required for zero-config path (completed 2026-04-03) -- [x] **Phase 32: Multi-Step Onboarding Wizard** — Assemble all provider tiers and hardware data into a skippable multi-step wizard; summary screen routes directly into chat (completed 2026-04-03) -- [x] **Phase 33: Persistent Memory + Personal Assistant Mode** — File-backed memory with write-time sanitization, PersonalAssistantPage, conversation handoff to PM agent (completed 2026-04-03) -- [x] **Phase 34: Voice** — Piper TTS with pre-warm progress, Whisper STT wired into voice service, onboarding voice step activated (completed 2026-04-03) -- [x] **Phase 35: npx buildthis CLI** — Standalone bootstrapper package with hardware detection and provider tiering parity with web onboarding (completed 2026-04-03) - ---- +- [ ] **Phase 36: Voice Pipeline Foundation** — Transport-agnostic VoicePipelineService (transcribe, synthesize, formatForVoice), voice.ts route, ffmpeg audio transcoding, voiceMode flag, dual output pattern +- [ ] **Phase 37: Web Chat Voice UI** — VAD silence detection, waveform visualization, voice mode toggle, inline audio player, auto-play toggle, COOP/COEP headers +- [ ] **Phase 38: Telegram Bridge** — grammY long polling relay, text + voice note bidirectional relay, agent identity prefix, BotFather onboarding setup +- [ ] **Phase 39: Voice Polish** — Sentence-buffered TTS streaming, multi-language TTS output, onboarding STT/TTS hardware detection step ## Phase Details -### Phase 30: Hardware Detection + Mode Selection -**Goal**: Users see accurate hardware information during onboarding, get a model recommendation matched to their machine, and choose a mode that correctly gates all downstream features — with the probe working before board auth exists -**Depends on**: Phase 29 (v1.4 shipped) -**Requirements**: ONBD-01, ONBD-02, ONBD-03, ONBD-07 +### Phase 36: Voice Pipeline Foundation +**Goal**: The transport-agnostic voice pipeline is live and callable from any consumer — web chat, Telegram, or future integrations — with correct audio transcoding, voice mode flag propagation, and dual output formatting baked in from the start +**Depends on**: Phase 35 (v1.5 shipped) +**Requirements**: VPIPE-01, VPIPE-02, VPIPE-03, VPIPE-04, VPIPE-05, VPIPE-06 **Success Criteria** (what must be TRUE): - 1. On a fresh install (before any board auth token exists), the hardware probe returns GPU, RAM, and Apple Silicon unified memory data within 5 seconds - 2. A Mac Mini M4 reports "unified memory" (not VRAM) with the 0.75 multiplier applied and copy that says "runs entirely on your machine" - 3. The mode selector (Personal AI Assistant / Project Builder / Both) is visible during onboarding and the selected mode is persisted; assistant-specific UI is hidden when Project Builder-only is chosen - 4. The model recommendation shown to the user matches an entry in the pre-built JSON catalog for the detected hardware tier (GPU / Apple Silicon / CPU-only) -**Plans**: 2 plans + 1. Posting a WAV audio file to `POST /api/transcribe` returns a transcription with detected language, regardless of whether the request came from the web UI or a test harness + 2. Calling `POST /api/synthesize` with a markdown-heavy agent response returns two outputs: a voice-optimized prose version (no markdown) and the original full text with code blocks + 3. A WebM/Opus browser recording and an OGG/Opus Telegram voice note both produce identical Whisper transcription quality after ffmpeg transcodes each to WAV 16kHz mono + 4. The `voiceMode` flag on a chat message survives from client request through Express route to message persistence — verifiable in the DB record + 5. `nexus-settings.json` accepts `voiceMode: "text" | "voice_input" | "full_voice"` and `telegramToken` fields without breaking existing settings reads +**Plans**: TBD -Plans: -- [x] 30-01-PLAN.md — Hardware service, nexus-settings service, model catalog extension, routes, and tests -- [x] 30-02-PLAN.md — ModeSelector, HardwareSummaryStep, useHardwareInfo hook, multi-step wizard wiring - -### Phase 31: Puter.js Zero-Config Cloud -**Goal**: Users without Ollama installed can reach working AI in one click via Puter.js — all calls server-proxied, tokens server-stored, cost tracked; Google OAuth and subscription auto-detection round out the provider tier -**Depends on**: Phase 30 -**Requirements**: CLOUD-01, CLOUD-02, CLOUD-03, CLOUD-04, CLOUD-05 +### Phase 37: Web Chat Voice UI +**Goal**: Users can speak to any agent in web chat — recording auto-stops on silence, a live waveform confirms the mic is active, responses play back automatically (toggleable), and voice mode is a first-class setting +**Depends on**: Phase 36 +**Requirements**: WCHAT-01, WCHAT-02, WCHAT-03, WCHAT-04, WCHAT-05, WCHAT-06 **Success Criteria** (what must be TRUE): - 1. A user with no Ollama and no API keys clicks "Continue with Puter" in onboarding, completes the Puter auth popup, and immediately gets a working chat response — no API key entry required - 2. All Puter AI calls flow through `POST /api/puter-proxy/chat` (verifiable in server logs); the Puter auth token is stored server-side via secretService, not in localStorage - 3. Token cost for Puter responses appears in the cost tracking view, attributed correctly per conversation - 4. A user with Hermes, Claude Code, or OpenClaw already installed sees those tools pre-filled in the provider configuration step with no manual entry - 5. A user clicking "Sign in with Google" for Gemini completes PKCE OAuth and gets a Gemini-backed chat response; the UI displays a policy-risk note that Google OAuth may trigger abuse detection -**Plans**: 4 plans - -Plans: -- [x] 31-01-PLAN.md — Puter proxy service, routes, unit tests, and app.ts wiring -- [x] 31-02-PLAN.md — Google OAuth PKCE service, routes, API key storage route -- [x] 31-03-PLAN.md — Provider Selection UI step, PuterAuthButton, GoogleOAuthButton, ApiKeyEntryForm, 4-step wizard wiring -- [x] 31-04-PLAN.md — Google OAuth claim endpoint, human verification of full onboarding flow + 1. Clicking the mic button starts recording; the waveform animates to show audio levels; speaking and then pausing for 1.5 seconds auto-submits the recording without pressing any button + 2. The voice mode toggle has three visible states (text only / voice input / full voice) and persists the selected mode across page refreshes + 3. An agent response delivered in full voice mode plays back automatically in the chat thread; the auto-play can be turned off in settings and stays off after a page reload + 4. The chat message for a voice interaction shows a voice badge and an expandable section revealing the full markdown response with code blocks intact + 5. Voice recording and VAD work correctly in Chrome and Firefox on the Mac Mini (COOP/COEP headers satisfy SharedArrayBuffer requirements) +**Plans**: TBD **UI hint**: yes -### Phase 32: Multi-Step Onboarding Wizard -**Goal**: Users move through a complete, skippable onboarding flow that assembles hardware data, provider selection, and voice options into a summary screen — and can jump straight into chat from there -**Depends on**: Phase 31 -**Requirements**: ONBD-04, ONBD-05, ONBD-06 +### Phase 38: Telegram Bridge +**Goal**: The user can message any Nexus agent from their phone via Telegram — text and voice notes both work, agent identity is visible on every reply, and the bot is set up through guided onboarding with no manual token entry in config files +**Depends on**: Phase 36 +**Requirements**: TGRAM-01, TGRAM-02, TGRAM-03, TGRAM-04, TGRAM-05, TGRAM-06, ONBRD-03 **Success Criteria** (what must be TRUE): - 1. A user can click "Skip" on every onboarding step (hardware, provider, voice) and reach the summary screen; the resulting workspace has at least one working agent with a valid provider - 2. The summary screen shows the configured providers and agent-model pairings for the selected mode; no corporate language ("company", "CEO", "mission") appears anywhere in the flow - 3. From the summary screen, one click navigates directly to the Personal Assistant chat or the project dashboard (depending on chosen mode) with no additional prompts -**Plans**: 1 plan + 1. Sending a text message to the Nexus Telegram bot from a phone produces an agent reply prefixed with the agent name (e.g. `[PM]: response`) within 10 seconds + 2. Sending a voice note to the Telegram bot produces a transcription confirmation message followed by the agent's text reply — the bot does not silently fail or miss the update + 3. Requesting a voice reply from the bot returns an OGG voice note that plays back correctly in the Telegram mobile app + 4. The Telegram bridge runs via long polling with no public HTTPS endpoint required — verified by running on the Mac Mini behind NAT + 5. The entire `telegram.ts` service file is under 500 lines + 6. The onboarding wizard includes a BotFather setup step that walks through creating a bot token and saves it to `nexus-settings.json` without manual file editing +**Plans**: TBD -Plans: -- [x] 32-01-PLAN.md — Summary step, skip buttons, chat handoff -**UI hint**: yes - -### Phase 33: Persistent Memory + Personal Assistant Mode -**Goal**: Users in Personal AI Assistant mode accumulate memory across sessions that shapes future responses — with no risk of credentials leaking into prompts — and can hand off any conversation to a PM agent with context intact -**Depends on**: Phase 32 -**Requirements**: ASST-01, ASST-02, ASST-03, ASST-04 +### Phase 39: Voice Polish +**Goal**: Voice responses begin playing before synthesis is complete (sentence-buffered), a single response can be synthesized in multiple languages simultaneously, and new installs can detect STT/TTS hardware capability during onboarding and enable voice in one step +**Depends on**: Phase 37 +**Requirements**: VPIPE-07, VPIPE-08, ONBRD-01, ONBRD-02 **Success Criteria** (what must be TRUE): - 1. A fact stated in one chat session ("I prefer TypeScript") is referenced correctly by the assistant in a new session started after closing and reopening the browser - 2. Pasting an API key or token into chat and then starting a new session results in the assistant having no knowledge of that credential — the sanitization blocklist prevented it from being stored - 3. A user clicks "Turn this into a project" in an assistant conversation; a PM agent is created with a system message containing the conversation summary and they land on the project dashboard - 4. A user with mode set to "Both" can switch between Personal Assistant chat and the project dashboard without losing context or cross-contaminating assistant memory with project agent messages -**Plans**: 3 plans - -Plans: -- [x] 33-01-PLAN.md — Memory sanitizer, assistant memory service, REST routes, and unit tests -- [x] 33-02-PLAN.md — PersonalAssistantPage, useNexusMode hook, sidebar navigation, route wiring -- [x] 33-03-PLAN.md — Real AI streaming with memory injection, assistant-to-PM handoff route and UI -**UI hint**: yes - -### Phase 34: Voice -**Goal**: Users can speak to the assistant (Whisper STT) and hear responses read aloud (Piper TTS) — Piper pre-warms visibly so the first synthesis call does not appear broken, and voice is offered during onboarding based on hardware capability -**Depends on**: Phase 32 -**Requirements**: VOICE-01, VOICE-02, VOICE-03 -**Success Criteria** (what must be TRUE): - 1. On a CPU-only machine (no GPU), enabling Piper TTS in the assistant produces audible speech output within a reasonable time after the first synthesis (not a silent hang) - 2. When Piper's WASM voice model is downloading for the first time, a visible progress indicator is shown before the TTS toggle is enabled; the download completes and TTS works without a page reload - 3. The onboarding voice step offers Whisper STT and Piper TTS toggles only when the hardware detection step has confirmed sufficient capability; on hardware below the threshold, the step is skipped or shows a capability warning -**Plans**: 2 plans - -Plans: -- [x] 34-01-PLAN.md — Fix /transcribe route registration, Piper TTS hook + TtsButton, voiceEnabled in nexus-settings -- [x] 34-02-PLAN.md — VoiceStep onboarding component, wizard step insertion, PersonalAssistant voice wiring -**UI hint**: yes - -### Phase 35: npx buildthis CLI -**Goal**: A developer can run `npx buildthis` on a fresh machine and either open an already-running Nexus or be guided through install — with the same hardware detection and provider tiering as the web onboarding -**Depends on**: Phase 30 (hardware detection service must exist) -**Requirements**: CLI-01, CLI-02 -**Success Criteria** (what must be TRUE): - 1. Running `npx buildthis` on a machine where Nexus is already running opens the Nexus UI in the default browser; running it on a machine with no Nexus guides the user through installation steps - 2. The CLI bootstrapper detects the same hardware tier (GPU / Apple Silicon / CPU-only) as the web onboarding and presents the matching provider tier recommendations in the terminal prompt -**Plans**: 1 plan - -Plans: -- [x] 35-01-PLAN.md — Package scaffold, hardware detection, two-path bootstrap (probe running vs guide install), provider selection, tests + 1. For a multi-sentence agent response, the first sentence begins playing in the browser before the second sentence has finished synthesizing — the gap between text completion and first audio is under 1 second + 2. A user can request the same agent response as audio in both English and Danish; both OGG files are generated and available for playback without a second agent call + 3. On a fresh install, the onboarding hardware probe reports whether Whisper STT and Piper TTS are runnable on the detected hardware tier + 4. The onboarding voice step activates (showing enable/skip options) only when the hardware probe confirms sufficient capability; on hardware below threshold it shows a capability note and skips to the next step +**Plans**: TBD --- ## Coverage Validation -All 21 v1.5 requirements are mapped to exactly one phase. No orphans. +All 23 v1.6 requirements are mapped to exactly one phase. No orphans. | Requirement | Phase | |-------------|-------| -| ONBD-01 | 30 | -| ONBD-02 | 30 | -| ONBD-03 | 30 | -| ONBD-07 | 30 | -| CLOUD-01 | 31 | -| CLOUD-02 | 31 | -| CLOUD-03 | 31 | -| CLOUD-04 | 31 | -| CLOUD-05 | 31 | -| ONBD-04 | 32 | -| ONBD-05 | 32 | -| ONBD-06 | 32 | -| ASST-01 | 33 | -| ASST-02 | 33 | -| ASST-03 | 33 | -| ASST-04 | 33 | -| VOICE-01 | 34 | -| VOICE-02 | 34 | -| VOICE-03 | 34 | -| CLI-01 | 35 | -| CLI-02 | 35 | +| VPIPE-01 | 36 | +| VPIPE-02 | 36 | +| VPIPE-03 | 36 | +| VPIPE-04 | 36 | +| VPIPE-05 | 36 | +| VPIPE-06 | 36 | +| WCHAT-01 | 37 | +| WCHAT-02 | 37 | +| WCHAT-03 | 37 | +| WCHAT-04 | 37 | +| WCHAT-05 | 37 | +| WCHAT-06 | 37 | +| TGRAM-01 | 38 | +| TGRAM-02 | 38 | +| TGRAM-03 | 38 | +| TGRAM-04 | 38 | +| TGRAM-05 | 38 | +| TGRAM-06 | 38 | +| ONBRD-03 | 38 | +| VPIPE-07 | 39 | +| VPIPE-08 | 39 | +| ONBRD-01 | 39 | +| ONBRD-02 | 39 | --- @@ -237,9 +210,13 @@ All 21 v1.5 requirements are mapped to exactly one phase. No orphans. | 27. Hermes Adapter | v1.4 | 1/1 | Complete | 2026-04-02 | | 28. Ollama Integration & Agent Surface | v1.4 | 3/3 | Complete | 2026-04-02 | | 29. Default Provider & End-to-End | v1.4 | 2/2 | Complete | 2026-04-02 | -| 30. Hardware Detection + Mode Selection | v1.5 | 2/2 | Complete | 2026-04-03 | -| 31. Puter.js Zero-Config Cloud | v1.5 | 4/4 | Complete | 2026-04-03 | -| 32. Multi-Step Onboarding Wizard | v1.5 | 1/1 | Complete | 2026-04-03 | -| 33. Persistent Memory + Personal Assistant Mode | v1.5 | 3/3 | Complete | 2026-04-03 | -| 34. Voice | v1.5 | 2/2 | Complete | 2026-04-03 | -| 35. npx buildthis CLI | v1.5 | 1/1 | Complete | 2026-04-03 | +| 30. Hardware Detection + Mode Selection | v1.5 | 2/2 | Complete | 2026-04-03 | +| 31. Puter.js Zero-Config Cloud | v1.5 | 4/4 | Complete | 2026-04-03 | +| 32. Multi-Step Onboarding Wizard | v1.5 | 1/1 | Complete | 2026-04-03 | +| 33. Persistent Memory + Personal Assistant Mode | v1.5 | 3/3 | Complete | 2026-04-03 | +| 34. Voice | v1.5 | 2/2 | Complete | 2026-04-03 | +| 35. npx buildthis CLI | v1.5 | 1/1 | Complete | 2026-04-03 | +| 36. Voice Pipeline Foundation | v1.6 | 0/TBD | Not started | - | +| 37. Web Chat Voice UI | v1.6 | 0/TBD | Not started | - | +| 38. Telegram Bridge | v1.6 | 0/TBD | Not started | - | +| 39. Voice Polish | v1.6 | 0/TBD | Not started | - | diff --git a/.planning/STATE.md b/.planning/STATE.md index 0f496b72..908766fd 100644 --- a/.planning/STATE.md +++ b/.planning/STATE.md @@ -7,7 +7,7 @@ stopped_at: null last_updated: "2026-04-03" last_activity: 2026-04-03 progress: - total_phases: 0 + total_phases: 4 completed_phases: 0 total_plans: 0 completed_plans: 0 @@ -21,14 +21,16 @@ progress: See: .planning/PROJECT.md (updated 2026-04-03) **Core value:** A fresh onboard asks for ONE thing (root directory), auto-creates PM + Engineer agents, and drops you in the dashboard. -**Current focus:** Defining requirements for v1.6 +**Current focus:** Phase 36 — Voice Pipeline Foundation (ready to plan) ## Current Position -Phase: Not started (defining requirements) -Plan: — -Status: Defining requirements -Last activity: 2026-04-03 — Milestone v1.6 started +Phase: 36 of 39 (Voice Pipeline Foundation) +Plan: — (not started) +Status: Ready to plan +Last activity: 2026-04-03 — v1.6 roadmap created (4 phases, 23 requirements mapped) + +Progress: [░░░░░░░░░░] 0% ## Performance Metrics @@ -45,11 +47,13 @@ Last activity: 2026-04-03 — Milestone v1.6 started Decisions are logged in PROJECT.md Key Decisions table. Key constraints for v1.6: -- Voice pipeline is transport-agnostic — no Telegram-specific code in core voice components -- Telegram bridge is intentionally disposable (<500 lines) — will be replaced by v2.2 Command Center -- Dual output always: voice response + full technical details in text -- Voice mode is a per-message flag, not a per-agent setting -- v1.5 already has VoiceRecordButton, TtsButton, usePiperTts hooks in place — build on these +- voicePipelineService is the keystone — Phase 37 and Phase 38 both depend on it; build Phase 36 first +- Telegram bridge uses long polling (grammY `bot.start()`) — no public HTTPS required on Mac Mini +- Audio transcoding via ffmpeg-static ^5.2.0 — NOT archived fluent-ffmpeg (archived May 2025) +- Voice mode flag must survive every pipeline layer: client → Express → message persistence → agent codec +- COOP/COEP headers required for @ricky0123/vad-react SharedArrayBuffer (add to Express static middleware) +- Phase 37 and Phase 38 are independent once Phase 36 ships; sequential ordering for single-developer delivery +- Telegram bridge must stay under 500 lines (TGRAM-06 is a hard constraint) ### Pending Todos @@ -57,10 +61,12 @@ None yet. ### Blockers/Concerns -- [v1.5 carryover] smart-whisper Apple Silicon acceleration claim unverified on Mac Mini M4 — fall back to `tiny.en` if `base.en` acceleration not confirmed on device +- [v1.5 carryover] smart-whisper Apple Silicon acceleration unverified on Mac Mini M4 — fall back to `tiny.en` if `base.en` acceleration not confirmed +- [v1.6] grammY session management approach not yet chosen: lightweight `Map` vs. grammY conversation plugin — decide at Phase 38 planning +- [v1.6] Dual output prompt reliability on 7B models is ~90% — Approach B fallback (post-process markdown strip) must be implemented as safety net, not optional ## Session Continuity Last session: 2026-04-03 -Stopped at: Milestone v1.6 initialized +Stopped at: Roadmap created — 4 phases defined, 23/23 requirements mapped Resume file: None