nexus/.planning/research/FEATURES.md

# Feature Research

**Domain:** Smart Onboarding + Personal AI Assistant (Nexus v1.5)
**Researched:** 2026-04-02
**Confidence:** MEDIUM overall — Puter.js confirmed current, hardware detection patterns confirmed, personal AI assistant patterns from active ecosystem; UX recommendations inferred from patterns

---

## Milestone Scope

This document covers only the NEW features in v1.5. Existing features (NexusOnboardingWizard, Hermes adapter, Ollama integration, chat interface, PWA, voice input via Whisper) are already built and are dependencies, not deliverables.

**New features being researched:**
- Hardware detection with pre-built model database
- Tiered provider setup: local (Ollama) → zero-config cloud (Puter.js) → OAuth cloud (Gemini, OpenAI) → API key / subscription (Hermes, Claude Code, OpenClaw)
- Personal AI Assistant mode with persistent memory, MCP connections, voice (Whisper + Piper)
- Project handoff: assistant conversation → PM agent with context transfer
- `npx buildthis` CLI entry point
- Every step skippable

---

## Feature Landscape

### Table Stakes (Users Expect These)

Features users assume exist in a modern AI onboarding flow. Missing these makes onboarding feel broken or untrustworthy.

| Feature | Why Expected | Complexity | Notes |
|---------|--------------|------------|-------|
| Hardware auto-detection on first run | Any local AI tool probes GPU/RAM; users expect "it just knows" | MEDIUM | Node.js can read `/proc/meminfo`, spawn `nvidia-smi`, detect Apple Silicon via `os.arch()`; Ollama's `/api/tags` endpoint also reveals loaded models |
| RAM-aware model recommendations | Ollama and LM Studio both do this; users have been trained to expect it | LOW | Pre-built lookup table: <8GB RAM → 3B-7B, 8-16GB → 7B-13B, 16GB+ → 30B+; VRAM takes priority over system RAM |
| Step-skippable onboarding | Any wizard that forces completion feels hostile; Clerk, Vercel, and Postman all allow skip | LOW | Each step needs a "skip" or "set up later" affordance; final summary shows what was skipped |
| Progress indicator | Multi-step wizards without progress indicators cause anxiety ("how many more steps?") | LOW | Step counter or progress bar; 5-7 max steps total |
| Summary screen before entering app | Users need to understand what was set up before being dropped in the dashboard | LOW | Show: mode selected, provider configured, models available; "Start chatting" CTA |
| "Test connection" before saving | Every API key entry form should validate before proceeding | LOW | Quick `/health` or echo call to configured provider; show latency |
| Persisted onboarding state | Refreshing mid-wizard should not restart from step 1 | LOW | LocalStorage or DB; existing NexusOnboardingWizard already handles this pattern |
| Voice input/output toggle | Users who selected voice features expect them to work immediately | MEDIUM | Whisper already exists (v1.3); Piper TTS is the new addition; toggle in assistant settings |
| Persistent conversation memory | Any "personal AI assistant" product ships some form of memory (ChatGPT, Claude Projects, Gemini) | HIGH | Users compare against ChatGPT memory; table stakes for the mode to feel meaningful |
| MCP-style external connections | Power users expect the assistant to connect to their tools (files, git, search) | MEDIUM | MCP is now a universal standard (Anthropic, OpenAI, Google all adopted it); STDIO and HTTP transport both needed |

### Differentiators (Competitive Advantage)

Features that make Nexus v1.5 worth using over ChatGPT, Claude Projects, or bare Ollama.

| Feature | Value Proposition | Complexity | Notes |
|---------|-------------------|------------|-------|
| Puter.js as zero-config cloud tier | No API key, no sign-up, 500+ models including GPT-4.1, Claude Sonnet 4, Gemini 2.5 — user pays via their Puter account | MEDIUM | Puter uses a "user-pays" model: each user authenticates against Puter and consumes their own credits. Developer (Mikkel) pays nothing. Implementation: drop in `puter.js` script, call `puter.ai.chat()`; requires user to have/create a Puter account (free tier exists) |
| Local-first framed as privacy premium | Most tools push cloud. Nexus frames local Ollama as the privacy-respecting choice, not the budget option | LOW | Copy/UX decision: "Your data never leaves your machine" for local tier. No code change needed |
| Hardware detection → instant model recommendation | Instead of listing 100 models and asking the user to pick, Nexus says "Given your M4 Mac Mini with 16GB unified memory, we recommend llama3.2:3b for assistant tasks" | MEDIUM | Pre-built model database (JSON lookup): Apple Silicon tiers, NVIDIA VRAM tiers, AMD VRAM tiers, CPU-only tier. Cross-reference with Ollama model library metadata |
| Project handoff from assistant to PM agent | "Turn this conversation into a project" — one button to create a Paperclip Project with issues extracted from the conversation, with full chat context transferred to PM agent | HIGH | Novel UX pattern; no off-the-shelf solution; requires: summary extraction from conversation (LLM call), Project entity creation via existing API, agent prompt injection with context summary |
| `npx buildthis` CLI entry point | Zero-install UX: `npx buildthis` downloads and runs the Nexus server + opens browser. Same pattern as `create-react-app`, `shadcn`, etc. | MEDIUM | Commander.js CLI already exists; `npx` entry requires: `bin` field in package.json, published to npm (or private registry), auto-open browser after server starts |
| Voice + local LLM = fully offline assistant | Whisper (STT) + Piper TTS + Ollama (LLM) = zero cloud dependency for voice interaction. Rare in consumer tools | HIGH | Piper is CPU-capable, fast enough on Apple Silicon. Integration complexity: audio pipeline (mic → Whisper → Ollama → Piper → speaker); streaming TTS for lower latency |
| Mode selection: Personal AI / Project Builder / Both | Most tools are either a chat assistant or a project manager. Nexus surfaces both modes with explicit switching | LOW | UI mode toggle stored in workspace settings; affects which features are surfaced in sidebar/dashboard |
| Google OAuth cloud tier (no API key) | Users with Google accounts can use Gemini without managing API keys — mirrors how Opencode handles Gemini OAuth | MEDIUM | Google OAuth flow → exchange for short-lived AI Studio token; already proven pattern in Opencode |

### Anti-Features (Commonly Requested, Often Problematic)

Features that seem like good additions but create maintenance debt, scope creep, or user confusion.

| Feature | Why Requested | Why Problematic | Alternative |
|---------|---------------|-----------------|-------------|
| "Sync memory to cloud" for personal assistant | Users want memory accessible across devices | Requires auth system, cloud storage, privacy policy, GDPR compliance — enormous scope for a personal tool | Local SQLite memory is sufficient for Mac Mini single-user; defer cloud sync to a future milestone |
| Automatic MCP server discovery | Users want zero-config MCP like Bluetooth discovery | MCP servers expose arbitrary capabilities; auto-discovery without user approval is a security risk | Curated list of common MCP servers (filesystem, git, web search) with one-click add; user approves each |
| Real-time provider cost display during chat | Visible per-message token cost feels responsive | Puter.js explicitly does not expose cost to developer (user-pays model); cost calculation would require hardcoding token prices that drift | Show estimated costs for API-key providers only; for Puter.js, show "costs charged to your Puter account" |
| Streaming TTS (word-by-word) | Reduces perceived latency of voice responses | Browser audio API makes true word-by-word streaming complex; sentence-by-sentence is the practical optimum | Buffer by sentence (split on `.!?`); start playing first sentence while next is synthesizing |
| Multi-user onboarding / team setup | Looks natural to "extend" to teams | Nexus is intentionally single-user (Mac Mini, local_trusted mode); team features require auth overhaul | Explicitly document single-user scope; defer team features until upstream Paperclip ships them |
| AI provider auto-negotiation (pick best available) | Transparent provider switching sounds smart | Silent model switches confuse users ("why did my assistant suddenly get dumber?"); debugging becomes impossible | Show active provider in UI always; let user set preferred priority order; never switch silently |

---

## Feature Dependencies

```
Hardware Detection
    └──feeds──> Model Recommendation DB
                    └──feeds──> Local AI Setup (Ollama tier)

Puter.js Integration
    └──requires──> Puter account (user-side; not a Nexus dependency)
    └──requires──> Client-side script inclusion (no server-side secrets)

Personal AI Assistant Mode
    └──requires──> Mode Selection (Personal / Project Builder / Both)
    └──requires──> Persistent Memory Store (SQLite via existing DB)
    └──requires──> Existing Chat Interface (v1.3 ChatPanel) [already built]

MCP Connections
    └──requires──> Personal AI Assistant Mode (MCP is an assistant-mode feature)
    └──requires──> STDIO transport (Node.js child_process, already available in CLI)

Voice (Piper TTS)
    └──requires──> Existing Whisper STT (v1.3) [already built]
    └──enhances──> Personal AI Assistant Mode

Project Handoff
    └──requires──> Personal AI Assistant Mode (conversation context exists there)
    └──requires──> Existing PM Agent Template (v1.4) [already built]
    └──requires──> Existing Project entity (upstream Paperclip) [already built]
    └──requires──> LLM summarization call (any configured provider)

npx buildthis
    └──requires──> Existing CLI (Commander.js) [already built]
    └──requires──> npm publish or private registry setup

Google OAuth Cloud Tier
    └──requires──> OAuth flow (Google Sign-In)
    └──independent──> other provider tiers (each tier is additive)
```

### Dependency Notes

- **Persistent memory requires existing DB:** Paperclip already uses SQLite/Postgres; a `memory` table (key/value or embedding store) can be added. No ORM change needed if using raw SQL in a new file.
- **MCP requires assistant mode to be active:** MCP connections are scoped to the Personal AI Assistant mode, not the Project Builder. They should not be surfaced during project management workflows.
- **Hardware detection is a one-time onboarding concern:** Results should be cached; re-detection should be available in Settings but not re-run on every launch.
- **Puter.js has no server-side dependency:** The entire integration is client-side JavaScript. This is both a strength (zero backend changes) and a constraint (Puter auth happens in the browser, not on the Nexus server).

---

## MVP Definition

### Launch With (v1.5 Milestone)

Minimum viable set to validate the milestone goals.

- [ ] **Mode selection UI** — Personal AI / Project Builder / Both selector in onboarding + settings. Why essential: gates all assistant-specific features.
- [ ] **Hardware detection + model recommendation** — Detect RAM/VRAM, recommend Ollama model. Why essential: the primary UX claim of "smart onboarding."
- [ ] **Puter.js cloud tier** — Zero-config provider for users without local AI. Why essential: removes the "I have to install Ollama" barrier.
- [ ] **Personal AI Assistant chat with persistent memory** — Conversations that remember previous sessions. Why essential: defines the Personal AI Assistant mode as meaningfully different from existing chat.
- [ ] **Summary screen → straight into chat** — After onboarding completes, land in chat not dashboard. Why essential: closes the onboarding funnel.
- [ ] **Every step skippable** — Including hardware detection, cloud setup, MCP config. Why essential: PROJECT.md explicitly requires this.
- [ ] **Piper TTS** — Text-to-speech for assistant responses. Why essential: completes the voice loop that Whisper STT already started.

### Add After Validation (v1.5.x)

Features to add once core assistant mode is working.

- [ ] **Project handoff** — "Turn this conversation into a project" button. Trigger: assistant mode is stable and used regularly.
- [ ] **MCP server connections** — Curated list with one-click add. Trigger: users request specific tool integrations.
- [ ] **Google OAuth cloud tier** — Gemini without API key. Trigger: Puter.js limitations surface (rate limits, cost surprises for users).
- [ ] **`npx buildthis` CLI entry point** — Zero-install UX. Trigger: sharing Nexus with others becomes a use case.

### Future Consideration (v2+)

Features to defer until post-v1.5.

- [ ] **OpenAI OAuth tier** — OpenAI free tier via OAuth; rate limits are aggressive and UX is complex.
- [ ] **Subscription/API key auto-detection** — Scan environment for `ANTHROPIC_API_KEY`, etc. Low user value vs. complexity.
- [ ] **Memory export/import** — Portable memory across reinstalls. Needs file format design.
- [ ] **Multi-MCP orchestration** — Parallel MCP server calls, result merging. Enterprise complexity for personal tool.

---

## Feature Prioritization Matrix

| Feature | User Value | Implementation Cost | Priority |
|---------|------------|---------------------|----------|
| Mode selection UI | HIGH | LOW | P1 |
| Hardware detection + model recommendation | HIGH | MEDIUM | P1 |
| Puter.js zero-config cloud | HIGH | MEDIUM | P1 |
| Persistent memory (SQLite) | HIGH | MEDIUM | P1 |
| Summary screen → chat | HIGH | LOW | P1 |
| Every step skippable | HIGH | LOW | P1 |
| Piper TTS | MEDIUM | MEDIUM | P1 |
| Project handoff | HIGH | HIGH | P2 |
| MCP connections (curated) | MEDIUM | MEDIUM | P2 |
| Google OAuth cloud tier | MEDIUM | MEDIUM | P2 |
| `npx buildthis` | LOW | MEDIUM | P2 |
| OpenAI free tier OAuth | LOW | HIGH | P3 |
| API key auto-detection | LOW | MEDIUM | P3 |

**Priority key:**
- P1: Must have for v1.5 launch
- P2: Should have, add in v1.5.x
- P3: Nice to have, v2+

---

## Competitor Feature Analysis

| Feature | ChatGPT | Claude Projects | Bare Ollama | Nexus v1.5 Approach |
|---------|---------|-----------------|-------------|---------------------|
| Persistent memory | Yes (cloud) | Yes (project instructions) | No | SQLite local; no cloud required |
| Hardware-aware setup | No | No | No | Pre-built model database; auto-recommend |
| Zero-config cloud | No (API key) | No (API key) | N/A | Puter.js user-pays model |
| Local/offline operation | No | No | Yes (manual) | Ollama + Piper + Whisper; fully offline |
| Voice I/O | Yes (cloud) | No | No | Whisper STT (existing) + Piper TTS (new) |
| Tool connections | Yes (plugins) | Yes (Projects) | No | MCP servers (curated list) |
| Project handoff | No | Partial (copy-paste) | No | One-button conversation → PM agent |
| Mode switching | No | No | No | Personal AI / Project Builder / Both |

---

## Provider Tier Architecture

The onboarding should present providers as a tiered funnel, not a flat list. Users land in the highest-comfort tier:

```
Tier 0: Already have Hermes / Claude Code / OpenClaw running
  └──detect via env vars or local port scan──> skip straight to summary

Tier 1: Local AI (most private, no cost)
  └──Ollama installed?──> detect models, recommend based on hardware
  └──Ollama not installed?──> show install prompt with one-liner

Tier 2: Zero-config cloud (easiest, user-pays)
  └──Puter.js──> "Sign in with Puter" → 500+ models, no API key
  └──User creates/logs into free Puter account

Tier 3: OAuth cloud (Google account required, free quota)
  └──Google Gemini──> OAuth flow → Gemini 2.0 Flash free tier
  └──Free tier as of 2026: reduced but functional (Gemini 2.0 Flash)

Tier 4: API key / subscription
  └──Hermes (existing)
  └──Claude Code (ANTHROPIC_API_KEY)
  └──OpenClaw (custom)
  └──OpenAI (OPENAI_API_KEY)
```

**Key insight:** Users should be steered toward Tier 0 or 1 first (most private, most robust for single-user Mac Mini). Puter.js (Tier 2) is the escape hatch for users who won't install Ollama, not the default recommendation.

---

## Puter.js Integration Notes

**Confidence:** MEDIUM — confirmed working from official docs, but production reliability and rate limit specifics are not publicly documented.

- Integration is entirely client-side: `<script src="https://js.puter.com/v2/"></script>` then `puter.ai.chat(model, message)`
- Supports 500+ models including GPT-4.1, Claude Sonnet 4, Gemini 2.5 Flash, Llama 3.x
- User authenticates against Puter (free account); developer incurs zero cost
- Rate limits: not publicly documented; Puter says "no restrictions" but this is unverified at scale
- Limitation: requires user to create/have a Puter account — this is friction vs. "truly zero-config"
- Risk: Puter's pricing model is described as "still being worked out" — future cost surprises for users possible
- Mitigation: Show clear messaging that Puter costs are the user's own account costs, not Nexus costs

---

## Hardware Detection Implementation Notes

**Confidence:** HIGH — patterns well-established across Ollama, LM Studio, llm-checker.

Detection sources (Node.js server-side, run once at onboarding):
1. `os.totalmem()` — system RAM (always available)
2. Spawn `nvidia-smi --query-gpu=memory.total --format=csv,noheader` — NVIDIA VRAM
3. `system_profiler SPDisplaysDataType` (macOS) — Apple Silicon unified memory
4. Ollama `/api/tags` endpoint — detect already-running models
5. `/proc/driver/nvidia/gpus/` (Linux) — alternative NVIDIA detection

Model recommendation lookup table (simplified):
```
CPU-only / <8GB RAM:  phi3:mini (3.8B), llama3.2:1b
8-16GB RAM:           llama3.2:3b, mistral:7b, phi3:medium
16-24GB unified:      llama3.1:8b, qwen2.5:7b
24GB+ unified / GPU:  llama3.1:70b (quantized), qwen2.5:32b
```

---

## Persistent Memory Implementation Notes

**Confidence:** MEDIUM — standard pattern, but the specific storage mechanism in Paperclip's DB needs verification.

Standard patterns in production personal AI assistants:
1. **Summary-based memory:** After each conversation, run an LLM call to extract key facts → store as `memory` rows. On next conversation, inject relevant memories into system prompt.
2. **Verbatim storage:** Store full conversation history, retrieve last N messages or vector-search for relevant passages.
3. **Hybrid:** Store both summaries (for long-term preferences) and recent verbatim context (for continuity).

Recommended for Nexus: Summary-based for long-term memory (preferences, ongoing projects, user facts) + last 10 messages as verbatim context. Avoids needing a vector database. Uses existing SQLite schema with a new `assistant_memories` table.

**MCP-compatible storage:** The MCP memory pattern (used by Penfield, mcp-memory-service) stores memories as MCP tool call results — same summary pattern, just with MCP as the transport. Nexus does not need to implement MCP just for memory; MCP is for external tool connections.

---

## Voice Architecture Notes

**Confidence:** MEDIUM — Piper confirmed CPU-capable and fast on Apple Silicon; full pipeline integration complexity is estimated, not measured.

Pipeline for full voice I/O:
```
Microphone → MediaRecorder (browser) → Whisper (existing, v1.3) → LLM (any provider)
                                                                         ↓
Speaker ← Web Audio API ← Piper TTS (new) ← Text response ←────────────┘
```

Piper TTS:
- Open-source (rhasspy/piper), MIT license
- Runs on CPU; Apple Silicon M4 handles it in real-time
- Node.js integration: spawn `piper` binary with text via stdin, read WAV from stdout
- Voice models: compact (few MB) per language/voice; ship one English voice as default
- Streaming: buffer by sentence for lower perceived latency (start playing sentence 1 while sentence 2 synthesizes)

Whisper is already integrated (v1.3). Piper adds the TTS half to complete the loop.

---

## Sources

- [Puter.js Free AI API (developer.puter.com)](https://developer.puter.com/tutorials/free-unlimited-ai-api/)
- [Puter.js Free LLM API (developer.puter.com)](https://developer.puter.com/tutorials/free-llm-api/)
- [Puter User-Pays Model (docs.puter.com)](https://docs.puter.com/user-pays-model/)
- [Ollama Hardware Detection and GPU Support (deepwiki.com)](https://deepwiki.com/ollama/ollama/6-gpu-and-hardware-support)
- [Ollama VRAM Requirements 2026 (localllm.in)](https://localllm.in/blog/ollama-vram-requirements-for-local-llms)
- [AI Hardware Guide 2026 (localaimaster.com)](https://localaimaster.com/blog/ai-hardware-requirements-2025-complete-guide)
- [Model Context Protocol Wikipedia](https://en.wikipedia.org/wiki/Model_Context_Protocol)
- [MCP for Persistent Memory (medium.com)](https://medium.com/mynextdeveloper/how-to-set-up-model-context-protocol-mcp-for-persistent-memory-in-your-ai-app-9c2f819f5c21)
- [Piper TTS GitHub (rhasspy/piper)](https://github.com/rhasspy/piper)
- [Voice Chat with Local LLMs: Whisper + TTS (insiderllm.com)](https://www.insiderllm.com/guides/voice-chat-local-llms-whisper-tts/)
- [Google Gemini API Free Tier 2026 (aifreeapi.com)](https://www.aifreeapi.com/en/posts/google-gemini-api-free-tier)
- [Google Gemini OAuth via Opencode (syntackle.com)](https://syntackle.com/blog/google-gemini-ai-subscription-with-opencode/)
- [AI Handoff Patterns in Multi-Agent Systems (towardsdatascience.com)](https://towardsdatascience.com/how-agent-handoffs-work-in-multi-agent-systems/)
- [Building an NPX CLI Tool (johnsedlak.com)](https://johnsedlak.com/blog/2025/03/building-an-npx-cli-tool)
- [Postman Onboarding UX Lessons (candu.ai)](https://www.candu.ai/blog/postman-onboarding-ux-lessons)

---
*Feature research for: Nexus v1.5 Smart Onboarding + Personal AI Assistant*
*Researched: 2026-04-02*