nexus/.planning/research/STACK.md

# Technology Stack: v1.5 Smart Onboarding + Personal AI Assistant

**Project:** Nexus v1.5 — additive to existing fork maintenance stack (see prior milestone research for branding/fork strategy)
**Researched:** 2026-04-02
**Scope:** NEW libraries only — Puter.js, hardware detection, Whisper STT + Piper TTS, OAuth, `npx buildthis` CLI, persistent memory
**Confidence:** MEDIUM-HIGH (most verified via official docs; a few version numbers from npm search only)

---

## Existing Stack (Do Not Change)

The following are already installed and working. Zero changes needed:

| Area | What's There | Location |
|------|-------------|----------|
| CLI framework | Commander.js `^13.1.0` + `@clack/prompts ^0.10.0` | `cli/package.json` |
| Hardware/Ollama | Custom detection (`v1.4`) + `systeminformation` likely via existing adapter | `packages/adapters/hermes` |
| Server auth | `better-auth 1.4.18` | `server/package.json` |
| UI | React 19, Vite 6, Tailwind v4, TanStack Query v5 | `ui/package.json` |
| DB | LibSQL/Drizzle ORM | `server/package.json` |

---

## New Libraries by Feature Area

### 1. Puter.js — Zero-Config Cloud AI

**Package:** `@heyputer/puter.js`
**Version:** latest (no stable semver pinned on npm — use `@latest` and lock in pnpm-lock)
**Where it lives:** `ui/` only — Puter.js is a frontend-first browser SDK

**Why:** 500+ models (GPT-4o, Claude, Gemini, Grok, DeepSeek) with zero API keys and zero developer billing. Users authenticate with their own Puter account; usage cost falls on the user, not the developer. This is the project's "zero-config cloud" tier — the entire value prop depends on this library.

**How the API works:**

```typescript
// Browser only — import via script tag or bundler
import Puter from "@heyputer/puter.js";

// Chat (streaming)
const stream = await puter.ai.chat("Hello", {
  model: "gpt-4o",
  stream: true,
});
for await (const part of stream) {
  process.stdout.write(part?.text ?? "");
}

// Image generation, TTS, STT also available under puter.ai.*
```

**Integration point:** New `PuterAdapter` in `packages/adapters/` following the existing adapter pattern. The adapter wraps `puter.ai.chat()` and maps to the shared `AdapterMessage` type. Keep it display-layer only — no server-side Puter calls.

**Constraint:** Puter.js runs in browser context only. Do NOT add it to `server/` or `cli/`. The adapter must be a frontend-only workspace package or inlined into the UI.

**Confidence: HIGH** — Official docs verified at developer.puter.com. User-pays model confirmed.

---

### 2. Hardware Detection — GPU, RAM, Apple Silicon

**Package:** `systeminformation`
**Version:** `^5.31.5` (latest stable; v6 TypeScript rewrite is in progress but not released)
**Where it lives:** `server/` (runs on the Mac Mini; browser APIs cannot access hardware)

**Why:** The only comprehensive cross-platform system info library for Node.js with 20M+ monthly downloads. Covers CPU, total RAM, GPU model/VRAM, and Apple Silicon GPU core count — exactly what's needed for model recommendation. Alternatives (`detect-gpu`, `gpu-info`) are browser-only or Windows-only.

**Key functions for v1.5:**

```typescript
import si from "systeminformation";

// Total system RAM
const mem = await si.mem(); // mem.total in bytes

// GPU info — works on macOS, Windows, Linux
const graphics = await si.graphics();
// graphics.controllers[0].vram — VRAM in MB (dedicated GPU)
// graphics.controllers[0].cores — GPU cores (Apple Silicon only)
// graphics.controllers[0].model — e.g. "Apple M4 Pro"
```

**Apple Silicon nuance:** Apple Silicon has unified memory — there is no separate VRAM. `si.graphics()` returns `vram: 0` and populates `cores` with GPU core count instead. The model recommendation logic must handle this: use `mem.total` as effective VRAM for Apple Silicon, scaled by a configurable fraction (typically 0.75 since OS+apps compete for the same pool).

**Existing usage in v1.4:** Ollama detection and RAM/VRAM recommendations are already implemented. This is an additive enhancement — if `systeminformation` is not yet imported in the server, add it. If it is, extend the existing detection service.

**Confidence: HIGH** — Verified via systeminformation.io official docs. Apple Silicon behavior confirmed via GPU core detection doc.

---

### 3. Whisper STT — Speech to Text (CPU-capable)

**Recommendation:** `smart-whisper`
**Version:** `^0.8.1` (latest as of October 2025)
**Where it lives:** `server/` as an optional service (graceful degradation if model not downloaded)

**Why over alternatives:**
- `smart-whisper`: Native Node.js addon wrapping whisper.cpp directly. Supports loading one model for parallel inferences. Auto-enables Apple Neural Engine acceleration on macOS. Pre-built binaries for macOS arm64 (Mac Mini M4).
- `nodejs-whisper` (v0.2.9, 10 months old): Older, CPU-focused, spawns a subprocess. Works but slower and less maintained.
- `whisper-node` (v1.1.1, 2 years old): Abandoned.

**Model recommendation for Mac Mini M4:**
- `base.en` model (~140MB) — good balance of speed/accuracy for English voice input
- `small.en` model (~460MB) — better accuracy if user has RAM to spare
- Models download lazily on first voice use; onboarding should gate voice on model availability

**Integration pattern:**

```typescript
import { Whisper } from "smart-whisper";

const whisper = new Whisper("base.en"); // downloads on first call
const transcript = await whisper.transcribe(audioBuffer, { language: "en" });
```

**Server endpoint:** Add `POST /api/voice/transcribe` that accepts audio blob (WAV/WebM from browser MediaRecorder), returns transcript JSON. The existing v1.3 voice input uses browser-side Web Speech API as a fallback — this is the local/offline upgrade path.

**Confidence: MEDIUM** — Package verified on npm and GitHub. Version from GitHub releases page. Apple Silicon acceleration confirmed in README. No production deployment data for this specific version.

---

### 4. Piper TTS — Text to Speech (CPU-capable)

**Recommendation:** Spawn `piper` binary via `child_process`, do NOT use a Node.js wrapper library
**Why:** No mature, production-ready Node.js binding for Piper TTS exists as of April 2026. The `@mintplex-labs/piper-tts-web` package is browser-only. ONNX-based implementations exist in Python (`piper-onnx`) and partially in JavaScript for Bun, but none are packaged for Node.js production use.

**Approach:**

```typescript
import { spawn } from "child_process";
import path from "path";

// piper binary downloaded to ~/.paperclip/voice/piper
// voice model downloaded to ~/.paperclip/voice/models/
async function synthesize(text: string, modelPath: string): Promise<Buffer> {
  return new Promise((resolve, reject) => {
    const proc = spawn("piper", [
      "--model", modelPath,
      "--output-raw",
    ]);
    const chunks: Buffer[] = [];
    proc.stdout.on("data", (chunk) => chunks.push(chunk));
    proc.stdout.on("end", () => resolve(Buffer.concat(chunks)));
    proc.stdin.write(text);
    proc.stdin.end();
  });
}
```

**Alternative for pure-JS TTS (fallback/cloud):** The browser's `window.speechSynthesis` API covers the cloud and basic local cases without any server dependency. Use Web Speech API as the default TTS tier; offer Piper as an optional "high-quality offline voice" that the user must enable explicitly.

**Piper binary distribution:** During onboarding, detect if piper binary exists at `~/.paperclip/voice/piper`. If not, show download prompt. Use `https://github.com/rhasspy/piper/releases` to fetch the macOS arm64 binary. Store in `~/.paperclip/` (Nexus never renames this dir per PROJECT.md constraints).

**Recommended voice model for Mac Mini M4:** `en_US-lessac-medium` (~63MB) — good quality, fast on Apple Silicon.

**Confidence: MEDIUM** — Based on official Piper GitHub + community blog posts (Bun runtime example). Subprocess approach is the proven path. ONNX-native Node.js path is theoretically possible but no maintained package exists.

---

### 5. OAuth Flows — Google Gemini + OpenAI Free Tiers

**Recommendation:** `openid-client` v6
**Version:** `^6.8.2` (latest stable, complete v6 API rewrite)
**Where it lives:** `server/` — OAuth flows run server-side with PKCE

**Why openid-client over passport.js:**
- Passport.js adds middleware abstraction that conflicts with Nexus's existing `better-auth` setup (already in `server/package.json`)
- `openid-client` v6 is a certified OAuth 2/OIDC client that handles PKCE natively without middleware
- Works alongside `better-auth` — openid-client handles the provider OAuth dance; better-auth handles the Nexus session

**What it provides:**
- Authorization Code Flow with PKCE (required by OAuth 2.1)
- Discovery via `.well-known/openid-configuration` — works for both Google and any OpenAI-compatible provider
- Token refresh, revocation, introspection

**Integration pattern:**

```typescript
import * as client from "openid-client";

// Google discovery
const googleConfig = await client.discovery(
  new URL("https://accounts.google.com"),
  process.env.GOOGLE_CLIENT_ID!,
  process.env.GOOGLE_CLIENT_SECRET!
);

// Generate PKCE challenge
const codeVerifier = client.randomPKCECodeVerifier();
const codeChallenge = await client.calculatePKCECodeChallenge(codeVerifier);
```

**Note on "zero sign-up":** Puter.js handles the zero-API-key tier. OAuth is the tier above that — where users already have Google/OpenAI accounts and want to connect them. Keep these separate in the onboarding UI: Puter tier requires zero setup; OAuth tier shows "Connect your Google account" CTA.

**Server routes to add:**
- `GET /api/oauth/google/start` — initiate flow, return redirect URL
- `GET /api/oauth/google/callback` — exchange code for tokens, store encrypted
- Same pattern for OpenAI when their OAuth flow is stable

**Confidence: MEDIUM** — openid-client v6 verified via GitHub and npm. Google OIDC integration confirmed. OpenAI's free tier OAuth specifics are LOW confidence (their free tier structure changes frequently).

---

### 6. `npx buildthis` — CLI Bootstrapper

**No new library needed.** The package structure is a standard npm pattern.

**What to build:** A new npm package `buildthis` (or scoped `@nexus/buildthis`) published to npm. When run via `npx buildthis`, it:
1. Detects if Nexus server is running locally (`localhost:4000` or configured port)
2. If yes: opens browser to onboarding URL
3. If no: guides user through one-command install (Docker or native)

**Package structure:**

```
cli-bootstrapper/          # New top-level directory in the Nexus monorepo
  package.json             # name: "buildthis", bin: { "buildthis": "./dist/index.js" }
  src/
    index.ts               # #!/usr/bin/env node shebang entry
  dist/                    # bundled by esbuild (same config as existing CLI)
```

**`package.json` bin field:**

```json
{
  "name": "buildthis",
  "version": "0.1.0",
  "bin": {
    "buildthis": "./dist/index.js"
  },
  "files": ["dist"]
}
```

**Key constraint:** Keep `buildthis` dependencies minimal. `npx` downloads and installs the package fresh on each invocation. Heavy dependencies (e.g. Commander.js, Inquirer) add 200-500ms to startup. Use Node.js built-ins (`readline`, `https`, `child_process`) wherever possible. Acceptable: `@clack/prompts` (already a project dependency, ~20KB).

**Existing CLI packages already use:** Commander.js `^13.1.0`, `@clack/prompts ^0.10.0`, `picocolors`. Reuse these — they're already in the project's lockfile.

**Confidence: HIGH** — npx bin-field pattern is official Node.js documentation. No novel library choices required.

---

### 7. Persistent Memory — Personal AI Assistant

**Recommendation:** Two-layer approach — SQLite for structured memory + local vector search for semantic recall

**Layer 1 — Structured facts:** Use the existing LibSQL/Drizzle ORM stack. Add a `memories` table with columns: `id`, `user_id`, `content` (text), `embedding` (blob), `created_at`, `source` (`conversation` | `explicit`). No new DB library needed — LibSQL supports this schema.

**Layer 2 — Semantic search:** `vectra`
**Version:** `^0.12.3` (last published ~1 month ago)
**Where it lives:** `server/` as an optional memory service

**Why vectra:**
- Zero infrastructure — index is a folder of JSON files on disk. Fits `~/.paperclip/memory/` perfectly.
- Sub-millisecond lookup for small corpora (<10K items, typical personal assistant use)
- TypeScript-native, MIT licensed
- No cloud dependency, no server process

**Embeddings for vectra:** Use Ollama's `nomic-embed-text` model (already in the Ollama ecosystem from v1.4). This avoids any OpenAI API key dependency for the memory layer.

```typescript
import { LocalIndex } from "vectra";
import ollama from "ollama"; // already installed via hermes adapter

const index = new LocalIndex(path.join(process.env.PAPERCLIP_HOME!, "memory"));

// Store memory
const { embeddings } = await ollama.embeddings({ model: "nomic-embed-text", prompt: text });
await index.insertItem({ vector: embeddings[0], metadata: { content: text, date: Date.now() } });

// Recall memories
const results = await index.queryItems(queryEmbedding, 5);
```

**Why NOT mem0ai:** `mem0ai` npm package defaults to OpenAI for both the LLM and embedder. Local/offline configuration is not documented in the Node SDK (only the Python SDK supports local providers). Using it would introduce an OpenAI API key hard dependency that conflicts with the "zero-config local-first" goal.

**Why NOT LangChain MemoryVectorStore:** LangChain JS is 40MB+ of dependencies and would be the largest single addition to the project. For a personal assistant's memory layer, vectra + Ollama embeddings is 1/20th the footprint.

**Confidence: MEDIUM** — vectra verified on npm/GitHub. Ollama embeddings confirmed via ollama.com docs. mem0ai limitation confirmed via their Node SDK docs (no local LLM option documented).

---

## Installation Summary

```bash
# server/ — add these dependencies
pnpm --filter @paperclipai/server add systeminformation openid-client vectra

# server/ — smart-whisper (optional, for local STT)
pnpm --filter @paperclipai/server add smart-whisper

# ui/ — Puter.js frontend SDK
pnpm --filter @paperclipai/ui add @heyputer/puter.js

# New package for npx bootstrapper (separate publish)
# cli-bootstrapper/package.json — no new external deps beyond @clack/prompts
```

---

## Alternatives Considered

| Feature | Recommended | Alternative | Why Not |
|---------|-------------|-------------|---------|
| Hardware detection | `systeminformation ^5.31.5` | `detect-gpu` | Browser-only; Node.js usage not supported |
| Hardware detection | `systeminformation ^5.31.5` | `gpu-info` | Windows-only; no macOS/Linux support |
| STT | `smart-whisper ^0.8.1` | `nodejs-whisper ^0.2.9` | Subprocess-based, 10 months stale, slower on Apple Silicon |
| STT | `smart-whisper ^0.8.1` | Cloud Whisper API | Requires API key; breaks offline/local-first promise |
| TTS | Piper binary via `child_process` | `@mintplex-labs/piper-tts-web` | Browser-only npm package, cannot run in Node.js server |
| TTS | Piper binary | `sherpa-onnx ^1.12.34` | Supports both STT+TTS but adds 80MB binary; overkill if using smart-whisper for STT |
| OAuth | `openid-client ^6.8.2` | `passport-oauth2` | Adds middleware layer that conflicts with existing `better-auth` session handling |
| Memory | `vectra ^0.12.3` + Ollama embeddings | `mem0ai` | Node SDK requires OpenAI; no local embedding option documented |
| Memory | `vectra ^0.12.3` + Ollama embeddings | LangChain MemoryVectorStore | 40MB+ transitive dependency footprint; overkill for personal use scale |
| Zero-config cloud | `@heyputer/puter.js` | Direct provider SDKs | Would require managing API keys per user; Puter eliminates this entirely |

---

## What NOT to Add

| Avoid | Why | Use Instead |
|-------|-----|-------------|
| `passport.js` | Conflicts with existing `better-auth`; adds middleware overhead | `openid-client v6` (certified, no middleware) |
| `langchain` or `llamaindex` | 40-80MB dep footprint; overkill for single-user personal assistant | `vectra` + direct Ollama calls |
| `mem0ai` Node SDK | OpenAI hard dependency in Node SDK; no local embedding option | Custom memory layer: `vectra` + Ollama `nomic-embed-text` |
| `@mintplex-labs/piper-tts-web` | Browser-only, cannot be used in Node.js server | Piper binary subprocess |
| Any browser extension for auth | Security risk; not applicable to local app | Standard PKCE via `openid-client` |
| `electron` or `tauri` | PROJECT.md target is web app on Mac Mini, not desktop app | Existing Vite/Express architecture |

---

## Version Compatibility Notes

| Package | Compatible With | Notes |
|---------|-----------------|-------|
| `systeminformation ^5.31.5` | Node.js >=18 | v6 is being rewritten in TS but not released; stick with v5 |
| `smart-whisper ^0.8.1` | Node.js >=18, macOS arm64 | Prebuilt binaries for Apple Silicon — no compilation needed |
| `openid-client ^6.8.2` | Node.js >=20 | v6 is a full rewrite; do not use v5 patterns (completely different API) |
| `vectra ^0.12.3` | Node.js >=16 | File-based; no native addons, no compilation |
| `@heyputer/puter.js` | Browser (Vite/ESM) | Not for Node.js server use |

---

## Integration Architecture

```
Browser (UI)                    Server (Express)
─────────────────               ────────────────────────────────
@heyputer/puter.js  ──────────→ No server proxy needed
                                (Puter calls go direct to puter.com)

React voice input ──────────→  POST /api/voice/transcribe
                                  └── smart-whisper (local STT)
                                      └── ~140MB model file in ~/.paperclip/voice/

GET /api/system/hardware  ←────  systeminformation
                                  └── GPU cores, total RAM, GPU model

React onboarding OAuth ────────→ GET /api/oauth/google/start
                                  └── openid-client PKCE flow
                                  └── GET /api/oauth/google/callback

Personal assistant chat ───────→ POST /api/assistant/chat
                                  └── vectra recall (nomic-embed-text via Ollama)
                                  └── context injection → selected AI provider

TTS response ──────────────────→ POST /api/voice/synthesize
                                  └── piper binary subprocess
                                  └── returns raw PCM → browser Audio API
```

---

## Sources

- [Puter.js developer docs](https://developer.puter.com/) — API structure, user-pays model confirmed
- [Puter.js npm install](https://docs.puter.com/) — package name `@heyputer/puter.js` verified
- [systeminformation npm](https://www.npmjs.com/package/systeminformation) — v5.31.5 latest, v6 in progress
- [systeminformation GPU docs](https://systeminformation.io/graphics.html) — Apple Silicon GPU cores confirmed
- [smart-whisper GitHub releases](https://github.com/JacobLinCool/smart-whisper/releases) — v0.8.1, October 2025
- [openid-client npm](https://www.npmjs.com/package/openid-client) — v6.8.2, PKCE confirmed
- [openid-client v6 migration discussion](https://github.com/panva/openid-client/discussions/702) — API changes documented
- [vectra npm](https://www.npmjs.com/package/vectra) — v0.12.3, file-backed vector index
- [Ollama embedding models](https://ollama.com/blog/embedding-models) — nomic-embed-text capability confirmed
- [Piper TTS GitHub](https://github.com/rhasspy/piper) — macOS arm64 binary available
- [Running Piper TTS with JS (Bun)](https://n4ze3m.com/blog/running-piper-tts-with-javascript-in-the-bun-runtime) — ONNX approach documented
- [mem0 Node SDK docs](https://docs.mem0.ai/open-source/node-quickstart) — OpenAI default confirmed, no local option documented
- [clack/prompts npm](https://www.npmjs.com/package/@clack/prompts) — v1.2.0 latest (CLI already uses ^0.10.0)
- [npx bin field pattern](https://docs.npmjs.com/cli/v11/commands/npx/) — official npm docs

---

*Stack research for: Nexus v1.5 Smart Onboarding + Personal AI Assistant*
*Researched: 2026-04-02*
*Prior milestone stack research (fork maintenance): see STACK.md entry dated 2026-03-30 (preserved above this file was overwritten — the fork maintenance content is in git history)*