nexus/.planning/research/STACK.md
2026-04-04 03:55:49 +00:00

20 KiB

Technology Stack: v1.5 Smart Onboarding + Personal AI Assistant

Project: Nexus v1.5 — additive to existing fork maintenance stack (see prior milestone research for branding/fork strategy) Researched: 2026-04-02 Scope: NEW libraries only — Puter.js, hardware detection, Whisper STT + Piper TTS, OAuth, npx buildthis CLI, persistent memory Confidence: MEDIUM-HIGH (most verified via official docs; a few version numbers from npm search only)


Existing Stack (Do Not Change)

The following are already installed and working. Zero changes needed:

Area What's There Location
CLI framework Commander.js ^13.1.0 + @clack/prompts ^0.10.0 cli/package.json
Hardware/Ollama Custom detection (v1.4) + systeminformation likely via existing adapter packages/adapters/hermes
Server auth better-auth 1.4.18 server/package.json
UI React 19, Vite 6, Tailwind v4, TanStack Query v5 ui/package.json
DB LibSQL/Drizzle ORM server/package.json

New Libraries by Feature Area

1. Puter.js — Zero-Config Cloud AI

Package: @heyputer/puter.js Version: latest (no stable semver pinned on npm — use @latest and lock in pnpm-lock) Where it lives: ui/ only — Puter.js is a frontend-first browser SDK

Why: 500+ models (GPT-4o, Claude, Gemini, Grok, DeepSeek) with zero API keys and zero developer billing. Users authenticate with their own Puter account; usage cost falls on the user, not the developer. This is the project's "zero-config cloud" tier — the entire value prop depends on this library.

How the API works:

// Browser only — import via script tag or bundler
import Puter from "@heyputer/puter.js";

// Chat (streaming)
const stream = await puter.ai.chat("Hello", {
  model: "gpt-4o",
  stream: true,
});
for await (const part of stream) {
  process.stdout.write(part?.text ?? "");
}

// Image generation, TTS, STT also available under puter.ai.*

Integration point: New PuterAdapter in packages/adapters/ following the existing adapter pattern. The adapter wraps puter.ai.chat() and maps to the shared AdapterMessage type. Keep it display-layer only — no server-side Puter calls.

Constraint: Puter.js runs in browser context only. Do NOT add it to server/ or cli/. The adapter must be a frontend-only workspace package or inlined into the UI.

Confidence: HIGH — Official docs verified at developer.puter.com. User-pays model confirmed.


2. Hardware Detection — GPU, RAM, Apple Silicon

Package: systeminformation Version: ^5.31.5 (latest stable; v6 TypeScript rewrite is in progress but not released) Where it lives: server/ (runs on the Mac Mini; browser APIs cannot access hardware)

Why: The only comprehensive cross-platform system info library for Node.js with 20M+ monthly downloads. Covers CPU, total RAM, GPU model/VRAM, and Apple Silicon GPU core count — exactly what's needed for model recommendation. Alternatives (detect-gpu, gpu-info) are browser-only or Windows-only.

Key functions for v1.5:

import si from "systeminformation";

// Total system RAM
const mem = await si.mem(); // mem.total in bytes

// GPU info — works on macOS, Windows, Linux
const graphics = await si.graphics();
// graphics.controllers[0].vram — VRAM in MB (dedicated GPU)
// graphics.controllers[0].cores — GPU cores (Apple Silicon only)
// graphics.controllers[0].model — e.g. "Apple M4 Pro"

Apple Silicon nuance: Apple Silicon has unified memory — there is no separate VRAM. si.graphics() returns vram: 0 and populates cores with GPU core count instead. The model recommendation logic must handle this: use mem.total as effective VRAM for Apple Silicon, scaled by a configurable fraction (typically 0.75 since OS+apps compete for the same pool).

Existing usage in v1.4: Ollama detection and RAM/VRAM recommendations are already implemented. This is an additive enhancement — if systeminformation is not yet imported in the server, add it. If it is, extend the existing detection service.

Confidence: HIGH — Verified via systeminformation.io official docs. Apple Silicon behavior confirmed via GPU core detection doc.


3. Whisper STT — Speech to Text (CPU-capable)

Recommendation: smart-whisper Version: ^0.8.1 (latest as of October 2025) Where it lives: server/ as an optional service (graceful degradation if model not downloaded)

Why over alternatives:

  • smart-whisper: Native Node.js addon wrapping whisper.cpp directly. Supports loading one model for parallel inferences. Auto-enables Apple Neural Engine acceleration on macOS. Pre-built binaries for macOS arm64 (Mac Mini M4).
  • nodejs-whisper (v0.2.9, 10 months old): Older, CPU-focused, spawns a subprocess. Works but slower and less maintained.
  • whisper-node (v1.1.1, 2 years old): Abandoned.

Model recommendation for Mac Mini M4:

  • base.en model (~140MB) — good balance of speed/accuracy for English voice input
  • small.en model (~460MB) — better accuracy if user has RAM to spare
  • Models download lazily on first voice use; onboarding should gate voice on model availability

Integration pattern:

import { Whisper } from "smart-whisper";

const whisper = new Whisper("base.en"); // downloads on first call
const transcript = await whisper.transcribe(audioBuffer, { language: "en" });

Server endpoint: Add POST /api/voice/transcribe that accepts audio blob (WAV/WebM from browser MediaRecorder), returns transcript JSON. The existing v1.3 voice input uses browser-side Web Speech API as a fallback — this is the local/offline upgrade path.

Confidence: MEDIUM — Package verified on npm and GitHub. Version from GitHub releases page. Apple Silicon acceleration confirmed in README. No production deployment data for this specific version.


4. Piper TTS — Text to Speech (CPU-capable)

Recommendation: Spawn piper binary via child_process, do NOT use a Node.js wrapper library Why: No mature, production-ready Node.js binding for Piper TTS exists as of April 2026. The @mintplex-labs/piper-tts-web package is browser-only. ONNX-based implementations exist in Python (piper-onnx) and partially in JavaScript for Bun, but none are packaged for Node.js production use.

Approach:

import { spawn } from "child_process";
import path from "path";

// piper binary downloaded to ~/.paperclip/voice/piper
// voice model downloaded to ~/.paperclip/voice/models/
async function synthesize(text: string, modelPath: string): Promise<Buffer> {
  return new Promise((resolve, reject) => {
    const proc = spawn("piper", [
      "--model", modelPath,
      "--output-raw",
    ]);
    const chunks: Buffer[] = [];
    proc.stdout.on("data", (chunk) => chunks.push(chunk));
    proc.stdout.on("end", () => resolve(Buffer.concat(chunks)));
    proc.stdin.write(text);
    proc.stdin.end();
  });
}

Alternative for pure-JS TTS (fallback/cloud): The browser's window.speechSynthesis API covers the cloud and basic local cases without any server dependency. Use Web Speech API as the default TTS tier; offer Piper as an optional "high-quality offline voice" that the user must enable explicitly.

Piper binary distribution: During onboarding, detect if piper binary exists at ~/.paperclip/voice/piper. If not, show download prompt. Use https://github.com/rhasspy/piper/releases to fetch the macOS arm64 binary. Store in ~/.paperclip/ (Nexus never renames this dir per PROJECT.md constraints).

Recommended voice model for Mac Mini M4: en_US-lessac-medium (~63MB) — good quality, fast on Apple Silicon.

Confidence: MEDIUM — Based on official Piper GitHub + community blog posts (Bun runtime example). Subprocess approach is the proven path. ONNX-native Node.js path is theoretically possible but no maintained package exists.


5. OAuth Flows — Google Gemini + OpenAI Free Tiers

Recommendation: openid-client v6 Version: ^6.8.2 (latest stable, complete v6 API rewrite) Where it lives: server/ — OAuth flows run server-side with PKCE

Why openid-client over passport.js:

  • Passport.js adds middleware abstraction that conflicts with Nexus's existing better-auth setup (already in server/package.json)
  • openid-client v6 is a certified OAuth 2/OIDC client that handles PKCE natively without middleware
  • Works alongside better-auth — openid-client handles the provider OAuth dance; better-auth handles the Nexus session

What it provides:

  • Authorization Code Flow with PKCE (required by OAuth 2.1)
  • Discovery via .well-known/openid-configuration — works for both Google and any OpenAI-compatible provider
  • Token refresh, revocation, introspection

Integration pattern:

import * as client from "openid-client";

// Google discovery
const googleConfig = await client.discovery(
  new URL("https://accounts.google.com"),
  process.env.GOOGLE_CLIENT_ID!,
  process.env.GOOGLE_CLIENT_SECRET!
);

// Generate PKCE challenge
const codeVerifier = client.randomPKCECodeVerifier();
const codeChallenge = await client.calculatePKCECodeChallenge(codeVerifier);

Note on "zero sign-up": Puter.js handles the zero-API-key tier. OAuth is the tier above that — where users already have Google/OpenAI accounts and want to connect them. Keep these separate in the onboarding UI: Puter tier requires zero setup; OAuth tier shows "Connect your Google account" CTA.

Server routes to add:

  • GET /api/oauth/google/start — initiate flow, return redirect URL
  • GET /api/oauth/google/callback — exchange code for tokens, store encrypted
  • Same pattern for OpenAI when their OAuth flow is stable

Confidence: MEDIUM — openid-client v6 verified via GitHub and npm. Google OIDC integration confirmed. OpenAI's free tier OAuth specifics are LOW confidence (their free tier structure changes frequently).


6. npx buildthis — CLI Bootstrapper

No new library needed. The package structure is a standard npm pattern.

What to build: A new npm package buildthis (or scoped @nexus/buildthis) published to npm. When run via npx buildthis, it:

  1. Detects if Nexus server is running locally (localhost:4000 or configured port)
  2. If yes: opens browser to onboarding URL
  3. If no: guides user through one-command install (Docker or native)

Package structure:

cli-bootstrapper/          # New top-level directory in the Nexus monorepo
  package.json             # name: "buildthis", bin: { "buildthis": "./dist/index.js" }
  src/
    index.ts               # #!/usr/bin/env node shebang entry
  dist/                    # bundled by esbuild (same config as existing CLI)

package.json bin field:

{
  "name": "buildthis",
  "version": "0.1.0",
  "bin": {
    "buildthis": "./dist/index.js"
  },
  "files": ["dist"]
}

Key constraint: Keep buildthis dependencies minimal. npx downloads and installs the package fresh on each invocation. Heavy dependencies (e.g. Commander.js, Inquirer) add 200-500ms to startup. Use Node.js built-ins (readline, https, child_process) wherever possible. Acceptable: @clack/prompts (already a project dependency, ~20KB).

Existing CLI packages already use: Commander.js ^13.1.0, @clack/prompts ^0.10.0, picocolors. Reuse these — they're already in the project's lockfile.

Confidence: HIGH — npx bin-field pattern is official Node.js documentation. No novel library choices required.


7. Persistent Memory — Personal AI Assistant

Recommendation: Two-layer approach — SQLite for structured memory + local vector search for semantic recall

Layer 1 — Structured facts: Use the existing LibSQL/Drizzle ORM stack. Add a memories table with columns: id, user_id, content (text), embedding (blob), created_at, source (conversation | explicit). No new DB library needed — LibSQL supports this schema.

Layer 2 — Semantic search: vectra Version: ^0.12.3 (last published ~1 month ago) Where it lives: server/ as an optional memory service

Why vectra:

  • Zero infrastructure — index is a folder of JSON files on disk. Fits ~/.paperclip/memory/ perfectly.
  • Sub-millisecond lookup for small corpora (<10K items, typical personal assistant use)
  • TypeScript-native, MIT licensed
  • No cloud dependency, no server process

Embeddings for vectra: Use Ollama's nomic-embed-text model (already in the Ollama ecosystem from v1.4). This avoids any OpenAI API key dependency for the memory layer.

import { LocalIndex } from "vectra";
import ollama from "ollama"; // already installed via hermes adapter

const index = new LocalIndex(path.join(process.env.PAPERCLIP_HOME!, "memory"));

// Store memory
const { embeddings } = await ollama.embeddings({ model: "nomic-embed-text", prompt: text });
await index.insertItem({ vector: embeddings[0], metadata: { content: text, date: Date.now() } });

// Recall memories
const results = await index.queryItems(queryEmbedding, 5);

Why NOT mem0ai: mem0ai npm package defaults to OpenAI for both the LLM and embedder. Local/offline configuration is not documented in the Node SDK (only the Python SDK supports local providers). Using it would introduce an OpenAI API key hard dependency that conflicts with the "zero-config local-first" goal.

Why NOT LangChain MemoryVectorStore: LangChain JS is 40MB+ of dependencies and would be the largest single addition to the project. For a personal assistant's memory layer, vectra + Ollama embeddings is 1/20th the footprint.

Confidence: MEDIUM — vectra verified on npm/GitHub. Ollama embeddings confirmed via ollama.com docs. mem0ai limitation confirmed via their Node SDK docs (no local LLM option documented).


Installation Summary

# server/ — add these dependencies
pnpm --filter @paperclipai/server add systeminformation openid-client vectra

# server/ — smart-whisper (optional, for local STT)
pnpm --filter @paperclipai/server add smart-whisper

# ui/ — Puter.js frontend SDK
pnpm --filter @paperclipai/ui add @heyputer/puter.js

# New package for npx bootstrapper (separate publish)
# cli-bootstrapper/package.json — no new external deps beyond @clack/prompts

Alternatives Considered

Feature Recommended Alternative Why Not
Hardware detection systeminformation ^5.31.5 detect-gpu Browser-only; Node.js usage not supported
Hardware detection systeminformation ^5.31.5 gpu-info Windows-only; no macOS/Linux support
STT smart-whisper ^0.8.1 nodejs-whisper ^0.2.9 Subprocess-based, 10 months stale, slower on Apple Silicon
STT smart-whisper ^0.8.1 Cloud Whisper API Requires API key; breaks offline/local-first promise
TTS Piper binary via child_process @mintplex-labs/piper-tts-web Browser-only npm package, cannot run in Node.js server
TTS Piper binary sherpa-onnx ^1.12.34 Supports both STT+TTS but adds 80MB binary; overkill if using smart-whisper for STT
OAuth openid-client ^6.8.2 passport-oauth2 Adds middleware layer that conflicts with existing better-auth session handling
Memory vectra ^0.12.3 + Ollama embeddings mem0ai Node SDK requires OpenAI; no local embedding option documented
Memory vectra ^0.12.3 + Ollama embeddings LangChain MemoryVectorStore 40MB+ transitive dependency footprint; overkill for personal use scale
Zero-config cloud @heyputer/puter.js Direct provider SDKs Would require managing API keys per user; Puter eliminates this entirely

What NOT to Add

Avoid Why Use Instead
passport.js Conflicts with existing better-auth; adds middleware overhead openid-client v6 (certified, no middleware)
langchain or llamaindex 40-80MB dep footprint; overkill for single-user personal assistant vectra + direct Ollama calls
mem0ai Node SDK OpenAI hard dependency in Node SDK; no local embedding option Custom memory layer: vectra + Ollama nomic-embed-text
@mintplex-labs/piper-tts-web Browser-only, cannot be used in Node.js server Piper binary subprocess
Any browser extension for auth Security risk; not applicable to local app Standard PKCE via openid-client
electron or tauri PROJECT.md target is web app on Mac Mini, not desktop app Existing Vite/Express architecture

Version Compatibility Notes

Package Compatible With Notes
systeminformation ^5.31.5 Node.js >=18 v6 is being rewritten in TS but not released; stick with v5
smart-whisper ^0.8.1 Node.js >=18, macOS arm64 Prebuilt binaries for Apple Silicon — no compilation needed
openid-client ^6.8.2 Node.js >=20 v6 is a full rewrite; do not use v5 patterns (completely different API)
vectra ^0.12.3 Node.js >=16 File-based; no native addons, no compilation
@heyputer/puter.js Browser (Vite/ESM) Not for Node.js server use

Integration Architecture

Browser (UI)                    Server (Express)
─────────────────               ────────────────────────────────
@heyputer/puter.js  ──────────→ No server proxy needed
                                (Puter calls go direct to puter.com)

React voice input ──────────→  POST /api/voice/transcribe
                                  └── smart-whisper (local STT)
                                      └── ~140MB model file in ~/.paperclip/voice/

GET /api/system/hardware  ←────  systeminformation
                                  └── GPU cores, total RAM, GPU model

React onboarding OAuth ────────→ GET /api/oauth/google/start
                                  └── openid-client PKCE flow
                                  └── GET /api/oauth/google/callback

Personal assistant chat ───────→ POST /api/assistant/chat
                                  └── vectra recall (nomic-embed-text via Ollama)
                                  └── context injection → selected AI provider

TTS response ──────────────────→ POST /api/voice/synthesize
                                  └── piper binary subprocess
                                  └── returns raw PCM → browser Audio API

Sources


Stack research for: Nexus v1.5 Smart Onboarding + Personal AI Assistant Researched: 2026-04-02 Prior milestone stack research (fork maintenance): see STACK.md entry dated 2026-03-30 (preserved above this file was overwritten — the fork maintenance content is in git history)