nexus/.planning/research/PITFALLS.md
2026-04-04 03:55:49 +00:00

636 lines
48 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Domain Pitfalls — Nexus Fork of Paperclip
**Domain:** Forked open-source project with display-layer renames, no i18n layer
**Researched:** 2026-04-02 (updated for v1.5 milestone: smart onboarding, multi-provider, voice TTS, persistent memory, assistant mode, `npx buildthis`)
**Confidence:** HIGH — based on direct codebase analysis of `/opt/nexus/` plus targeted research on each new integration domain
---
## About This Document
This file covers pitfalls for the **v1.5 milestone additions**. The original pitfalls (Pitfalls 111) covering fork hygiene, display-layer rename discipline, and upstream sync remain valid and are preserved below. Pitfalls 1226 are new for v1.5.
---
## Critical Pitfalls (Fork Hygiene — v1.01.4, still active)
---
### Pitfall 1: Renaming a Code Identifier That Is Also a Stored DB Value
**What goes wrong:** You rename a TypeScript constant, CLI command, or function to use the new Nexus vocabulary, not realising the same string is also stored as a literal value in database rows. The app breaks for any existing installation because the server checks `approval.type === "hire_agent"` but the DB still has `"hire_agent"` rows.
**Why it happens:** In Paperclip the same string serves double duty: it is both a TypeScript constant/enum and a persisted DB value. The CONCERNS.md audit identifies these dual-purpose strings explicitly: `"ceo"`, `"hire_agent"`, `"approve_ceo_strategy"`, `"bootstrap_ceo"`, `"company"` in goal levels, `"board"` in auth challenges.
**How to avoid:**
1. Treat every string in the Summary Risk Table (CONCERNS.md) marked "Critical" as immutable.
2. For display renaming only: change label maps (`AGENT_ROLE_LABELS`, `ApprovalPayload` display maps) without touching the underlying constant value.
3. Before touching any string, grep for it in `packages/db/src/schema/` and migration files.
**Warning signs:**
- Any string appearing in `packages/db/src/schema/` or migration files
- Approval, invite, and goal lists empty on existing install but work on fresh install
**Phase to address:** Phase 1 (Display Rename)
---
### Pitfall 2: Treating "Display-Only Rename" as a Simple Find-Replace
**What goes wrong:** Bulk `sed` or IDE find-replace on "company" → "workspace" across the entire codebase. Touches service files, route files, schema files, and test files indiscriminately. The next `git rebase upstream/master` has conflicts on hundreds of files.
**Why it happens:** "Display-only" is a policy decision, not a property the codebase enforces. Nothing in the TypeScript source distinguishes a user-facing label string from an internal identifier.
**How to avoid:**
1. Establish a strict three-zone taxonomy: Zone A (display strings, safe), Zone B (code identifiers, do not rename), Zone C (dual-purpose stored values, label map only).
2. Never run a global find-replace. Work file-by-file.
**Warning signs:**
- PR diff touching `server/src/services/`, `server/src/routes/`, or `packages/db/` with rename changes
- Diff showing TypeScript identifier name changes (not JSX string literals)
**Phase to address:** Phase 1 (Display Rename)
---
### Pitfall 3: Diverging the Onboarding Assets Directory Name From Upstream
**What goes wrong:** Renaming `server/src/onboarding-assets/ceo/` to `pm/`. Upstream changes a file inside `ceo/` in a future commit. Git cannot reconcile rename-on-one-side with content-edit-on-other.
**How to avoid:** Do not rename the `ceo/` directory. Change file *content* only. The directory path is Zone B.
**Warning signs:** Rebase conflict shows a file as "deleted" that you expected to be "modified."
**Phase to address:** Phase 1 (Onboarding Redesign)
---
### Pitfall 4: Changing the `localStorage` Key or `~/.paperclip` Config Path Without a Migration
**What goes wrong:** Renaming `"paperclip.selectedCompanyId"` localStorage key or `~/.paperclip` config path drops all existing state.
**How to avoid:** Keep key names unchanged OR implement a read-both-paths fallback that migrates existing values on boot before deleting the old key.
**Warning signs:** Server logs "no config found, starting fresh" on a machine with existing data.
**Phase to address:** Phase 2 (Directory Restructure)
---
### Pitfall 5: Upstream Rebase Cadence Slipping Below Weekly
**What goes wrong:** Fork drift. Upstream has 120+ commits since fork. Waiting accumulates compound conflicts. A 10-minute weekly rebase becomes 4 hours after a month gap.
**How to avoid:** Rebase at minimum weekly. `[nexus]` commit prefix strictly enforced. CI alert on `git rebase upstream/master` failures in a test branch.
**Warning signs:** Last rebase more than 2 weeks ago; `git log upstream/master..HEAD` shows more than 20 upstream commits unmerged.
**Phase to address:** Ongoing from Phase 1
---
### Pitfall 6: Renaming the CLI Binary Name Without a Shim
**What goes wrong:** Renaming to `nexus` without updating all four locations where `paperclipai` appears as an instructional string.
**How to avoid:** Add `nexus` as an alias; keep `paperclipai` binary working. If renaming, atomic commit covering all instructional copy.
**Phase to address:** Phase 1 (CLI String Updates)
---
### Pitfall 7: Partial Rename — Changing Some Occurrences But Not All
**What goes wrong:** "CEO" renamed in 8 of 12 files. Users see mixed vocabulary.
**How to avoid:** Post-rename `grep -ri "CEO" ui/src cli/src server/src` and verify every remaining occurrence is Zone B/C or non-user-visible.
**Phase to address:** Phase 1 (Display Rename)
---
### Pitfall 8: The `[nexus]` Commit Prefix Not Applied Consistently From the Start
**What goes wrong:** Without consistent prefixing, rebase archaeology becomes necessary to identify which commits are Nexus vs. upstream.
**How to avoid:** Pre-commit hook rejecting messages not starting with `[nexus]` from the first commit.
**Phase to address:** Phase 1 (First commit)
---
### Pitfall 9: Onboarding Redesign Coupled to the Corporate Metaphor in Data Layer
**What goes wrong:** New wizard does not pass a company name; `POST /api/companies` requires it. Company created with undefined name.
**How to avoid:** Document API contract before redesigning wizard. Derive workspace name from directory basename (or VOCAB.appName as fallback — which `NexusOnboardingWizard.tsx` already does correctly).
**Phase to address:** Phase 2 (Onboarding Redesign)
---
### Pitfall 10: Forgetting to Update Tests That Assert on Display Strings
**What goes wrong:** `invite-onboarding-text.test.ts` asserts invite text contains "CEO." After rename, tests fail.
**How to avoid:** Before any rename commit, grep all `*.test.ts` files for old vocabulary terms and update in the same commit.
**Phase to address:** Phase 1 (Display Rename)
---
### Pitfall 11: Exporting a `.nexus.yaml` File While Upstream Exports `.paperclip.yaml`
**What goes wrong:** Breaking import compatibility with upstream Paperclip instances.
**How to avoid:** Keep emitting `.paperclip.yaml`. The filename and schema header are Zone B/C.
**Phase to address:** Phase 1 (Display Rename)
---
## Critical Pitfalls (v1.5 New Features)
---
### Pitfall 12: Vite Alias Swap Breaking Upstream Rebase on OnboardingWizard
**What goes wrong:** The current pattern aliases `src/components/OnboardingWizard``NexusOnboardingWizard` at build time via `vite.config.ts`. If upstream renames, moves, or splits `OnboardingWizard.tsx` into multiple files, the alias silently points to a non-existent target — the build succeeds (the alias target exists) but the import resolution breaks at runtime in any code path that imports the upstream file by a new name.
More critically: when v1.5 replaces the simple wizard with a multi-step hardware-detection wizard, the alias target `NexusOnboardingWizard.tsx` grows significantly. Upstream may add new features to `OnboardingWizard.tsx` (new props, context dependencies) that `NexusOnboardingWizard.tsx` silently misses, since it fully replaces rather than extends the upstream file.
**Why it happens:** Full file replacement via Vite alias means no inheritance from upstream. Every upstream improvement to the wizard is silently discarded.
**How to avoid:**
1. After each upstream rebase, diff `OnboardingWizard.tsx` against the previous upstream version: `git diff upstream-prev..upstream-new -- ui/src/components/OnboardingWizard.tsx`. If upstream adds new props or context hooks, integrate them into `NexusOnboardingWizard.tsx`.
2. Keep `NexusOnboardingWizard.tsx` surface API identical to `OnboardingWizard.tsx` (same component name export, same props interface as far as upstream is concerned).
3. Add a CI check: `test -f ui/src/components/OnboardingWizard.tsx` — verify the aliased-away file still exists with its expected export.
**Warning signs:**
- `NexusOnboardingWizard.tsx` not using a `DialogContext` or `CompanyContext` hook that upstream's version uses
- After rebase, `pnpm dev` fails with "cannot find module" for the alias source path
- The multi-step wizard is missing features that upstream added (e.g., invite-based onboarding, workspace templates)
**Phase to address:** Phase 1 (Hardware Detection Wizard) — before building the multi-step v1.5 wizard, establish a diff-and-integrate protocol for this alias.
---
### Pitfall 13: Hardware Detection Returning Inaccurate or Platform-Specific Values
**What goes wrong:** The v1.5 hardware detection step must surface GPU/RAM to recommend Ollama models. Two platform-specific traps exist on the Mac Mini M4 deploy target:
1. **VRAM is not VRAM on Apple Silicon.** The M4 uses unified memory — the same physical RAM serves both CPU and GPU. `os.totalmem()` in Node.js returns total unified memory. Reporting this as "VRAM available for Ollama" misleads: Ollama on Apple Silicon uses a portion of unified memory, but the OS, browser, and other processes also consume it. Treating `totalmem × 0.75` as GPU-available VRAM overestimates for models that also need system RAM headroom.
2. **`os.totalmem()` reads total installed RAM, not available RAM.** The existing `getRecommendedModel()` in `server/src/services/ollama.ts` already applies a 0.75 multiplier to account for OS overhead, but it uses total RAM, not free RAM. If the system is under load (Paperclip server + Ollama already running), available RAM is far lower than 75% of total.
**Why it happens:** Node.js `os` module has `totalmem()` and `freemem()` but no VRAM API. Browser `WebGL` UNMASKED_RENDERER gives GPU name but not VRAM size; actual VRAM queries are blocked by browser security sandboxing. Developers reach for the most accessible number.
**How to avoid:**
1. Use `os.freemem()` (not `totalmem()`) as the baseline for available-RAM recommendations when Ollama is already running.
2. On Apple Silicon, explicitly document in UI copy that "available memory" is unified memory shared with OS, not dedicated GPU VRAM.
3. Treat hardware detection values as hints, not guarantees. Add a message: "Recommendation based on system RAM. Actual performance may vary."
4. The pre-built model catalog (`ollama-model-catalog.json`) is the right layer for model-to-RAM requirements; use it as the authoritative source rather than computing from raw hardware numbers.
**Warning signs:**
- Model recommendation shows "fits in memory" but Ollama OOM-kills it at load time
- M4 Mac Mini reports 16GB available for models but the system has 16GB total (OS needs 46GB)
- AMD GPU users see wildly incorrect VRAM numbers (confirmed bug in Ollama's VRAM detection for AMD/Vulkan as of 2025)
**Phase to address:** Phase 1 (Hardware Detection) — define detection methodology before building the UI layer.
---
### Pitfall 14: The Onboarding Probe Running at the Wrong Authentication Level
**What goes wrong:** The existing adapter probe endpoint (`GET /adapters/:type/probe`) requires board authentication (`req.actor.type !== "board"`). The v1.5 onboarding wizard runs *during* first-time setup — before the user has authenticated. If the probe is called before board auth is established, every probe returns 403, the wizard always falls back to `claude_local`, and the user never gets the Hermes auto-detection benefit.
This is the exact scenario the current `NexusOnboardingWizard.tsx` is vulnerable to: it calls `agentsApi.probeAdapter("hermes_local")` on wizard open, but if the user arrives at the onboarding page without board auth (fresh install, incognito session), the probe silently fails and `defaultAdapter` stays `"claude_local"`.
**Why it happens:** Board auth is the right guard for post-setup adapter operations. But hardware detection and provider probing are legitimately pre-auth operations — you want to present the right setup path before any credentials exist.
**How to avoid:**
1. Create a separate `GET /system/providers` endpoint that does not require board auth. It returns available local providers (Ollama status, Hermes status) based purely on server-side detection (no user credentials needed).
2. Alternatively, make the probe endpoint check auth level: if no board auth exists (fresh install), allow the probe to run unauthenticated for a whitelist of safe probe types (`hermes_local`, `ollama`).
3. Never gate hardware detection on user credentials — hardware is a property of the machine, not the user session.
**Warning signs:**
- Browser network tab shows 403 on the probe call during onboarding
- `defaultAdapter` in the wizard is always `"claude_local"` even when Ollama/Hermes are running
- Probe works in the settings page (user is auth'd) but not during initial onboarding
**Phase to address:** Phase 1 (Hardware Detection) — the probe auth story must be designed before the multi-step wizard is built.
---
### Pitfall 15: Puter.js "Zero-Config" Promise Breaking on Paperclip's Server-Side Architecture
**What goes wrong:** Puter.js is designed for purely browser-side use: load the CDN script, call `puter.ai.chat()`, Puter handles auth via its own popup login flow. Nexus/Paperclip proxies AI calls through the server (`/api/chat`, `/api/agents`). If Puter.js is loaded browser-side and calls Puter's servers directly, it bypasses Paperclip's cost tracking, budget enforcement, session codec, and skill sync entirely.
This creates a split-brain: the Puter adapter sends messages to Puter's cloud while Paperclip's adapter system thinks the agent is using a different provider. Cost tracking shows $0 for Puter sessions. Heartbeat and session management are not wired up.
**Why it happens:** Puter.js is documented as a CDN-loaded browser library with client-side auth. The natural integration is to `<script src="https://js.puter.com/v2/">` and call the API directly. But Paperclip's architecture requires all AI calls to go through server-side adapter machinery.
**How to avoid:**
1. Implement Puter as a server-side adapter that calls Puter's API from Node.js using HTTP (not the browser SDK). The Puter API is callable via standard HTTP — use `fetch()` on the server, not the browser SDK.
2. The server-side Puter adapter must implement the full adapter contract: `spawn`, `heartbeat`, `sessionCodec`, `configFields` (see `packages/adapters/` pattern).
3. If browser-side Puter SDK is needed for auth popup (Puter uses its own account system), implement auth as a UI-only step that retrieves a Puter token, then stores that token in Paperclip's adapter config for server-side use.
4. Confirm Puter's rate limiting behavior for server-side calls. Puter's "free unlimited" claim applies to personal/hobby use; verify terms before treating it as production-grade.
**Warning signs:**
- Puter.js loaded via `<script>` CDN tag in the app shell
- Cost tracking shows $0 for all Puter-backed agent sessions
- `puter.ai.chat()` calls appearing in browser network tab (not proxied through `/api/`)
**Phase to address:** Phase 2 (Zero-Config Cloud / Puter.js)
---
### Pitfall 16: OAuth Token Storage in `localStorage` Creating Security and Rebase Risk
**What goes wrong:** The natural place to store OAuth access tokens in an SPA is `localStorage`. But:
1. `localStorage` is accessible to any JS on the page — XSS vulnerabilities can steal tokens.
2. Paperclip already uses `localStorage` with `"paperclip.*"` prefixed keys. Any Nexus key added with `"nexus.*"` prefix will need a migration if the key name is ever changed, per Pitfall 4.
3. OAuth refresh token rotation (required for Google/OpenAI free tiers) must clear-and-rewrite the stored token on every refresh. If this fails mid-write (e.g., browser close), the user is logged out and must re-authenticate.
**Why it happens:** `localStorage` is the default that every OAuth tutorial reaches for in SPA context. The PKCE security guidance says to use `sessionStorage` for the code verifier but often developers apply `localStorage` for the actual access token.
**How to avoid:**
1. Store OAuth tokens server-side in Paperclip's existing config/secrets mechanism (`server/src/secrets/`). The server does the OAuth exchange and stores the token; the browser never sees the raw token.
2. Use Paperclip's existing board auth cookie mechanism to gate whether the OAuth integration is enabled — do not create a separate browser-side auth session for each OAuth provider.
3. If browser-side token storage is unavoidable, use `sessionStorage` (not `localStorage`) for OAuth code verifiers; store refresh tokens server-side only.
4. For the state parameter in PKCE flow: generate a cryptographically random state with `crypto.getRandomValues()`, store in `sessionStorage`, verify on redirect.
**Warning signs:**
- `window.localStorage.getItem("nexus.oauth.google.accessToken")` or similar in browser DevTools
- OAuth token visible in network requests from browser to Google/OpenAI APIs (not proxied through Paperclip server)
- Re-authentication required after browser restart (session not persisting correctly)
**Phase to address:** Phase 3 (OAuth Cloud Tier)
---
### Pitfall 17: Multi-Provider Onboarding Creating Multiple Competing Default Adapters
**What goes wrong:** v1.5 adds multiple provider tiers: local Ollama/Hermes, free cloud Puter.js, OAuth Google Gemini/OpenAI, and subscription detection (Claude Code, OpenClaw). If a user configures more than one provider during onboarding, the resulting agents get created with the adapter config from the onboarding summary step. But Paperclip's agent model is one-adapter-per-agent. If the wizard creates agents without being explicit about which provider wins, agents may be created with inconsistent adapter types (one with `hermes_local`, another with `puter_cloud`), creating a confusing mixed-provider workspace.
The deeper trap: the onboarding wizard currently creates exactly 2 agents (PM + Engineer) with identical adapter config. v1.5 may want different agents on different providers (e.g., assistant on Puter, PM on Hermes). This is a valid architecture but requires explicit per-agent provider selection, which the current wizard doesn't support.
**Why it happens:** Multi-provider selection UX tends to present all providers as equally valid, then requires a tie-breaking decision the wizard may not have asked the user to make.
**How to avoid:**
1. Make the onboarding wizard select ONE primary provider and create all initial agents on that provider. Secondary provider credentials can be stored for later use (configuring individual agents from the settings page).
2. If the mode selection is "Personal AI Assistant," create the assistant agent on the highest-quality available provider (subscription > OAuth > Puter > local).
3. If the mode selection is "Project Builder," create PM + Engineer on the local/privacy-first provider since these agents run autonomously and should not require cloud API credits per task.
4. Document the provider selection logic explicitly in code comments.
**Warning signs:**
- PM agent created with `hermes_local`, Engineer created with `puter_cloud` after the same onboarding flow
- "Recommended provider" badge in wizard applied to multiple providers simultaneously
- Users confused about which API credits are being used for which agents
**Phase to address:** Phase 1 (Mode Selection) — define the provider-per-mode rule before building the selection UI.
---
### Pitfall 18: Voice TTS (Piper) Cold Start Blocking the First Spoken Response
**What goes wrong:** Piper TTS (browser WASM implementation) downloads the voice model on the first synthesis call. This means the first time a user activates TTS, they wait 530 seconds for the model to download before hearing anything. Without user feedback, this appears as a hang or broken feature.
A secondary trap: the WASM Piper phonemizer does not always match the phoneme mapping expected by every Piper voice model. Using a voice model that was compiled for a different language variant (e.g., an `en_GB` model on a browser Piper instance expecting `en_US` phoneme tables) produces garbled or silent output.
**Why it happens:** Browser-based Piper TTS stores models in the Origin Private File System (OPFS). The first call triggers the download. Developers who test Piper locally after the first call never encounter the cold start because the model is already cached.
**How to avoid:**
1. Pre-warm Piper on background thread during onboarding (after the voice step is confirmed, not on first message). Use a silent warmup synthesis ("...") to trigger model download before the user expects to hear anything.
2. Show a download progress indicator on the TTS toggle — not a spinner (implies in-progress work) but a "preparing voice model" state with estimated download size.
3. Limit initial voice model choices to stable Piper models with confirmed browser WASM compatibility. Avoid offering non-English models unless specifically verified.
4. Store pre-downloaded voice models in OPFS; on subsequent loads, check `navigator.storage.getDirectory()` before re-downloading.
**Warning signs:**
- TTS button appears responsive (toggles on) but no audio plays for 15+ seconds
- Voice model download appears in DevTools network tab on the first "speak" action
- Users reporting "the voice feature is broken" on first use but "works fine" on subsequent uses
**Phase to address:** Phase 4 (Voice TTS) — warmup strategy must be designed before the TTS toggle is wired up.
---
### Pitfall 19: Persistent Memory Injecting Sensitive Data Into System Prompts
**What goes wrong:** The Personal AI Assistant stores memories (user preferences, past conversation summaries, project context) to inject into future system prompts. Two failure modes:
1. **Prompt injection via stored memory.** If memory content is retrieved from external sources (web fetch, document import, MCP tools) and stored verbatim, malicious content in those sources gets injected into future system prompts with elevated priority. Palo Alto Unit 42 documented this attack vector in 2025: memory-poisoning allows persistent malicious instructions affecting agent behavior across sessions.
2. **Sensitive data leaking between sessions.** If the assistant stores a memory like "user's Stripe API key is sk_live_..." (from a pasted credential) and that memory surfaces in a future session with a different context (e.g., a Puter.js provider that logs requests), the credential leaks.
**Why it happens:** Memory systems treat all content as equal. The distinction between "safe user preference" and "sensitive credential that should never be persisted" is not obvious at write time.
**How to avoid:**
1. Apply rule-based filters at write time: never store content matching secret patterns (API key regexes, tokens, passwords). Use a blocklist of patterns before persisting any memory fragment.
2. Sanitize memory content before injecting into system prompts — strip any content between `<` `>` tags, backtick blocks, or content that looks like instruction syntax.
3. For MCP tool results that become memory, apply the same sanitization as user-pasted content.
4. Implement memory scoping: memories should only surface in sessions with the same mode (assistant memories should not surface in project builder sessions).
**Warning signs:**
- Memory fragments containing "api_key", "token", "password", "secret" stored in the memory DB
- A stored memory from a previous session altering agent behavior in unexpected ways
- MCP tool output (e.g., fetched web page content) appearing verbatim in system prompts
**Phase to address:** Phase 5 (Persistent Memory) — memory schema must include sanitization at write time before any memory is persisted.
---
### Pitfall 20: MCP Integration Conflicting With Paperclip's Existing Tool/Skill System
**What goes wrong:** Paperclip has its own skill/tool system (`AdapterSkillSnapshot`, `AdapterSkillEntry`, `company-skills.ts`). MCP also defines tools. If an MCP server exposes a tool named `"terminal"` or `"file_read"` and Paperclip's skill system also has these (used in Hermes heartbeat prompt templates), the agent receives duplicate or conflicting tool definitions. The LLM may call the MCP version when the Paperclip version was intended, bypassing Paperclip's permission and cost tracking.
Additionally, MCP uses SSE as its transport, which is deprecated in the latest MCP spec (June 2025 spec prefers Streamable HTTP). If the MCP server is implemented with SSE transport, it will need migration as MCP clients drop SSE support.
**Why it happens:** MCP tool names are unscoped — any tool named `"terminal"` is `"terminal"`. The collision with Paperclip's native tools is invisible until an agent calls the wrong one. Developers add MCP without auditing for name collisions.
**How to avoid:**
1. Use Streamable HTTP transport for the MCP server (not SSE, which is deprecated as of MCP spec 2025-06-18).
2. Prefix all Nexus-registered MCP tools with a namespace: `nexus_memory_read`, `nexus_memory_write`, `nexus_context_set`, etc.
3. Before exposing any MCP tool, check it against the list of tool names in `TOOLS.md` (Hermes skill bundle). If there is a collision, rename the MCP tool.
4. TypeScript interface pitfall: when defining `structuredContent` types for MCP tool responses, use `type` aliases not `interface` declarations — interfaces lack implicit index signatures and cause TypeScript assignment errors with `{ [key: string]: unknown }`.
**Warning signs:**
- Agent calling `terminal` tool but the call is going to MCP server, not Paperclip's exec sandbox
- TypeScript compile errors: "Type 'XInterface' is not assignable to type '{ [key: string]: unknown }'"
- MCP server implemented with `sse` transport (use `streamable-http` instead)
**Phase to address:** Phase 5 (MCP Integration)
---
### Pitfall 21: `npx buildthis` Conflicting With an Existing Paperclip CLI Entry Point
**What goes wrong:** The `npx buildthis` entry point must add a new `bin` entry to the Nexus package. Paperclip's CLI already has `bin.paperclipai`. If `buildthis` is added to a package that does not yet exist on npm (or is published under a different name), `npx buildthis` will either: (a) fetch the wrong package from npm (there are existing npm packages named `buildthis`), or (b) fail with "package not found" because the Nexus fork is not on npm.
A secondary trap: `npx` installs packages temporarily in a user's npm cache. If `npx buildthis` is run on a machine that already has `npx` cached from a previous install, it may use the old version without the latest onboarding flow.
**Why it happens:** `npx` resolves package names from the public npm registry first. If the package name collides with an existing npm package, users get the wrong thing. If the package is private (Forgejo only), `npx` cannot find it by default.
**How to avoid:**
1. Before naming the CLI entry `buildthis`, search npm: `npm search buildthis` — verify there is no collision. If there is, choose `nexus-buildthis` or `@yourusername/buildthis` (scoped package).
2. Since Nexus is deployed on a Mac Mini for single-user use, `npx buildthis` likely resolves to a local package reference rather than npm. Document this explicitly: `npx /path/to/nexus/packages/cli buildthis` or publish to a private registry.
3. For first-run detection: check for `~/.paperclip` (or `~/.nexus`) existence before running full onboarding; if config exists, route to the "already configured" path.
**Warning signs:**
- `npx buildthis` prints output from an unrelated npm package
- CLI help text shows incorrect version (cached from npm, not local build)
- `npm info buildthis` returns a package that is not Nexus
**Phase to address:** Phase 6 (`npx buildthis` CLI)
---
## Moderate Pitfalls (v1.5)
---
### Pitfall 22: Multi-Step Onboarding Wizard Breaking the "Every Step Skippable" Requirement
**What goes wrong:** The v1.5 onboarding has many steps: mode selection, hardware detection, local AI setup, voice, Puter.js, OAuth, subscription detection, summary, and straight-into-chat. As the wizard grows, "every step skippable" becomes hard to maintain because steps develop implicit dependencies:
- The summary step shows "selected providers" — if you skip all provider steps, the summary is empty and the wizard has no actionable result.
- The voice step configures Piper — if it's skipped, the voice feature is silently disabled without telling the user.
- OAuth setup creates credentials — if skipped after starting the OAuth popup, the popup tab is orphaned.
**Why it happens:** Step dependencies are added incrementally as each step is built. By the time all steps exist, the skip logic has edge cases that weren't anticipated.
**How to avoid:**
1. Define the "skip all" state explicitly before building any step: what does a fully-skipped onboarding produce? Answer: one workspace, one agent, Hermes or claude_local as default, no voice, no OAuth, no memory. Make this the minimum valid state.
2. Code the summary step to present a useful state even when every step is skipped.
3. Treat OAuth flows specially: if a user starts an OAuth popup (opens Google auth window) and then closes the wizard, cancel the OAuth state cleanly. Never leave orphaned OAuth state.
**Warning signs:**
- Summary step shows empty provider list when all steps are skipped
- "Skip" button disabled on certain steps
- Closing the wizard mid-OAuth leaves the OAuth callback URL still active
**Phase to address:** Phase 1 (Mode Selection) — define the skip-all state as a test case before building any step.
---
### Pitfall 23: Assistant Mode and Project Builder Mode Sharing Conversation History
**What goes wrong:** The Personal AI Assistant has its own conversation context: user preferences, daily notes, personal projects. The Project Builder has PM + Engineer agents working on specific code issues. If both modes share the same `conversations` table without a mode discriminator, the assistant's personal context bleeds into project sessions and vice versa.
A user asking the assistant "remind me what I was working on yesterday" should not surface issues from the Project Builder's agent task queue. An agent executing a coding task should not have the user's personal assistant context injected into its system prompt.
**Why it happens:** The `conversations` table is generic. Adding a `mode` column or `agent_type` discriminator requires a DB schema change, which is out of scope for Nexus (no migrations). Without a schema change, mode separation must be achieved through metadata conventions.
**How to avoid:**
1. Since DB schema changes are out of scope, use the existing conversation metadata/tagging system (if available) to tag conversations as `assistant` vs. `agent`. Filter on this tag when fetching conversation history.
2. If no tagging system exists, use the agent's `role` field as a discriminator: conversations involving a `role: "ceo"` or `role: "engineer"` agent are project builder context; conversations with a dedicated assistant agent are personal assistant context.
3. The personal assistant agent should have a distinct `adapterType` or `name` pattern that makes it queryable as a filter.
**Warning signs:**
- Assistant surfacing agent task IDs or issue numbers when answering personal questions
- Project Builder agents including personal notes in their task context
- `conversations` table query returns mixed results from both modes
**Phase to address:** Phase 2 (Mode Selection / Assistant Mode) — define the conversation isolation strategy before creating the assistant agent.
---
### Pitfall 24: Subscription/API Key Auto-Detection Creating False Positives
**What goes wrong:** The onboarding tries to auto-detect existing Hermes, Claude Code, and OpenClaw subscriptions. Each of these works differently:
- Hermes: probe the local adapter (existing `probeAdapter` endpoint)
- Claude Code: check for `~/.claude/` directory or `claude` binary in PATH
- OpenClaw: check for an OpenClaw-specific config file or env var
False positives occur when: a Claude Code config exists but the API key is expired; an OpenClaw config file exists but the subscription is cancelled; a `claude` binary exists but is the wrong version for the adapter.
Showing "Claude Code detected — ready to use" when the subscription is inactive is worse than not detecting it, because the user proceeds with a broken setup.
**Why it happens:** Presence of config files or binaries does not guarantee valid credentials or active subscriptions. The only reliable detection is making an actual API call, which has latency implications for onboarding.
**How to avoid:**
1. Distinguish between "binary/config present" (detected) and "API call succeeded" (verified). Show "detected" state immediately but show "verified" state only after a lightweight API validation call.
2. For expensive verification calls, do them in parallel with a timeout. If verification times out, show "detected but unverified" rather than "ready to use."
3. Never block onboarding progress on subscription verification. Mark unverified detections prominently and let the user proceed, then verify asynchronously.
**Warning signs:**
- Onboarding step shows "Claude Code ready" but first agent run fails with auth error
- Detection step takes more than 3 seconds (verification calls blocking UI)
- Config file present but API key revoked 6 months ago
**Phase to address:** Phase 3 (Subscription/API Key Auto-Detection)
---
## Minor Pitfalls (v1.5)
---
### Pitfall 25: Project Handoff from Assistant Conversation Losing Context
**What goes wrong:** "Project handoff: assistant conversation → PM with context transfer" is a v1.5 requirement. The naive implementation creates a new issue in the project from the assistant conversation summary. But the handoff loses: branching context (which assistant conversation branch), attachment references (files uploaded in the assistant chat), and the interim decisions the user made during the assistant conversation.
**How to avoid:**
1. Handoff should carry: (a) conversation ID or branch ID as a reference, (b) a structured summary (not just free text), and (c) attachment IDs from the assistant conversation.
2. The PM agent receiving the handoff should be able to `GET /api/chat/conversations/{id}` to retrieve the full context if needed.
3. Do not flatten the handoff context into the issue title/description alone — preserve the conversation reference.
**Phase to address:** Phase 5 (Persistent Memory + Assistant Mode)
---
### Pitfall 26: `ollama-model-catalog.json` Becoming Stale as New Models Are Released
**What goes wrong:** The pre-built model catalog (`server/src/data/ollama-model-catalog.json`) hard-codes RAM/VRAM requirements per model name. Ollama releases new model versions and new model families frequently. A user who installs a new model after the catalog was last updated gets no recommendation reason — the model is silently marked `recommended: false` with `recommendationReason: null` because it is not in the catalog.
The existing code in `getRecommendedModel()` silently skips models not in the catalog (`const entry = catalogMap.get(model.name); if (!entry) continue;`). A model installed as `llama3.3:latest` may not match a catalog entry for `llama3.3:70b-instruct-q4_K_M`.
**How to avoid:**
1. Implement a fallback heuristic: if a model is not in the catalog, estimate RAM requirements from the model's `parameterSize` and `quantization` fields that Ollama already returns. A 7B Q4_K_M model reliably fits in ~5GB.
2. Normalize model name matching — strip version tags and match on family+quantization pattern, not exact name string.
3. Document the catalog update process: when to update it, who owns it, and how to add new families.
**Phase to address:** Phase 1 (Hardware Detection / Model Recommendations)
---
## Technical Debt Patterns
| Shortcut | Immediate Benefit | Long-term Cost | When Acceptable |
|----------|-------------------|----------------|-----------------|
| Browser-side Puter.js SDK instead of server adapter | Faster to ship | Bypasses cost tracking, skill sync, session codec; creates split-brain | Never for production use |
| `localStorage` for OAuth tokens | Easy to implement | XSS exposure; migration required if key renamed; conflicts with upstream Paperclip keys | Never; use server-side secrets storage |
| `os.totalmem()` for RAM recommendations | One-line implementation | Overestimates available RAM on loaded systems; misleads model recommendations | Only as a fallback when `freemem()` is not available |
| Polling for hardware detection status | Avoids SSE complexity | Hammers server during onboarding; creates race conditions with slow detection | Only if SSE is unavailable |
| Inline Piper model download on first TTS call | Zero extra onboarding step | Silent hang on first use; poor UX; perceived as broken feature | Never; always pre-warm |
| Flat memory injection (all memories into every prompt) | Simple implementation | Context window overflow; irrelevant memories degrade response quality | Only for prototyping |
| No mode discriminator on conversations table | No schema change needed | Mode cross-contamination; hard to query assistant vs. agent conversations | Acceptable with explicit agent-based filtering |
---
## Integration Gotchas
| Integration | Common Mistake | Correct Approach |
|-------------|----------------|------------------|
| Puter.js | Load browser SDK, call `puter.ai.chat()` directly | Implement as server-side HTTP adapter; Puter token stored in Paperclip config |
| Piper TTS (WASM) | Call synthesis on first user message | Pre-warm on background thread during onboarding step; show download progress |
| Ollama probe | Probe at onboarding time without board auth | Use a dedicated unauthenticated `/system/providers` endpoint for pre-auth hardware detection |
| MCP tools | Add tools with generic names (`terminal`, `search`) | Namespace all MCP tools: `nexus_memory_*`, `nexus_context_*` |
| Google OAuth | Store access token in `localStorage` | Exchange code server-side; store token in Paperclip secrets; never expose to browser |
| Upstream rebase after v1.5 | Forget to diff `OnboardingWizard.tsx` against upstream | Post-rebase protocol: diff the aliased-away file, integrate any new upstream props |
| Apple Silicon VRAM | Report `os.totalmem()` as available GPU memory | Use `os.freemem()` with explicit copy: "unified memory, shared with OS" |
---
## Performance Traps
| Trap | Symptoms | Prevention | When It Breaks |
|------|----------|------------|----------------|
| Sequential provider probes in onboarding | Each probe adds 3s+ to wizard load time | Probe all providers in parallel with `Promise.allSettled()` | Any multi-provider step with 3+ probes |
| Memory retrieval on every chat message | 200-500ms added to every response | Cache last N memories; only re-fetch if conversation context changes | Systems with >100 stored memory fragments |
| Piper TTS blocking main thread | UI freezes during synthesis | Run Piper WASM in a Web Worker; stream audio chunks as they generate | Models larger than small/medium quality |
| Ollama model catalog loaded from disk on every request | File I/O on every recommendation call | Load and cache catalog at server startup, not per-request | High-frequency polling during onboarding |
| MCP tool calls in the critical path of assistant response | Latency spikes when memory server is slow | Make MCP tool calls non-blocking where possible; set aggressive timeouts | MCP server under load or starting up |
---
## Security Mistakes
| Mistake | Risk | Prevention |
|---------|------|------------|
| Storing OAuth tokens in `localStorage` | XSS can steal tokens; Paperclip key collision | Server-side token storage in existing secrets mechanism |
| Persisting raw user input in memory without sanitization | Credential leakage; prompt injection across sessions | Regex-based blocklist at write time; strip instruction-like syntax |
| Unauthenticated MCP endpoint exposure | External callers invoking memory read/write | MCP server bound to `localhost` only; board auth required for all tool calls |
| Puter.js API key in browser bundle | Key exposure in DevTools | Server-side Puter adapter; no Puter credentials in browser |
| Recording audio without explicit per-session consent indicator | Privacy violation perception | Show persistent recording indicator; stop all audio tracks immediately on stop |
---
## UX Pitfalls
| Pitfall | User Impact | Better Approach |
|---------|-------------|-----------------|
| Multi-step wizard with no skip-all option | Users with existing tools feel trapped | "Skip setup" at top of wizard; minimum valid state if skipped |
| Showing all providers as equally valid | Decision paralysis; wrong choice for hardware | Pre-select the best option; others are secondary alternatives |
| TTS toggle with no download state | Appears broken; silent 15-30s wait | Pre-warm voice model; show download progress before toggle is active |
| Hardware detection with false confidence | User loads model that OOMs | Label recommendations as "estimated" not "guaranteed"; add safety margin |
| Mode selection before hardware detection | User picks "Personal AI Assistant" but their hardware can't run local models | Show hardware detection first; mode recommendation follows hardware capability |
| Summary screen with no way to change a step | User made wrong choice earlier; stuck | Every summary item links back to the relevant step |
---
## "Looks Done But Isn't" Checklist
- [ ] **Puter.js adapter:** Is it going through the server-side adapter machinery (cost tracking, heartbeat, session codec) or calling Puter's API directly from the browser?
- [ ] **Adapter probe during onboarding:** Does it work before board auth is established (fresh install) or does it silently return 403?
- [ ] **Piper TTS first use:** Has the warmup been tested on a clean browser profile with no OPFS cache?
- [ ] **Persistent memory:** Are there sanitization filters at write time preventing credential storage?
- [ ] **MCP tool names:** Have all Nexus MCP tools been checked against the Hermes `TOOLS.md` skill bundle for name collisions?
- [ ] **OAuth token storage:** Is the refresh token stored server-side? Is the browser holding only a session indicator, not the raw token?
- [ ] **Mode isolation:** Can assistant conversation history be queried without surfacing project builder agent conversations?
- [ ] **Onboarding skip:** Does skipping every step produce a usable workspace with at least one agent?
- [ ] **Apple Silicon VRAM copy:** Does the hardware detection screen say "unified memory" not "VRAM" for M-series chips?
- [ ] **`npx buildthis` package name:** Has `npm search buildthis` been run to verify no collision?
- [ ] **Upstream OnboardingWizard diff:** After the v1.5 wizard is built, has `OnboardingWizard.tsx` been diffed against upstream to check for new props that `NexusOnboardingWizard.tsx` needs to handle?
---
## Recovery Strategies
| Pitfall | Recovery Cost | Recovery Steps |
|---------|---------------|----------------|
| Puter.js browser-side integration shipped | HIGH | Rewrite as server-side adapter; migrate conversation history to route through server |
| OAuth tokens in `localStorage` shipped | HIGH | Server-side migration: on next load, detect browser-stored tokens, exchange for server-stored ones, clear localStorage |
| Persistent memory storing credentials | HIGH | Purge memory store; add retroactive scan-and-delete for credential patterns; add blocklist |
| Piper TTS no warmup (silent hang) | LOW | Add warmup call in background; show download progress indicator |
| Model catalog stale | LOW | Add fallback heuristic; document update process |
| Onboarding probe auth-gated on board auth | MEDIUM | Add unauthenticated system/providers endpoint; update wizard to use new endpoint |
| Mode contamination in conversations table | MEDIUM | Add agent-based filter to conversation queries; document the filtering convention |
---
## Pitfall-to-Phase Mapping
| Pitfall | Prevention Phase | Verification |
|---------|------------------|--------------|
| Vite alias swap breaking upstream rebase (12) | Phase 1 — Hardware Wizard | Post-rebase diff protocol in place and documented |
| Hardware detection inaccuracy on Apple Silicon (13) | Phase 1 — Hardware Detection | Unit test: compare `totalmem()` vs `freemem()` recommendations; verify M4 copy says "unified" |
| Probe endpoint requires board auth (14) | Phase 1 — Hardware Detection | Test: call probe endpoint with no board auth cookie; should succeed |
| Puter.js bypassing adapter system (15) | Phase 2 — Zero-Config Cloud | Verify: Puter sessions appear in cost tracking with correct provider label |
| OAuth tokens in localStorage (16) | Phase 3 — OAuth | Verify: no OAuth tokens visible in browser DevTools localStorage |
| Multi-provider creating competing defaults (17) | Phase 1 — Mode Selection | Test: skip-all onboarding produces exactly one adapter type per agent |
| Piper TTS cold start hang (18) | Phase 4 — Voice TTS | Test: fresh browser profile, enable TTS, measure time-to-first-audio |
| Memory prompt injection (19) | Phase 5 — Persistent Memory | Test: paste a credential into chat; verify it is NOT stored in memory DB |
| MCP tool name collision (20) | Phase 5 — MCP Integration | Audit: compare MCP tool names against TOOLS.md before shipping |
| `npx buildthis` package name collision (21) | Phase 6 — CLI | Run `npm search buildthis` before publishing |
| Skip-all onboarding broken (22) | Phase 1 — Mode Selection | Test: skip every step; verify workspace + one agent created |
| Assistant/project builder context bleed (23) | Phase 2 — Mode Selection | Test: assistant query does not surface issue IDs from project builder |
| Subscription detection false positives (24) | Phase 3 — Subscription Detection | Test: revoke an API key; verify wizard shows "unverified" not "ready" |
| Project handoff losing context (25) | Phase 5 — Persistent Memory | Test: handoff includes conversation ID, not just flat text summary |
| Model catalog staleness (26) | Phase 1 — Hardware Detection | Test: install an uncatalogued Ollama model; verify fallback heuristic fires |
---
## Sources
**Codebase analysis (HIGH confidence):**
- `/opt/nexus/server/src/services/ollama.ts` — RAM detection using `totalmem()`, catalog lookup
- `/opt/nexus/ui/src/components/NexusOnboardingWizard.tsx` — probe auth requirement, adapter detection
- `/opt/nexus/server/src/routes/agents.ts` — board-auth gate on probe endpoint
- `/opt/nexus/ui/vite.config.ts` — OnboardingWizard Vite alias pattern
- `/opt/nexus/ui/src/components/VoiceRecordButton.tsx` — existing Whisper STT implementation
- `/opt/nexus/ui/src/adapters/registry.ts` — adapter registration pattern
**Research (MEDIUM confidence unless noted):**
- [Puter.js Free Unlimited AI API](https://developer.puter.com/tutorials/free-unlimited-ai-api/) — Puter is browser-SDK-first; server-side HTTP integration requires manual HTTP calls
- [WebGPU/WebGL VRAM Limitations](https://dl.acm.org/doi/10.1145/3730567.3764504) — VRAM not queryable from browser; integrated vs. dedicated GPU reporting issues (HIGH confidence — peer-reviewed)
- [Ollama AMD VRAM Detection Bug](https://github.com/ollama/ollama/issues/13677) — confirmed VRAM misreport on AMD/Vulkan
- [MCP Tips, Tricks and Pitfalls — Nearform](https://nearform.com/digital-community/implementing-model-context-protocol-mcp-tips-tricks-and-pitfalls/) — TypeScript interface vs. type alias; SSE deprecated
- [MCP Specification 2025-06-18](https://modelcontextprotocol.io/specification/2025-06-18) — SSE deprecated, Streamable HTTP preferred
- [Memory Poison Attack — Palo Alto Unit 42](https://unit42.paloaltonetworks.com/indirect-prompt-injection-poisons-ai-longterm-memory/) — persistent memory prompt injection attack vector (HIGH confidence)
- [Piper TTS WASM cold start](https://github.com/rhasspy/piper/issues/352) — first-run download, OPFS caching, warmup pattern
- [OAuth PKCE SPA Best Practices — Curity](https://curity.io/resources/learn/spa-best-practices/) — sessionStorage for verifiers, server-side token storage
- [AI Agent Memory — Redis](https://redis.io/blog/ai-agent-memory-stateful-systems/) — context window overflow, hybrid vector+graph architecture
---
*Pitfalls research for: Nexus v1.5 — Smart Onboarding + Personal AI Assistant*
*Researched: 2026-04-02*