155 KiB
Domain Pitfalls — Nexus Fork of Paperclip
Domain: Forked open-source project with display-layer renames, no i18n layer
Researched: 2026-04-02 (updated for v1.5 milestone: smart onboarding, multi-provider, voice TTS, persistent memory, assistant mode, npx buildthis)
Updated: 2026-04-04 (v1.7 milestone: content generation — Remotion, image gen, Mermaid, PDF, theme gen, social media, content skills, large file storage)
Confidence: HIGH — based on direct codebase analysis of /opt/nexus/ plus targeted research on each new integration domain
About This Document
This file covers pitfalls for the v1.5, v1.6, and v1.7 milestone additions. The original pitfalls (Pitfalls 1–11) covering fork hygiene, display-layer rename discipline, and upstream sync remain valid and are preserved below. Pitfalls 12–26 are new for v1.5. Pitfalls 27–44 are new for v1.6 (voice pipeline + Telegram bridge). Pitfalls 45–66 are new for v1.7 (content generation layer).
Critical Pitfalls (Fork Hygiene — v1.0–1.4, still active)
Pitfall 1: Renaming a Code Identifier That Is Also a Stored DB Value
What goes wrong: You rename a TypeScript constant, CLI command, or function to use the new Nexus vocabulary, not realising the same string is also stored as a literal value in database rows. The app breaks for any existing installation because the server checks approval.type === "hire_agent" but the DB still has "hire_agent" rows.
Why it happens: In Paperclip the same string serves double duty: it is both a TypeScript constant/enum and a persisted DB value. The CONCERNS.md audit identifies these dual-purpose strings explicitly: "ceo", "hire_agent", "approve_ceo_strategy", "bootstrap_ceo", "company" in goal levels, "board" in auth challenges.
How to avoid:
- Treat every string in the Summary Risk Table (CONCERNS.md) marked "Critical" as immutable.
- For display renaming only: change label maps (
AGENT_ROLE_LABELS,ApprovalPayloaddisplay maps) without touching the underlying constant value. - Before touching any string, grep for it in
packages/db/src/schema/and migration files.
Warning signs:
- Any string appearing in
packages/db/src/schema/or migration files - Approval, invite, and goal lists empty on existing install but work on fresh install
Phase to address: Phase 1 (Display Rename)
Pitfall 2: Treating "Display-Only Rename" as a Simple Find-Replace
What goes wrong: Bulk sed or IDE find-replace on "company" → "workspace" across the entire codebase. Touches service files, route files, schema files, and test files indiscriminately. The next git rebase upstream/master has conflicts on hundreds of files.
Why it happens: "Display-only" is a policy decision, not a property the codebase enforces. Nothing in the TypeScript source distinguishes a user-facing label string from an internal identifier.
How to avoid:
- Establish a strict three-zone taxonomy: Zone A (display strings, safe), Zone B (code identifiers, do not rename), Zone C (dual-purpose stored values, label map only).
- Never run a global find-replace. Work file-by-file.
Warning signs:
- PR diff touching
server/src/services/,server/src/routes/, orpackages/db/with rename changes - Diff showing TypeScript identifier name changes (not JSX string literals)
Phase to address: Phase 1 (Display Rename)
Pitfall 3: Diverging the Onboarding Assets Directory Name From Upstream
What goes wrong: Renaming server/src/onboarding-assets/ceo/ to pm/. Upstream changes a file inside ceo/ in a future commit. Git cannot reconcile rename-on-one-side with content-edit-on-other.
How to avoid: Do not rename the ceo/ directory. Change file content only. The directory path is Zone B.
Warning signs: Rebase conflict shows a file as "deleted" that you expected to be "modified."
Phase to address: Phase 1 (Onboarding Redesign)
Pitfall 4: Changing the localStorage Key or ~/.paperclip Config Path Without a Migration
What goes wrong: Renaming "paperclip.selectedCompanyId" localStorage key or ~/.paperclip config path drops all existing state.
How to avoid: Keep key names unchanged OR implement a read-both-paths fallback that migrates existing values on boot before deleting the old key.
Warning signs: Server logs "no config found, starting fresh" on a machine with existing data.
Phase to address: Phase 2 (Directory Restructure)
Pitfall 5: Upstream Rebase Cadence Slipping Below Weekly
What goes wrong: Fork drift. Upstream has 120+ commits since fork. Waiting accumulates compound conflicts. A 10-minute weekly rebase becomes 4 hours after a month gap.
How to avoid: Rebase at minimum weekly. [nexus] commit prefix strictly enforced. CI alert on git rebase upstream/master failures in a test branch.
Warning signs: Last rebase more than 2 weeks ago; git log upstream/master..HEAD shows more than 20 upstream commits unmerged.
Phase to address: Ongoing from Phase 1
Pitfall 6: Renaming the CLI Binary Name Without a Shim
What goes wrong: Renaming to nexus without updating all four locations where paperclipai appears as an instructional string.
How to avoid: Add nexus as an alias; keep paperclipai binary working. If renaming, atomic commit covering all instructional copy.
Phase to address: Phase 1 (CLI String Updates)
Pitfall 7: Partial Rename — Changing Some Occurrences But Not All
What goes wrong: "CEO" renamed in 8 of 12 files. Users see mixed vocabulary.
How to avoid: Post-rename grep -ri "CEO" ui/src cli/src server/src and verify every remaining occurrence is Zone B/C or non-user-visible.
Phase to address: Phase 1 (Display Rename)
Pitfall 8: The [nexus] Commit Prefix Not Applied Consistently From the Start
What goes wrong: Without consistent prefixing, rebase archaeology becomes necessary to identify which commits are Nexus vs. upstream.
How to avoid: Pre-commit hook rejecting messages not starting with [nexus] from the first commit.
Phase to address: Phase 1 (First commit)
Pitfall 9: Onboarding Redesign Coupled to the Corporate Metaphor in Data Layer
What goes wrong: New wizard does not pass a company name; POST /api/companies requires it. Company created with undefined name.
How to avoid: Document API contract before redesigning wizard. Derive workspace name from directory basename (or VOCAB.appName as fallback — which NexusOnboardingWizard.tsx already does correctly).
Phase to address: Phase 2 (Onboarding Redesign)
Pitfall 10: Forgetting to Update Tests That Assert on Display Strings
What goes wrong: invite-onboarding-text.test.ts asserts invite text contains "CEO." After rename, tests fail.
How to avoid: Before any rename commit, grep all *.test.ts files for old vocabulary terms and update in the same commit.
Phase to address: Phase 1 (Display Rename)
Pitfall 11: Exporting a .nexus.yaml File While Upstream Exports .paperclip.yaml
What goes wrong: Breaking import compatibility with upstream Paperclip instances.
How to avoid: Keep emitting .paperclip.yaml. The filename and schema header are Zone B/C.
Phase to address: Phase 1 (Display Rename)
Critical Pitfalls (v1.5 New Features)
Pitfall 12: Vite Alias Swap Breaking Upstream Rebase on OnboardingWizard
What goes wrong: The current pattern aliases src/components/OnboardingWizard → NexusOnboardingWizard at build time via vite.config.ts. If upstream renames, moves, or splits OnboardingWizard.tsx into multiple files, the alias silently points to a non-existent target — the build succeeds (the alias target exists) but the import resolution breaks at runtime in any code path that imports the upstream file by a new name.
More critically: when v1.5 replaces the simple wizard with a multi-step hardware-detection wizard, the alias target NexusOnboardingWizard.tsx grows significantly. Upstream may add new features to OnboardingWizard.tsx (new props, context dependencies) that NexusOnboardingWizard.tsx silently misses, since it fully replaces rather than extends the upstream file.
Why it happens: Full file replacement via Vite alias means no inheritance from upstream. Every upstream improvement to the wizard is silently discarded.
How to avoid:
- After each upstream rebase, diff
OnboardingWizard.tsxagainst the previous upstream version:git diff upstream-prev..upstream-new -- ui/src/components/OnboardingWizard.tsx. If upstream adds new props or context hooks, integrate them intoNexusOnboardingWizard.tsx. - Keep
NexusOnboardingWizard.tsxsurface API identical toOnboardingWizard.tsx(same component name export, same props interface as far as upstream is concerned). - Add a CI check:
test -f ui/src/components/OnboardingWizard.tsx— verify the aliased-away file still exists with its expected export.
Warning signs:
NexusOnboardingWizard.tsxnot using aDialogContextorCompanyContexthook that upstream's version uses- After rebase,
pnpm devfails with "cannot find module" for the alias source path - The multi-step wizard is missing features that upstream added (e.g., invite-based onboarding, workspace templates)
Phase to address: Phase 1 (Hardware Detection Wizard) — before building the multi-step v1.5 wizard, establish a diff-and-integrate protocol for this alias.
Pitfall 13: Hardware Detection Returning Inaccurate or Platform-Specific Values
What goes wrong: The v1.5 hardware detection step must surface GPU/RAM to recommend Ollama models. Two platform-specific traps exist on the Mac Mini M4 deploy target:
-
VRAM is not VRAM on Apple Silicon. The M4 uses unified memory — the same physical RAM serves both CPU and GPU.
os.totalmem()in Node.js returns total unified memory. Reporting this as "VRAM available for Ollama" misleads: Ollama on Apple Silicon uses a portion of unified memory, but the OS, browser, and other processes also consume it. Treatingtotalmem × 0.75as GPU-available VRAM overestimates for models that also need system RAM headroom. -
os.totalmem()reads total installed RAM, not available RAM. The existinggetRecommendedModel()inserver/src/services/ollama.tsalready applies a 0.75 multiplier to account for OS overhead, but it uses total RAM, not free RAM. If the system is under load (Paperclip server + Ollama already running), available RAM is far lower than 75% of total.
Why it happens: Node.js os module has totalmem() and freemem() but no VRAM API. Browser WebGL UNMASKED_RENDERER gives GPU name but not VRAM size; actual VRAM queries are blocked by browser security sandboxing. Developers reach for the most accessible number.
How to avoid:
- Use
os.freemem()(nottotalmem()) as the baseline for available-RAM recommendations when Ollama is already running. - On Apple Silicon, explicitly document in UI copy that "available memory" is unified memory shared with OS, not dedicated GPU VRAM.
- Treat hardware detection values as hints, not guarantees. Add a message: "Recommendation based on system RAM. Actual performance may vary."
- The pre-built model catalog (
ollama-model-catalog.json) is the right layer for model-to-RAM requirements; use it as the authoritative source rather than computing from raw hardware numbers.
Warning signs:
- Model recommendation shows "fits in memory" but Ollama OOM-kills it at load time
- M4 Mac Mini reports 16GB available for models but the system has 16GB total (OS needs 4–6GB)
- AMD GPU users see wildly incorrect VRAM numbers (confirmed bug in Ollama's VRAM detection for AMD/Vulkan as of 2025)
Phase to address: Phase 1 (Hardware Detection) — define detection methodology before building the UI layer.
Pitfall 14: The Onboarding Probe Running at the Wrong Authentication Level
What goes wrong: The existing adapter probe endpoint (GET /adapters/:type/probe) requires board authentication (req.actor.type !== "board"). The v1.5 onboarding wizard runs during first-time setup — before the user has authenticated. If the probe is called before board auth is established, every probe returns 403, the wizard always falls back to claude_local, and the user never gets the Hermes auto-detection benefit.
This is the exact scenario the current NexusOnboardingWizard.tsx is vulnerable to: it calls agentsApi.probeAdapter("hermes_local") on wizard open, but if the user arrives at the onboarding page without board auth (fresh install, incognito session), the probe silently fails and defaultAdapter stays "claude_local".
Why it happens: Board auth is the right guard for post-setup adapter operations. But hardware detection and provider probing are legitimately pre-auth operations — you want to present the right setup path before any credentials exist.
How to avoid:
- Create a separate
GET /system/providersendpoint that does not require board auth. It returns available local providers (Ollama status, Hermes status) based purely on server-side detection (no user credentials needed). - Alternatively, make the probe endpoint check auth level: if no board auth exists (fresh install), allow the probe to run unauthenticated for a whitelist of safe probe types (
hermes_local,ollama). - Never gate hardware detection on user credentials — hardware is a property of the machine, not the user session.
Warning signs:
- Browser network tab shows 403 on the probe call during onboarding
defaultAdapterin the wizard is always"claude_local"even when Ollama/Hermes are running- Probe works in the settings page (user is auth'd) but not during initial onboarding
Phase to address: Phase 1 (Hardware Detection) — the probe auth story must be designed before the multi-step wizard is built.
Pitfall 15: Puter.js "Zero-Config" Promise Breaking on Paperclip's Server-Side Architecture
What goes wrong: Puter.js is designed for purely browser-side use: load the CDN script, call puter.ai.chat(), Puter handles auth via its own popup login flow. Nexus/Paperclip proxies AI calls through the server (/api/chat, /api/agents). If Puter.js is loaded browser-side and calls Puter's servers directly, it bypasses Paperclip's cost tracking, budget enforcement, session codec, and skill sync entirely.
This creates a split-brain: the Puter adapter sends messages to Puter's cloud while Paperclip's adapter system thinks the agent is using a different provider. Cost tracking shows $0 for Puter sessions. Heartbeat and session management are not wired up.
Why it happens: Puter.js is documented as a CDN-loaded browser library with client-side auth. The natural integration is to <script src="https://js.puter.com/v2/"> and call the API directly. But Paperclip's architecture requires all AI calls to go through server-side adapter machinery.
How to avoid:
- Implement Puter as a server-side adapter that calls Puter's API from Node.js using HTTP (not the browser SDK). The Puter API is callable via standard HTTP — use
fetch()on the server, not the browser SDK. - The server-side Puter adapter must implement the full adapter contract:
spawn,heartbeat,sessionCodec,configFields(seepackages/adapters/pattern). - If browser-side Puter SDK is needed for auth popup (Puter uses its own account system), implement auth as a UI-only step that retrieves a Puter token, then stores that token in Paperclip's adapter config for server-side use.
- Confirm Puter's rate limiting behavior for server-side calls. Puter's "free unlimited" claim applies to personal/hobby use; verify terms before treating it as production-grade.
Warning signs:
- Puter.js loaded via
<script>CDN tag in the app shell - Cost tracking shows $0 for all Puter-backed agent sessions
puter.ai.chat()calls appearing in browser network tab (not proxied through/api/)
Phase to address: Phase 2 (Zero-Config Cloud / Puter.js)
Pitfall 16: OAuth Token Storage in localStorage Creating Security and Rebase Risk
What goes wrong: The natural place to store OAuth access tokens in an SPA is localStorage. But:
localStorageis accessible to any JS on the page — XSS vulnerabilities can steal tokens.- Paperclip already uses
localStoragewith"paperclip.*"prefixed keys. Any Nexus key added with"nexus.*"prefix will need a migration if the key name is ever changed, per Pitfall 4. - OAuth refresh token rotation (required for Google/OpenAI free tiers) must clear-and-rewrite the stored token on every refresh. If this fails mid-write (e.g., browser close), the user is logged out and must re-authenticate.
Why it happens: localStorage is the default that every OAuth tutorial reaches for in SPA context. The PKCE security guidance says to use sessionStorage for the code verifier but often developers apply localStorage for the actual access token.
How to avoid:
- Store OAuth tokens server-side in Paperclip's existing config/secrets mechanism (
server/src/secrets/). The server does the OAuth exchange and stores the token; the browser never sees the raw token. - Use Paperclip's existing board auth cookie mechanism to gate whether the OAuth integration is enabled — do not create a separate browser-side auth session for each OAuth provider.
- If browser-side token storage is unavoidable, use
sessionStorage(notlocalStorage) for OAuth code verifiers; store refresh tokens server-side only. - For the state parameter in PKCE flow: generate a cryptographically random state with
crypto.getRandomValues(), store insessionStorage, verify on redirect.
Warning signs:
window.localStorage.getItem("nexus.oauth.google.accessToken")or similar in browser DevTools- OAuth token visible in network requests from browser to Google/OpenAI APIs (not proxied through Paperclip server)
- Re-authentication required after browser restart (session not persisting correctly)
Phase to address: Phase 3 (OAuth Cloud Tier)
Pitfall 17: Multi-Provider Onboarding Creating Multiple Competing Default Adapters
What goes wrong: v1.5 adds multiple provider tiers: local Ollama/Hermes, free cloud Puter.js, OAuth Google Gemini/OpenAI, and subscription detection (Claude Code, OpenClaw). If a user configures more than one provider during onboarding, the resulting agents get created with the adapter config from the onboarding summary step. But Paperclip's agent model is one-adapter-per-agent. If the wizard creates agents without being explicit about which provider wins, agents may be created with inconsistent adapter types (one with hermes_local, another with puter_cloud), creating a confusing mixed-provider workspace.
The deeper trap: the onboarding wizard currently creates exactly 2 agents (PM + Engineer) with identical adapter config. v1.5 may want different agents on different providers (e.g., assistant on Puter, PM on Hermes). This is a valid architecture but requires explicit per-agent provider selection, which the current wizard doesn't support.
Why it happens: Multi-provider selection UX tends to present all providers as equally valid, then requires a tie-breaking decision the wizard may not have asked the user to make.
How to avoid:
- Make the onboarding wizard select ONE primary provider and create all initial agents on that provider. Secondary provider credentials can be stored for later use (configuring individual agents from the settings page).
- If the mode selection is "Personal AI Assistant," create the assistant agent on the highest-quality available provider (subscription > OAuth > Puter > local).
- If the mode selection is "Project Builder," create PM + Engineer on the local/privacy-first provider since these agents run autonomously and should not require cloud API credits per task.
- Document the provider selection logic explicitly in code comments.
Warning signs:
- PM agent created with
hermes_local, Engineer created withputer_cloudafter the same onboarding flow - "Recommended provider" badge in wizard applied to multiple providers simultaneously
- Users confused about which API credits are being used for which agents
Phase to address: Phase 1 (Mode Selection) — define the provider-per-mode rule before building the selection UI.
Pitfall 18: Voice TTS (Piper) Cold Start Blocking the First Spoken Response
What goes wrong: Piper TTS (browser WASM implementation) downloads the voice model on the first synthesis call. This means the first time a user activates TTS, they wait 5–30 seconds for the model to download before hearing anything. Without user feedback, this appears as a hang or broken feature.
A secondary trap: the WASM Piper phonemizer does not always match the phoneme mapping expected by every Piper voice model. Using a voice model that was compiled for a different language variant (e.g., an en_GB model on a browser Piper instance expecting en_US phoneme tables) produces garbled or silent output.
Why it happens: Browser-based Piper TTS stores models in the Origin Private File System (OPFS). The first call triggers the download. Developers who test Piper locally after the first call never encounter the cold start because the model is already cached.
How to avoid:
- Pre-warm Piper on background thread during onboarding (after the voice step is confirmed, not on first message). Use a silent warmup synthesis ("...") to trigger model download before the user expects to hear anything.
- Show a download progress indicator on the TTS toggle — not a spinner (implies in-progress work) but a "preparing voice model" state with estimated download size.
- Limit initial voice model choices to stable Piper models with confirmed browser WASM compatibility. Avoid offering non-English models unless specifically verified.
- Store pre-downloaded voice models in OPFS; on subsequent loads, check
navigator.storage.getDirectory()before re-downloading.
Warning signs:
- TTS button appears responsive (toggles on) but no audio plays for 15+ seconds
- Voice model download appears in DevTools network tab on the first "speak" action
- Users reporting "the voice feature is broken" on first use but "works fine" on subsequent uses
Phase to address: Phase 4 (Voice TTS) — warmup strategy must be designed before the TTS toggle is wired up.
Pitfall 19: Persistent Memory Injecting Sensitive Data Into System Prompts
What goes wrong: The Personal AI Assistant stores memories (user preferences, past conversation summaries, project context) to inject into future system prompts. Two failure modes:
-
Prompt injection via stored memory. If memory content is retrieved from external sources (web fetch, document import, MCP tools) and stored verbatim, malicious content in those sources gets injected into future system prompts with elevated priority. Palo Alto Unit 42 documented this attack vector in 2025: memory-poisoning allows persistent malicious instructions affecting agent behavior across sessions.
-
Sensitive data leaking between sessions. If the assistant stores a memory like "user's Stripe API key is sk_live_..." (from a pasted credential) and that memory surfaces in a future session with a different context (e.g., a Puter.js provider that logs requests), the credential leaks.
Why it happens: Memory systems treat all content as equal. The distinction between "safe user preference" and "sensitive credential that should never be persisted" is not obvious at write time.
How to avoid:
- Apply rule-based filters at write time: never store content matching secret patterns (API key regexes, tokens, passwords). Use a blocklist of patterns before persisting any memory fragment.
- Sanitize memory content before injecting into system prompts — strip any content between
<>tags, backtick blocks, or content that looks like instruction syntax. - For MCP tool results that become memory, apply the same sanitization as user-pasted content.
- Implement memory scoping: memories should only surface in sessions with the same mode (assistant memories should not surface in project builder sessions).
Warning signs:
- Memory fragments containing "api_key", "token", "password", "secret" stored in the memory DB
- A stored memory from a previous session altering agent behavior in unexpected ways
- MCP tool output (e.g., fetched web page content) appearing verbatim in system prompts
Phase to address: Phase 5 (Persistent Memory) — memory schema must include sanitization at write time before any memory is persisted.
Pitfall 20: MCP Integration Conflicting With Paperclip's Existing Tool/Skill System
What goes wrong: Paperclip has its own skill/tool system (AdapterSkillSnapshot, AdapterSkillEntry, company-skills.ts). MCP also defines tools. If an MCP server exposes a tool named "terminal" or "file_read" and Paperclip's skill system also has these (used in Hermes heartbeat prompt templates), the agent receives duplicate or conflicting tool definitions. The LLM may call the MCP version when the Paperclip version was intended, bypassing Paperclip's permission and cost tracking.
Additionally, MCP uses SSE as its transport, which is deprecated in the latest MCP spec (June 2025 spec prefers Streamable HTTP). If the MCP server is implemented with SSE transport, it will need migration as MCP clients drop SSE support.
Why it happens: MCP tool names are unscoped — any tool named "terminal" is "terminal". The collision with Paperclip's native tools is invisible until an agent calls the wrong one. Developers add MCP without auditing for name collisions.
How to avoid:
- Use Streamable HTTP transport for the MCP server (not SSE, which is deprecated as of MCP spec 2025-06-18).
- Prefix all Nexus-registered MCP tools with a namespace:
nexus_memory_read,nexus_memory_write,nexus_context_set, etc. - Before exposing any MCP tool, check it against the list of tool names in
TOOLS.md(Hermes skill bundle). If there is a collision, rename the MCP tool. - TypeScript interface pitfall: when defining
structuredContenttypes for MCP tool responses, usetypealiases notinterfacedeclarations — interfaces lack implicit index signatures and cause TypeScript assignment errors with{ [key: string]: unknown }.
Warning signs:
- Agent calling
terminaltool but the call is going to MCP server, not Paperclip's exec sandbox - TypeScript compile errors: "Type 'XInterface' is not assignable to type '{ [key: string]: unknown }'"
- MCP server implemented with
ssetransport (usestreamable-httpinstead)
Phase to address: Phase 5 (MCP Integration)
Pitfall 21: npx buildthis Conflicting With an Existing Paperclip CLI Entry Point
What goes wrong: The npx buildthis entry point must add a new bin entry to the Nexus package. Paperclip's CLI already has bin.paperclipai. If buildthis is added to a package that does not yet exist on npm (or is published under a different name), npx buildthis will either: (a) fetch the wrong package from npm (there are existing npm packages named buildthis), or (b) fail with "package not found" because the Nexus fork is not on npm.
A secondary trap: npx installs packages temporarily in a user's npm cache. If npx buildthis is run on a machine that already has npx cached from a previous install, it may use the old version without the latest onboarding flow.
Why it happens: npx resolves package names from the public npm registry first. If the package name collides with an existing npm package, users get the wrong thing. If the package is private (Forgejo only), npx cannot find it by default.
How to avoid:
- Before naming the CLI entry
buildthis, search npm:npm search buildthis— verify there is no collision. If there is, choosenexus-buildthisor@yourusername/buildthis(scoped package). - Since Nexus is deployed on a Mac Mini for single-user use,
npx buildthislikely resolves to a local package reference rather than npm. Document this explicitly:npx /path/to/nexus/packages/cli buildthisor publish to a private registry. - For first-run detection: check for
~/.paperclip(or~/.nexus) existence before running full onboarding; if config exists, route to the "already configured" path.
Warning signs:
npx buildthisprints output from an unrelated npm package- CLI help text shows incorrect version (cached from npm, not local build)
npm info buildthisreturns a package that is not Nexus
Phase to address: Phase 6 (npx buildthis CLI)
Moderate Pitfalls (v1.5)
Pitfall 22: Multi-Step Onboarding Wizard Breaking the "Every Step Skippable" Requirement
What goes wrong: The v1.5 onboarding has many steps: mode selection, hardware detection, local AI setup, voice, Puter.js, OAuth, subscription detection, summary, and straight-into-chat. As the wizard grows, "every step skippable" becomes hard to maintain because steps develop implicit dependencies:
- The summary step shows "selected providers" — if you skip all provider steps, the summary is empty and the wizard has no actionable result.
- The voice step configures Piper — if it's skipped, the voice feature is silently disabled without telling the user.
- OAuth setup creates credentials — if skipped after starting the OAuth popup, the popup tab is orphaned.
Why it happens: Step dependencies are added incrementally as each step is built. By the time all steps exist, the skip logic has edge cases that weren't anticipated.
How to avoid:
- Define the "skip all" state explicitly before building any step: what does a fully-skipped onboarding produce? Answer: one workspace, one agent, Hermes or claude_local as default, no voice, no OAuth, no memory. Make this the minimum valid state.
- Code the summary step to present a useful state even when every step is skipped.
- Treat OAuth flows specially: if a user starts an OAuth popup (opens Google auth window) and then closes the wizard, cancel the OAuth state cleanly. Never leave orphaned OAuth state.
Warning signs:
- Summary step shows empty provider list when all steps are skipped
- "Skip" button disabled on certain steps
- Closing the wizard mid-OAuth leaves the OAuth callback URL still active
Phase to address: Phase 1 (Mode Selection) — define the skip-all state as a test case before building any step.
Pitfall 23: Assistant Mode and Project Builder Mode Sharing Conversation History
What goes wrong: The Personal AI Assistant has its own conversation context: user preferences, daily notes, personal projects. The Project Builder has PM + Engineer agents working on specific code issues. If both modes share the same conversations table without a mode discriminator, the assistant's personal context bleeds into project sessions and vice versa.
A user asking the assistant "remind me what I was working on yesterday" should not surface issues from the Project Builder's agent task queue. An agent executing a coding task should not have the user's personal assistant context injected into its system prompt.
Why it happens: The conversations table is generic. Adding a mode column or agent_type discriminator requires a DB schema change, which is out of scope for Nexus (no migrations). Without a schema change, mode separation must be achieved through metadata conventions.
How to avoid:
- Since DB schema changes are out of scope, use the existing conversation metadata/tagging system (if available) to tag conversations as
assistantvs.agent. Filter on this tag when fetching conversation history. - If no tagging system exists, use the agent's
rolefield as a discriminator: conversations involving arole: "ceo"orrole: "engineer"agent are project builder context; conversations with a dedicated assistant agent are personal assistant context. - The personal assistant agent should have a distinct
adapterTypeornamepattern that makes it queryable as a filter.
Warning signs:
- Assistant surfacing agent task IDs or issue numbers when answering personal questions
- Project Builder agents including personal notes in their task context
conversationstable query returns mixed results from both modes
Phase to address: Phase 2 (Mode Selection / Assistant Mode) — define the conversation isolation strategy before creating the assistant agent.
Pitfall 24: Subscription/API Key Auto-Detection Creating False Positives
What goes wrong: The onboarding tries to auto-detect existing Hermes, Claude Code, and OpenClaw subscriptions. Each of these works differently:
- Hermes: probe the local adapter (existing
probeAdapterendpoint) - Claude Code: check for
~/.claude/directory orclaudebinary in PATH - OpenClaw: check for an OpenClaw-specific config file or env var
False positives occur when: a Claude Code config exists but the API key is expired; an OpenClaw config file exists but the subscription is cancelled; a claude binary exists but is the wrong version for the adapter.
Showing "Claude Code detected — ready to use" when the subscription is inactive is worse than not detecting it, because the user proceeds with a broken setup.
Why it happens: Presence of config files or binaries does not guarantee valid credentials or active subscriptions. The only reliable detection is making an actual API call, which has latency implications for onboarding.
How to avoid:
- Distinguish between "binary/config present" (detected) and "API call succeeded" (verified). Show "detected" state immediately but show "verified" state only after a lightweight API validation call.
- For expensive verification calls, do them in parallel with a timeout. If verification times out, show "detected but unverified" rather than "ready to use."
- Never block onboarding progress on subscription verification. Mark unverified detections prominently and let the user proceed, then verify asynchronously.
Warning signs:
- Onboarding step shows "Claude Code ready" but first agent run fails with auth error
- Detection step takes more than 3 seconds (verification calls blocking UI)
- Config file present but API key revoked 6 months ago
Phase to address: Phase 3 (Subscription/API Key Auto-Detection)
Minor Pitfalls (v1.5)
Pitfall 25: Project Handoff from Assistant Conversation Losing Context
What goes wrong: "Project handoff: assistant conversation → PM with context transfer" is a v1.5 requirement. The naive implementation creates a new issue in the project from the assistant conversation summary. But the handoff loses: branching context (which assistant conversation branch), attachment references (files uploaded in the assistant chat), and the interim decisions the user made during the assistant conversation.
How to avoid:
- Handoff should carry: (a) conversation ID or branch ID as a reference, (b) a structured summary (not just free text), and (c) attachment IDs from the assistant conversation.
- The PM agent receiving the handoff should be able to
GET /api/chat/conversations/{id}to retrieve the full context if needed. - Do not flatten the handoff context into the issue title/description alone — preserve the conversation reference.
Phase to address: Phase 5 (Persistent Memory + Assistant Mode)
Pitfall 26: ollama-model-catalog.json Becoming Stale as New Models Are Released
What goes wrong: The pre-built model catalog (server/src/data/ollama-model-catalog.json) hard-codes RAM/VRAM requirements per model name. Ollama releases new model versions and new model families frequently. A user who installs a new model after the catalog was last updated gets no recommendation reason — the model is silently marked recommended: false with recommendationReason: null because it is not in the catalog.
The existing code in getRecommendedModel() silently skips models not in the catalog (const entry = catalogMap.get(model.name); if (!entry) continue;). A model installed as llama3.3:latest may not match a catalog entry for llama3.3:70b-instruct-q4_K_M.
How to avoid:
- Implement a fallback heuristic: if a model is not in the catalog, estimate RAM requirements from the model's
parameterSizeandquantizationfields that Ollama already returns. A 7B Q4_K_M model reliably fits in ~5GB. - Normalize model name matching — strip version tags and match on family+quantization pattern, not exact name string.
- Document the catalog update process: when to update it, who owns it, and how to add new families.
Phase to address: Phase 1 (Hardware Detection / Model Recommendations)
Critical Pitfalls (v1.6 — Voice Pipeline + Telegram Bridge)
Pitfall 27: Audio Format Mismatch Between Browser Recording and Whisper Input
What goes wrong: The browser's MediaRecorder produces audio in formats that vary by browser. Chrome records audio/webm;codecs=opus. Firefox records audio/ogg;codecs=opus. Safari (since 18.4) can record audio/webm;codecs=opus but used to produce audio/mp4. Whisper (and faster-whisper) requires 16 kHz mono PCM WAV — none of these formats match directly.
The trap is assuming a single format pipeline will work everywhere. Sending a WebM blob directly to Whisper either causes a silent transcription failure (empty string returned) or an error that is swallowed by the error handler, making the feature appear to work while returning nothing.
Why it happens: Browser format diversity is historically inconsistent. MediaRecorder.isTypeSupported('audio/webm;codecs=opus') returns true in Chrome, Firefox, and Safari 18.4+ — but the produced bitrates and frame durations differ in ways that affect downstream processing. Developers test on Chrome and never encounter Firefox/Safari failures.
How to avoid:
- Always transcode to 16 kHz mono WAV on the server before passing audio to Whisper. Use ffmpeg:
ffmpeg -i input -ar 16000 -ac 1 -f wav output.wav. This handles any valid audio format the browser might send. - On the client, use
MediaRecorder.isTypeSupported()to detect the actual format being used and send the MIME type in the upload request header so the server knows what it is receiving. - Do not assume the file extension from the Content-Type header — WebM containers can hold different codecs; always transcode rather than assume.
- ffmpeg must be installed and in PATH on the server. Make this a hard dependency checked at server startup (
which ffmpeg || exit 1), not a silent fallback.
Warning signs:
- Transcription returns empty string on Safari or Firefox but works on Chrome
- Whisper logs show "unsupported format" or "decode error"
ffmpegnot in PATH on production server (check server startup log)- Audio upload succeeds (HTTP 200) but transcribed text is empty
Phase to address: Phase 1 (Whisper STT pipeline) — transcode pipeline must be in place before any browser testing.
Pitfall 28: Telegram Voice Messages Arriving as OGG/Opus at 48 kHz, Not 16 kHz
What goes wrong: Telegram voice messages use the OGG container with Opus codec at 48 kHz mono, stored as audio_[id].ogg. This is a different container format from what the browser sends (WebM), and a different sample rate from what Whisper expects (16 kHz). Treating the two pipelines identically breaks silently: ffmpeg will convert the format but the sample rate mismatch causes Whisper to either produce garbage transcriptions or fail.
A documented second trap: some Telegram voice messages arrive with the MIME type flagged as audio/ogg but the file extension .oga. Not all MIME type parsers recognize .oga, so the media pipeline may classify the file as "unrecognized audio" and skip transcription entirely.
Why it happens: Telegram's wire format is documented but developers building voice-to-text pipelines often copy the browser audio pipeline without adjusting for Telegram's specific encoding. The OGG container with Opus codec at 48 kHz is valid audio that plays fine in media players, so local testing succeeds but transcription quality degrades.
How to avoid:
- Use a dedicated Telegram audio conversion step:
ffmpeg -i input.ogg -ar 16000 -ac 1 -f wav output.wav. This is identical to the browser pipeline but sourced from a downloaded Telegram file, not a browser blob. - Download the Telegram voice file using
getFile+ the CDN URL before transcribing. Do not attempt to stream or pipe Telegram file downloads directly to Whisper. - Treat
.ogaand.oggas the same format — normalize file handling to check codec metadata rather than relying on extension. - Log the input audio duration before transcribing: if Telegram sends a 0-byte or corrupted file, ffmpeg will fail loudly rather than silently returning empty text.
Warning signs:
- Telegram voice messages return empty transcription while browser voice works correctly
- ffmpeg logs showing "48000 Hz" input — correct but needs explicit
-ar 16000flag - Files downloaded from Telegram with
.ogaextension not recognized by MIME type check
Phase to address: Phase 3 (Telegram bridge audio handling) — the OGG/Opus download-and-transcode path must be tested with real Telegram voice messages before the bridge ships.
Pitfall 29: Spawning a New Piper Process Per TTS Request (Process-Per-Request Anti-Pattern)
What goes wrong: The Piper binary is a CLI tool: piper --model voice.onnx --output_file out.wav < text.txt. The naive Node.js integration spawns a new process for each TTS request. Two problems emerge:
-
Model reload latency. Piper loads the ONNX voice model into memory on startup. On CPU-only hardware (M4 Mac Mini with no explicit CUDA), this takes 200–800ms per request. For a voice reply to a short message, this means 1–2 seconds of silence before audio starts.
-
Long text truncation. A documented Piper bug: when processing text longer than ~500 characters via stdin pipe, Piper silently truncates the output or exits early. The generated audio file exists but is shorter than expected. The calling code sees a successful exit code and plays the truncated audio without knowing content was lost.
Why it happens: CLI tools feel simple to integrate. The first working implementation spawns a process, gets output, done. The model-reload cost and the long-text bug only surface in production use with real message lengths.
How to avoid:
- Run Piper as a persistent HTTP service on a local port (there is a community
piper-httpwrapper, or implement one). The process stays alive between requests, keeping the model in memory. - For long responses (>400 characters), split text into sentence-level chunks before sending to Piper. Synthesize each chunk and concatenate the WAV files. This avoids both truncation and per-request reload cost.
- Implement a warmup call at server startup: send a short dummy text to Piper to force model loading before the first real request.
- Cap TTS at a reasonable character limit for voice output (e.g., 1500 chars) — this is a UX constraint anyway; wall-of-text responses should not be read aloud verbatim.
Warning signs:
- First TTS response takes 2+ seconds after server restart
- Audio playback cuts off mid-sentence on responses longer than ~30 words
- Process table shows a new
piperprocess appearing and dying for each TTS request - TTS works in unit tests (short strings) but fails in integration tests (real agent responses)
Phase to address: Phase 2 (Piper TTS pipeline) — persistent process architecture must be designed before the first response endpoint is implemented.
Pitfall 30: Whisper Model Loading on Every Request (Memory Spike Anti-Pattern)
What goes wrong: Whisper and faster-whisper load a large model into memory (tiny: ~150MB, small: ~500MB, medium: ~1.5GB). If the STT endpoint loads the model fresh for each HTTP request — or if the Python process exits and restarts — every concurrent transcription request duplicates the model in memory. On an M4 Mac Mini with 16GB unified memory running Paperclip + Ollama, this can cause the system to swap and degrade all services.
A secondary issue: faster-whisper has a documented memory leak where RAM from a transcription session is not fully released. On a long-running server, this causes gradual memory growth over hours.
Why it happens: Python subprocesses spawned from Node.js are short-lived by default. The "simplest integration" is spawn('python3', ['transcribe.py', audioPath]) — this reloads the model every time. Developers test with a handful of requests and don't observe the memory pattern.
How to avoid:
- Run Whisper/faster-whisper as a persistent sidecar process (e.g., a FastAPI service on
localhost:8001). Node.js callsPOST /transcribevia HTTP. The model stays loaded in the Python process between requests. - On the Mac Mini M4, use
whisper-mlxormlx-whisperwhich uses Apple's MLX framework for 2–3x faster transcription on Apple Silicon with lower memory overhead compared to PyTorch. - Implement a request queue in the sidecar: accept one transcription at a time, queue the rest. This prevents concurrent requests from doubling memory usage.
- Add a health check endpoint to the sidecar:
/healthreturns model load status. The main server waits for this to be healthy before routing traffic.
Warning signs:
- Memory usage spikes by 500MB+ on each transcription request
python3processes appearing inps auxthat don't match the count of active requests- Transcription latency increasing linearly with server uptime (memory leak indicator)
- System starts swapping after 20–30 transcription requests
Phase to address: Phase 1 (Whisper STT pipeline) — sidecar architecture must be the starting design, not a later refactor.
Pitfall 31: Browser Silence Detection Triggering Too Early or Too Late
What goes wrong: The web chat mic button uses client-side silence detection to auto-stop recording and send the audio. Threshold-based silence detection (RMS below X for N milliseconds) has two failure modes:
-
Too eager: Fires after a natural pause mid-sentence ("I want to... create a new task"). The user is still speaking, but the detector interprets the inter-clause pause as end-of-speech. Audio is sent and transcribed as incomplete input.
-
Too late: In a quiet room with a good microphone, even breathing or HVAC noise keeps the RMS above the silence threshold. Recording never auto-stops. The user waits, unsure if anything is working.
Simple RMS-based detection (the first approach most developers reach for) achieves only ~50% true positive rate for end-of-speech detection at a 5% false positive rate. Production-quality VAD (Silero, Picovoice Cobra) achieves 87–99%.
Why it happens: RMS threshold detection is two lines of code. It works in demo conditions (quiet room, clear speech, no pauses). It fails noticeably in real use. Developers ship the demo implementation.
How to avoid:
- Use
@ricky0123/vad-web(browser-native VAD using Silero model via ONNX Runtime Web). It runs off the main thread, handles natural pauses, and achieves significantly better accuracy than threshold detection. - Set a maximum recording duration (e.g., 60 seconds) as a fallback — always auto-stop even if silence detection is confused.
- Show a waveform visualization while recording so users can see whether the mic is capturing audio (helps them self-diagnose "is it recording?").
- Provide a manual stop button alongside auto-stop — never rely solely on automatic detection.
Warning signs:
- Transcription of "I want to" submitted as complete message
- Recording indicator stays active for 30+ seconds in normal use
- Users repeatedly clicking the mic button because auto-stop didn't fire
- Silence threshold value hardcoded to a constant (needs to be calibrated per device)
Phase to address: Phase 2 (Web chat mic button) — VAD library choice must be made before the recording UI is built.
Pitfall 32: Voice Mode Flag Not Propagated Through the Agent Session Layer
What goes wrong: The voice mode flag (isVoiceMode: true) attached to a user message in the web chat must reach the agent's system prompt generator to trigger voice-optimized response formatting (shorter sentences, no markdown, no code blocks). If the flag is stripped or not forwarded at any point in the message pipeline — SSE event, message persistence, agent session codec, Hermes adapter — the agent responds in its default text format. The TTS then tries to synthesize text containing backtick code blocks, markdown headers, and bullet points, producing robotic-sounding output like "backtick backtick backtick bash backtick backtick backtick."
Why it happens: Message metadata (anything beyond content and role) is treated as optional. Each layer in the pipeline — the Express route handler, the SSE broadcaster, the DB persistence layer, the adapter's session encoder — may serialize/deserialize the message and drop non-standard fields. The flag is present in the browser but never reaches the agent runtime.
How to avoid:
- Audit every layer the message passes through: client →
POST /api/chat/messages→ message persistence → agent session codec → Hermes adapter system prompt. VerifyisVoiceMode(or equivalent) is preserved at each layer. - Use Paperclip's existing message metadata mechanism (if present) rather than adding a top-level field that might be stripped. Check whether the message schema has a
metadataJSON column. - Test the full pipeline end-to-end: send a voice-flagged message and check the agent's system prompt (log it in development mode) to confirm the voice formatting instruction is present.
- The dual output pattern (voice-optimized response + full text with code blocks) requires the LLM to produce two outputs or a structured output with separate fields. Design this contract before implementing either end.
Warning signs:
- Agent responses in voice mode contain markdown formatting
- TTS output includes spoken "asterisk", "backtick", or "hash" characters
isVoiceModepresent in browser network request payload but absent in server-side agent session log- Voice and text responses are identical in content and format
Phase to address: Phase 1 (Voice mode flag) — the flag propagation path must be fully designed and tested with a no-op handler before the dual output pattern is built on top of it.
Pitfall 33: Telegram Bridge Creating a Competing Session Identity for the Same Agent
What goes wrong: Paperclip's agent session model assigns one session per agent per "channel" (web, API, etc.). The Telegram bridge opens a new channel. If the bridge creates a new session for each Telegram message instead of maintaining a persistent session for the Telegram channel, the agent loses context between Telegram messages — each message starts a fresh conversation.
A documented related bug in similar systems: when the Telegram bridge relays a message through the web channel gateway instead of its own channel, the session's channel field gets overwritten from telegram to webchat. Subsequent agent replies are then routed to the web UI, not back to Telegram. The user sends a Telegram message and the reply appears in the browser but never arrives in Telegram.
Why it happens: Reusing the existing web session mechanism (the path of least resistance) overwrites session channel metadata. The Telegram bridge needs its own channel identity that the session persists.
How to avoid:
- Create a dedicated
telegramchannel type in the session layer. Do not route Telegram messages through thewebchatgateway — use a separate message path that preserveschannel: "telegram". - Maintain a persistent session per Telegram chat ID (not per message). Store the
sessionId ↔ chatIdmapping in a lightweight lookup (in-memory Map for single-user deployment, or a simple JSON file). - On agent reply, inspect the originating session's channel field. Route replies to Telegram if
channel === "telegram", to the web UI ifchannel === "webchat". Never allow this routing to be overwritten by message relay logic. - Test the routing explicitly: send a Telegram message, verify the reply arrives in Telegram (not in web UI), send a web chat message to the same agent, verify the reply arrives in the browser (not in Telegram).
Warning signs:
- Telegram messages receive no reply in Telegram but a reply appears in the web chat interface
- Session
channelfield changes fromtelegramtowebchatafter the first reply - New session created for every Telegram message (no conversation continuity)
- Agent session table grows unboundedly (new session per Telegram message)
Phase to address: Phase 3 (Telegram bridge session layer) — session identity design must be finalized before the bridge handler is implemented.
Pitfall 34: Telegram Webhook vs. Long Polling — Wrong Choice for This Deployment
What goes wrong: Telegram bots receive updates either via long polling (getUpdates) or webhooks. The choice matters for this deployment:
- Webhook requires: a publicly accessible HTTPS URL, a valid TLS certificate, and a port in [80, 88, 443, 8443]. The Nexus deployment is a local Mac Mini without a public URL. Webhooks do not work behind NAT/LAN without a tunnel (ngrok, Cloudflare Tunnel, etc.).
- Long polling works from behind NAT. The bot proactively calls
getUpdatesevery N seconds. No public URL required.
The trap: developers set up webhooks because webhook tutorials are more common, then wonder why Telegram isn't delivering updates. Or they use long polling but run multiple processes that all call getUpdates simultaneously — Telegram delivers each update to only one caller, so updates are split between processes and lost.
Why it happens: Webhook is the "production-grade" recommendation in most Telegram bot guides. Local deployment contexts are underrepresented in tutorials.
How to avoid:
- For this deployment (local Mac Mini, no public URL, single user): use long polling. It is simpler, works behind NAT, and the latency difference (1–2 seconds vs. real-time) is irrelevant for a personal assistant.
- Ensure only ONE process calls
getUpdates. If the Express server restarts, verify the previous polling loop has stopped before starting a new one. - Use a Telegram bot library (Telegraf, grammY) rather than raw HTTP polling — these libraries handle the polling loop, update acknowledgement, and error recovery correctly.
- Never mix polling and webhooks: if a webhook was previously registered, it must be explicitly deleted (
deleteWebhook) before long polling will work.
Warning signs:
- Telegram updates not arriving despite correct bot token
- Some messages received, others not (multiple processes polling simultaneously)
setWebhookcalled during testing but server not publicly accessiblegetWebhookInforeturns a webhook URL pointing tolocalhost
Phase to address: Phase 1 (Telegram bridge setup) — polling vs. webhook decision must be made before any bot code is written.
Pitfall 35: TTS Synthesizing Agent Prefixes, Timestamps, and Metadata Verbatim
What goes wrong: The Telegram bridge prefixes agent replies with the agent name: [Nexus] Here is your answer.... The web chat renders this as a styled badge. When the same message content is passed to TTS, Piper synthesizes "bracket Nexus bracket Here is your answer" — the prefix is read aloud verbatim.
Similarly, if any message metadata (issue IDs like #ISS-42, timestamps, Markdown formatting characters) reaches the TTS synthesis input without being stripped, the audio sounds broken and robotic.
Why it happens: The message content as stored is the same string used for both display (where the prefix is rendered as a badge) and audio synthesis. The stripping step is obvious in retrospect but is easily forgotten when the display rendering works correctly.
How to avoid:
- Create a
sanitizeForTTS(text: string): stringutility function applied before any text reaches the Piper synthesis call. It strips: agent prefixes ([AgentName]patterns), Markdown formatting (**,*,#, backticks,>blockquote markers), issue/task IDs (#ISS-\d+,#TSK-\d+), URLs (replace with "link"), and code blocks (replace with "code example"). - Apply this sanitization at the TTS layer, not at the storage layer — the stored message should remain unmodified so the web UI can render it correctly.
- For the dual output pattern (voice-optimized + full text), the voice-optimized variant should already be prose-formatted —
sanitizeForTTSis a safety net, not the primary formatting mechanism.
Warning signs:
- TTS reads "asterisk asterisk important asterisk asterisk" instead of "important"
- TTS reads "hash" or "pound" characters from Markdown headers
- Agent prefix brackets audible in playback
- Code block content being read aloud character by character
Phase to address: Phase 2 (Piper TTS pipeline + dual output pattern) — sanitizeForTTS must exist before the first TTS integration test.
Pitfall 36: CPU-Only Whisper Model Size Too Large for Acceptable Latency
What goes wrong: The Whisper model family spans: tiny (39M params, ~200MB), base (74M, ~300MB), small (244M, ~500MB), medium (769M, ~1.5GB), large (1.5B, ~3GB). On Apple Silicon M4 with Metal/MLX acceleration, the medium model runs in under 1 second for typical voice input. On CPU-only fallback (or if MLX is not configured), medium model transcription takes 4–15 seconds for a 10-second clip — too slow for interactive voice use.
Developers test on the primary deployment target (M4 with MLX fast path) and set model: "medium" as the default. On any other machine (CI server, Docker container, Linux server without Metal), the same default makes the feature unusable.
Why it happens: The bottleneck is hardware-dependent and only surfaces when Metal/MLX is unavailable. The test environment is the Mac Mini M4 where everything is fast.
How to avoid:
- Make the Whisper model size configurable at startup, not hardcoded. Default to
small(good accuracy, fast on CPU), allow upgrade tomediumorlargein config. - Add hardware detection to the STT sidecar startup: if Apple Silicon + MLX available, default to
medium; if CPU-only, default tosmallortiny. - Benchmark the chosen model on the target hardware before committing to it:
time python3 -c "from faster_whisper import WhisperModel; m = WhisperModel('small'); list(m.transcribe('test.wav'))". - For the Mac Mini M4 specifically:
mlx-whisperorwhisper-mlxuses Apple's MLX framework and is 2–8x faster than faster-whisper's CPU path, and does not require CUDA.
Warning signs:
- Transcription taking 5+ seconds for a 5-second voice clip
- Default model is
mediumorlargewithout hardware detection - MLX not installed or not used (check:
python3 -c "import mlx"should succeed on M4) - STT latency acceptable on the dev machine but reported as "frozen" on other hardware
Phase to address: Phase 1 (Whisper STT pipeline + CPU fallback) — model selection logic and hardware detection must be in place before the latency target (<3 seconds) is validated.
Pitfall 37: Telegram File Downloads Blocking the Bot Event Loop
What goes wrong: When a Telegram voice message arrives, the bot must: (1) call getFile to get the file path, (2) download the file from Telegram's CDN, (3) transcode with ffmpeg, (4) transcribe with Whisper. Steps 2–4 each take 0.5–3 seconds. If the bot processes messages synchronously in the main event loop, it cannot acknowledge incoming updates during this window. Telegram resends unacknowledged updates after a timeout, causing the bot to process the same voice message multiple times and flood the agent with duplicate transcriptions.
Why it happens: Bot frameworks (Telegraf, grammY) handle one update at a time by default. Voice message handling is I/O-heavy. The simple implementation puts all processing in the message handler, which blocks the next update from being processed.
How to avoid:
- Acknowledge the Telegram update immediately (return from the handler without awaiting the full pipeline). Kick off the transcription + agent call pipeline asynchronously.
- Use a per-chat-ID in-flight tracker: if a voice transcription is already in progress for a given
chatId, queue the next one rather than spawning a second concurrent pipeline. - Send an intermediate "Transcribing..." status message to Telegram immediately after receiving the voice message, so the user gets immediate feedback while the pipeline runs.
- Set a timeout on the ffmpeg + Whisper steps. If transcription takes longer than 30 seconds, send an error reply and discard the audio.
Warning signs:
- Same Telegram voice message transcribed 2–3 times (duplicate update delivery)
- Bot stops responding to text messages while a voice message is being processed
- Telegram delivery reports showing retried updates
getFile+downloadFilecalls in the main event handler (not in a background task)
Phase to address: Phase 3 (Telegram bridge) — async pipeline architecture must be in place before end-to-end testing with real voice messages.
Pitfall 38: Piper Binary Not Found When Node.js Server Starts as a Service
What goes wrong: piper is installed to a user directory like ~/.local/share/piper-tts/piper or /usr/local/bin/piper. When Node.js server runs interactively in a terminal, the shell PATH includes this directory. When the server starts via a system service (launchd on macOS, systemd on Linux), the service environment has a minimal PATH that does not include user-local directories. child_process.spawn('piper', ...) throws ENOENT: no such file or directory.
This is a common and non-obvious failure: the feature works in development (interactive terminal) and silently fails in production (service startup).
Why it happens: Service environment PATH is not the same as interactive shell PATH. This is a standard UNIX gotcha that every server deployment eventually encounters.
How to avoid:
- Never rely on PATH resolution for subprocess binaries in server code. Store the absolute path to the Piper binary in the Nexus config file and use it explicitly in
spawn():spawn('/usr/local/bin/piper', ...). - Check for the binary at server startup and log its absolute path:
which piper || echo "piper not found in PATH"in the startup health check. - Add a
voices.piper_binary_pathconfig key that can be overridden in~/.paperclip/nexus.yamlwithout code changes. - The same issue applies to
ffmpeg. Both must be resolved to absolute paths.
Warning signs:
- TTS works when running
pnpm devbut fails when running via launchctl/systemctl ENOENTerrors in server logs forpiperorffmpegprocessesprocess.env.PATHin server context is shorter than interactive shell PATH
Phase to address: Phase 2 (Piper TTS pipeline) — absolute path configuration must be in place before any service deployment testing.
Moderate Pitfalls (v1.6)
Pitfall 39: Dual Output Pattern Producing Two Separate LLM Calls Per Voice Message
What goes wrong: The dual output pattern (voice-optimized response + full text with code blocks) is straightforward to implement as two separate LLM calls: one with a "respond in plain spoken prose" system prompt for TTS, one with the standard formatting for display. But two calls per voice message doubles cost and doubles latency. For a local Hermes/Ollama backend, this doubles the time-to-response.
An alternative (one call, structured output) requires the LLM to produce a JSON object with { voice: "...", text: "..." }. This requires reliable structured output, which is model-dependent — smaller models (7B) produce malformed JSON under structured output constraints more often than larger ones.
Why it happens: Two-call is the obvious, correct first implementation. The optimization is non-trivial and model-dependent.
How to avoid:
- For the MVP, use a single LLM call. Ask the agent to produce a voice-formatted response (plain prose, no markdown). Display the voice-formatted text in the chat UI as well — users reading the chat still get the content, just formatted for voice.
- Reserve the dual-output (voice prose + full-text-with-code) pattern for a later iteration when the voice pipeline is stable and the cost/latency of two calls is measurable.
- If dual output is required from the start: use function calling / tool use to get structured output rather than relying on JSON in the completion text. Most current models support structured output via the tools API more reliably than via raw JSON generation.
Warning signs:
- Two sequential LLM calls observed in the server log per voice-flagged message
- Latency in voice mode is 2× text mode latency
- Structured output JSON malformed ~10% of the time on the 7B model
Phase to address: Phase 1 (Voice mode flag + dual output pattern) — design the output contract before implementation to avoid a costly rewrite.
Pitfall 40: Audio Playback Autoplay Blocked by Browser Policy
What goes wrong: Browsers block audio.play() calls that are not triggered by a user gesture. The voice pipeline flow is: user records → server transcribes → agent responds → server synthesizes → client receives audio blob → client plays. The final audio.play() call is triggered by an SSE event or fetch response completion, not by a user gesture. Chrome and Safari block this as autoplay.
The feature appears to work in development (because the developer's browser has granted the page autoplay permissions during testing) and fails for first-time users on a clean browser profile.
Why it happens: Autoplay policies protect users from unexpected audio. Developers habitually run with autoplay unlocked in their dev browsers.
How to avoid:
- Require an explicit user gesture to initiate the voice mode session. The "start voice mode" button click counts as a user gesture — use it to create and unlock an
AudioContext(const ctx = new AudioContext(); await ctx.resume()). Once unlocked, the AudioContext can play audio without further gesture requirements for the remainder of the session. - Do not use
<audio>element autoplay. Instead, decode the received audio blob withAudioContext.decodeAudioData()and play viaAudioBufferSourceNode— this uses the already-unlocked context. - Test on a clean browser profile with default settings to verify autoplay works before shipping.
Warning signs:
- Audio plays fine in development but silently fails on first user visit
- Browser DevTools console shows "play() failed because the user didn't interact with the document first"
AudioContextstate issuspendedwhen audio playback is attempted
Phase to address: Phase 3 (Web chat audio playback) — AudioContext unlock must be part of the "start voice mode" button handler design.
Pitfall 41: Telegram Bot Token Stored in Environment Variable That Leaks Into Client Bundle
What goes wrong: The Telegram bot token (TELEGRAM_BOT_TOKEN) is a server-side secret. In a Vite/React monorepo, environment variables prefixed with VITE_ are bundled into the client. A developer who adds VITE_TELEGRAM_BOT_TOKEN to expose it to React code, or who imports a .env file in a Vite config context, risks the token appearing in the compiled JS bundle served to the browser.
Even without VITE_ prefix, if the token is loaded into a shared packages/shared module that is imported by both server and client code, Vite may tree-shake incorrectly and include it in the client bundle.
Why it happens: Monorepo shared package boundary between server and client code is not enforced by Vite's environment variable system. VITE_ prefix is the documented mechanism but developers sometimes work around it.
How to avoid:
- Store
TELEGRAM_BOT_TOKENin.env.server(not.envor.env.localwhich Vite reads). Usedotenvon the server explicitly; never load this file through Vite. - Validate at build time: add a lint check or Vite plugin that fails the build if any variable containing
TOKEN,SECRET, orKEYappears in the client bundle. - Keep Telegram bridge code entirely in
server/src/— never inpackages/shared/or any package imported by the UI.
Warning signs:
TELEGRAM_BOT_TOKENvalue visible in browser DevTools → Sources → compiled JS.envfile containingTELEGRAM_BOT_TOKENin the repo root (Vite reads this)- Telegram bridge code imported from a shared package used by UI code
Phase to address: Phase 1 (Telegram bridge setup) — secret handling policy must be validated before the bot token is added to any config file.
Pitfall 42: Voice Waveform UI Causing Unnecessary Re-renders During Recording
What goes wrong: The recording waveform visualization reads audio amplitude data from an AnalyserNode via requestAnimationFrame at ~60fps. If this data is stored in React state (useState), every frame triggers a re-render of the component tree above the waveform. For a chat interface with many messages, this causes perceptible jank during recording (dropped frames, slow scrolling).
Why it happens: Waveform amplitude data is time-series state that changes at animation frame rate. React state is not designed for 60fps updates. The trap is copy-pasting a CanvasRenderingContext2D waveform example that stores amplitude in useState without considering the re-render cost.
How to avoid:
- Read amplitude data via
useRef+ direct Canvas 2D drawing inside therequestAnimationFrameloop. Never put waveform data inuseState. - Keep the waveform Canvas element isolated from the React component tree — render it outside the main message list DOM subtree (e.g., as an absolutely positioned overlay) so re-draws do not trigger layout recalculation for sibling components.
- Stop the
requestAnimationFrameloop and disconnect theAnalyserNodeimmediately when recording stops — do not leave the loop running even at low amplitude.
Warning signs:
- React DevTools Profiler shows high commit count and component renders during recording
- Chat scroll is janky while recording
requestAnimationFramecallback showinguseStatesetter calls
Phase to address: Phase 2 (Web chat mic button + waveform UI) — Canvas-direct rendering pattern must be established before the waveform component is built.
Pitfall 43: Missing ffmpeg on Production Mac Mini Silently Disabling Voice
What goes wrong: The entire STT + Telegram audio pipeline depends on ffmpeg for audio format conversion. If ffmpeg is not installed, the transcode step fails. Depending on error handling: (a) the transcription endpoint returns an HTTP 500, (b) it returns an empty transcription, or (c) it silently discards the audio and moves on. Outcomes (b) and (c) are worse than (a) because the user sees no error.
ffmpeg is not installed by default on macOS. It is available via Homebrew (brew install ffmpeg) but is not a Node.js dependency and will not be installed by pnpm install.
Why it happens: ffmpeg is a system dependency, not an npm dependency. It is easy to forget to document, and installation instructions are frequently missing from setup guides.
How to avoid:
- Add a startup check to the Nexus server: detect ffmpeg at boot time and log its version. If absent, log a prominent warning and disable the voice pipeline gracefully (return a clear error from the transcription endpoint, show a "voice unavailable" state in the UI).
- Add ffmpeg installation to the
npx buildthissetup flow — if voice mode is enabled and ffmpeg is absent, the CLI should prompt to install it (brew install ffmpeg). - Document ffmpeg as a hard prerequisite for voice features in the onboarding hardware detection step.
Warning signs:
which ffmpegreturns nothing on the production machine- Voice features work in development (developer has ffmpeg) but fail in any fresh install
- Transcription endpoint returning 500 with no diagnostic message
Phase to address: Phase 1 (Whisper STT pipeline) — ffmpeg detection and graceful degradation must be implemented before any voice endpoint is exposed.
Minor Pitfalls (v1.6)
Pitfall 44: Telegram Agent Prefix Leaking Into Whisper Transcription Input
What goes wrong: The Telegram bridge formats replies as [AgentName] response text. If the bridge accidentally echoes the agent's own message back into the Whisper transcription pipeline (e.g., when relaying between agents or logging), Whisper transcribes the agent prefix along with the user's intended input. The resulting transcription contains [Nexus] previous response... prepended to whatever the user said. The agent receives this as its next input and behaves erratically.
Why it happens: Message relay and logging code passes message objects through the same pipeline as user input without filtering by sender type.
How to avoid:
- In the Telegram bridge handler, only transcribe messages where
update.message.from.id !== bot.id— never transcribe messages sent by the bot itself. - Apply a sender-type check before the transcription pipeline: if the message is from a bot, skip transcription and routing entirely.
Phase to address: Phase 3 (Telegram bridge) — sender-type filtering must be in the handler before end-to-end testing.
Technical Debt Patterns
| Shortcut | Immediate Benefit | Long-term Cost | When Acceptable |
|---|---|---|---|
| Browser-side Puter.js SDK instead of server adapter | Faster to ship | Bypasses cost tracking, skill sync, session codec; creates split-brain | Never for production use |
localStorage for OAuth tokens |
Easy to implement | XSS exposure; migration required if key renamed; conflicts with upstream Paperclip keys | Never; use server-side secrets storage |
os.totalmem() for RAM recommendations |
One-line implementation | Overestimates available RAM on loaded systems; misleads model recommendations | Only as a fallback when freemem() is not available |
| Polling for hardware detection status | Avoids SSE complexity | Hammers server during onboarding; creates race conditions with slow detection | Only if SSE is unavailable |
| Inline Piper model download on first TTS call | Zero extra onboarding step | Silent hang on first use; poor UX; perceived as broken feature | Never; always pre-warm |
| Flat memory injection (all memories into every prompt) | Simple implementation | Context window overflow; irrelevant memories degrade response quality | Only for prototyping |
| No mode discriminator on conversations table | No schema change needed | Mode cross-contamination; hard to query assistant vs. agent conversations | Acceptable with explicit agent-based filtering |
| Spawn new Piper process per TTS request | Trivial first implementation | 200–800ms model reload per request; long-text truncation bug | Never; use persistent process |
| Skip ffmpeg transcode, send raw audio to Whisper | One less dependency | Silent transcription failures on non-WAV formats; broken on Safari/Firefox | Never; transcode is mandatory |
| Whisper in-process (Python subprocess per request) | No sidecar to manage | Model reload on every call; memory leak; concurrent request memory doubling | Only for one-off scripts, never for server |
| Telegram webhook on local server | "Production-grade" pattern | Requires public URL; breaks behind NAT; doesn't work for this deployment | Never for local Mac Mini deployment |
useState for waveform animation data |
Familiar React pattern | 60fps state updates cause continuous re-renders; UI jank during recording | Never; use useRef + Canvas direct |
Integration Gotchas
| Integration | Common Mistake | Correct Approach |
|---|---|---|
| Puter.js | Load browser SDK, call puter.ai.chat() directly |
Implement as server-side HTTP adapter; Puter token stored in Paperclip config |
| Piper TTS (WASM) | Call synthesis on first user message | Pre-warm on background thread during onboarding step; show download progress |
| Ollama probe | Probe at onboarding time without board auth | Use a dedicated unauthenticated /system/providers endpoint for pre-auth hardware detection |
| MCP tools | Add tools with generic names (terminal, search) |
Namespace all MCP tools: nexus_memory_*, nexus_context_* |
| Google OAuth | Store access token in localStorage |
Exchange code server-side; store token in Paperclip secrets; never expose to browser |
| Upstream rebase after v1.5 | Forget to diff OnboardingWizard.tsx against upstream |
Post-rebase protocol: diff the aliased-away file, integrate any new upstream props |
| Apple Silicon VRAM | Report os.totalmem() as available GPU memory |
Use os.freemem() with explicit copy: "unified memory, shared with OS" |
| Whisper STT (server-side) | Pass raw browser WebM to Whisper | Transcode to 16 kHz mono WAV via ffmpeg first; Whisper expects PCM WAV |
| Telegram voice messages | Assume same pipeline as browser audio | Telegram sends OGG/Opus at 48 kHz; same ffmpeg transcode step applies but source is CDN download |
| Piper TTS (server-side binary) | Spawn new process per request | Keep Piper as persistent HTTP sidecar; model stays loaded between requests |
| Telegram bot updates | Use webhooks for local deployment | Use long polling (getUpdates) — works behind NAT, no public URL required |
| Telegram bot token | Add VITE_TELEGRAM_BOT_TOKEN for debugging |
Keep token server-side only; never in Vite env variables |
| Audio autoplay (browser) | Call audio.play() from SSE event handler |
Unlock AudioContext on the "start voice mode" gesture; play via AudioBufferSourceNode |
| ffmpeg dependency | Assume it is installed | Detect at server startup; degrade gracefully with clear error; add to npx buildthis setup |
Performance Traps
| Trap | Symptoms | Prevention | When It Breaks |
|---|---|---|---|
| Sequential provider probes in onboarding | Each probe adds 3s+ to wizard load time | Probe all providers in parallel with Promise.allSettled() |
Any multi-provider step with 3+ probes |
| Memory retrieval on every chat message | 200-500ms added to every response | Cache last N memories; only re-fetch if conversation context changes | Systems with >100 stored memory fragments |
| Piper TTS blocking main thread | UI freezes during synthesis | Run Piper WASM in a Web Worker; stream audio chunks as they generate | Models larger than small/medium quality |
| Ollama model catalog loaded from disk on every request | File I/O on every recommendation call | Load and cache catalog at server startup, not per-request | High-frequency polling during onboarding |
| MCP tool calls in the critical path of assistant response | Latency spikes when memory server is slow | Make MCP tool calls non-blocking where possible; set aggressive timeouts | MCP server under load or starting up |
| Whisper model reload per STT request | 500MB+ memory spike; 2–5s startup delay per transcription | Persistent sidecar process; model loaded once at startup | First concurrent request pair |
| Piper process spawn per TTS request | 200–800ms model reload per voice response | Persistent Piper process or HTTP sidecar | Any production traffic |
| Telegram file download in main bot handler | Bot stops processing messages during download | Download + transcode + transcribe in async background task | Any voice message >2 seconds |
| 60fps waveform data in React state | Chat UI jank during recording | Canvas-direct rendering via useRef, no React state for amplitude data |
Any component tree with >20 chat messages |
| No request queue on Whisper sidecar | Memory doubles under concurrent requests | Semaphore pattern; max 1 concurrent transcription | 2+ simultaneous voice inputs |
Security Mistakes
| Mistake | Risk | Prevention |
|---|---|---|
Storing OAuth tokens in localStorage |
XSS can steal tokens; Paperclip key collision | Server-side token storage in existing secrets mechanism |
| Persisting raw user input in memory without sanitization | Credential leakage; prompt injection across sessions | Regex-based blocklist at write time; strip instruction-like syntax |
| Unauthenticated MCP endpoint exposure | External callers invoking memory read/write | MCP server bound to localhost only; board auth required for all tool calls |
| Puter.js API key in browser bundle | Key exposure in DevTools | Server-side Puter adapter; no Puter credentials in browser |
| Recording audio without explicit per-session consent indicator | Privacy violation perception | Show persistent recording indicator; stop all audio tracks immediately on stop |
VITE_TELEGRAM_BOT_TOKEN in environment |
Token bundled into client JS; visible in DevTools | Server-only env vars for all bot tokens; no VITE_ prefix for secrets |
| Telegram bridge accepting messages from any chat ID | Unauthorized users can send commands to agent | Whitelist allowed chatId values in config; reject all other chat IDs |
| Audio files persisted to disk without cleanup | Disk space exhaustion; audio data retained longer than needed | Delete transcoded WAV files immediately after Whisper transcription |
UX Pitfalls
| Pitfall | User Impact | Better Approach |
|---|---|---|
| Multi-step wizard with no skip-all option | Users with existing tools feel trapped | "Skip setup" at top of wizard; minimum valid state if skipped |
| Showing all providers as equally valid | Decision paralysis; wrong choice for hardware | Pre-select the best option; others are secondary alternatives |
| TTS toggle with no download state | Appears broken; silent 15-30s wait | Pre-warm voice model; show download progress before toggle is active |
| Hardware detection with false confidence | User loads model that OOMs | Label recommendations as "estimated" not "guaranteed"; add safety margin |
| Mode selection before hardware detection | User picks "Personal AI Assistant" but their hardware can't run local models | Show hardware detection first; mode recommendation follows hardware capability |
| Summary screen with no way to change a step | User made wrong choice earlier; stuck | Every summary item links back to the relevant step |
| No intermediate "transcribing..." feedback on Telegram | User resends voice message thinking it was lost | Send immediate typing indicator + "Transcribing..." message to Telegram |
| Voice auto-stop firing mid-sentence | Partial input submitted; confusing agent response | Use VAD library (Silero/@ricky0123/vad-web) not threshold detection; add manual stop button |
| TTS reading agent prefix brackets aloud | Robotic "bracket Nexus bracket" audio | sanitizeForTTS() strips all formatting before synthesis |
| Autoplay blocked with no feedback | Audio response plays silently; user thinks voice is broken | Unlock AudioContext on voice mode toggle; show clear "tap to enable audio" prompt if blocked |
"Looks Done But Isn't" Checklist
- Puter.js adapter: Is it going through the server-side adapter machinery (cost tracking, heartbeat, session codec) or calling Puter's API directly from the browser?
- Adapter probe during onboarding: Does it work before board auth is established (fresh install) or does it silently return 403?
- Piper TTS first use: Has the warmup been tested on a clean browser profile with no OPFS cache?
- Persistent memory: Are there sanitization filters at write time preventing credential storage?
- MCP tool names: Have all Nexus MCP tools been checked against the Hermes
TOOLS.mdskill bundle for name collisions? - OAuth token storage: Is the refresh token stored server-side? Is the browser holding only a session indicator, not the raw token?
- Mode isolation: Can assistant conversation history be queried without surfacing project builder agent conversations?
- Onboarding skip: Does skipping every step produce a usable workspace with at least one agent?
- Apple Silicon VRAM copy: Does the hardware detection screen say "unified memory" not "VRAM" for M-series chips?
npx buildthispackage name: Hasnpm search buildthisbeen run to verify no collision?- Upstream OnboardingWizard diff: After the v1.5 wizard is built, has
OnboardingWizard.tsxbeen diffed against upstream to check for new props thatNexusOnboardingWizard.tsxneeds to handle? - Audio format transcode: Does the
/transcribeendpoint transcode to 16 kHz mono WAV before passing to Whisper? Test with a Safari recording (mp4) and Firefox recording (ogg). - Telegram OGG pipeline: Is the Telegram voice download → ffmpeg → Whisper path tested with a real Telegram voice message (not a local file)?
- Piper persistent process: Is Piper running as a persistent process/sidecar, not spawned per request? Check
ps aux | grep pipercount during consecutive TTS calls. - Whisper sidecar health check: Does the server wait for the Whisper sidecar
/healthendpoint before routing STT requests? - Voice mode flag propagation: Is
isVoiceModepresent in the agent's system prompt log? Check server logs for a voice-flagged message. - TTS sanitization: Does
sanitizeForTTS()strip agent prefixes, Markdown, and issue IDs? Test with a response containing backtick code blocks. - Telegram session routing: After sending a Telegram message, does the reply appear in Telegram (not web UI)? Check session
channelfield in DB. - Long polling only: Is
deleteWebhookcalled on bot startup to ensure no stale webhook is registered? - AudioContext unlock: Does audio autoplay work on a fresh browser profile (no stored autoplay permissions)?
- ffmpeg at startup: Does the server log ffmpeg version on startup? Does it gracefully disable voice with a clear error if ffmpeg is absent?
- Telegram token not in client bundle: Does
grep -r "VITE_TELEGRAM" ui/srcreturn nothing? - Telegram chat ID whitelist: Does the bot reject messages from unknown chat IDs?
- Audio file cleanup: Are transcoded WAV temp files deleted after transcription?
Recovery Strategies
| Pitfall | Recovery Cost | Recovery Steps |
|---|---|---|
| Puter.js browser-side integration shipped | HIGH | Rewrite as server-side adapter; migrate conversation history to route through server |
OAuth tokens in localStorage shipped |
HIGH | Server-side migration: on next load, detect browser-stored tokens, exchange for server-stored ones, clear localStorage |
| Persistent memory storing credentials | HIGH | Purge memory store; add retroactive scan-and-delete for credential patterns; add blocklist |
| Piper TTS no warmup (silent hang) | LOW | Add warmup call in background; show download progress indicator |
| Model catalog stale | LOW | Add fallback heuristic; document update process |
| Onboarding probe auth-gated on board auth | MEDIUM | Add unauthenticated system/providers endpoint; update wizard to use new endpoint |
| Mode contamination in conversations table | MEDIUM | Add agent-based filter to conversation queries; document the filtering convention |
| Piper spawn-per-request shipped | MEDIUM | Wrap Piper in persistent HTTP sidecar; update spawn calls to HTTP requests; no data migration needed |
| Whisper in-process (no sidecar) shipped | HIGH | Extract to standalone FastAPI/Flask sidecar; update all Node.js callers; retest on CPU fallback path |
| Telegram webhook on local deploy | LOW | Call deleteWebhook; switch to getUpdates long polling; update bot startup code |
| Telegram session channel overwritten | MEDIUM | Add dedicated telegram channel type; audit all sessions_send call sites; retest routing |
VITE_TELEGRAM_BOT_TOKEN in bundle shipped |
HIGH | Rotate bot token immediately; move to server-only env var; rebuild and redeploy |
| ffmpeg missing, voice silently broken | LOW | Install ffmpeg; add startup check to catch future regressions |
| Audio autoplay blocked | LOW | Implement AudioContext unlock on voice mode toggle; test on clean browser profile |
Pitfall-to-Phase Mapping
| Pitfall | Prevention Phase | Verification |
|---|---|---|
| Vite alias swap breaking upstream rebase (12) | Phase 1 — Hardware Wizard | Post-rebase diff protocol in place and documented |
| Hardware detection inaccuracy on Apple Silicon (13) | Phase 1 — Hardware Detection | Unit test: compare totalmem() vs freemem() recommendations; verify M4 copy says "unified" |
| Probe endpoint requires board auth (14) | Phase 1 — Hardware Detection | Test: call probe endpoint with no board auth cookie; should succeed |
| Puter.js bypassing adapter system (15) | Phase 2 — Zero-Config Cloud | Verify: Puter sessions appear in cost tracking with correct provider label |
| OAuth tokens in localStorage (16) | Phase 3 — OAuth | Verify: no OAuth tokens visible in browser DevTools localStorage |
| Multi-provider creating competing defaults (17) | Phase 1 — Mode Selection | Test: skip-all onboarding produces exactly one adapter type per agent |
| Piper TTS cold start hang (18) | Phase 4 — Voice TTS | Test: fresh browser profile, enable TTS, measure time-to-first-audio |
| Memory prompt injection (19) | Phase 5 — Persistent Memory | Test: paste a credential into chat; verify it is NOT stored in memory DB |
| MCP tool name collision (20) | Phase 5 — MCP Integration | Audit: compare MCP tool names against TOOLS.md before shipping |
npx buildthis package name collision (21) |
Phase 6 — CLI | Run npm search buildthis before publishing |
| Skip-all onboarding broken (22) | Phase 1 — Mode Selection | Test: skip every step; verify workspace + one agent created |
| Assistant/project builder context bleed (23) | Phase 2 — Mode Selection | Test: assistant query does not surface issue IDs from project builder |
| Subscription detection false positives (24) | Phase 3 — Subscription Detection | Test: revoke an API key; verify wizard shows "unverified" not "ready" |
| Project handoff losing context (25) | Phase 5 — Persistent Memory | Test: handoff includes conversation ID, not just flat text summary |
| Model catalog staleness (26) | Phase 1 — Hardware Detection | Test: install an uncatalogued Ollama model; verify fallback heuristic fires |
| Audio format mismatch browser → Whisper (27) | v1.6 Phase 1 — Whisper STT | Test: record on Safari + Firefox; verify both transcribe correctly |
| Telegram OGG/Opus 48 kHz mismatch (28) | v1.6 Phase 3 — Telegram audio | Test: send real Telegram voice message; verify transcription succeeds |
| Piper spawn-per-request (29) | v1.6 Phase 2 — Piper TTS | Verify: ps aux | grep piper shows one persistent process, not N per request |
| Whisper model reload per request (30) | v1.6 Phase 1 — Whisper sidecar | Verify: memory stays flat across 10 consecutive transcription requests |
| Browser silence detection too eager/late (31) | v1.6 Phase 2 — Web mic button | Test: natural mid-sentence pause does not auto-submit; quiet room does not stall recording |
| Voice mode flag not propagated (32) | v1.6 Phase 1 — Voice mode flag | Test: voice-flagged message; verify agent system prompt contains voice formatting instruction |
| Telegram competing session identity (33) | v1.6 Phase 3 — Telegram session | Test: Telegram message reply arrives in Telegram, not web UI |
| Telegram webhook on local deploy (34) | v1.6 Phase 1 — Telegram setup | Verify: getWebhookInfo returns empty webhook URL; bot uses long polling |
| TTS synthesizing agent prefixes verbatim (35) | v1.6 Phase 2 — TTS sanitization | Test: agent reply with [Nexus] prefix; verify audio does not contain "bracket" |
| Whisper model too large for CPU fallback (36) | v1.6 Phase 1 — CPU fallback | Benchmark: transcribe 10s clip on CPU-only path; must complete in <5s |
| Telegram file download blocking event loop (37) | v1.6 Phase 3 — Telegram async | Test: send voice message; verify text messages still processed during download |
| Piper binary not found in service PATH (38) | v1.6 Phase 2 — Piper binary config | Test: start server via launchctl; verify Piper path resolves |
| Dual output two LLM calls doubling latency (39) | v1.6 Phase 1 — Output pattern design | Verify: single LLM call per voice message in server logs |
| Audio autoplay blocked by browser policy (40) | v1.6 Phase 3 — Web audio playback | Test: fresh browser profile; voice response plays without user interaction after voice mode toggle |
| Telegram bot token in client bundle (41) | v1.6 Phase 1 — Telegram setup | Verify: grep -r TELEGRAM ui/dist returns nothing |
| Waveform causing React re-renders (42) | v1.6 Phase 2 — Waveform UI | Profile: React DevTools shows no re-renders in parent components during recording |
| ffmpeg missing on production (43) | v1.6 Phase 1 — Whisper STT | Verify: server logs ffmpeg version on startup; which ffmpeg on production machine |
| Telegram agent prefix in transcription input (44) | v1.6 Phase 3 — Telegram handler | Verify: bot-originated messages are filtered before the transcription pipeline |
Critical Pitfalls — v1.7 Content Generation Layer
Pitfall 45: Calling bundle() Per Render Request
What goes wrong: @remotion/bundler's bundle() function runs Webpack to compile the Remotion composition. When called on every video render request, Webpack runs from scratch each time — taking 2–5 minutes before a single frame is encoded. At two concurrent render requests, the server becomes unresponsive. The first symptom is a request queue that grows indefinitely.
Why it happens: Remotion's SSR docs document bundle() and renderMedia() as a two-step pipeline. Developers naturally call both steps together per request. The anti-pattern is not obvious because both functions are in the same @remotion/renderer + @remotion/bundler package and the docs show them sequentially in examples.
How to avoid:
- Call
bundle()once at server startup (or once when compositions change), cache the bundle path in memory. - Each render request reuses the cached bundle path and only calls
renderMedia()with differentinputProps. - If compositions change at runtime, invalidate the bundle cache explicitly and re-bundle asynchronously — do not block render requests.
- For the Mac Mini M4 single-user deployment: a startup bundle is fine; no need for elaborate cache invalidation. Re-bundle on process restart.
Warning signs:
bundle()call inside the same function/route handler asrenderMedia()- Render requests taking 3+ minutes for a 30-second video
- Server logs showing Webpack compilation on every render
- CPU pegged at 100% from the second concurrent render request
Phase to address: Phase 1 (Remotion integration foundation) — bundle caching must be established before any render endpoint is exposed.
Pitfall 46: Remotion Chromium Concurrency Thrashing on Mac Mini M4
What goes wrong: Remotion spawns one headless Chromium instance per concurrent render frame by default. concurrency: "100%" on a 10-core M4 spawns 10 Chrome instances. Each Chromium instance uses ~200–400MB RAM. At 10 instances rendering a complex composition with video assets, the Mac Mini (16GB RAM) hits memory pressure, macOS begins swapping, and render times increase 3–10x. The system may become temporarily unresponsive to UI requests.
Why it happens: Remotion's concurrency model is designed for cloud rendering where the machine has many cores. On a shared personal machine running the full Nexus server stack (Node.js server, Hermes/Ollama, UI), the available RAM for rendering is significantly less than total system RAM.
How to avoid:
- Set
concurrency: 4as the default for the Mac Mini M4 (leaves ~8 cores for other processes). - Run
npx remotion benchmarkagainst the specific composition type to find the actual optimal concurrency for the hardware. - Do not run Remotion renders concurrently with heavy Ollama inference — implement a simple render queue that checks if an Ollama session is active before starting a render.
- In headless mode, Chromium disables GPU acceleration by default (software rasterization). This is slower but more memory-stable than GPU mode for this use case.
Warning signs:
- System becoming sluggish during video render
- Memory pressure in Activity Monitor during render
- Render time increasing non-linearly with video length
concurrencynot set (defaults to 100% of cores)
Phase to address: Phase 1 (Remotion integration) — concurrency configuration must be set before first production render test.
Pitfall 47: Bundling Remotion Inside an Already-Bundled Server Context
What goes wrong: The Nexus server is built with tsc or esbuild into a dist/ directory and run from there. Remotion's bundle() function calls Webpack internally and must be invoked from a non-bundled context with access to the raw source file entry point. When bundle() is called from inside the compiled server bundle, it cannot find the Remotion composition source files and throws path resolution errors or silently produces empty bundles.
Why it happens: bundle() requires an absolute path to the Remotion entry point (the .tsx file). When the server is compiled, __dirname and relative paths change. The Remotion entry point lives in the UI package (ui/src/remotion/) but the server calls bundle() — a cross-package path dependency that breaks after compilation.
How to avoid:
- Keep the Remotion composition source files in a dedicated
packages/remotion-compositions/package that is never compiled (stays as TypeScript source). - Pass the absolute path to this package as a config value (
REMOTION_COMPOSITIONS_PATH) rather than computing it from__dirnameat runtime. - In the server, resolve the entry point at startup and log it:
const entryPoint = path.resolve(process.env.REMOTION_COMPOSITIONS_PATH, 'index.ts'). Fail fast if it does not exist. - Run
bundle()in a separate worker process or child process — never inline in the main Express server process.
Warning signs:
bundle()working in development (ts-node, pnpm dev) but failing afterpnpm build- Path resolution errors pointing to
dist/subdirectories for Remotion entry - Webpack "module not found" errors for composition files during server-side render
Phase to address: Phase 1 (Remotion integration) — entry point resolution strategy must be validated in the compiled server build before any further Remotion work.
Pitfall 48: 10MB File Size Limit Blocks Video and Large Image Storage
What goes wrong: The existing Nexus/Paperclip storage layer enforces a 10MB maximum file size (MAX_ATTACHMENT_BYTES = 10 * 1024 * 1024 in server/src/attachment-types.ts). A 30-second 1080p video rendered by Remotion is typically 20–200MB. A high-quality wallpaper image at 4K is 5–30MB. Any attempt to store a generated video or large image through the existing attachment/assets upload routes returns HTTP 422 with "File exceeds 10485760 bytes".
Additionally, video/mp4 and other video MIME types are not in DEFAULT_ALLOWED_TYPES. Both the byte limit and the MIME type allowlist must be extended.
Why it happens: The original limit was set for user-uploaded document attachments (PDFs, images for chat). Generated content is structurally different — it is produced by the system, not uploaded by the user — but routes through the same storage pipeline.
How to avoid:
- Create a separate storage namespace for generated content:
namespace: "generated"with its own size limits (e.g., 500MB per file, 5GB total per workspace). - Do not modify
MAX_ATTACHMENT_BYTESglobally — it is the correct limit for user attachments. Add a parallel constantMAX_GENERATED_ASSET_BYTES. - Add video MIME types to the allowed set for the generated assets route only:
video/mp4,video/webm. - For Remotion output: write directly to the storage provider using
putObjectafter render completes, bypassing the upload multipart route entirely. The render runs server-side; no HTTP upload is needed. - Add a manifest record linking the generated asset to its originating task/issue so the file can be garbage-collected when the task is deleted.
Warning signs:
- HTTP 422 errors when the server tries to store generated video
video/mp4silently rejected byisAllowedContentType()- Large generated images silently truncated or rejected
- Trying to POST a 50MB video through the existing
/api/companies/:id/assetsupload route
Phase to address: Phase 1 (Storage and file size foundations) — must be resolved before any content type produces files larger than 10MB.
Pitfall 49: Mermaid securityLevel "loose" Enabling XSS to RCE
What goes wrong: Mermaid diagrams rendered with securityLevel: "loose" allow click directives that execute arbitrary JavaScript. In an Electron-based or server-rendered context, this becomes remote code execution. In 2025–2026, multiple production apps (OneUptime, DeepChat) were exploited through this vector. The natural language → Mermaid pipeline means AI-generated diagram syntax reaches the renderer — AI models can be prompted to include malicious click directives.
Per-diagram %%{init: {"securityLevel": "loose"}}%% directives can override the global setting, so even a "strict" default can be bypassed if the diagram source is not sanitized before passing to mermaid.render().
Why it happens: "loose" mode is documented as enabling "interactive diagrams." Developers enable it to support click events in presentations. The security implication is not obvious from the API surface. AI-generated Mermaid is treated like static diagram syntax rather than untrusted input.
How to avoid:
- Always use
securityLevel: "strict"globally — no exceptions. - Before passing any Mermaid source (including AI-generated) to
mermaid.render(), strip%%{init}%%directives andclickstatements using a regex preprocessor. - After
mermaid.render()returns SVG, sanitize the SVG output with DOMPurify (usingisomorphic-dompurifyfor Node.js server-side rendering) before storing or returning to the client. - Treat all Mermaid source as untrusted input regardless of origin — even AI-generated diagrams can be manipulated via prompt injection.
Warning signs:
securityLevel: "loose"anywhere in Mermaid config- Mermaid source passed directly to
mermaid.render()without preprocessing - No SVG sanitization step after render
%%{init}%%directives in AI-generated diagram source not stripped
Phase to address: Phase 3 (Mermaid diagram generation) — security config must be locked before any diagram rendering is exposed to the UI.
Pitfall 50: DOMPurify Server-Side Memory Accumulation with JSDOM
What goes wrong: Server-side SVG sanitization with DOMPurify requires a DOM environment. The standard approach is isomorphic-dompurify backed by JSDOM. In a long-running Node.js process, each DOMPurify.sanitize() call accumulates DOM state inside the JSDOM window object. Over hundreds of diagram renders, the JSDOM window grows unboundedly, causing progressive memory increase and eventual OOM.
Additionally, using happy-dom instead of JSDOM as the DOM provider is documented as unsafe and likely to produce XSS bypasses.
Why it happens: JSDOM is designed for single-use in tests, not as a long-running in-process DOM. The memory accumulation is subtle — no immediate crash, just gradual slowdown.
How to avoid:
- Use
isomorphic-dompurifywith JSDOM (nothappy-dom). - After every N sanitization calls (e.g., 100), call the window cleanup method to release JSDOM state. Alternatively, create a fresh JSDOM window per sanitization batch.
- For server-side diagram rendering, prefer rendering to SVG in a sandboxed child process (using the existing
plugin-worker-manager.tspattern) rather than in the main server process. The child process's memory is fully released on exit. - Pin JSDOM to version 20+ — version 19 has known attack vectors that allow XSS even with DOMPurify correctly applied.
Warning signs:
- Server heap growing steadily during diagram render load testing
- Using
happy-domas the DOMPurify DOM provider - JSDOM version < 20 in
package.json - No DOM cleanup between sanitization calls in a long-running process
Phase to address: Phase 3 (Mermaid diagram generation) — establish the server-side sanitization pattern with memory management before rendering is enabled in production.
Pitfall 51: HSL-Based Color Palette Generation Producing Perceptually Incoherent Themes
What goes wrong: A theme generator takes a brand color and generates a full palette by rotating hue in HSL space (e.g., complementary colors at +180°, triadic at +120°/+240°, tints by varying L). The generated palette looks visually unbalanced: some colors appear much brighter or darker than others even though their HSL lightness values are identical. A blue at L=50% looks significantly darker than a yellow at L=50%.
WCAG contrast calculations on these palettes pass numerically but the palette feels wrong to human designers, leading to rejection of the feature.
Why it happens: HSL is not perceptually uniform. Equal numeric steps in HSL lightness do not correspond to equal perceived brightness changes. This is a well-known limitation documented by the CSS working group. Tailwind CSS 4.0 moved away from HSL to OKLCH for exactly this reason.
How to avoid:
- Use OKLCH (OKLab with cylindrical coordinates) for all palette generation operations. OKLCH is available via the
culorinpm library which is zero-dependency and TypeScript-native. - Generate tints/shades by varying L in OKLCH space (perceptually uniform lightness), not in HSL.
- Generate complementary/analogous colors by rotating H in OKLCH space.
- Convert to HEX/RGB for output and storage — OKLCH is the computation space, not the output format.
- Do not use HSL as an intermediate — go HEX input → OKLCH computation → HEX output.
Warning signs:
- Using
hsl(),chroma-jswith HSL operations, or manual(h + 180) % 360hue rotation - Palette colors appearing visually unbalanced (some look brighter/darker than intended)
- Design review rejecting AI-generated palettes as "off"
Phase to address: Phase 4 (Theme and palette generator) — color space selection is a foundation decision; switching after palette logic is built requires rewriting all generation functions.
Pitfall 52: WCAG Contrast Ratio Computed on sRGB Without Linearization
What goes wrong: WCAG 2.x contrast ratio requires computing relative luminance from sRGB values. The correct computation linearizes the 8-bit channel value: values ≤ 0.04045 divide by 12.92; values > 0.04045 apply ((v + 0.055) / 1.055) ^ 2.4. Developers frequently skip the linearization step and compute luminance directly from the 0–255 byte values, producing incorrect contrast ratios. A pair that calculates as "passing WCAG AA (4.5:1)" may actually fail when correctly computed.
A secondary mistake: the WCAG 2.x specification itself uses 0.03928 as the threshold (instead of the correct sRGB standard 0.04045). For 8-bit values, the difference affects one channel value (decimal 10 maps differently). Using 0.03928 produces incorrect results for that specific edge case.
Why it happens: The WCAG spec formula is copy-pasted from W3C documentation which contains the erroneous 0.03928 threshold. Most online "WCAG contrast calculators" also use the incorrect threshold, reinforcing the mistake.
How to avoid:
- Use
culori's built-in WCAG functions (wcagContrast(),wcagLuminance()) which implement the correct linearization. - If implementing manually, use threshold
0.04045(not0.03928) and ensure linearization happens on normalized 0–1 values (not 0–255 integers). - Cross-validate computed ratios against WebAIM Contrast Checker for known color pairs during development.
- For the upcoming WCAG 3.0 / APCA standard: note that APCA uses different weights (0.2126729, 0.7151522, 0.0721750) and a polarity-sensitive formula. Use
@colour-contrast/apcaif APCA compliance is needed.
Warning signs:
- Contrast ratio formula not including the linearization conditional branch
- Using raw 0–255 integer values in luminance calculation (missing
/255normalization) - Threshold of
0.03928in the linearization formula - No cross-validation against known-good reference calculator
Phase to address: Phase 4 (Theme generator) — validated in the WCAG AA export check step before theme output is presented to the user.
Pitfall 53: PDF Generation Chromium Font Loading Failures in Headless Environments
What goes wrong: PDF generation via Puppeteer/Chromium-headless renders HTML to PDF. The generated PDF uses a specific brand font (e.g., Inter, a custom typeface). In development on the Mac Mini, the font is installed system-wide and loads correctly. In the production server process (started via launchctl), the font is not in the headless Chromium font search path. The PDF renders with a fallback system font, producing different page layouts and line breaks than the designed template — tables overflow, headings reflow, and the PDF looks broken.
Why it happens: Headless Chromium uses its own font resolution paths, not the macOS font manager. User-installed fonts in ~/Library/Fonts are not accessible to headless Chromium without explicit configuration. The failure is environment-dependent and invisible in development.
How to avoid:
- Bundle all fonts used in PDF templates as static assets in the Nexus codebase (e.g.,
packages/pdf-templates/fonts/). Self-host them via the Express static server. - Reference fonts in PDF templates using
@font-facewith explicitsrc: url('http://localhost:PORT/fonts/...')— absolute localhost URLs, not relative paths. - In the Puppeteer page setup, call
page.waitForNetworkIdle()after navigation to ensure fonts are loaded before callingpage.pdf(). - Add a font smoke test: render a one-page PDF at startup and verify the font name embedded in the PDF metadata matches the expected font.
Warning signs:
- PDF layout differs between
pnpm devand production server - System fonts used instead of brand fonts in generated PDFs
@font-facewith relative URLs (./fonts/Inter.woff2) in PDF templates- No
waitForNetworkIdle()beforepage.pdf()call
Phase to address: Phase 5 (PDF document generation) — font strategy must be defined before any PDF template is considered complete.
Pitfall 54: Puppeteer Instance Not Reused Across PDF Render Requests
What goes wrong: Each PDF render request calls puppeteer.launch() to create a new browser instance, renders the page, and calls browser.close(). Launching a Chromium instance takes 0.5–2 seconds. For a feature that generates PDFs on demand (invoice on task completion, report at end of sprint), this adds significant latency to each render. At 3 concurrent PDF requests, 3 Chromium instances start simultaneously — using ~800MB RAM and 3 full startup sequences.
Why it happens: The code examples in Puppeteer documentation show launch() → newPage() → close() as the simple unit. Reuse is an optimization not shown in introductory examples.
How to avoid:
- Maintain a single persistent Puppeteer browser instance at the server level (similar to the Piper TTS persistent process pattern from v1.6).
- Use
browser.newPage()per render request andpage.close()when done — do not close the browser between requests. - Add a health check: if the browser crashes, restart it automatically (the same backoff pattern used in
plugin-worker-manager.ts). - Limit concurrent PDF pages to 2–3 via a semaphore to prevent RAM exhaustion.
Warning signs:
puppeteer.launch()inside the route handler or per-request function- High memory and CPU spikes on PDF requests visible in Activity Monitor
- PDF generation latency >3 seconds for a simple one-page document
- No browser lifecycle management (launch once, keep alive)
Phase to address: Phase 5 (PDF generation) — establish browser lifecycle pattern before any PDF template work begins.
Pitfall 55: Remotion Video File Not Streamable Before Full Render Completes
What goes wrong: Remotion's renderMedia() produces a video file only after the entire render is complete. For a 2-minute pitch deck video, this takes 3–10 minutes on the Mac Mini M4. During rendering, the user sees no progress indicator and cannot access even the first few seconds of the video. If the render fails at frame 450 of 3600, all progress is lost with no partial recovery.
A secondary issue: the rendered video is written to a temp file by default. If the server process crashes or is restarted during a long render, the temp file is orphaned with no manifest record.
Why it happens: Remotion's architecture renders all frames, then encodes. There is no streaming output during rendering. Progress is available via the onProgress callback but developers often don't wire it up.
How to avoid:
- Always use the
onProgresscallback inrenderMedia()to emit render progress via SSE to the UI. The existinglive-events-ws.tsrealtime layer can carry these events. - Write the output to a deterministic path based on a render job ID (not a temp path):
storage/generated/{jobId}/output.mp4. Create the manifest record before render starts, not after. - Implement a render job table in the DB (or a simple in-memory map for the single-user case) with states:
queued → rendering → done → failed. Store frame progress in the record. - For failed renders, keep the manifest record with
status: "failed"and the error message. Do not silently discard.
Warning signs:
renderMedia()called withoutonProgresscallback- Output path using
tmpdir()or random temp file - No manifest record created before render starts
- UI shows no progress during render (user cannot tell if server is working)
Phase to address: Phase 1 (Remotion integration) — progress reporting and job lifecycle management must be designed before any rendering is implemented.
Pitfall 56: Social Media Image Dimensions and MIME Type Constraints Ignored
What goes wrong: A "social media post" generator produces a 1200×628 OG image and outputs it as PNG. The Instagram API rejects it: Instagram accepts JPEG only, not PNG, for feed posts. Twitter/X accepts up to 5MB for photos but the three-step media upload flow (INIT → APPEND chunks → FINALIZE) is required for anything over ~1MB — a direct upload fails. The 2025 Instagram rate limit reduction from 5,000 to 200 API calls/hour was unannounced and broke production apps; the generator does not account for this and hammers the API during batch generation.
Why it happens: Platform-specific requirements are scattered across documentation pages that are updated without notice. Developers test with a single post and discover constraints only when attempting bulk generation or hitting edge cases in image format.
How to avoid:
- Encode platform constraints as explicit data structures in the skill:
const PLATFORM_SPECS = { instagram: { format: 'jpeg', maxBytes: 8_388_608, dimensions: { feed: [1080, 1080], story: [1080, 1920] } }, twitter: { format: 'jpeg_or_png', maxBytes: 5_242_880, useChunkedUpload: true }, linkedin: { format: 'jpeg_or_png', maxBytes: 10_485_760 }, } - Convert all output images to JPEG at the generation step for cross-platform compatibility.
- Implement a rate-limit-aware upload queue with per-platform buckets. For Instagram: max 100 API publishes per 24-hour rolling window, 200 API calls per hour.
- For Twitter/X: always use the chunked upload flow (INIT+APPEND+FINALIZE) regardless of file size — it is more reliable than the simple upload endpoint.
Warning signs:
- Generating PNG images for Instagram posting
- Simple single-request media upload (not chunked) for Twitter/X
- No rate limit tracking between API calls
- Platform spec constants hardcoded as magic numbers scattered through posting code
Phase to address: Phase 6 (Social media content generation) — platform specs table must be defined as the first step, before any image generation or posting code is written.
Pitfall 57: Content Skills Bypassing Plugin Capability Enforcement
What goes wrong: A content generation skill is implemented as a Paperclip plugin. During development, the plugin worker directly calls internal server routes (e.g., fetch('http://localhost:PORT/api/companies/...')) or imports server-side modules (import { storageService } from '../../server/src/services/storage'). This works in development but violates the plugin isolation contract: plugins must only communicate with the host via the JSON-RPC bridge defined in the plugin SDK. Direct HTTP calls bypass capability checks and audit logging.
A related mistake: the plugin stores generated file bytes in ctx.state (the plugin key-value state store). ctx.state uses the plugin_state DB table and is designed for small JSON blobs (configuration, counters, IDs). A 50MB video stored in ctx.state as a base64 string will cause severe DB performance degradation and hits PostgreSQL row size limits.
Why it happens: The host-side storage service is accessible from the same process. Developers shortcut the plugin boundary during rapid prototyping. ctx.state feels like the obvious place to persist plugin data.
How to avoid:
- Content skills must use
ctx.host.storage.*RPC methods (when these are added to the plugin SDK for v1.7) to store generated files — never direct HTTP or module imports. ctx.stateis for metadata only: store the asset'sobjectKey,contentType,byteSize, andgenerationParamsas JSON. Never store binary content in state.- Add a lint rule or TS path alias that prevents
@paperclipai/plugin-sdkpackages from importing from../../server/. - Review the plugin manifest
capabilitiesarray before each phase: a content skill generating PDFs needsplugin.storage.writebut does not needplugin.agents.read.
Warning signs:
fetch()calls tohttp://localhostinside plugin worker codeimportstatements in plugin code referencing../../server/paths- Binary content or large strings stored in
ctx.state - Plugin manifest with overly broad capabilities (
*or all capabilities listed)
Phase to address: Phase 2 (Content skills architecture) — plugin boundary rules must be defined before any content skill implementation begins.
Pitfall 58: Image Generation Model Loaded Per Request Without VRAM Management
What goes wrong: A local image generation endpoint loads the SDXL or Flux model on each request: model = load_model('flux-dev'). On the Mac Mini M4 (18–32GB unified memory), loading a 12GB model takes 8–15 seconds and allocates most available memory. If a second image request arrives during model loading, the second load attempt fails or causes memory exhaustion. When the request completes, the model is garbage-collected — only to be reloaded for the next request.
Why it happens: Stateless request handler pattern (load → infer → unload) is the natural first implementation. VRAM/unified memory management is not visible at the application layer.
How to avoid:
- Load the image generation model once at startup (or on first use, then keep in memory). Never reload per request.
- Use a semaphore to ensure only one inference runs at a time on the M4 — Apple Silicon unified memory does not support concurrent model instances efficiently.
- For the M4's unified memory architecture: the model's memory is shared with system RAM. Monitor memory pressure via
os.totalmem()/os.freemem()and emit a warning if free memory falls below 4GB before starting inference. - If multiple model sizes are available, load the smallest acceptable model by default. Allow the user to select a higher quality model explicitly (with a warning about inference time).
- Implement a simple LRU model cache: if two different models are needed (e.g., icon generation uses a different model than photo generation), keep the most recently used loaded and unload the least recently used when switching.
Warning signs:
- Model loading call inside the request handler function
- No semaphore or mutex around inference
- Memory exhaustion errors on concurrent image generation requests
- Model reload happening on every request (check logs for "Loading model..." appearing multiple times)
Phase to address: Phase 7 (Local image generation) — model lifecycle management must be established before any inference endpoint is exposed.
Pitfall 59: Mermaid Server-Side Rendering Requiring Full DOM in Node.js
What goes wrong: mermaid.render() requires a browser DOM environment. In Node.js server-side rendering (for SVG-to-PNG conversion or PDF embedding), calling mermaid.render() in the main Node.js process throws "document is not defined". The common workaround — using JSDOM — requires additional setup and has known limitations with Mermaid's SVG rendering (complex diagrams with foreignObject elements may not render correctly).
An alternative approach (spawning a headless Chromium page via Puppeteer to render Mermaid client-side, then extracting the SVG) adds Chromium as a dependency for what should be a lightweight diagram operation, and reintroduces the Puppeteer lifecycle pitfalls.
Why it happens: Mermaid is designed as a browser library. Its server-side story is underdeveloped — the GitHub issue tracking server-side SVG rendering with JSDOM has been open since 2023 with no complete resolution.
How to avoid:
- Use the
@mermaid-js/mermaid-zenumlpattern withsvgdom(not JSDOM) for server-side Mermaid rendering —svgdomis purpose-built for SVG rendering in Node.js and produces more accurate output for Mermaid. - Alternatively, use the
mmdcCLI (@mermaid-js/mermaid-cli) as a child process:mmdc -i input.mmd -o output.svg. This uses Puppeteer internally but encapsulates the DOM requirement. Reuse the Puppeteer instance from the PDF generator to avoid double-launching Chromium. - For the Nexus use case (agent generates diagram description → Mermaid → embedded in PDF or displayed in UI): render server-side for PDF embedding, render client-side (in the browser) for UI display. These are two separate code paths.
- Cache rendered SVGs by Mermaid source hash — the same diagram definition always produces the same SVG.
Warning signs:
- Calling
mermaid.render()in Node.js without DOM setup - JSDOM used for Mermaid rendering (prone to foreignObject failures)
- Separate Chromium launch just for Mermaid (missed opportunity to reuse PDF's browser instance)
- No SVG cache — same diagram re-rendered on every page load
Phase to address: Phase 3 (Mermaid diagram generation) — server-side rendering approach must be validated before the diagram feature is integrated with PDF generation.
Pitfall 60: Agent Heartbeat Timeout Too Short for Long-Running Content Renders
What goes wrong: The Paperclip agent heartbeat model is designed for short execution windows. An agent checks out a task, starts a video render job (3–10 minutes), and the heartbeat timeout fires before the render completes. The heartbeat process exits; the render continues as an orphan child process. The task remains in_progress indefinitely. The next heartbeat re-checks the task and either starts a second render (wasting resources and producing duplicates) or reports as blocked.
Why it happens: The heartbeat model assumes agent work completes within a few minutes. Content generation tasks (video rendering, batch image generation, document compilation) violate this assumption. The existing patterns for long-running operations (e.g., git worktree operations) use a different lifecycle model.
How to avoid:
- Content generation tasks must use an async fire-and-forget pattern: the agent heartbeat starts the job, writes the job ID to the task's document, sets status to
in_progress, and exits. A separate polling routine (using Paperclip's cron/routines feature) checks job status and updates the task todonewhen the render completes. - Alternatively, use the execution workspace's
workspace-operations.tslong-running operation pattern for renders — this is already designed for multi-minute operations. - Never await a render inside a heartbeat handler. Use
renderMedia({ ...options }).then(onComplete).catch(onError)with the completion callbacks posting a comment to the issue and updating status. - Add a job ID to the task comment immediately after starting the render: "Render started. Job ID:
{jobId}. Expected completion: ~5 minutes."
Warning signs:
await renderMedia(...)inside a heartbeat route handler- Heartbeat timeout shorter than the expected render time
- Orphaned render processes after heartbeat exits (check
ps aux | grep remotion) - Tasks stuck in
in_progressafter render completes
Phase to address: Phase 1 (Remotion integration) — async job model must be designed before the first render is attempted through the agent interface.
Pitfall 61: Placeholder Assets Without DRAFT Watermark Mistaken for Final Output
What goes wrong: An agent generates a placeholder for a video (a static slide with "DRAFT" intent) while the real render is queued. The placeholder is stored and linked to the task. A user reviews the task output, sees a static image, and marks the task as approved — not realizing it is a placeholder pending a full render. The actual video is never triggered because the task is now done.
Why it happens: Placeholder assets look similar to real output in the task file list. Without a clear visual indicator and a machine-readable flag, humans and agents alike cannot distinguish "this is final" from "this is a placeholder for X".
How to avoid:
- Store a
isDraft: trueflag in the asset manifest for all placeholder assets. Include this flag in the API response for asset listings. - Render a visible "DRAFT" overlay directly into placeholder images/videos — not just in the filename. Use
sharpto composite a semi-transparent "DRAFT" watermark on generated placeholder images. - In the UI asset list, show a distinct badge (yellow "DRAFT" tag) for assets with
isDraft: true. - The agent that queued a render should not mark the parent task as
doneuntil the render completes and theisDraftflag is cleared. Use the job polling routine (Pitfall 60) to trigger the status update.
Warning signs:
- Placeholder assets stored without a
isDraftorstatusfield - UI showing placeholder and final assets identically
- Tasks marked
donewhile the render job is stillqueuedorrendering - No visual DRAFT indicator in placeholder file content
Phase to address: Phase 2 (Placeholder assets and manifest tracking) — the DRAFT flag and visual indicator must be in place before any placeholder is stored.
Pitfall 62: Theme Export Using HEX Values That Lose Color Space Information
What goes wrong: A theme generator computes colors in OKLCH (perceptually uniform), validates WCAG contrast ratios, and produces a beautiful palette. It then exports the theme as a set of HEX values. Downstream consumers (CSS custom properties, design system tokens, Tailwind config) receive the HEX values and regenerate their own tints/shades — using HSL, because that is what most tools default to. The palette is immediately corrupted by the round-trip through the wrong color space.
Why it happens: HEX is the universal color exchange format. The perceptual uniformity of OKLCH is lost when values are converted to HEX and then re-processed by tools that use HSL.
How to avoid:
- Export theme tokens in multiple formats simultaneously: HEX (for compatibility), OKLCH (for tools that support it), and CSS custom properties with
oklch()syntax. - For Tailwind config export: Tailwind 4 supports OKLCH natively in the config — export
oklch(L% C H)strings directly. - For CSS variable exports:
--color-primary: oklch(0.65 0.15 250);— modern browsers support this. - Mark the HEX export as "sRGB approximate" in export metadata so consumers know it is lossy.
- Store the OKLCH source values in the theme manifest, not just HEX. The HEX representation is derived output.
Warning signs:
- Theme manifest storing only HEX values
- No OKLCH export format in the theme exporter
- Downstream tools re-deriving tints from the exported HEX using HSL
- Palette looking "off" after importing into Figma or Tailwind config
Phase to address: Phase 4 (Theme generator) — export format design must include OKLCH from the start. Retrofitting after the exporter is built requires changes to all downstream consumers.
Pitfall 63: pnpm-lock.yaml Merge Conflicts When Adding Remotion to Monorepo
What goes wrong: Remotion pulls in a large dependency tree: Webpack, Chromium binaries (via @remotion/renderer), React (specific version), and multiple @remotion/* sub-packages. Adding these to the monorepo's pnpm-lock.yaml produces a large lockfile diff. The next upstream rebase (git rebase upstream/master) that also touches pnpm-lock.yaml produces a conflicted lockfile that cannot be auto-merged. The manual merge is error-prone — resolving lockfile conflicts incorrectly causes pnpm install to fail with dependency resolution errors.
Why it happens: The Nexus fork performs periodic rebases onto upstream Paperclip. Both branches add/update dependencies and produce lockfile diffs. Lockfile merge conflicts in pnpm are notoriously difficult because a single dependency change can cascade across hundreds of lockfile lines.
How to avoid:
- Add all Remotion dependencies in a single commit immediately after an upstream rebase (while the lockfile is clean). This minimizes the conflict surface for the next rebase.
- For Remotion's Chromium binary (
@remotion/renderer): add it as a devDependency of a dedicatedpackages/remotion-renderer/package, isolated from the rest of the monorepo. This limits the lockfile impact to one sub-package. - On lockfile conflicts: do not attempt to manually merge. Run
pnpm install --no-frozen-lockfileafter resolvingpackage.jsonconflicts — pnpm regenerates the lockfile automatically. - After each upstream rebase, run
pnpm buildandpnpm testto verify the lockfile regeneration did not introduce version regressions.
Warning signs:
- Remotion dependencies added to the root
package.json(adds to every workspace's resolution) - Lockfile conflict during rebase with hundreds of conflicted lines
- Attempting to manually edit
pnpm-lock.yamlto resolve conflicts
Phase to address: Phase 1 (Remotion integration) — dependency isolation strategy must be decided before installing Remotion packages.
Pitfall 64: SVG Icon Generation Producing Non-Sanitized Output Used in dangerouslySetInnerHTML
What goes wrong: An AI generates SVG markup for an icon (e.g., "generate a minimalist camera icon in SVG"). The generated SVG is stored as a string and rendered in React using dangerouslySetInnerHTML={{ __html: svgContent }}. A malicious or hallucinated SVG could contain <script> tags, onclick attributes, or <use xlink:href="..."> references to external resources — causing XSS or data exfiltration.
Why it happens: SVG is XML with embedded scripting capability. AI-generated SVG is treated as trusted content because it originated from the system, not from a user. The trust boundary between "system-generated" and "safe" is incorrectly equated.
How to avoid:
- All SVG content — regardless of source — must be sanitized before rendering. Use DOMPurify with SVG-specific config:
FORCE_BODY: true, USE_PROFILES: { svg: true, svgFilters: true }. - For icon SVGs specifically: after sanitization, optimize with
svgoto remove metadata, comments, and non-display elements. This also removes any scripting artifacts the sanitizer missed. - Use
<img src="data:image/svg+xml;base64,...">for displaying AI-generated icons rather than inline SVG. This prevents script execution entirely — the SVG is rendered as an image, not as DOM. - Validate that the output is actually an SVG: check for
<svgroot element, valid namespace, and reasonable file size before storing.
Warning signs:
dangerouslySetInnerHTMLused to render AI-generated SVG content- No sanitization step between AI output and SVG storage
- SVG stored and served without Content-Security-Policy headers preventing script execution
- No file size or structure validation on generated SVG
Phase to address: Phase 8 (Icon generation) — sanitization pipeline must be in place before any generated SVG reaches the DOM.
Pitfall 65: Branding Media Kit Generation Treating All Assets as a Single Atomic Operation
What goes wrong: A branding media kit requires: logo (SVG), color palette, typography recommendation, banner images (5 sizes), social media templates (6 platforms), PDF one-pager, and icon set (24 icons). Implemented as a single agent task, the generation takes 15–45 minutes. If any single component fails (e.g., the PDF renderer crashes at step 7), the entire kit generation is abandoned with no partial output.
Why it happens: "Generate a brand kit" is naturally conceived as one task. The atomic approach matches how a human designer might present the deliverable — as a complete package. The failure mode only becomes apparent when the first long-running attempt is interrupted.
How to avoid:
- Decompose the brand kit into a parent task with sub-tasks per asset type, using Paperclip's existing
parentId+goalIdsub-task pattern. - Each sub-task (logo generation, palette, PDFs, banners) runs independently and stores its output before the next sub-task begins.
- The parent task aggregates completed sub-task outputs into a final ZIP/manifest. It only moves to
donewhen all sub-tasks complete. - If a sub-task fails, it enters
blockedstate with an error comment — the other sub-tasks continue. The user sees partial progress rather than total failure. - Use placeholder assets (Pitfall 61) for each sub-task to signal "this component is queued."
Warning signs:
- All brand kit generation in a single agent run
- No sub-task decomposition in the agent's plan
- All-or-nothing completion: either full kit or nothing stored
- No intermediate progress visible in the UI during kit generation
Phase to address: Phase 9 (Branding media kit) — task decomposition design must be specified before implementation. This is an agent orchestration design decision, not just a code change.
Pitfall 66: Generated Assets Not Linked to Their Originating Task for Garbage Collection
What goes wrong: Content generation produces files: videos, PDFs, images, SVGs. These files accumulate in the storage directory. When a task is deleted or cancelled, its generated assets remain on disk because no relationship between the task and the generated files was established. Over months, the storage directory fills with orphaned files from cancelled, superseded, or test renders.
For a single-user deployment on a Mac Mini, disk space is finite. A few hundred video renders can consume 50–200GB of disk without the user being aware.
Why it happens: Generating the file and storing it is the primary flow. Cleanup is deferred as "we'll add it later." The relationship between task and asset is informal (mentioned in task comments) rather than machine-readable.
How to avoid:
- Every generated asset must be stored with a
sourceTaskId(issue ID) andsourceRunIdin its manifest record. This is a hard requirement, not an optional field. - When a task is deleted or moved to
cancelled, a cleanup job queries all assets with thatsourceTaskIdand queues them for deletion. - Add a storage usage dashboard visible in the Nexus admin UI: total storage used, per-type breakdown (video, PDF, image), largest files.
- Set retention policies per content type: generated draft videos expire after 7 days unless explicitly pinned; final approved assets are retained indefinitely.
- The existing
storageServicealready hasdeleteObject— wire it to the task lifecycle.
Warning signs:
- Assets stored with no
sourceTaskIdfield - Storage directory growing unboundedly over weeks
- No delete path in the generated asset manifest
- Task deletion not triggering asset cleanup
Phase to address: Phase 1 (Storage foundations) — the sourceTaskId manifest field must be present from the first generated asset stored, not added retroactively.
Technical Debt Patterns — v1.7
| Shortcut | Immediate Benefit | Long-term Cost | When Acceptable |
|---|---|---|---|
| HEX-only color storage in theme manifest | Simpler, universal format | OKLCH round-trip loss; palette corruption in downstream tools | Never — store OKLCH source values always |
bundle() per render request |
No bundle cache management | 5-minute render startup; server unresponsive under load | Never |
| HSL for palette generation | Familiar API surface | Perceptually incoherent palettes; design rejection | Never — use OKLCH via culori |
Puppeteer launch() per PDF request |
No browser lifecycle management | 2–3s overhead per PDF; RAM spikes | Never for production; OK for CLI one-shot scripts |
| All brand kit in one agent task | Simple orchestration | All-or-nothing failure; no partial recovery | MVP only if kit has <3 components |
ctx.state for generated file storage |
Simplest persistence path | DB row size limits; performance degradation with binary data | Never — use objectKey reference only |
Global MAX_ATTACHMENT_BYTES bump |
Quick fix for video storage | User-uploaded attachment limit also raised; security regression | Never — use separate generated assets namespace |
Integration Gotchas — v1.7
| Integration | Common Mistake | Correct Approach |
|---|---|---|
| Instagram API | PNG images for feed posts | Convert all output to JPEG before posting |
| Instagram API | 5,000 calls/hour assumed (pre-2025 rate) | Use 200 calls/hour budget; implement rate-limit queue |
| Twitter/X media upload | Simple single-request upload | Always use INIT+APPEND+FINALIZE three-step chunked upload |
| Remotion + pnpm | Adding @remotion/renderer to root workspace |
Isolate in packages/remotion-renderer/; avoid lockfile cascade |
| Mermaid server-side | Calling mermaid.render() in Node.js without DOM |
Use svgdom + DOMPurify, or mmdc CLI child process |
| Puppeteer fonts | Relying on system fonts in headless Chromium | Self-host all fonts; reference via localhost URL in templates |
| Paperclip plugin SDK | Direct HTTP calls from plugin worker to host | Use ctx.host.* RPC bridge only |
| WCAG calculation | WCAG 2.x spec's 0.03928 threshold | Use culori's wcagContrast() with correct 0.04045 threshold |
| OKLCH exports | HEX-only export from theme generator | Export HEX + OKLCH + CSS custom properties simultaneously |
Performance Traps — v1.7
| Trap | Symptoms | Prevention | When It Breaks |
|---|---|---|---|
| Bundle-per-render | Render queue backed up; server unresponsive | Cache bundle at startup; renderMedia() only per request |
First concurrent render request |
| Chromium concurrency 100% | Memory pressure; render time 3–10x baseline | Set concurrency: 4 on M4; benchmark with npx remotion benchmark |
Second concurrent render on 16GB machine |
| Model-per-request inference | 15s startup on every image generation call | Keep model in memory; semaphore for single-concurrent inference | First concurrent image generation request |
| JSDOM DOMPurify accumulation | Slow diagram renders after 100+ requests | Periodic JSDOM window cleanup; or child process per sanitization batch | After ~200 diagram renders in one process lifetime |
| Puppeteer launch-per-PDF | 2–3s overhead per PDF; RAM spikes | Persistent browser instance; newPage() per request |
Third concurrent PDF request |
| Unrestricted generated asset storage | Disk full after months of use | Per-type retention policies; sourceTaskId for cleanup |
After ~100 video renders (50–200GB) |
Security Mistakes — v1.7
| Mistake | Risk | Prevention |
|---|---|---|
Mermaid securityLevel: "loose" |
XSS → RCE via AI-generated click directives | Always "strict"; strip %%{init}%% pre-render; DOMPurify post-render |
AI-generated SVG via dangerouslySetInnerHTML |
XSS via script/event injection in SVG | DOMPurify with SVG profile; prefer <img> over inline SVG for AI output |
| JSDOM version < 20 with DOMPurify | XSS bypass via known JSDOM 19 attack vectors | Pin JSDOM ≥ 20 |
| Plugin worker direct HTTP to host API | Capability bypass; audit trail gaps | Enforce JSON-RPC bridge only; no fetch() to localhost in plugins |
| Generated asset served without content-type validation | Browser interprets SVG as executable HTML | Always set explicit Content-Type header from manifest; never infer from file extension |
| Social media API credentials in generated content skill | Token exposure via plugin state leak | Store API credentials in Nexus server config; inject via ctx.host.secrets.* RPC |
"Looks Done But Isn't" Checklist — v1.7
- Remotion render:
onProgresscallback wired to SSE; render job manifest exists before render starts; output path is deterministic (not temp); job status tracked to completion. - Remotion bundle:
bundle()called once at startup, result cached; never called per request; entry point validated at startup. - Mermaid rendering:
securityLevel: "strict"set;%%{init}%%directives stripped; DOMPurify applied to output SVG; JSDOM ≥ 20. - PDF generation: fonts self-hosted via localhost URL; Puppeteer browser instance persistent (not per-request);
waitForNetworkIdle()beforepage.pdf(). - Theme generator: OKLCH used for all computation; WCAG calculation uses
culori.wcagContrast(); export includes OKLCH format alongside HEX. - Color palette:
culorilibrary used (not HSL manipulation); perceptual uniformity validated by visual inspection; OKLCH L/C/H values stored in manifest. - Storage limits: generated assets use separate namespace with raised limits;
MAX_ATTACHMENT_BYTESunchanged;video/mp4only allowed on generated assets route;sourceTaskIdpresent on all generated assets. - Image generation: model loaded once (not per request); inference semaphore in place; memory pressure logged before inference start.
- Social media: platform specs table defined as code; JPEG conversion applied before Instagram posts; chunked upload used for Twitter/X; rate limit queue implemented.
- Content skills: all host communication via JSON-RPC bridge (no
fetch()to localhost);ctx.statecontains only metadata (objectKey, not binary content);capabilitiesarray reviewed and minimal. - SVG icons: DOMPurify + svgo applied to all AI-generated SVG; rendered as
<img>not inline DOM where possible. - Brand kit: decomposed into sub-tasks; each sub-task has its own output manifest; parent task only
donewhen all sub-tasks complete. - Asset lifecycle: all generated assets have
sourceTaskId; task cancellation triggers asset cleanup query. - Placeholder assets:
isDraft: trueflag in manifest; visible DRAFT watermark in file content; UI shows DRAFT badge.
Pitfall-to-Phase Mapping — v1.7
| Pitfall | Prevention Phase | Verification |
|---|---|---|
| bundle() per render (45) | Phase 1 — Remotion foundation | Server logs show "bundle cached" at startup; no Webpack compilation in render request logs |
| Chromium concurrency thrashing (46) | Phase 1 — Remotion foundation | concurrency: 4 in render config; Activity Monitor RAM stays below 12GB during render |
| bundle() in compiled server context (47) | Phase 1 — Remotion foundation | pnpm build && pnpm start with render request succeeds; no path resolution errors |
| 10MB file size limit (48) | Phase 1 — Storage foundations | Store a 50MB test file via generated assets route; HTTP 200 returned |
| Mermaid XSS via securityLevel loose (49) | Phase 3 — Mermaid generation | securityLevel: "strict" in code review; penetration test with %%{init:{"securityLevel":"loose"}}%% input |
| DOMPurify JSDOM memory accumulation (50) | Phase 3 — Mermaid generation | Load test: 200 diagram renders; server heap stays flat |
| HSL palette incoherence (51) | Phase 4 — Theme generator | No HSL in palette generation code; visual review of 10 generated palettes |
| WCAG incorrect linearization (52) | Phase 4 — Theme generator | Cross-validate 5 color pairs against WebAIM checker; results match |
| PDF font loading failures (53) | Phase 5 — PDF generation | Generate PDF via launchd service process; font matches dev environment |
| Puppeteer per-request launch (54) | Phase 5 — PDF generation | Browser process count stays at 1 during 10 concurrent PDF requests |
| No render progress reporting (55) | Phase 1 — Remotion foundation | UI shows progress bar during render; SSE events visible in browser DevTools |
| Social media constraints ignored (56) | Phase 6 — Social media skill | Platform specs table exists as typed constant; Instagram posts JPEG; Twitter uses chunked upload |
| Plugin capability bypass (57) | Phase 2 — Content skills architecture | No fetch('http://localhost') in any plugin worker file (grep check) |
| Image model per-request (58) | Phase 7 — Image generation | "Loading model" log line appears once at startup; semaphore visible in code |
| Mermaid DOM in Node.js (59) | Phase 3 — Mermaid generation | Server-side render test produces valid SVG; no "document is not defined" error |
| Heartbeat timeout for renders (60) | Phase 1 — Remotion foundation | Agent starts render and exits heartbeat; task still in_progress; completion fires via polling routine |
| Placeholder without DRAFT indicator (61) | Phase 2 — Placeholder assets | Placeholder image contains visible DRAFT watermark; manifest has isDraft:true |
| Theme HEX-only export (62) | Phase 4 — Theme generator | Export JSON contains oklch field; CSS export uses oklch() syntax |
| pnpm lockfile merge conflicts (63) | Phase 1 — Remotion foundation | Remotion in isolated sub-package; post-rebase pnpm install succeeds without manual lockfile edit |
| SVG icon XSS (64) | Phase 8 — Icon generation | DOMPurify + svgo applied; icons rendered as <img> not inline SVG |
| Brand kit atomic failure (65) | Phase 9 — Branding media kit | Kit generation uses sub-tasks; partial completion visible if one sub-task fails |
| Generated assets without cleanup (66) | Phase 1 — Storage foundations | All stored assets have sourceTaskId; task deletion query confirms cleanup |
Sources
Codebase analysis (HIGH confidence):
/opt/nexus/server/src/services/ollama.ts— RAM detection usingtotalmem(), catalog lookup/opt/nexus/ui/src/components/NexusOnboardingWizard.tsx— probe auth requirement, adapter detection/opt/nexus/server/src/routes/agents.ts— board-auth gate on probe endpoint/opt/nexus/ui/vite.config.ts— OnboardingWizard Vite alias pattern/opt/nexus/ui/src/components/VoiceRecordButton.tsx— existing Whisper STT implementation/opt/nexus/ui/src/adapters/registry.ts— adapter registration pattern/opt/nexus/server/src/attachment-types.ts— MAX_ATTACHMENT_BYTES=10MB default; DEFAULT_ALLOWED_TYPES excludes video/* (HIGH confidence — direct read)/opt/nexus/server/src/storage/local-disk-provider.ts— local disk storage: no built-in size limits, atomic write via rename (HIGH confidence — direct read)/opt/nexus/server/src/services/plugin-worker-manager.ts— one worker per plugin, crash recovery backoff, 30s RPC timeout (HIGH confidence — direct read)/opt/nexus/server/src/services/plugin-state-store.ts— plugin state is scoped key-value JSON in DB; not designed for binary blobs (HIGH confidence — direct read)
Research (MEDIUM confidence unless noted):
- Puter.js Free Unlimited AI API — Puter is browser-SDK-first; server-side HTTP integration requires manual HTTP calls
- WebGPU/WebGL VRAM Limitations — VRAM not queryable from browser; integrated vs. dedicated GPU reporting issues (HIGH confidence — peer-reviewed)
- Ollama AMD VRAM Detection Bug — confirmed VRAM misreport on AMD/Vulkan
- MCP Tips, Tricks and Pitfalls — Nearform — TypeScript interface vs. type alias; SSE deprecated
- MCP Specification 2025-06-18 — SSE deprecated, Streamable HTTP preferred
- Memory Poison Attack — Palo Alto Unit 42 — persistent memory prompt injection attack vector (HIGH confidence)
- Piper TTS WASM cold start — first-run download, OPFS caching, warmup pattern
- OAuth PKCE SPA Best Practices — Curity — sessionStorage for verifiers, server-side token storage
- AI Agent Memory — Redis — context window overflow, hybrid vector+graph architecture
- faster-whisper audio format and WebM handling — WebM input requires ffmpeg transcode to 16 kHz WAV (MEDIUM confidence)
- Whisper concurrent requests memory issue — each worker loads full model; use semaphore pattern (HIGH confidence — official issue tracker)
- Piper long message truncation bug — messages >500 chars cause silent truncation; chunk input (HIGH confidence — confirmed upstream bug)
- Piper persistent process discussion — subprocess exits after each synthesis; keep alive pattern documented (HIGH confidence)
- Telegram webhooks guide — official — port and SSL requirements; mixing polling/webhook forbidden (HIGH confidence — official docs)
- Telegram voice message OGG/Opus format — OGG Opus 48 kHz; ffmpeg transcode required (MEDIUM confidence)
- Telegram session channel overwrite bug —
sessions_sendoverwriteschannelfield towebchat(HIGH confidence — confirmed bug in similar system) - Silero VAD vs WebRTC VAD comparison — WebRTC VAD 50% TPR vs Silero 87.7% at 5% FPR (HIGH confidence — benchmark data)
- Whisper on Apple Silicon M4 benchmarks — all models process 10s audio faster than real-time on M4 with MLX (MEDIUM confidence)
- MediaRecorder cross-browser format differences — Chrome/Firefox/Safari produce different formats and bitrates (HIGH confidence)
- Node.js child_process binary PATH issue — service environment PATH differs from interactive shell; use absolute paths (HIGH confidence)
- Whisper memory leak — RAM not fully released after transcription in some environments (MEDIUM confidence)
v1.7 Research (MEDIUM confidence unless noted):
- Remotion bundle() anti-pattern — official docs — calling bundle() per render is documented anti-pattern; bundle once, renderMedia() per job (HIGH confidence — official docs)
- Remotion compare-ssr options — custom server requires managing queuing, progress, error handling; CPU-only on self-hosted (HIGH confidence — official docs)
- Remotion concurrency issue #4300 — 100% concurrency with limited Docker CPU causes thrashing; npx remotion benchmark recommended (MEDIUM confidence)
- Remotion Chromium headless memory leak — angle GL backend memory leak in v2.4.3–2.6.6; use swangle (HIGH confidence — official changelog)
- Mermaid XSS via securityLevel:loose — OneUptime advisory — stored XSS via click directive (HIGH confidence — official CVE)
- Mermaid XSS RCE in DeepChat — CVE-2025-67744 — XSS to RCE via Electron IPC (HIGH confidence — official advisory)
- beautiful-mermaid SVG attribute injection — CVE-2026-26226 — SVG attribute injection without %%{init}%% (HIGH confidence — GitLab advisory)
- DOMPurify server-side JSDOM requirements — JSDOM ≥ 20 required; happy-dom unsafe; memory accumulation in long-running processes (HIGH confidence — official repo docs)
- Mermaid server-side SVG rendering issue #6634 — JSDOM limitations for foreignObject; svgdom preferred (MEDIUM confidence)
- OKLCH for palette generation — Evil Martians — HSL perceptual non-uniformity; OKLCH superior for palette generation (HIGH confidence — widely cited)
- Tailwind CSS 4.0 adopts OKLCH — Tailwind 4 uses OKLCH natively (MEDIUM confidence)
- WCAG contrast linearization formula — W3C — 0.04045 is correct threshold; 0.03928 in WCAG 2.x spec is erroneous (HIGH confidence — W3C official wiki)
- Puppeteer PDF font pitfalls — Joyfill — system vs headless font paths differ; self-host fonts; lock versions (MEDIUM confidence)
- Never use Puppeteer for PDFs on server — Medium — resource-intensive; cold start latency; recommend persistent instance (MEDIUM confidence)
- Social media API rules and rate limits 2026 — Postproxy — Instagram 200 calls/hour (down from 5,000); Twitter chunked upload; platform MIME constraints (MEDIUM confidence)
- Instagram PNG rejection / JPEG-only requirement — JPEG only for feed posts confirmed (MEDIUM confidence)
- Paperclip plugin SDK capability model — official spec — all Worker-to-Host calls gated by manifest.capabilities; plugin bundles must not import from host internals (HIGH confidence — official spec)
- pnpm lockfile merge conflicts — pnpm discussion #4324 — large dependency additions produce large lockfile diffs; run pnpm install after resolving package.json conflicts (MEDIUM confidence)
Pitfalls research for: Nexus v1.5 — Smart Onboarding + Personal AI Assistant; v1.6 — Voice Pipeline + Telegram Bridge; v1.7 — Content Generation Layer (Remotion, image gen, Mermaid, PDF, theme gen, social media, content skills, large file storage) Researched: 2026-04-02; Updated: 2026-04-03