nexus/.planning/research/SUMMARY.md
2026-04-04 04:25:21 +00:00

23 KiB
Raw Blame History

Project Research Summary

Project: Nexus v1.7 — Content Generation Layer Domain: AI-driven local content generation (presentations, diagrams, PDFs, themes, social assets, icons) Researched: 2026-04-04 Confidence: MEDIUM-HIGH

Executive Summary

Nexus v1.7 adds a local content generation layer to an existing Paperclip fork running on a Mac Mini M4. The scope is narrow but technically deep: agents produce visual and document deliverables (diagrams, PDFs, videos, color themes, social media assets, icons) entirely on-device, with no cloud API calls. The recommended approach is a pipeline of purpose-built libraries — Remotion for video, Playwright for PDFs, satori+resvg-js for social images, culori for OKLCH-based theme generation, and @mermaid-js/mermaid-cli for server-side diagrams — routed through a shared async job infrastructure built on top of the existing Paperclip assets, publishLiveEvent, and StorageService systems. Every content type is an installable skill, meaning the content layer is additive and does not touch the upstream Paperclip schema.

The single most important architectural decision is the async job pattern. Long-running renders (Remotion video: 310 min, PDF: 15 sec, Mermaid: fast) must return a job ID immediately and push progress via the existing SSE live-events bus. Synchronous HTTP for any render is the primary failure path. The second most important decision is Remotion bundle isolation: the webpack bundler must run once at startup in a dedicated packages/remotion-compositions/ workspace package, never on each render request, and never inside the main Vite/tsc server build context.

The primary risks cluster around three areas: Remotion's CPU/RAM footprint competing with Ollama on the shared M4 machine (mitigated by capping concurrency at 4 and serializing renders with LLM inference); security in the diagram and icon pipeline (Mermaid securityLevel: "loose" has documented XSS-to-RCE exploits; all SVG output, AI-generated or not, must pass DOMPurify before reaching the DOM); and storage growth (video renders accumulate fast on a finite Mac Mini SSD — sourceTaskId linking and per-type retention policies are mandatory from day one, not deferred cleanup).

Key Findings

The v1.7 stack is entirely additive to the v1.6 base (Express, sharp, ffmpeg-static, grammy, mermaid). Seven new library groups cover the new content types. Remotion requires workspace isolation in packages/content-renderer/ due to its webpack bundler conflicting with Vite. Three separate Chromium binaries will be installed (Remotion, mermaid-cli, Playwright) totaling approximately 900MB on the Mac Mini SSD — acceptable, but worth attempting to share via PUPPETEER_EXECUTABLE_PATH.

One package name needs verification before installation: the correct package may be @resvg/resvg-js (v2.6.2, Rust napi-rs) rather than resvg-js (v0.1.97, older version). Confirm before pnpm add.

Core technologies:

  • remotion ^4.0.443 + @remotion/bundler + @remotion/renderer: React-based video/presentation rendering, Mac M4 arm64 confirmed, SSR API works in Node.js without browser UI — isolated in packages/content-renderer/
  • playwright-chromium ^1.50.0: HTML-to-PDF via headless Chromium, 42ms cold start vs Puppeteer's 147ms (2026 macOS arm64 benchmark), TypeScript-native — installed in server/
  • @mermaid-js/mermaid-cli ^11.12.0: Official server-side Mermaid-to-SVG via run() API, same version as mermaid ^11.12.0 already in ui/ — installed in server/
  • satori ^0.26.0 + @resvg/resvg-js ^2.6.2: JSX/CSS-to-SVG-to-PNG without a browser; used by @vercel/og internally; pipeline for OG images, social cards, wallpapers — installed in server/
  • culori ^4.0.2: OKLCH-native color math, correct WCAG contrast calculation (0.04045 threshold, not the erroneous 0.03928 in the W3C spec), 2026 community consensus over chroma-js for design-system work — installed in server/ and ui/
  • @stable-canvas/comfyui-client ^1.5.9: Zero-dependency MIT client for ComfyUI REST/WebSocket API; graceful degradation when ComfyUI not running on localhost:8188 — optional, installed in server/
  • sharp ^0.34.5 (already installed): image compositing, resizing, format conversion — extended for content use, not re-added
  • ffmpeg-static ^5.3.0 (already installed): Remotion detects it automatically via ensureFfmpeg(); no second FFmpeg needed

Expected Features

The FEATURES.md establishes a clear three-tier priority. The critical insight is that the Content Skill System must come first because every other content type depends on it. Satori+Sharp is the single image pipeline for all 2D raster output — do not introduce per-type image libraries.

Must have (table stakes — P1):

  • Download produced file with correct MIME type and Content-Disposition: attachment
  • Preview output before downloading (inline SVG, iframe PDF, Remotion Player, image thumbnail)
  • Generation status feedback via SSE progress: queued → generating → ready → error
  • Structured error recovery with actionable suggestions (e.g., "Run: ollama pull llava")
  • Save output to file system with git versioning and PLACEHOLDERS.md manifest integration
  • Re-generate with revised prompt (store parameters per job)
  • Content type labeled clearly (distinct icon, preview strategy, type registry)

Should have (differentiators — P1/P2):

  • Agent-driven generation from chat (NL → skill routing → file attachment in chat)
  • Content types as installable skills (each generator is a separate skill file, not a monolithic feature)
  • PLACEHOLDERS.md manifest integration (draft flag, prompt_hash, generated_at on every asset)
  • Seed-color-to-full-theme pipeline with WCAG AA enforced (not optional) using OKLCH
  • Diagram from natural language (LLM → Mermaid syntax → server-side SVG)
  • Local-only operation (no data leaves Mac Mini)

Defer to v2+:

  • Branding media kit (high coordination cost; requires all other generators stable first)
  • Batch generation (job queue infrastructure not justified for v1.7)
  • Font embedding in PDF/video (licensing audit required)
  • Auto-publish to social platforms (OAuth token management, platform API complexity)
  • Template marketplace

Architecture Approach

The architecture builds entirely on existing Nexus/Paperclip patterns: factory functions (not classes), StorageService for all blob storage, publishLiveEvent for SSE fan-out, and the assets table for file metadata. The core addition is a content_jobs table tracking async render lifecycle, a renderPipelineService routing jobs to typed RendererAdapter implementations, and a themeEngineService as a pure computation service with no DB dependency. The ARCHITECTURE.md is derived from direct codebase inspection (HIGH confidence) — the patterns are proven.

Content types are implemented as Markdown skill files, not code. Agents read the skill instructions and call POST /api/companies/:id/content-jobs with the appropriate type and params. No new schema is needed for the skill layer.

Major components:

  1. contentJobService — Enqueues async render jobs, emits content.job.started/done/failed live events, tracks lifecycle in content_jobs table; returns 202 Accepted with job ID immediately
  2. renderPipelineService — Strategy dispatch: routes ContentJobType to the correct RendererAdapter; each adapter is independently pluggable behind a shared interface
  3. themeEngineService — Pure OKLCH computation: seed color → palette → WCAG AA validation → CSS/JSON/Tailwind exports; synchronous HTTP, no DB, client-side preview via CSS custom property injection
  4. Renderer adapters (mermaid, svg, pdf, remotion, image) — each isolated behind RendererAdapter interface; binary-dependent adapters in server/src/services/renderers/
  5. packages/content-renderer/ (Remotion workspace package) — Compositions bundled once at startup; renderMedia() called per request against cached bundle path
  6. UI components — ContentJobViewer, DiagramRenderer, ThemePreview, ContentGallery — consume SSE events and existing asset APIs

Critical Pitfalls

The PITFALLS.md has 22 v1.7-specific pitfalls (4566). The highest-severity items:

  1. Remotion bundle() called per render request (Pitfall 45) — Webpack takes 25 min; server becomes unresponsive under load. Prevention: call bundle() once at startup, cache the bundle path, pass only inputProps to renderMedia() per request.

  2. Storage 10MB limit blocks video/large image storage (Pitfall 48) — The existing MAX_ATTACHMENT_BYTES = 10MB and MIME type allowlist reject generated video files. Prevention: separate MAX_GENERATED_ASSET_BYTES constant and generated/ namespace in StorageService; write rendered output directly via putObject, bypassing the upload route entirely.

  3. Mermaid securityLevel: "loose" enabling XSS to RCE (Pitfall 49) — AI-generated Mermaid syntax with click directives executes arbitrary JS. Confirmed exploits in production apps (OneUptime, DeepChat) in 20252026. Prevention: always "strict", strip %%{init}%% and click statements before render, DOMPurify on SVG output.

  4. HSL-based palette generation producing perceptually incoherent themes (Pitfall 51) — Equal HSL lightness steps are not perceptually equal; blue at L=50% appears darker than yellow at L=50%. Prevention: use OKLCH via culori for all generation; never HSL as an intermediate.

  5. Agent heartbeat timeout too short for long renders (Pitfall 60) — A 310 min video render orphans when the heartbeat exits; task stays in_progress indefinitely, or a second render starts. Prevention: fire-and-forget from heartbeat (write job ID to task, exit); a polling routine checks job status and closes the task on completion.

  6. Generated assets not linked to originating task (Pitfall 66) — Orphaned files accumulate on Mac Mini SSD (50200GB over months). Prevention: sourceTaskId is a mandatory field on every generated asset from day one; cleanup job triggers on task deletion.

  7. AI-generated SVG rendered inline without sanitization (Pitfall 64) — XSS via <script> tags or event handlers in AI-generated SVG when set directly as innerHTML. Prevention: DOMPurify with SVG profile on all AI-generated SVG; prefer <img src="data:image/svg+xml;base64,..."> over inline SVG for untrusted content.

Implications for Roadmap

Based on the dependency graph in FEATURES.md and the build order in ARCHITECTURE.md, the natural phase structure has seven phases. The critical path runs: storage/job infrastructure → fast no-binary content types → UI pipeline → browser-dependent generators (PDF, video) → optional ML-dependent features.

Phase 1: Storage and Job Infrastructure

Rationale: Everything else depends on this. The content_jobs table, renderPipelineService stub, storage namespace extension, and the 10MB limit fix (Pitfall 48) must exist before any content type can be built. The sourceTaskId field (Pitfall 66) must be present from the first asset stored. Delivers: content_jobs DB migration, contentJobService, renderPipelineService stub, extended storage namespace, LIVE_EVENT_TYPES for content jobs, API route scaffolding, MAX_GENERATED_ASSET_BYTES constant Addresses: Table stakes (download, status feedback, save to file system, re-generate) Avoids: Pitfall 48 (storage size limit), Pitfall 66 (orphaned assets), Pitfall 45 (bundle-per-render pre-empted by establishing async job model), Pitfall 60 (agent heartbeat — async fire-and-forget designed here)

Phase 2: Fast Content Types (No Binary Dependencies)

Rationale: SVG generation and theme engine are pure TypeScript with no Chromium, Webpack, or binary deps. They validate the end-to-end pipeline (job → render → asset → SSE → UI) at low risk before heavier renderers are added. WCAG contrast correctness (Pitfall 52) and OKLCH color space (Pitfall 51) must be locked here — retrofitting after the theme exporter is built is costly. Delivers: svgGeneratorAdapter (icons, placeholders, banners), themeEngineService (OKLCH, WCAG AA enforcement, CSS/JSON/Tailwind export), placeholder asset system with DRAFT watermark, culori integration Addresses: Theme + palette generator (P1), placeholder asset system (P1), icon generation scaffolding, OKLCH export in multiple formats Avoids: Pitfall 51 (HSL perceptual incoherence), Pitfall 52 (WCAG linearization error), Pitfall 62 (HEX-only export losing OKLCH)

Rationale: Mermaid is the highest-value, lowest-complexity content type. The UI pipeline (ContentJobViewer, DiagramRenderer, ContentGallery) validates the SSE progress flow end-to-end. The Mermaid security config (Pitfall 49) and DOMPurify memory pattern (Pitfall 50) must be established before any diagram renders reach the browser. Delivers: mermaidRendererAdapter (server-side via @mermaid-js/mermaid-cli), ChatMarkdownMessage extension for client-side Mermaid fences, DiagramRenderer component, ThemePreview component, ContentJobViewer, ContentGallery, GeneratedAssetCard, assetService.list() Addresses: Diagram generation (P1), content type preview, inline diagram rendering in chat Avoids: Pitfall 49 (Mermaid XSS/RCE), Pitfall 50 (DOMPurify JSDOM memory accumulation), Pitfall 59 (server-side Mermaid DOM requirement)

Phase 4: Wallpapers and OG Images (Satori Pipeline)

Rationale: The satori+resvg-js+sharp pipeline is pure Node.js (no Chromium) and covers OG images, social headers, and wallpapers in a single code path. Establishes the reusable 2D raster pipeline before PDF and video introduce heavier binary deps. Delivers: Platform-sized image outputs (OG 1200x630, Instagram 1080x1080, desktop wallpaper 2560x1440, etc.), social.ts service, platform dimension registry constant Uses: satori, @resvg/resvg-js, sharp (already installed) Addresses: Wallpapers + OG images (P1), social media content scaffolding (P2) Avoids: Pitfall 56 (platform MIME type and dimension constraints encoded as explicit data structure, not magic numbers)

Phase 5: PDF Document Generation

Rationale: PDF introduces the first Chromium binary via playwright-chromium. Browser lifecycle must be established as a persistent instance (Pitfall 54) before any template work begins. Font self-hosting (Pitfall 53) must be designed before the first PDF template is considered complete. Delivers: pdfRendererAdapter (Playwright persistent browser instance), HTML template PDF (reports, one-pagers), pdf-lib for data-driven invoices, font self-hosting via Express static server, PDF download flow in UI Addresses: PDF generation (P1) Avoids: Pitfall 53 (headless Chromium font loading), Pitfall 54 (Puppeteer launch-per-request overhead)

Phase 6: Video and Presentations (Remotion)

Rationale: Remotion is the highest-complexity and highest-risk content type — webpack bundler conflicts, three Chromium binaries total, M4 concurrency limits, and the agent heartbeat timeout problem. It comes last among P1/P2 features so the async job infrastructure (Phase 1) is fully proven before the longest-running render type is added. Delivers: packages/content-renderer/ workspace package, remotionRendererAdapter (CLI subprocess with cached bundle), video playback UI, onProgress SSE progress events, render queue with concurrency: 4 on M4 Addresses: Remotion presentations + video (P2) Avoids: Pitfall 45 (bundle-per-render), Pitfall 46 (Chromium concurrency thrashing), Pitfall 47 (bundler inside compiled server context), Pitfall 55 (video not streamable — onProgress mandatory), Pitfall 63 (pnpm lockfile conflicts — add Remotion immediately after upstream rebase)

Phase 7: Content as Skills

Rationale: No new code — this phase writes Markdown skill files for each content type in company_skills. It is last because skill instructions reference API contracts finalized in Phases 16. Plugin boundary rules (Pitfall 57) must be enforced before any skill implementation. Delivers: Skill markdown files for diagram, theme, PDF, wallpaper, video content types; agent-callable via existing Skill Aggregator Addresses: Content types as installable skills (differentiator) Avoids: Pitfall 57 (plugin workers bypassing JSON-RPC bridge, using direct HTTP to host API)

Phase Ordering Rationale

  • Phases 1 → 2 → 3 follow the build-order diagram in ARCHITECTURE.md exactly: infrastructure unblocks fast types, fast types validate the pipeline, UI comes after the first adapter works end-to-end.
  • Phase 4 (Satori) precedes Phase 5 (PDF) because Satori has no Chromium dep; PDF introduces the first persistent browser instance that the diagram renderer (Phase 3) can optionally reuse to avoid a second Chromium binary.
  • Phase 6 (Remotion) is last among feature phases because it is CPU/RAM-intensive and its Webpack bundler is a build pipeline risk — isolating it reduces rebase conflict surface.
  • Phase 7 (Skills) is last because skill instructions reference finalized API contracts.

Research Flags

Phases likely needing deeper research during planning:

  • Phase 6 (Remotion): Chromium binary count on the specific Mac Mini M4 config (18GB vs 32GB RAM variant changes concurrency budget); Remotion bundle vs Vite isolation needs validation in the actual monorepo build pipeline; run npx remotion benchmark before finalizing concurrency setting
  • Phase 5 (PDF): Verify whether playwright-chromium and @mermaid-js/mermaid-cli can share a Chromium binary via PUPPETEER_EXECUTABLE_PATH to reduce total to two binaries instead of three
  • Phase 4 (Satori): Verify correct package name: @resvg/resvg-js vs resvg-js — npm shows different versions; confirm before pnpm add

Phases with standard patterns (can proceed without additional research):

  • Phase 1 (Infrastructure): Factory function pattern, content_jobs table schema, and SSE live events pattern are all directly codebase-confirmed — HIGH confidence, no research needed
  • Phase 2 (Theme/SVG): culori OKLCH API is documented and confirmed; WCAG threshold fix is specific and well-understood
  • Phase 3 (Mermaid): Mermaid CLI Node.js run() API confirmed in README; security config is a one-line change with documented correct value
  • Phase 7 (Skills): Skill markdown format is already established in the codebase

Confidence Assessment

Area Confidence Notes
Stack MEDIUM-HIGH Remotion HIGH (official SSR docs confirmed). Playwright PDF benchmark MEDIUM (single benchmark source, pdf4.dev March 2026). resvg-js package name LOW (npm shows two packages — verify). culori MEDIUM (version and WCAG claim confirmed via npm + pkgpulse comparison). ComfyUI client MEDIUM (npm confirmed, Mac M4 support sourced from offlinecreator.com).
Features MEDIUM-HIGH Technology capabilities verified via docs. UX expectations inferred from Canva/Pitch/Mermaid Live comparisons. Skill architecture patterns based on existing Nexus skill system.
Architecture HIGH Derived entirely from direct codebase inspection of /opt/nexus/ on 2026-04-04. Factory function patterns, StorageService interface, live events bus, placeholder service, and asset service all confirmed by reading source files.
Pitfalls HIGH Critical pitfalls verified via multiple sources: Mermaid XSS confirmed via production exploit reports (OneUptime, DeepChat 20252026); WCAG linearization error confirmed vs W3C spec; HSL perceptual non-uniformity confirmed by Tailwind CSS 4.0 rationale; Remotion bundle timing confirmed via official Remotion SSR docs.

Overall confidence: MEDIUM-HIGH

Gaps to Address

  • resvg-js package name: Run npm info @resvg/resvg-js before pnpm add — npm shows divergent versions between resvg-js (v0.1.97) and @resvg/resvg-js (v2.6.2). Use the scoped package.
  • Chromium binary sharing: Whether PUPPETEER_EXECUTABLE_PATH pointing to Playwright's Chromium satisfies @mermaid-js/mermaid-cli's bundled-puppeteer binary requirement needs a 10-minute test on the Mac Mini before Phase 3 begins — could eliminate one ~300MB download.
  • Remotion Vite isolation: Run pnpm build after adding packages/content-renderer/ to the workspace to verify no Vite/webpack conflicts surface before Phase 6 implementation work begins.
  • ComfyUI availability: Image generation (optional, Phase 7) assumes ComfyUI is already installed. Confirm whether this is in scope for v1.7 or defer to v2 — the install is multi-GB (ComfyUI + Flux.1 model).
  • pdf-lib scope: FEATURES.md recommends both Playwright (design-rich PDFs) and pdf-lib (invoices). Confirm whether pdf-lib is in scope for v1.7 or if all PDF is Playwright-only initially during Phase 5 planning.

Sources

Primary (HIGH confidence)

Secondary (MEDIUM confidence)

Tertiary (LOW confidence — needs validation during implementation)

  • offlinecreator.com — ComfyUI Mac M4 2026 — ComfyUI Metal/MPS support on M4
  • Mermaid XSS via securityLevel: "loose" — referenced via exploit reports for OneUptime and DeepChat; the attack vector is documented in the Mermaid changelog and security advisories; specific CVE numbers not cited

Research completed: 2026-04-04 Ready for roadmap: yes