nexus/.planning/research/SUMMARY.md
2026-04-04 04:25:21 +00:00

204 lines
23 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Project Research Summary
**Project:** Nexus v1.7 — Content Generation Layer
**Domain:** AI-driven local content generation (presentations, diagrams, PDFs, themes, social assets, icons)
**Researched:** 2026-04-04
**Confidence:** MEDIUM-HIGH
## Executive Summary
Nexus v1.7 adds a local content generation layer to an existing Paperclip fork running on a Mac Mini M4. The scope is narrow but technically deep: agents produce visual and document deliverables (diagrams, PDFs, videos, color themes, social media assets, icons) entirely on-device, with no cloud API calls. The recommended approach is a pipeline of purpose-built libraries — Remotion for video, Playwright for PDFs, satori+resvg-js for social images, culori for OKLCH-based theme generation, and `@mermaid-js/mermaid-cli` for server-side diagrams — routed through a shared async job infrastructure built on top of the existing Paperclip `assets`, `publishLiveEvent`, and `StorageService` systems. Every content type is an installable skill, meaning the content layer is additive and does not touch the upstream Paperclip schema.
The single most important architectural decision is the async job pattern. Long-running renders (Remotion video: 310 min, PDF: 15 sec, Mermaid: fast) must return a job ID immediately and push progress via the existing SSE live-events bus. Synchronous HTTP for any render is the primary failure path. The second most important decision is Remotion bundle isolation: the webpack bundler must run once at startup in a dedicated `packages/remotion-compositions/` workspace package, never on each render request, and never inside the main Vite/tsc server build context.
The primary risks cluster around three areas: Remotion's CPU/RAM footprint competing with Ollama on the shared M4 machine (mitigated by capping concurrency at 4 and serializing renders with LLM inference); security in the diagram and icon pipeline (Mermaid `securityLevel: "loose"` has documented XSS-to-RCE exploits; all SVG output, AI-generated or not, must pass DOMPurify before reaching the DOM); and storage growth (video renders accumulate fast on a finite Mac Mini SSD — `sourceTaskId` linking and per-type retention policies are mandatory from day one, not deferred cleanup).
## Key Findings
### Recommended Stack
The v1.7 stack is entirely additive to the v1.6 base (Express, sharp, ffmpeg-static, grammy, mermaid). Seven new library groups cover the new content types. Remotion requires workspace isolation in `packages/content-renderer/` due to its webpack bundler conflicting with Vite. Three separate Chromium binaries will be installed (Remotion, mermaid-cli, Playwright) totaling approximately 900MB on the Mac Mini SSD — acceptable, but worth attempting to share via `PUPPETEER_EXECUTABLE_PATH`.
One package name needs verification before installation: the correct package may be `@resvg/resvg-js` (v2.6.2, Rust napi-rs) rather than `resvg-js` (v0.1.97, older version). Confirm before `pnpm add`.
**Core technologies:**
- `remotion ^4.0.443` + `@remotion/bundler` + `@remotion/renderer`: React-based video/presentation rendering, Mac M4 arm64 confirmed, SSR API works in Node.js without browser UI — isolated in `packages/content-renderer/`
- `playwright-chromium ^1.50.0`: HTML-to-PDF via headless Chromium, 42ms cold start vs Puppeteer's 147ms (2026 macOS arm64 benchmark), TypeScript-native — installed in `server/`
- `@mermaid-js/mermaid-cli ^11.12.0`: Official server-side Mermaid-to-SVG via `run()` API, same version as `mermaid ^11.12.0` already in `ui/` — installed in `server/`
- `satori ^0.26.0` + `@resvg/resvg-js ^2.6.2`: JSX/CSS-to-SVG-to-PNG without a browser; used by `@vercel/og` internally; pipeline for OG images, social cards, wallpapers — installed in `server/`
- `culori ^4.0.2`: OKLCH-native color math, correct WCAG contrast calculation (0.04045 threshold, not the erroneous 0.03928 in the W3C spec), 2026 community consensus over chroma-js for design-system work — installed in `server/` and `ui/`
- `@stable-canvas/comfyui-client ^1.5.9`: Zero-dependency MIT client for ComfyUI REST/WebSocket API; graceful degradation when ComfyUI not running on `localhost:8188` — optional, installed in `server/`
- `sharp ^0.34.5` (already installed): image compositing, resizing, format conversion — extended for content use, not re-added
- `ffmpeg-static ^5.3.0` (already installed): Remotion detects it automatically via `ensureFfmpeg()`; no second FFmpeg needed
### Expected Features
The FEATURES.md establishes a clear three-tier priority. The critical insight is that the Content Skill System must come first because every other content type depends on it. Satori+Sharp is the single image pipeline for all 2D raster output — do not introduce per-type image libraries.
**Must have (table stakes — P1):**
- Download produced file with correct MIME type and `Content-Disposition: attachment`
- Preview output before downloading (inline SVG, iframe PDF, Remotion Player, image thumbnail)
- Generation status feedback via SSE progress: `queued → generating → ready → error`
- Structured error recovery with actionable suggestions (e.g., "Run: ollama pull llava")
- Save output to file system with git versioning and PLACEHOLDERS.md manifest integration
- Re-generate with revised prompt (store parameters per job)
- Content type labeled clearly (distinct icon, preview strategy, type registry)
**Should have (differentiators — P1/P2):**
- Agent-driven generation from chat (NL → skill routing → file attachment in chat)
- Content types as installable skills (each generator is a separate skill file, not a monolithic feature)
- PLACEHOLDERS.md manifest integration (draft flag, `prompt_hash`, `generated_at` on every asset)
- Seed-color-to-full-theme pipeline with WCAG AA enforced (not optional) using OKLCH
- Diagram from natural language (LLM → Mermaid syntax → server-side SVG)
- Local-only operation (no data leaves Mac Mini)
**Defer to v2+:**
- Branding media kit (high coordination cost; requires all other generators stable first)
- Batch generation (job queue infrastructure not justified for v1.7)
- Font embedding in PDF/video (licensing audit required)
- Auto-publish to social platforms (OAuth token management, platform API complexity)
- Template marketplace
### Architecture Approach
The architecture builds entirely on existing Nexus/Paperclip patterns: factory functions (not classes), `StorageService` for all blob storage, `publishLiveEvent` for SSE fan-out, and the `assets` table for file metadata. The core addition is a `content_jobs` table tracking async render lifecycle, a `renderPipelineService` routing jobs to typed `RendererAdapter` implementations, and a `themeEngineService` as a pure computation service with no DB dependency. The ARCHITECTURE.md is derived from direct codebase inspection (HIGH confidence) — the patterns are proven.
Content types are implemented as Markdown skill files, not code. Agents read the skill instructions and call `POST /api/companies/:id/content-jobs` with the appropriate `type` and `params`. No new schema is needed for the skill layer.
**Major components:**
1. `contentJobService` — Enqueues async render jobs, emits `content.job.started/done/failed` live events, tracks lifecycle in `content_jobs` table; returns `202 Accepted` with job ID immediately
2. `renderPipelineService` — Strategy dispatch: routes `ContentJobType` to the correct `RendererAdapter`; each adapter is independently pluggable behind a shared interface
3. `themeEngineService` — Pure OKLCH computation: seed color → palette → WCAG AA validation → CSS/JSON/Tailwind exports; synchronous HTTP, no DB, client-side preview via CSS custom property injection
4. Renderer adapters (mermaid, svg, pdf, remotion, image) — each isolated behind `RendererAdapter` interface; binary-dependent adapters in `server/src/services/renderers/`
5. `packages/content-renderer/` (Remotion workspace package) — Compositions bundled once at startup; `renderMedia()` called per request against cached bundle path
6. UI components — `ContentJobViewer`, `DiagramRenderer`, `ThemePreview`, `ContentGallery` — consume SSE events and existing asset APIs
### Critical Pitfalls
The PITFALLS.md has 22 v1.7-specific pitfalls (4566). The highest-severity items:
1. **Remotion `bundle()` called per render request** (Pitfall 45) — Webpack takes 25 min; server becomes unresponsive under load. Prevention: call `bundle()` once at startup, cache the bundle path, pass only `inputProps` to `renderMedia()` per request.
2. **Storage 10MB limit blocks video/large image storage** (Pitfall 48) — The existing `MAX_ATTACHMENT_BYTES = 10MB` and MIME type allowlist reject generated video files. Prevention: separate `MAX_GENERATED_ASSET_BYTES` constant and `generated/` namespace in `StorageService`; write rendered output directly via `putObject`, bypassing the upload route entirely.
3. **Mermaid `securityLevel: "loose"` enabling XSS to RCE** (Pitfall 49) — AI-generated Mermaid syntax with `click` directives executes arbitrary JS. Confirmed exploits in production apps (OneUptime, DeepChat) in 20252026. Prevention: always `"strict"`, strip `%%{init}%%` and `click` statements before render, DOMPurify on SVG output.
4. **HSL-based palette generation producing perceptually incoherent themes** (Pitfall 51) — Equal HSL lightness steps are not perceptually equal; blue at L=50% appears darker than yellow at L=50%. Prevention: use OKLCH via `culori` for all generation; never HSL as an intermediate.
5. **Agent heartbeat timeout too short for long renders** (Pitfall 60) — A 310 min video render orphans when the heartbeat exits; task stays `in_progress` indefinitely, or a second render starts. Prevention: fire-and-forget from heartbeat (write job ID to task, exit); a polling routine checks job status and closes the task on completion.
6. **Generated assets not linked to originating task** (Pitfall 66) — Orphaned files accumulate on Mac Mini SSD (50200GB over months). Prevention: `sourceTaskId` is a mandatory field on every generated asset from day one; cleanup job triggers on task deletion.
7. **AI-generated SVG rendered inline without sanitization** (Pitfall 64) — XSS via `<script>` tags or event handlers in AI-generated SVG when set directly as innerHTML. Prevention: DOMPurify with SVG profile on all AI-generated SVG; prefer `<img src="data:image/svg+xml;base64,...">` over inline SVG for untrusted content.
## Implications for Roadmap
Based on the dependency graph in FEATURES.md and the build order in ARCHITECTURE.md, the natural phase structure has seven phases. The critical path runs: storage/job infrastructure → fast no-binary content types → UI pipeline → browser-dependent generators (PDF, video) → optional ML-dependent features.
### Phase 1: Storage and Job Infrastructure
**Rationale:** Everything else depends on this. The `content_jobs` table, `renderPipelineService` stub, storage namespace extension, and the 10MB limit fix (Pitfall 48) must exist before any content type can be built. The `sourceTaskId` field (Pitfall 66) must be present from the first asset stored.
**Delivers:** `content_jobs` DB migration, `contentJobService`, `renderPipelineService` stub, extended storage namespace, `LIVE_EVENT_TYPES` for content jobs, API route scaffolding, `MAX_GENERATED_ASSET_BYTES` constant
**Addresses:** Table stakes (download, status feedback, save to file system, re-generate)
**Avoids:** Pitfall 48 (storage size limit), Pitfall 66 (orphaned assets), Pitfall 45 (bundle-per-render pre-empted by establishing async job model), Pitfall 60 (agent heartbeat — async fire-and-forget designed here)
### Phase 2: Fast Content Types (No Binary Dependencies)
**Rationale:** SVG generation and theme engine are pure TypeScript with no Chromium, Webpack, or binary deps. They validate the end-to-end pipeline (job → render → asset → SSE → UI) at low risk before heavier renderers are added. WCAG contrast correctness (Pitfall 52) and OKLCH color space (Pitfall 51) must be locked here — retrofitting after the theme exporter is built is costly.
**Delivers:** `svgGeneratorAdapter` (icons, placeholders, banners), `themeEngineService` (OKLCH, WCAG AA enforcement, CSS/JSON/Tailwind export), placeholder asset system with DRAFT watermark, culori integration
**Addresses:** Theme + palette generator (P1), placeholder asset system (P1), icon generation scaffolding, OKLCH export in multiple formats
**Avoids:** Pitfall 51 (HSL perceptual incoherence), Pitfall 52 (WCAG linearization error), Pitfall 62 (HEX-only export losing OKLCH)
### Phase 3: Diagram Generation and Content Gallery UI
**Rationale:** Mermaid is the highest-value, lowest-complexity content type. The UI pipeline (ContentJobViewer, DiagramRenderer, ContentGallery) validates the SSE progress flow end-to-end. The Mermaid security config (Pitfall 49) and DOMPurify memory pattern (Pitfall 50) must be established before any diagram renders reach the browser.
**Delivers:** `mermaidRendererAdapter` (server-side via `@mermaid-js/mermaid-cli`), `ChatMarkdownMessage` extension for client-side Mermaid fences, `DiagramRenderer` component, `ThemePreview` component, `ContentJobViewer`, `ContentGallery`, `GeneratedAssetCard`, `assetService.list()`
**Addresses:** Diagram generation (P1), content type preview, inline diagram rendering in chat
**Avoids:** Pitfall 49 (Mermaid XSS/RCE), Pitfall 50 (DOMPurify JSDOM memory accumulation), Pitfall 59 (server-side Mermaid DOM requirement)
### Phase 4: Wallpapers and OG Images (Satori Pipeline)
**Rationale:** The satori+resvg-js+sharp pipeline is pure Node.js (no Chromium) and covers OG images, social headers, and wallpapers in a single code path. Establishes the reusable 2D raster pipeline before PDF and video introduce heavier binary deps.
**Delivers:** Platform-sized image outputs (OG 1200x630, Instagram 1080x1080, desktop wallpaper 2560x1440, etc.), `social.ts` service, platform dimension registry constant
**Uses:** satori, @resvg/resvg-js, sharp (already installed)
**Addresses:** Wallpapers + OG images (P1), social media content scaffolding (P2)
**Avoids:** Pitfall 56 (platform MIME type and dimension constraints encoded as explicit data structure, not magic numbers)
### Phase 5: PDF Document Generation
**Rationale:** PDF introduces the first Chromium binary via `playwright-chromium`. Browser lifecycle must be established as a persistent instance (Pitfall 54) before any template work begins. Font self-hosting (Pitfall 53) must be designed before the first PDF template is considered complete.
**Delivers:** `pdfRendererAdapter` (Playwright persistent browser instance), HTML template PDF (reports, one-pagers), pdf-lib for data-driven invoices, font self-hosting via Express static server, PDF download flow in UI
**Addresses:** PDF generation (P1)
**Avoids:** Pitfall 53 (headless Chromium font loading), Pitfall 54 (Puppeteer launch-per-request overhead)
### Phase 6: Video and Presentations (Remotion)
**Rationale:** Remotion is the highest-complexity and highest-risk content type — webpack bundler conflicts, three Chromium binaries total, M4 concurrency limits, and the agent heartbeat timeout problem. It comes last among P1/P2 features so the async job infrastructure (Phase 1) is fully proven before the longest-running render type is added.
**Delivers:** `packages/content-renderer/` workspace package, `remotionRendererAdapter` (CLI subprocess with cached bundle), video playback UI, `onProgress` SSE progress events, render queue with `concurrency: 4` on M4
**Addresses:** Remotion presentations + video (P2)
**Avoids:** Pitfall 45 (bundle-per-render), Pitfall 46 (Chromium concurrency thrashing), Pitfall 47 (bundler inside compiled server context), Pitfall 55 (video not streamable — onProgress mandatory), Pitfall 63 (pnpm lockfile conflicts — add Remotion immediately after upstream rebase)
### Phase 7: Content as Skills
**Rationale:** No new code — this phase writes Markdown skill files for each content type in `company_skills`. It is last because skill instructions reference API contracts finalized in Phases 16. Plugin boundary rules (Pitfall 57) must be enforced before any skill implementation.
**Delivers:** Skill markdown files for diagram, theme, PDF, wallpaper, video content types; agent-callable via existing Skill Aggregator
**Addresses:** Content types as installable skills (differentiator)
**Avoids:** Pitfall 57 (plugin workers bypassing JSON-RPC bridge, using direct HTTP to host API)
### Phase Ordering Rationale
- Phases 1 → 2 → 3 follow the build-order diagram in ARCHITECTURE.md exactly: infrastructure unblocks fast types, fast types validate the pipeline, UI comes after the first adapter works end-to-end.
- Phase 4 (Satori) precedes Phase 5 (PDF) because Satori has no Chromium dep; PDF introduces the first persistent browser instance that the diagram renderer (Phase 3) can optionally reuse to avoid a second Chromium binary.
- Phase 6 (Remotion) is last among feature phases because it is CPU/RAM-intensive and its Webpack bundler is a build pipeline risk — isolating it reduces rebase conflict surface.
- Phase 7 (Skills) is last because skill instructions reference finalized API contracts.
### Research Flags
Phases likely needing deeper research during planning:
- **Phase 6 (Remotion):** Chromium binary count on the specific Mac Mini M4 config (18GB vs 32GB RAM variant changes concurrency budget); Remotion bundle vs Vite isolation needs validation in the actual monorepo build pipeline; run `npx remotion benchmark` before finalizing concurrency setting
- **Phase 5 (PDF):** Verify whether `playwright-chromium` and `@mermaid-js/mermaid-cli` can share a Chromium binary via `PUPPETEER_EXECUTABLE_PATH` to reduce total to two binaries instead of three
- **Phase 4 (Satori):** Verify correct package name: `@resvg/resvg-js` vs `resvg-js` — npm shows different versions; confirm before `pnpm add`
Phases with standard patterns (can proceed without additional research):
- **Phase 1 (Infrastructure):** Factory function pattern, `content_jobs` table schema, and SSE live events pattern are all directly codebase-confirmed — HIGH confidence, no research needed
- **Phase 2 (Theme/SVG):** culori OKLCH API is documented and confirmed; WCAG threshold fix is specific and well-understood
- **Phase 3 (Mermaid):** Mermaid CLI Node.js `run()` API confirmed in README; security config is a one-line change with documented correct value
- **Phase 7 (Skills):** Skill markdown format is already established in the codebase
## Confidence Assessment
| Area | Confidence | Notes |
|------|------------|-------|
| Stack | MEDIUM-HIGH | Remotion HIGH (official SSR docs confirmed). Playwright PDF benchmark MEDIUM (single benchmark source, pdf4.dev March 2026). resvg-js package name LOW (npm shows two packages — verify). culori MEDIUM (version and WCAG claim confirmed via npm + pkgpulse comparison). ComfyUI client MEDIUM (npm confirmed, Mac M4 support sourced from offlinecreator.com). |
| Features | MEDIUM-HIGH | Technology capabilities verified via docs. UX expectations inferred from Canva/Pitch/Mermaid Live comparisons. Skill architecture patterns based on existing Nexus skill system. |
| Architecture | HIGH | Derived entirely from direct codebase inspection of `/opt/nexus/` on 2026-04-04. Factory function patterns, StorageService interface, live events bus, placeholder service, and asset service all confirmed by reading source files. |
| Pitfalls | HIGH | Critical pitfalls verified via multiple sources: Mermaid XSS confirmed via production exploit reports (OneUptime, DeepChat 20252026); WCAG linearization error confirmed vs W3C spec; HSL perceptual non-uniformity confirmed by Tailwind CSS 4.0 rationale; Remotion bundle timing confirmed via official Remotion SSR docs. |
**Overall confidence:** MEDIUM-HIGH
### Gaps to Address
- **resvg-js package name:** Run `npm info @resvg/resvg-js` before `pnpm add` — npm shows divergent versions between `resvg-js` (v0.1.97) and `@resvg/resvg-js` (v2.6.2). Use the scoped package.
- **Chromium binary sharing:** Whether `PUPPETEER_EXECUTABLE_PATH` pointing to Playwright's Chromium satisfies `@mermaid-js/mermaid-cli`'s bundled-puppeteer binary requirement needs a 10-minute test on the Mac Mini before Phase 3 begins — could eliminate one ~300MB download.
- **Remotion Vite isolation:** Run `pnpm build` after adding `packages/content-renderer/` to the workspace to verify no Vite/webpack conflicts surface before Phase 6 implementation work begins.
- **ComfyUI availability:** Image generation (optional, Phase 7) assumes ComfyUI is already installed. Confirm whether this is in scope for v1.7 or defer to v2 — the install is multi-GB (ComfyUI + Flux.1 model).
- **pdf-lib scope:** FEATURES.md recommends both Playwright (design-rich PDFs) and pdf-lib (invoices). Confirm whether pdf-lib is in scope for v1.7 or if all PDF is Playwright-only initially during Phase 5 planning.
## Sources
### Primary (HIGH confidence)
- Direct codebase inspection of `/opt/nexus/` (2026-04-04) — service patterns, StorageService interface, live events bus, asset schema, placeholder service, package.json contents
- [Remotion SSR docs](https://www.remotion.dev/docs/ssr) — `@remotion/renderer` Node.js API, bundle caching pattern
- [Remotion Express render-server template](https://github.com/remotion-dev/template-render-server) — Express integration confirmed
- [vercel/satori GitHub](https://github.com/vercel/satori) — JSX-to-SVG API, font format constraints (TTF/OTF/WOFF, no WOFF2)
- [mermaid-cli GitHub](https://github.com/mermaid-js/mermaid-cli) — Node.js `run()` API confirmed
- [playwright-chromium npm](https://www.npmjs.com/package/playwright-chromium) — Chromium-only package confirmed
- [culori npm](https://www.npmjs.com/package/culori) — version 4.0.2, WCAG functions confirmed
### Secondary (MEDIUM confidence)
- [PDF benchmark 2026](https://pdf4.dev/blog/html-to-pdf-benchmark-2026) — Playwright vs Puppeteer macOS arm64 timing (single source)
- [thx/resvg-js GitHub](https://github.com/thx/resvg-js) — SVG-to-PNG Rust napi-rs; package name ambiguity noted
- [culori vs chroma-js 2026](https://www.pkgpulse.com/blog/culori-vs-chroma-js-vs-tinycolor2-color-manipulation-javascript-2026) — OKLCH accuracy comparison
- [@stable-canvas/comfyui-client npm](https://www.npmjs.com/package/@stable-canvas/comfyui-client) — zero deps, MIT confirmed
- [SocialSizes.io 2026](https://socialsizes.io/) — platform dimension registry
### Tertiary (LOW confidence — needs validation during implementation)
- [offlinecreator.com — ComfyUI Mac M4 2026](https://offlinecreator.com/blog/best-local-stable-diffusion-setup-2026) — ComfyUI Metal/MPS support on M4
- Mermaid XSS via `securityLevel: "loose"` — referenced via exploit reports for OneUptime and DeepChat; the attack vector is documented in the Mermaid changelog and security advisories; specific CVE numbers not cited
---
*Research completed: 2026-04-04*
*Ready for roadmap: yes*