658 lines
38 KiB
Markdown
658 lines
38 KiB
Markdown
# Phase 42: Wallpapers, Social, Format Conversion & Voice — Research
|
||
|
||
**Researched:** 2026-04-04
|
||
**Domain:** Image generation (sharp/SVG), format conversion (sharp/ffmpeg-static/AI-bridge), social text generation (LLM), voice transcription (Whisper)
|
||
**Confidence:** HIGH
|
||
|
||
---
|
||
|
||
<user_constraints>
|
||
## User Constraints (from CONTEXT.md)
|
||
|
||
### Locked Decisions
|
||
All implementation choices are at Claude's discretion — discuss phase was skipped per user setting.
|
||
|
||
### Claude's Discretion
|
||
All implementation choices are at Claude's discretion. Use ROADMAP phase goal, success criteria, and codebase conventions to guide decisions.
|
||
|
||
### Deferred Ideas (OUT OF SCOPE)
|
||
None — discuss phase skipped.
|
||
</user_constraints>
|
||
|
||
---
|
||
|
||
<phase_requirements>
|
||
## Phase Requirements
|
||
|
||
| ID | Description | Research Support |
|
||
|----|-------------|------------------|
|
||
| WALL-01 | User can generate desktop and mobile wallpapers from a description | SVG-via-LLM + sharp rasterize at target dimensions; PLATFORM_DIMENSIONS constants in renderer |
|
||
| WALL-02 | User can generate social media banners with correct dimensions per platform | Same renderer; platform map covers OG Image, Twitter Card, Instagram, LinkedIn |
|
||
| WALL-03 | User can generate Open Graph and social preview images | Same renderer; OG Image = 1200×630 constant |
|
||
| WALL-04 | User can generate app icons and favicons in multiple sizes | Renderer returns multi-size bundle (1024, 512, 256, 64, 32); WallpaperPreview renders grid |
|
||
| SOCIAL-01 | User can generate platform-ready posts respecting character limits (Twitter, LinkedIn) | LLM prompt with platform limit injected; character count UI enforced per-platform constants |
|
||
| SOCIAL-02 | User can generate Instagram carousels and thread sequences | LLM returns JSON with slides array; carousel rendered as numbered collapsible sections |
|
||
| SOCIAL-03 | System suggests relevant hashtags for generated content | LLM prompt requests hashtag suggestions as JSON array alongside post text |
|
||
| CONV-01 | User can convert between image formats (PNG, JPG, SVG, WebP, GIF) via sharp | sharp 0.34.5 already installed; supports all listed formats |
|
||
| CONV-02 | User can convert between audio/video formats via ffmpeg | ffmpeg-static 7.0.2 already installed and verified working |
|
||
| CONV-03 | User can convert between document formats via Pandoc/LibreOffice | pandoc/libreoffice NOT installed → falls to AI-bridge per CONV-08 |
|
||
| CONV-04 | User can convert between data formats (CSV, JSON, XLSX) | xlsx + csv-parse packages needed; pure-Node.js conversion |
|
||
| CONV-05 | User can convert between any format pair via AI-bridged conversion | puterChatComplete already established; handles unsupported pairs |
|
||
| CONV-06 | System provides conversion UI with source/target format selection and drag-drop | Standalone /convert page; ConvertPanel as described in UI spec |
|
||
| CONV-07 | User can deep-link to specific conversion flows via URL | /convert/:sourceFormat?/:targetFormat? route in App.tsx; pre-select chips on mount |
|
||
| CONV-08 | System detects available direct converters at startup | Startup probe service; GET /api/system/converters endpoint |
|
||
| CONV-09 | System validates uploaded file MIME type via magic-byte detection | file-type@22.0.0 (ESM, ships own types); validate at convert route before job dispatch |
|
||
| VOICE-01 | User can click mic button in web chat to record and auto-transcribe via Whisper | VoiceMicButton already in ChatInput when enableVoiceInput=true; already wired |
|
||
| VOICE-02 | User can toggle between text-only, voice-input, and full-voice modes | VoiceModeToggle already exists; already wired in ChatInput; Phase 42 verifies correctness |
|
||
| VOICE-03 | Voice input works offline with local Whisper model | voice-pipeline.ts already probes whisper-cpp → openai-whisper; WHISPER_MODEL env var + offline badge |
|
||
</phase_requirements>
|
||
|
||
---
|
||
|
||
## Summary
|
||
|
||
Phase 42 extends the Phase 41 content generation system with four new capabilities: platform-aware image generation (wallpapers, OG images, social banners, app icons), LLM-driven social post generation with hashtag suggestions, a full-featured file format conversion pipeline, and offline voice input via Whisper.
|
||
|
||
The server already has all critical dependencies for images (sharp@0.34.5, @resvg/resvg-js@2.6.2) and audio/video (ffmpeg-static@7.0.2 — verified working at /opt/nexus/node_modules/.pnpm/ffmpeg-static@5.3.0). Three packages need to be added: `file-type@22.0.0` (magic-byte MIME detection), `xlsx@0.18.5` (XLSX data conversion), and `csv-parse@6.2.1` (CSV parsing). Document conversion (pandoc/libreoffice) is not available on this system and will fall through to AI-bridge per CONV-08 — no installation needed.
|
||
|
||
The voice pipeline (`voice-pipeline.ts`) already handles Whisper probe and transcription. Phase 42's voice work is: (1) add `WHISPER_MODEL=local` env var support to signal offline capability, (2) expose whisper availability to the UI via the existing `/api/system/providers` endpoint (already returns `whisperAvailable`), (3) render the "Offline" badge in `ChatInput` alongside `VoiceMicButton`. The VoiceMicButton, VoiceModeToggle, and `enableVoiceInput=true` wiring already exist in `ChatPanel.tsx`.
|
||
|
||
**Primary recommendation:** Follow the established Phase 41 renderer pattern: add four new `jobType` cases to `content-job-runner.ts` (`wallpaper`, `social-post`, `convert`), create one renderer file per job type in `server/src/services/renderers/`, and wire three new ContentStudio tabs + one standalone `/convert` page in the UI.
|
||
|
||
---
|
||
|
||
## Standard Stack
|
||
|
||
### Core (all verified installed in server)
|
||
|
||
| Library | Version | Purpose | Why Standard |
|
||
|---------|---------|---------|--------------|
|
||
| sharp | 0.34.5 | Image format conversion + SVG rasterization at target dimensions | Already installed; used by icon-renderer and org-chart-svg |
|
||
| @resvg/resvg-js | 2.6.2 | High-fidelity SVG→PNG rasterization with fitTo dimensions | Already installed; used by diagram-renderer |
|
||
| ffmpeg-static | 5.3.0 (bin: 7.0.2) | Bundled ffmpeg binary for audio/video conversion | Already installed; used by voice-pipeline and telegram |
|
||
| culori | 4.0.2 | OKLCH color math (not directly needed but available) | Already installed |
|
||
| puterChatComplete | (internal) | LLM inference for wallpaper SVG generation, social posts, AI-bridge conversion | Established pattern in Phase 41 renderers |
|
||
|
||
### New Dependencies (needs `pnpm add` in server)
|
||
|
||
| Library | Version | Purpose | Why Standard |
|
||
|---------|---------|---------|--------------|
|
||
| file-type | 22.0.0 | Magic-byte MIME type detection for CONV-09 | ESM-native, ships own types, well-maintained |
|
||
| xlsx | 0.18.5 | XLSX read/write for data conversion CONV-04 | Most-used Excel library for Node.js |
|
||
| csv-parse | 6.2.1 | CSV parsing for data conversion CONV-04 | De-facto standard, streaming API |
|
||
| @types/xlsx | 0.0.36 | TypeScript types for xlsx | xlsx ships types/index.d.ts but @types available |
|
||
|
||
### Alternatives Considered
|
||
|
||
| Instead of | Could Use | Tradeoff |
|
||
|------------|-----------|----------|
|
||
| file-type@22 (ESM) | mmmagic or mime-magic | file-type is pure JS, no native binding, ships own types; server is type:module so ESM is fine |
|
||
| xlsx | exceljs | xlsx is simpler API for read/write; exceljs has streaming but more complex |
|
||
| sharp for SVG rasterization | Playwright (like diagram-renderer) | sharp+resvg is faster for simple SVG → PNG; Playwright only needed for JavaScript-rendered content |
|
||
|
||
**Installation:**
|
||
```bash
|
||
# Run from /opt/nexus/server
|
||
pnpm add file-type@22.0.0 xlsx@0.18.5 csv-parse@6.2.1
|
||
pnpm add -D @types/xlsx@0.0.36
|
||
```
|
||
|
||
**Version verification (run before installing):**
|
||
```bash
|
||
npm view file-type version # → 22.0.0
|
||
npm view xlsx version # → 0.18.5
|
||
npm view csv-parse version # → 6.2.1
|
||
```
|
||
|
||
---
|
||
|
||
## Architecture Patterns
|
||
|
||
### Established Renderer Pattern (from Phase 41)
|
||
|
||
Every new capability follows this exact structure:
|
||
|
||
1. **Renderer file:** `server/src/services/renderers/{name}-renderer.ts` exports `async function render{Name}(input: Record<string, unknown>): Promise<RenderResult>`
|
||
2. **Job runner switch:** Add `case '{jobtype}':` to `renderContent()` in `content-job-runner.ts`
|
||
3. **Bundle type (if needed):** Add `interface {Name}Bundle` to `types.ts`
|
||
4. **API route:** Submit via existing `POST /api/companies/:id/content-jobs` with `{ jobType, input }`
|
||
5. **UI hook:** `useContentJob(companyId)` already handles all SSE + state management
|
||
6. **UI component:** Panel reads `job.bundle` after `status === 'done'`
|
||
|
||
The format conversion job is the only exception — it requires a separate multipart upload route because the file binary cannot be passed as JSON input via the standard content-jobs endpoint.
|
||
|
||
### Recommended Project Structure (new files)
|
||
|
||
```
|
||
server/src/
|
||
├── services/renderers/
|
||
│ ├── types.ts # ADD: WallpaperBundle, SocialPostBundle, ConvertBundle
|
||
│ ├── wallpaper-renderer.ts # NEW
|
||
│ ├── social-renderer.ts # NEW
|
||
│ └── convert-renderer.ts # NEW
|
||
├── services/
|
||
│ └── converter-capabilities.ts # NEW: startup probe + cache
|
||
└── routes/
|
||
└── convert.ts # NEW: POST /api/companies/:id/convert (multipart)
|
||
# GET /api/system/converters
|
||
|
||
ui/src/
|
||
├── pages/
|
||
│ └── ConvertPage.tsx # NEW: standalone /convert page
|
||
├── components/
|
||
│ ├── WallpaperGeneratePanel.tsx # NEW
|
||
│ ├── WallpaperPreview.tsx # NEW
|
||
│ ├── SocialPostPanel.tsx # NEW
|
||
│ ├── SocialPostResult.tsx # NEW
|
||
│ └── ConvertPanel.tsx # NEW (contains ConvertSourceZone + ConvertTargetSelector + ConvertActionBar)
|
||
└── api/
|
||
└── convert.ts # NEW: submitConvertJob (multipart), getConverterCapabilities
|
||
```
|
||
|
||
### Pattern 1: Wallpaper Generation (WALL-01 to WALL-04)
|
||
|
||
**What:** LLM generates an SVG at a conceptual level, then sharp rasterizes it to exact pixel dimensions for the requested platform.
|
||
**When to use:** Any fixed-dimension image asset (wallpaper, OG image, social banner, app icon).
|
||
|
||
```typescript
|
||
// Source: established pattern from icon-renderer.ts + sharp resize
|
||
// server/src/services/renderers/wallpaper-renderer.ts
|
||
|
||
export const PLATFORM_DIMENSIONS: Record<string, { width: number; height: number; label: string }> = {
|
||
"desktop-hd": { width: 2560, height: 1440, label: "Desktop HD (2560 × 1440)" },
|
||
"desktop-fhd": { width: 1920, height: 1080, label: "Desktop FHD (1920 × 1080)" },
|
||
"desktop-4k": { width: 3840, height: 2160, label: "Desktop 4K (3840 × 2160)" },
|
||
"mobile-portrait": { width: 1080, height: 1920, label: "Mobile Portrait (1080 × 1920)" },
|
||
"mobile-landscape": { width: 1920, height: 1080, label: "Mobile Landscape (1920 × 1080)" },
|
||
"og-image": { width: 1200, height: 630, label: "OG Image (1200 × 630)" },
|
||
"twitter-card": { width: 1200, height: 628, label: "Twitter Card (1200 × 628)" },
|
||
"instagram-post": { width: 1080, height: 1080, label: "Instagram Post (1080 × 1080)" },
|
||
"instagram-banner": { width: 1080, height: 566, label: "Instagram Banner (1080 × 566)" },
|
||
"linkedin-banner": { width: 1584, height: 396, label: "LinkedIn Banner (1584 × 396)" },
|
||
"app-icon": { width: 1024, height: 1024, label: "App Icon (1024 × 1024)" },
|
||
"favicon": { width: 32, height: 32, label: "Favicon (32 × 32)" },
|
||
};
|
||
|
||
// App icon + favicon: render multiple sizes from one SVG
|
||
const APP_ICON_SIZES = [1024, 512, 256, 64, 32] as const;
|
||
|
||
// Render flow:
|
||
// 1. puterChatComplete → SVG string (LLM generates SVG matching aspect ratio)
|
||
// 2. sharp(svgBuffer).resize(width, height, { fit: 'fill' }).png() → PNG buffer
|
||
// 3. Return WallpaperBundle with pngBase64 + dimensions
|
||
```
|
||
|
||
**Critical constraint:** Platform dimensions MUST be constants, never magic numbers (success criterion 1). Export `PLATFORM_DIMENSIONS` from the renderer and re-export to the UI API client so the UI's Select options derive from the same source.
|
||
|
||
### Pattern 2: Format Conversion Architecture (CONV-01 to CONV-09)
|
||
|
||
**What:** Multipart upload endpoint validates MIME, stores base64 in job input, dispatch to converter renderer which routes to sharp/ffmpeg/xlsx/AI-bridge based on format pair.
|
||
**Why separate route:** Content-jobs POST accepts JSON; file binary needs multipart handling.
|
||
|
||
```typescript
|
||
// server/src/routes/convert.ts — new multipart route
|
||
// POST /api/companies/:companyId/convert
|
||
|
||
import multer from "multer";
|
||
import { fileTypeFromBuffer } from "file-type";
|
||
|
||
router.post("/companies/:companyId/convert", async (req, res) => {
|
||
// 1. multer.memoryStorage() upload (limit: MAX_ATTACHMENT_BYTES)
|
||
// 2. fileTypeFromBuffer(file.buffer) → detected MIME
|
||
// 3. Compare detected MIME against file extension claim
|
||
// 4. If mismatch: res.status(422).json({ error: "...", actualMime, claimedMime })
|
||
// 5. job input: { fileBase64: buffer.toString('base64'), sourceMime, targetFormat, originalFilename }
|
||
// 6. contentJobStore.create + contentJobRunner.dispatch
|
||
// 7. res.status(202).json({ jobId, status })
|
||
});
|
||
|
||
// GET /api/system/converters — capability map for UI
|
||
router.get("/system/converters", async (_req, res) => {
|
||
const caps = await converterCapabilitiesService().get();
|
||
res.json(caps);
|
||
// Returns: { imageConverter: true, audioVideoConverter: true, docConverter: false, dataConverter: true }
|
||
});
|
||
```
|
||
|
||
```typescript
|
||
// server/src/services/renderers/convert-renderer.ts
|
||
|
||
async function renderConvert(input: Record<string, unknown>): Promise<RenderResult> {
|
||
const { fileBase64, sourceMime, targetFormat } = input;
|
||
const fileBuffer = Buffer.from(fileBase64 as string, "base64");
|
||
|
||
// Route by format category:
|
||
if (isImageFormat(sourceMime) && isImageFormat(targetFormat)) {
|
||
return convertImageViaSharp(fileBuffer, sourceMime, targetFormat);
|
||
}
|
||
if (isAudioVideoFormat(sourceMime) || isAudioVideoFormat(targetFormat)) {
|
||
return convertAVViaFfmpeg(fileBuffer, sourceMime, targetFormat);
|
||
}
|
||
if (isDataFormat(sourceMime) || isDataFormat(targetFormat)) {
|
||
return convertDataFormat(fileBuffer, sourceMime, targetFormat);
|
||
}
|
||
// All other pairs: AI bridge (CONV-05)
|
||
return convertViaAiBridge(fileBuffer, sourceMime, targetFormat);
|
||
}
|
||
```
|
||
|
||
### Pattern 3: Converter Capability Probe (CONV-08)
|
||
|
||
```typescript
|
||
// server/src/services/converter-capabilities.ts
|
||
// Probe at startup, cache result (same pattern as hardwareService)
|
||
|
||
let cache: ConverterCapabilities | null = null;
|
||
|
||
export interface ConverterCapabilities {
|
||
imageConverter: boolean; // sharp — always true (npm dep)
|
||
audioVideoConverter: boolean; // ffmpeg-static — always true (npm dep)
|
||
docConverter: boolean; // pandoc or libreoffice — probe at startup
|
||
dataConverter: boolean; // xlsx + csv-parse — always true (npm dep)
|
||
}
|
||
|
||
export function converterCapabilitiesService() {
|
||
async function get(): Promise<ConverterCapabilities> {
|
||
if (cache) return cache;
|
||
let docConverter = false;
|
||
try {
|
||
await execFileAsync("pandoc", ["--version"], { timeout: 2000 });
|
||
docConverter = true;
|
||
} catch {
|
||
try {
|
||
await execFileAsync("libreoffice", ["--version"], { timeout: 2000 });
|
||
docConverter = true;
|
||
} catch { /* not available */ }
|
||
}
|
||
cache = { imageConverter: true, audioVideoConverter: true, docConverter, dataConverter: true };
|
||
return cache;
|
||
}
|
||
return { get };
|
||
}
|
||
```
|
||
|
||
### Pattern 4: Social Post Generation (SOCIAL-01 to SOCIAL-03)
|
||
|
||
```typescript
|
||
// server/src/services/renderers/social-renderer.ts
|
||
|
||
export const PLATFORM_CHAR_LIMITS: Record<string, number> = {
|
||
"twitter-x": 280,
|
||
"linkedin": 3000,
|
||
"instagram-caption": 2200,
|
||
"instagram-carousel": 300, // per slide
|
||
};
|
||
|
||
// LLM prompt asks for JSON: { post: string, hashtags: string[], slides?: string[] }
|
||
// For carousel: slides array, each under 300 chars
|
||
// puterChatComplete returns JSON; renderer parses + validates
|
||
```
|
||
|
||
### Pattern 5: Voice Offline Badge (VOICE-03)
|
||
|
||
The voice pipeline already handles Whisper detection. Phase 42 adds two things:
|
||
|
||
1. **Server:** `WHISPER_MODEL` env var read in `voice-pipeline.ts` — when set to `"local"`, include `"local"` in nexus-settings response or expose via `GET /api/system/providers` (already returns `whisperAvailable` from `hardwareService().detect()`).
|
||
|
||
2. **UI:** In `ChatInput.tsx`, read `whisperAvailable` from a `useConverterCapabilities()` or `useSystemProviders()` hook. Show `<span aria-label="Voice input is offline (local model)">Offline</span>` next to `VoiceMicButton` when `whisperAvailable === true`.
|
||
|
||
**IMPORTANT:** The existing `GET /api/system/providers` already returns `{ whisperAvailable: boolean, piperAvailable: boolean, ... }` — no new endpoint needed. Create a `useSystemProviders()` hook that calls this endpoint once on mount.
|
||
|
||
### Pattern 6: ContentStudio Tab Extension + Standalone Convert Page
|
||
|
||
```typescript
|
||
// ui/src/pages/ContentStudio.tsx — extend TabsList
|
||
// Add three TabsTriggers: "Wallpapers", "Social", "Convert"
|
||
// "Convert" tab value triggers navigate() to /convert (standalone page)
|
||
// TabsContent for wallpapers and social are normal panel components
|
||
// TabsContent for convert is NOT a content panel — the tab click navigates away
|
||
|
||
// ui/src/App.tsx — add new routes in boardRoutes()
|
||
<Route path="content-studio" element={<ContentStudio />} />
|
||
<Route path="convert" element={<ConvertPage />} />
|
||
<Route path="convert/:sourceFormat" element={<ConvertPage />} />
|
||
<Route path="convert/:sourceFormat/:targetFormat" element={<ConvertPage />} />
|
||
```
|
||
|
||
### Anti-Patterns to Avoid
|
||
|
||
- **Magic number dimensions:** Never hardcode `2560` or `1440` in component code — always read from `PLATFORM_DIMENSIONS` constant exported from renderer or a shared types file.
|
||
- **Passing file buffer as base64 in SSE-triggered jobs with >10MB files:** The 10MB multer limit prevents oversized uploads; document this clearly in the convert route.
|
||
- **Blocking HTTP on render:** All conversion dispatched fire-and-forget via `contentJobRunner.dispatch()`. The POST /convert route returns 202 immediately.
|
||
- **Showing format pairs as "unavailable":** Per CONV-08, all format pairs are selectable in the UI. Unavailable direct converters show the AI fallback notice, never a disabled/grey chip.
|
||
- **Creating a separate `/api/convert/validate` endpoint:** Validate at job submit time in the convert route (simpler, fewer round trips). The UI spec notes this as an OR condition.
|
||
- **Satori for wallpaper generation:** Satori is NOT installed. Use the established pattern: LLM generates SVG → sharp rasterizes to exact dimensions. Satori would require JSX rendering infrastructure not needed here.
|
||
|
||
---
|
||
|
||
## Don't Hand-Roll
|
||
|
||
| Problem | Don't Build | Use Instead | Why |
|
||
|---------|-------------|-------------|-----|
|
||
| MIME type detection from file bytes | Custom magic-byte reader | `file-type@22.0.0` | Handles 500+ MIME types, handles edge cases like truncated files, streaming API |
|
||
| XLSX read/write | Custom binary parser | `xlsx@0.18.5` | XLSX format is complex binary (OOXML); hand-rolling is weeks of work |
|
||
| CSV parsing | String.split() | `csv-parse@6.2.1` | Handles quoted fields, escaped commas, multiline values, BOM |
|
||
| Image format conversion | Native buffer manipulation | `sharp@0.34.5` | Already installed; handles color spaces, ICC profiles, transparency |
|
||
| Audio/video conversion | Custom codec wrappers | `ffmpeg-static@7.0.2` | Already installed; handles all codec negotiation |
|
||
| SVG rasterization | canvas/Playwright | `@resvg/resvg-js@2.6.2` | Already installed; faster than Playwright for static SVG |
|
||
| LLM inference | New HTTP client | `puterChatComplete()` | Already implemented in Phase 41; puter-inference.ts is the project standard |
|
||
|
||
**Key insight:** All heavy-lifting tools are already installed. Phase 42 is primarily wiring (new renderers + routes + UI panels) rather than infrastructure.
|
||
|
||
---
|
||
|
||
## Common Pitfalls
|
||
|
||
### Pitfall 1: Sharp SVG Input at Large Dimensions
|
||
**What goes wrong:** `sharp(svgBuffer).resize(2560, 1440)` produces a blurry image when the SVG has a small implicit pixel density.
|
||
**Why it happens:** Sharp defaults to 72 DPI for SVG input; scaling up produces raster artifacts before the resize step.
|
||
**How to avoid:** Always pass `{ density: 300 }` option when loading SVG into sharp: `sharp(svgBuffer, { density: 300 }).resize(width, height, { fit: 'fill' }).png()`. Alternatively, ask the LLM to generate an SVG with `viewBox="0 0 {width} {height}"` matching the target dimensions, then use Resvg with `fitTo: { mode: 'width', value: width }`.
|
||
**Warning signs:** Generated wallpapers look pixelated or blurry at edges.
|
||
|
||
### Pitfall 2: file-type v22 Import Syntax
|
||
**What goes wrong:** `import FileType from 'file-type'` fails with "does not provide an export named 'default'".
|
||
**Why it happens:** file-type v22 is pure ESM with named exports only.
|
||
**How to avoid:** Use named import: `import { fileTypeFromBuffer } from 'file-type'`. Server is `type: module` with `module: NodeNext` — ESM imports work directly.
|
||
**Warning signs:** TypeScript error TS2613 or runtime "is not a function" errors.
|
||
|
||
### Pitfall 3: ffmpeg-static Path Resolution
|
||
**What goes wrong:** `spawn(ffmpegPath, ...)` throws ENOENT even though ffmpeg-static is installed.
|
||
**Why it happens:** `ffmpegPath` from `import ffmpegPath from 'ffmpeg-static'` is the binary path string, but it needs `as unknown as string` cast due to TS type mismatch. The actual binary is at `/opt/nexus/node_modules/.pnpm/ffmpeg-static@5.3.0/node_modules/ffmpeg-static/ffmpeg`.
|
||
**How to avoid:** Copy the existing pattern from `voice-pipeline.ts` exactly: `if (!ffmpegPath) throw new Error("ffmpeg-static binary not found"); const ffmpegBin = ffmpegPath as unknown as string;`.
|
||
**Warning signs:** `ffmpegBin` is null/undefined; ENOENT on spawn.
|
||
|
||
### Pitfall 4: Content-Job Input Size for Conversion
|
||
**What goes wrong:** Submitting a 10MB file as base64 in job input stores ~13.3MB of base64 in the `content_jobs.input` JSONB column per submission.
|
||
**Why it happens:** base64 adds ~33% overhead. For a 10MB file (MAX_ATTACHMENT_BYTES), this is ~13.3MB per job row.
|
||
**How to avoid:** This is acceptable for the single-user case (success criteria assume one conversion at a time). Document the max file size clearly in the UI (the multer limit enforces it). If this becomes a problem in future, change the renderer to accept storage object keys (requires extending content-job-runner signature).
|
||
**Warning signs:** Postgres table growth visible in db metrics after many conversions.
|
||
|
||
### Pitfall 5: Social Carousel JSON Parsing from LLM
|
||
**What goes wrong:** LLM returns markdown-fenced JSON or adds explanation text, causing `JSON.parse()` to throw.
|
||
**Why it happens:** LLMs sometimes wrap JSON in ````json ... ```` fences.
|
||
**How to avoid:** Post-process LLM output to strip markdown fences before JSON.parse(). Use a robust extraction pattern: `const match = raw.match(/```json\s*([\s\S]*?)\s*```/) || raw.match(/({[\s\S]*})/); JSON.parse(match ? match[1] : raw)`. Apply the same fix pattern used by icon-renderer.ts SVG validation.
|
||
**Warning signs:** SocialPostResult shows "Generation failed" after seemingly valid LLM output.
|
||
|
||
### Pitfall 6: Deep-Link Route Parameter Case
|
||
**What goes wrong:** `/convert/PNG/SVG` doesn't pre-select chips because the component does a case-sensitive compare against format names.
|
||
**Why it happens:** URL params are case-sensitive; format chips may be stored as uppercase.
|
||
**How to avoid:** Normalize URL params to lowercase on read: `params.sourceFormat?.toLowerCase()`. Match against chip identifiers using `formatId.toLowerCase() === param.toLowerCase()`.
|
||
**Warning signs:** Deep-link URL works in one case but not when user types different casing.
|
||
|
||
### Pitfall 7: Voice Offline Badge Always Showing
|
||
**What goes wrong:** The "Offline" badge shows even when whisper is not installed (whisperAvailable: false).
|
||
**Why it happens:** Misreading the UI spec: badge shows when `whisperAvailable === true` (local model detected), not when `WHISPER_MODEL=local` env var is set (which is confusing naming).
|
||
**How to avoid:** Read `whisperAvailable` from `GET /api/system/providers`. Show badge if `whisperAvailable === true`. The "offline capability" is proven by the binary being detected, not by an env var. The `WHISPER_MODEL` env var mentioned in the UI spec is a future extension point for model selection — do not implement it unless the spec is explicitly required. Per VOICE-03, "works offline with locally cached model" means the whisper-cpp binary + base model are present.
|
||
**Warning signs:** Badge shows on machines where whisper is not installed.
|
||
|
||
---
|
||
|
||
## Code Examples
|
||
|
||
### Wallpaper Renderer: Sharp at Target Dimensions
|
||
|
||
```typescript
|
||
// Source: icon-renderer.ts pattern + sharp resize extension
|
||
// server/src/services/renderers/wallpaper-renderer.ts
|
||
|
||
import sharp from "sharp";
|
||
import { puterChatComplete } from "../puter-inference.js";
|
||
import type { RenderResult } from "./types.js";
|
||
|
||
async function renderSvgToWallpaper(svgString: string, width: number, height: number): Promise<Buffer> {
|
||
return sharp(Buffer.from(svgString), { density: 300 })
|
||
.resize(width, height, { fit: "fill" })
|
||
.png({ compressionLevel: 9 })
|
||
.toBuffer();
|
||
}
|
||
```
|
||
|
||
### Magic-Byte MIME Validation
|
||
|
||
```typescript
|
||
// Source: file-type@22 documentation — ESM named import
|
||
import { fileTypeFromBuffer } from "file-type";
|
||
|
||
async function validateMime(buffer: Buffer, claimedExtension: string): Promise<{ ok: boolean; actualMime?: string; claimedMime?: string }> {
|
||
const detected = await fileTypeFromBuffer(buffer);
|
||
if (!detected) return { ok: true }; // unknown type, allow (SVG/text files have no magic bytes)
|
||
const mimeForExtension = extensionToMime(claimedExtension); // lookup table
|
||
if (mimeForExtension && detected.mime !== mimeForExtension) {
|
||
return { ok: false, actualMime: detected.mime, claimedMime: mimeForExtension };
|
||
}
|
||
return { ok: true };
|
||
}
|
||
```
|
||
|
||
### ffmpeg-static Conversion (audio/video)
|
||
|
||
```typescript
|
||
// Source: voice-pipeline.ts pattern (established in Phase 36)
|
||
import ffmpegPath from "ffmpeg-static";
|
||
import { spawn } from "node:child_process";
|
||
|
||
if (!ffmpegPath) throw new Error("ffmpeg-static binary not found");
|
||
const ffmpegBin = ffmpegPath as unknown as string;
|
||
|
||
function convertAVViaFfmpeg(inputBuffer: Buffer, sourceFormat: string, targetFormat: string): Promise<Buffer> {
|
||
return new Promise<Buffer>((resolve, reject) => {
|
||
const ffmpeg = spawn(ffmpegBin, [
|
||
"-f", sourceFormat,
|
||
"-i", "pipe:0",
|
||
"-f", targetFormat,
|
||
"pipe:1",
|
||
], { stdio: ["pipe", "pipe", "pipe"] });
|
||
const chunks: Buffer[] = [];
|
||
ffmpeg.stdout.on("data", (c: Buffer) => chunks.push(c));
|
||
ffmpeg.stderr.on("data", () => {}); // discard
|
||
ffmpeg.on("close", (code) => code === 0 ? resolve(Buffer.concat(chunks)) : reject(new Error(`ffmpeg exited ${code}`)));
|
||
ffmpeg.on("error", reject);
|
||
ffmpeg.stdin.write(inputBuffer);
|
||
ffmpeg.stdin.end();
|
||
});
|
||
}
|
||
```
|
||
|
||
### Data Format Conversion (CSV ↔ JSON ↔ XLSX)
|
||
|
||
```typescript
|
||
// Source: xlsx documentation + csv-parse documentation
|
||
import * as XLSX from "xlsx";
|
||
import { parse as csvParse } from "csv-parse/sync";
|
||
|
||
// CSV → JSON
|
||
function csvToJson(buffer: Buffer): Record<string, unknown>[] {
|
||
return csvParse(buffer, { columns: true, skip_empty_lines: true });
|
||
}
|
||
|
||
// JSON → XLSX
|
||
function jsonToXlsx(data: Record<string, unknown>[]): Buffer {
|
||
const ws = XLSX.utils.json_to_sheet(data);
|
||
const wb = XLSX.utils.book_new();
|
||
XLSX.utils.book_append_sheet(wb, ws, "Sheet1");
|
||
return Buffer.from(XLSX.write(wb, { type: "buffer", bookType: "xlsx" }));
|
||
}
|
||
```
|
||
|
||
### useContentJob Pattern (UI — already exists)
|
||
|
||
```typescript
|
||
// Source: ui/src/hooks/useContentJob.ts (Phase 41)
|
||
// Usage in WallpaperGeneratePanel:
|
||
const job = useContentJob(companyId);
|
||
|
||
// Submit
|
||
job.submit("wallpaper", { prompt, platformKey: "desktop-hd" });
|
||
|
||
// Render result when done
|
||
if (job.status === "done" && job.bundle) {
|
||
const bundle = job.bundle as WallpaperBundle;
|
||
// bundle.pngBase64, bundle.dimensions, bundle.platformKey
|
||
}
|
||
```
|
||
|
||
### Converter Capabilities in UI
|
||
|
||
```typescript
|
||
// ui/src/hooks/useSystemProviders.ts (new)
|
||
// Calls GET /api/system/providers once on mount, caches result
|
||
// Returns: { whisperAvailable, piperAvailable, ... }
|
||
// Used by ChatInput for offline badge, by ConvertTargetSelector for AI-fallback notice
|
||
|
||
// ui/src/hooks/useConverterCapabilities.ts (new)
|
||
// Calls GET /api/system/converters once on mount
|
||
// Returns: { imageConverter, audioVideoConverter, docConverter, dataConverter }
|
||
```
|
||
|
||
---
|
||
|
||
## State of the Art
|
||
|
||
| Old Approach | Current Approach | When Changed | Impact |
|
||
|--------------|------------------|--------------|--------|
|
||
| Manual MIME detection via extension | Magic-byte detection via file-type | file-type v19+ | Required for CONV-09 — extension can be spoofed |
|
||
| Pandoc/LibreOffice for doc conversion | AI-bridge fallback when not available | CONV-08 design | No installer required; works everywhere |
|
||
| Separate validate endpoint | Validate at submit time | UI spec v1 | Fewer round trips, simpler client code |
|
||
|
||
**Deprecated/outdated:**
|
||
- `satori` for wallpaper generation: Not installed and not needed. The Phase 41 pattern (LLM SVG + sharp rasterize) is sufficient and consistent with existing code.
|
||
- Separate `/api/convert/validate` endpoint: Consolidate validation into the convert submit route.
|
||
|
||
---
|
||
|
||
## Open Questions
|
||
|
||
1. **WallpaperBundle storage format**
|
||
- What we know: Other bundles (DiagramBundle, IconSetBundle) store base64-encoded assets in JSON
|
||
- What's unclear: For wallpapers at 2560×1440, the PNG can be 5–15MB — base64 encoding adds ~33% → 20MB JSON blob stored in content_jobs.output. MAX_GENERATED_ASSET_BYTES = 500MB so it fits, but row size may be large for Postgres.
|
||
- Recommendation: Store the PNG as an asset (same as diagram-renderer stores to storage), and return `WallpaperBundle` with `assetId` + `dimensions` + `platformKey`. The UI downloads via `/api/assets/:id/content`. This avoids storing large base64 in the DB. Follow the same pattern if app-icon returns multiple sizes: store each size as a separate asset or as a multi-size ZIP.
|
||
|
||
2. **Convert job input size for large files**
|
||
- What we know: base64(10MB file) = ~13.3MB JSON in content_jobs.input column
|
||
- What's unclear: Whether Postgres/Drizzle has JSONB size limits that would reject this
|
||
- Recommendation: Postgres JSONB has no practical size limit beyond the max row size (1GB). 13.3MB is fine. Document the 10MB upload cap in the UI.
|
||
|
||
3. **Social post carousel slide format**
|
||
- What we know: SOCIAL-02 says "Instagram carousels and thread sequences"
|
||
- What's unclear: Whether thread sequences means Twitter threads (numbered tweets) or just a generic multi-part structure
|
||
- Recommendation: Implement as a unified `slides: string[]` field in SocialPostBundle. The `collapsible` sections in SocialPostResult handle both Twitter threads and Instagram carousel displays.
|
||
|
||
---
|
||
|
||
## Environment Availability
|
||
|
||
| Dependency | Required By | Available | Version | Fallback |
|
||
|------------|------------|-----------|---------|----------|
|
||
| sharp | CONV-01, WALL-01-04 | ✓ | 0.34.5 | — |
|
||
| @resvg/resvg-js | WALL-01-04 | ✓ | 2.6.2 | — |
|
||
| ffmpeg-static | CONV-02 | ✓ | 7.0.2 (binary) | — |
|
||
| file-type | CONV-09 | ✗ | — | Install: `pnpm add file-type@22.0.0` |
|
||
| xlsx | CONV-04 | ✗ | — | Install: `pnpm add xlsx@0.18.5` |
|
||
| csv-parse | CONV-04 | ✗ | — | Install: `pnpm add csv-parse@6.2.1` |
|
||
| pandoc | CONV-03 | ✗ | — | AI-bridge (CONV-08 design) |
|
||
| libreoffice | CONV-03 | ✗ | — | AI-bridge (CONV-08 design) |
|
||
| whisper-cpp | VOICE-03 | ✗ | — | openai-whisper CLI fallback; error message if neither |
|
||
| whisper (openai) | VOICE-03 | ✗ | — | whisper-cpp fallback |
|
||
| satori | Phase goal wording | ✗ | — | Not needed — use LLM SVG + sharp pattern |
|
||
|
||
**Missing dependencies with no fallback:**
|
||
- `file-type`, `xlsx`, `csv-parse` — these MUST be installed in Wave 0. Phase cannot complete CONV-01/CONV-04/CONV-09 without them.
|
||
|
||
**Missing dependencies with fallback:**
|
||
- `pandoc`, `libreoffice` — document conversion falls through to AI-bridge per CONV-08 design. Planner should add a startup probe that logs "pandoc not found, doc conversion will use AI bridge" rather than failing.
|
||
- `whisper-cpp`, `whisper` — existing voice pipeline already handles both missing gracefully with an informative error. VOICE-03 "offline" badge is shown based on `whisperAvailable` from hardware detection.
|
||
|
||
---
|
||
|
||
## Validation Architecture
|
||
|
||
### Test Framework
|
||
|
||
| Property | Value |
|
||
|----------|-------|
|
||
| Framework | vitest 3.0.5 |
|
||
| Server config | server/vitest.config.ts (environment: node) |
|
||
| UI config | ui/vitest.config.ts (environment: node, react plugin) |
|
||
| Quick run (server) | `cd /opt/nexus/server && npx vitest run src/__tests__/42-*.test.ts` |
|
||
| Full suite (server) | `cd /opt/nexus/server && npx vitest run` |
|
||
| Quick run (UI) | `cd /opt/nexus/ui && npx vitest run src/**/*.test.{ts,tsx}` |
|
||
|
||
**Note:** Server baseline has 4 pre-existing failing test files (hardware-detection, skill-registry-routes, agent-permissions, heartbeat-workspace-session) — these are NOT caused by Phase 42. Phase 42 tests must not add to this count.
|
||
|
||
### Phase Requirements → Test Map
|
||
|
||
| Req ID | Behavior | Test Type | Automated Command | File Exists? |
|
||
|--------|----------|-----------|-------------------|-------------|
|
||
| WALL-01/02/03 | `renderWallpaper()` returns PNG buffer at correct dimensions per platform key | unit | `npx vitest run src/__tests__/42-wallpaper-renderer.test.ts` | ❌ Wave 0 |
|
||
| WALL-04 | App icon renderer returns multi-size array | unit | `npx vitest run src/__tests__/42-wallpaper-renderer.test.ts` | ❌ Wave 0 |
|
||
| SOCIAL-01/02/03 | `renderSocialPost()` returns post text + hashtags; carousel returns slides array | unit | `npx vitest run src/__tests__/42-social-renderer.test.ts` | ❌ Wave 0 |
|
||
| CONV-01 | Image conversion round-trip (PNG→JPG) via sharp | unit | `npx vitest run src/__tests__/42-convert-renderer.test.ts` | ❌ Wave 0 |
|
||
| CONV-02 | Audio conversion dispatch calls ffmpeg-static binary | unit (mocked) | `npx vitest run src/__tests__/42-convert-renderer.test.ts` | ❌ Wave 0 |
|
||
| CONV-04 | CSV→JSON and JSON→XLSX conversions | unit | `npx vitest run src/__tests__/42-convert-renderer.test.ts` | ❌ Wave 0 |
|
||
| CONV-05 | Unknown pair falls through to AI bridge | unit | `npx vitest run src/__tests__/42-convert-renderer.test.ts` | ❌ Wave 0 |
|
||
| CONV-08 | converterCapabilitiesService probes pandoc/libreoffice at startup | unit (mocked execFile) | `npx vitest run src/__tests__/42-converter-capabilities.test.ts` | ❌ Wave 0 |
|
||
| CONV-09 | MIME mismatch rejected with 422 at convert route | unit (supertest) | `npx vitest run src/__tests__/42-convert-routes.test.ts` | ❌ Wave 0 |
|
||
| VOICE-01/02 | VoiceMicButton renders in ChatInput when enableVoiceInput=true | manual (pre-existing wiring) | n/a — already wired in ChatPanel.tsx | ✅ Existing |
|
||
| VOICE-03 | Offline badge shows when whisperAvailable=true from /api/system/providers | unit (mocked hook) | `npx vitest run src/**/*.test.tsx` (UI test) | ❌ Wave 0 |
|
||
|
||
### Sampling Rate
|
||
|
||
- **Per task commit:** `cd /opt/nexus/server && npx vitest run src/__tests__/42-*.test.ts`
|
||
- **Per wave merge:** `cd /opt/nexus/server && npx vitest run` (full server suite)
|
||
- **Phase gate:** Full server + UI suites green before `/gsd:verify-work`
|
||
|
||
### Wave 0 Gaps
|
||
|
||
- [ ] `server/src/__tests__/42-wallpaper-renderer.test.ts` — covers WALL-01 through WALL-04
|
||
- [ ] `server/src/__tests__/42-social-renderer.test.ts` — covers SOCIAL-01 through SOCIAL-03
|
||
- [ ] `server/src/__tests__/42-convert-renderer.test.ts` — covers CONV-01 through CONV-05
|
||
- [ ] `server/src/__tests__/42-converter-capabilities.test.ts` — covers CONV-08
|
||
- [ ] `server/src/__tests__/42-convert-routes.test.ts` — covers CONV-09 (MIME validation at HTTP layer)
|
||
- [ ] UI test for offline badge rendering (VOICE-03)
|
||
- [ ] Package install: `pnpm add file-type@22.0.0 xlsx@0.18.5 csv-parse@6.2.1 && pnpm add -D @types/xlsx@0.0.36`
|
||
|
||
---
|
||
|
||
## Sources
|
||
|
||
### Primary (HIGH confidence)
|
||
|
||
- Codebase direct read: `server/src/services/renderers/icon-renderer.ts` — renderer pattern, sharp usage
|
||
- Codebase direct read: `server/src/services/renderers/diagram-renderer.ts` — Playwright + Resvg pattern
|
||
- Codebase direct read: `server/src/services/content-job-runner.ts` — job dispatch architecture
|
||
- Codebase direct read: `server/src/services/voice-pipeline.ts` — Whisper probe and transcription pattern
|
||
- Codebase direct read: `server/src/routes/voice.ts` — multer upload pattern for binary input
|
||
- Codebase direct read: `ui/src/hooks/useContentJob.ts` — SSE hook established in Phase 41
|
||
- Codebase direct read: `ui/src/components/ChatInput.tsx` — existing VoiceMicButton wiring
|
||
- Codebase direct read: `ui/src/hooks/useVoiceMode.ts` — existing voice mode settings pattern
|
||
- Codebase direct read: `server/src/services/hardware.ts` — whisperAvailable detection, probe pattern
|
||
- Codebase direct read: `server/src/routes/hardware.ts` — GET /api/system/providers returns whisperAvailable
|
||
- Codebase direct read: `server/src/app.ts` — route mounting pattern
|
||
- Codebase direct read: `server/package.json` — installed deps list
|
||
- `npm view file-type version` → 22.0.0 (verified 2026-04-04)
|
||
- `npm view xlsx version` → 0.18.5 (verified 2026-04-04)
|
||
- `npm view csv-parse version` → 6.2.1 (verified 2026-04-04)
|
||
- Binary probe: `/opt/nexus/node_modules/.pnpm/ffmpeg-static@5.3.0/.../ffmpeg -version` → 7.0.2 (verified working)
|
||
|
||
### Secondary (MEDIUM confidence)
|
||
|
||
- `.planning/STATE.md` — accumulated decisions: CONV-05, CONV-08, CONV-09 architectural choices locked
|
||
- Phase 41-01-SUMMARY.md — renderer pattern, useContentJob hook, tech stack context
|
||
- Phase 40-01-SUMMARY.md — content_jobs schema, RenderResult interface, MAX_GENERATED_ASSET_BYTES
|
||
|
||
### Tertiary (LOW confidence)
|
||
|
||
- None — all critical claims verified by codebase inspection or npm registry.
|
||
|
||
---
|
||
|
||
## Metadata
|
||
|
||
**Confidence breakdown:**
|
||
- Standard stack: HIGH — all packages verified via codebase inspection + npm registry
|
||
- Architecture: HIGH — pattern directly derived from Phase 41 implementations in codebase
|
||
- Pitfalls: HIGH — most derived from actual code review (ffmpeg-static cast, file-type ESM, etc.)
|
||
- Environment availability: HIGH — verified via command execution on target system
|
||
|
||
**Research date:** 2026-04-04
|
||
**Valid until:** 2026-05-04 (packages stable; architecture unlikely to change)
|