nexus/.planning/phases/42-wallpapers-social-format-conversion-voice/42-RESEARCH.md

658 lines
38 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Phase 42: Wallpapers, Social, Format Conversion & Voice — Research
**Researched:** 2026-04-04
**Domain:** Image generation (sharp/SVG), format conversion (sharp/ffmpeg-static/AI-bridge), social text generation (LLM), voice transcription (Whisper)
**Confidence:** HIGH
---
<user_constraints>
## User Constraints (from CONTEXT.md)
### Locked Decisions
All implementation choices are at Claude's discretion — discuss phase was skipped per user setting.
### Claude's Discretion
All implementation choices are at Claude's discretion. Use ROADMAP phase goal, success criteria, and codebase conventions to guide decisions.
### Deferred Ideas (OUT OF SCOPE)
None — discuss phase skipped.
</user_constraints>
---
<phase_requirements>
## Phase Requirements
| ID | Description | Research Support |
|----|-------------|------------------|
| WALL-01 | User can generate desktop and mobile wallpapers from a description | SVG-via-LLM + sharp rasterize at target dimensions; PLATFORM_DIMENSIONS constants in renderer |
| WALL-02 | User can generate social media banners with correct dimensions per platform | Same renderer; platform map covers OG Image, Twitter Card, Instagram, LinkedIn |
| WALL-03 | User can generate Open Graph and social preview images | Same renderer; OG Image = 1200×630 constant |
| WALL-04 | User can generate app icons and favicons in multiple sizes | Renderer returns multi-size bundle (1024, 512, 256, 64, 32); WallpaperPreview renders grid |
| SOCIAL-01 | User can generate platform-ready posts respecting character limits (Twitter, LinkedIn) | LLM prompt with platform limit injected; character count UI enforced per-platform constants |
| SOCIAL-02 | User can generate Instagram carousels and thread sequences | LLM returns JSON with slides array; carousel rendered as numbered collapsible sections |
| SOCIAL-03 | System suggests relevant hashtags for generated content | LLM prompt requests hashtag suggestions as JSON array alongside post text |
| CONV-01 | User can convert between image formats (PNG, JPG, SVG, WebP, GIF) via sharp | sharp 0.34.5 already installed; supports all listed formats |
| CONV-02 | User can convert between audio/video formats via ffmpeg | ffmpeg-static 7.0.2 already installed and verified working |
| CONV-03 | User can convert between document formats via Pandoc/LibreOffice | pandoc/libreoffice NOT installed → falls to AI-bridge per CONV-08 |
| CONV-04 | User can convert between data formats (CSV, JSON, XLSX) | xlsx + csv-parse packages needed; pure-Node.js conversion |
| CONV-05 | User can convert between any format pair via AI-bridged conversion | puterChatComplete already established; handles unsupported pairs |
| CONV-06 | System provides conversion UI with source/target format selection and drag-drop | Standalone /convert page; ConvertPanel as described in UI spec |
| CONV-07 | User can deep-link to specific conversion flows via URL | /convert/:sourceFormat?/:targetFormat? route in App.tsx; pre-select chips on mount |
| CONV-08 | System detects available direct converters at startup | Startup probe service; GET /api/system/converters endpoint |
| CONV-09 | System validates uploaded file MIME type via magic-byte detection | file-type@22.0.0 (ESM, ships own types); validate at convert route before job dispatch |
| VOICE-01 | User can click mic button in web chat to record and auto-transcribe via Whisper | VoiceMicButton already in ChatInput when enableVoiceInput=true; already wired |
| VOICE-02 | User can toggle between text-only, voice-input, and full-voice modes | VoiceModeToggle already exists; already wired in ChatInput; Phase 42 verifies correctness |
| VOICE-03 | Voice input works offline with local Whisper model | voice-pipeline.ts already probes whisper-cpp → openai-whisper; WHISPER_MODEL env var + offline badge |
</phase_requirements>
---
## Summary
Phase 42 extends the Phase 41 content generation system with four new capabilities: platform-aware image generation (wallpapers, OG images, social banners, app icons), LLM-driven social post generation with hashtag suggestions, a full-featured file format conversion pipeline, and offline voice input via Whisper.
The server already has all critical dependencies for images (sharp@0.34.5, @resvg/resvg-js@2.6.2) and audio/video (ffmpeg-static@7.0.2 — verified working at /opt/nexus/node_modules/.pnpm/ffmpeg-static@5.3.0). Three packages need to be added: `file-type@22.0.0` (magic-byte MIME detection), `xlsx@0.18.5` (XLSX data conversion), and `csv-parse@6.2.1` (CSV parsing). Document conversion (pandoc/libreoffice) is not available on this system and will fall through to AI-bridge per CONV-08 — no installation needed.
The voice pipeline (`voice-pipeline.ts`) already handles Whisper probe and transcription. Phase 42's voice work is: (1) add `WHISPER_MODEL=local` env var support to signal offline capability, (2) expose whisper availability to the UI via the existing `/api/system/providers` endpoint (already returns `whisperAvailable`), (3) render the "Offline" badge in `ChatInput` alongside `VoiceMicButton`. The VoiceMicButton, VoiceModeToggle, and `enableVoiceInput=true` wiring already exist in `ChatPanel.tsx`.
**Primary recommendation:** Follow the established Phase 41 renderer pattern: add four new `jobType` cases to `content-job-runner.ts` (`wallpaper`, `social-post`, `convert`), create one renderer file per job type in `server/src/services/renderers/`, and wire three new ContentStudio tabs + one standalone `/convert` page in the UI.
---
## Standard Stack
### Core (all verified installed in server)
| Library | Version | Purpose | Why Standard |
|---------|---------|---------|--------------|
| sharp | 0.34.5 | Image format conversion + SVG rasterization at target dimensions | Already installed; used by icon-renderer and org-chart-svg |
| @resvg/resvg-js | 2.6.2 | High-fidelity SVG→PNG rasterization with fitTo dimensions | Already installed; used by diagram-renderer |
| ffmpeg-static | 5.3.0 (bin: 7.0.2) | Bundled ffmpeg binary for audio/video conversion | Already installed; used by voice-pipeline and telegram |
| culori | 4.0.2 | OKLCH color math (not directly needed but available) | Already installed |
| puterChatComplete | (internal) | LLM inference for wallpaper SVG generation, social posts, AI-bridge conversion | Established pattern in Phase 41 renderers |
### New Dependencies (needs `pnpm add` in server)
| Library | Version | Purpose | Why Standard |
|---------|---------|---------|--------------|
| file-type | 22.0.0 | Magic-byte MIME type detection for CONV-09 | ESM-native, ships own types, well-maintained |
| xlsx | 0.18.5 | XLSX read/write for data conversion CONV-04 | Most-used Excel library for Node.js |
| csv-parse | 6.2.1 | CSV parsing for data conversion CONV-04 | De-facto standard, streaming API |
| @types/xlsx | 0.0.36 | TypeScript types for xlsx | xlsx ships types/index.d.ts but @types available |
### Alternatives Considered
| Instead of | Could Use | Tradeoff |
|------------|-----------|----------|
| file-type@22 (ESM) | mmmagic or mime-magic | file-type is pure JS, no native binding, ships own types; server is type:module so ESM is fine |
| xlsx | exceljs | xlsx is simpler API for read/write; exceljs has streaming but more complex |
| sharp for SVG rasterization | Playwright (like diagram-renderer) | sharp+resvg is faster for simple SVG → PNG; Playwright only needed for JavaScript-rendered content |
**Installation:**
```bash
# Run from /opt/nexus/server
pnpm add file-type@22.0.0 xlsx@0.18.5 csv-parse@6.2.1
pnpm add -D @types/xlsx@0.0.36
```
**Version verification (run before installing):**
```bash
npm view file-type version # → 22.0.0
npm view xlsx version # → 0.18.5
npm view csv-parse version # → 6.2.1
```
---
## Architecture Patterns
### Established Renderer Pattern (from Phase 41)
Every new capability follows this exact structure:
1. **Renderer file:** `server/src/services/renderers/{name}-renderer.ts` exports `async function render{Name}(input: Record<string, unknown>): Promise<RenderResult>`
2. **Job runner switch:** Add `case '{jobtype}':` to `renderContent()` in `content-job-runner.ts`
3. **Bundle type (if needed):** Add `interface {Name}Bundle` to `types.ts`
4. **API route:** Submit via existing `POST /api/companies/:id/content-jobs` with `{ jobType, input }`
5. **UI hook:** `useContentJob(companyId)` already handles all SSE + state management
6. **UI component:** Panel reads `job.bundle` after `status === 'done'`
The format conversion job is the only exception — it requires a separate multipart upload route because the file binary cannot be passed as JSON input via the standard content-jobs endpoint.
### Recommended Project Structure (new files)
```
server/src/
├── services/renderers/
│ ├── types.ts # ADD: WallpaperBundle, SocialPostBundle, ConvertBundle
│ ├── wallpaper-renderer.ts # NEW
│ ├── social-renderer.ts # NEW
│ └── convert-renderer.ts # NEW
├── services/
│ └── converter-capabilities.ts # NEW: startup probe + cache
└── routes/
└── convert.ts # NEW: POST /api/companies/:id/convert (multipart)
# GET /api/system/converters
ui/src/
├── pages/
│ └── ConvertPage.tsx # NEW: standalone /convert page
├── components/
│ ├── WallpaperGeneratePanel.tsx # NEW
│ ├── WallpaperPreview.tsx # NEW
│ ├── SocialPostPanel.tsx # NEW
│ ├── SocialPostResult.tsx # NEW
│ └── ConvertPanel.tsx # NEW (contains ConvertSourceZone + ConvertTargetSelector + ConvertActionBar)
└── api/
└── convert.ts # NEW: submitConvertJob (multipart), getConverterCapabilities
```
### Pattern 1: Wallpaper Generation (WALL-01 to WALL-04)
**What:** LLM generates an SVG at a conceptual level, then sharp rasterizes it to exact pixel dimensions for the requested platform.
**When to use:** Any fixed-dimension image asset (wallpaper, OG image, social banner, app icon).
```typescript
// Source: established pattern from icon-renderer.ts + sharp resize
// server/src/services/renderers/wallpaper-renderer.ts
export const PLATFORM_DIMENSIONS: Record<string, { width: number; height: number; label: string }> = {
"desktop-hd": { width: 2560, height: 1440, label: "Desktop HD (2560 × 1440)" },
"desktop-fhd": { width: 1920, height: 1080, label: "Desktop FHD (1920 × 1080)" },
"desktop-4k": { width: 3840, height: 2160, label: "Desktop 4K (3840 × 2160)" },
"mobile-portrait": { width: 1080, height: 1920, label: "Mobile Portrait (1080 × 1920)" },
"mobile-landscape": { width: 1920, height: 1080, label: "Mobile Landscape (1920 × 1080)" },
"og-image": { width: 1200, height: 630, label: "OG Image (1200 × 630)" },
"twitter-card": { width: 1200, height: 628, label: "Twitter Card (1200 × 628)" },
"instagram-post": { width: 1080, height: 1080, label: "Instagram Post (1080 × 1080)" },
"instagram-banner": { width: 1080, height: 566, label: "Instagram Banner (1080 × 566)" },
"linkedin-banner": { width: 1584, height: 396, label: "LinkedIn Banner (1584 × 396)" },
"app-icon": { width: 1024, height: 1024, label: "App Icon (1024 × 1024)" },
"favicon": { width: 32, height: 32, label: "Favicon (32 × 32)" },
};
// App icon + favicon: render multiple sizes from one SVG
const APP_ICON_SIZES = [1024, 512, 256, 64, 32] as const;
// Render flow:
// 1. puterChatComplete → SVG string (LLM generates SVG matching aspect ratio)
// 2. sharp(svgBuffer).resize(width, height, { fit: 'fill' }).png() → PNG buffer
// 3. Return WallpaperBundle with pngBase64 + dimensions
```
**Critical constraint:** Platform dimensions MUST be constants, never magic numbers (success criterion 1). Export `PLATFORM_DIMENSIONS` from the renderer and re-export to the UI API client so the UI's Select options derive from the same source.
### Pattern 2: Format Conversion Architecture (CONV-01 to CONV-09)
**What:** Multipart upload endpoint validates MIME, stores base64 in job input, dispatch to converter renderer which routes to sharp/ffmpeg/xlsx/AI-bridge based on format pair.
**Why separate route:** Content-jobs POST accepts JSON; file binary needs multipart handling.
```typescript
// server/src/routes/convert.ts — new multipart route
// POST /api/companies/:companyId/convert
import multer from "multer";
import { fileTypeFromBuffer } from "file-type";
router.post("/companies/:companyId/convert", async (req, res) => {
// 1. multer.memoryStorage() upload (limit: MAX_ATTACHMENT_BYTES)
// 2. fileTypeFromBuffer(file.buffer) → detected MIME
// 3. Compare detected MIME against file extension claim
// 4. If mismatch: res.status(422).json({ error: "...", actualMime, claimedMime })
// 5. job input: { fileBase64: buffer.toString('base64'), sourceMime, targetFormat, originalFilename }
// 6. contentJobStore.create + contentJobRunner.dispatch
// 7. res.status(202).json({ jobId, status })
});
// GET /api/system/converters — capability map for UI
router.get("/system/converters", async (_req, res) => {
const caps = await converterCapabilitiesService().get();
res.json(caps);
// Returns: { imageConverter: true, audioVideoConverter: true, docConverter: false, dataConverter: true }
});
```
```typescript
// server/src/services/renderers/convert-renderer.ts
async function renderConvert(input: Record<string, unknown>): Promise<RenderResult> {
const { fileBase64, sourceMime, targetFormat } = input;
const fileBuffer = Buffer.from(fileBase64 as string, "base64");
// Route by format category:
if (isImageFormat(sourceMime) && isImageFormat(targetFormat)) {
return convertImageViaSharp(fileBuffer, sourceMime, targetFormat);
}
if (isAudioVideoFormat(sourceMime) || isAudioVideoFormat(targetFormat)) {
return convertAVViaFfmpeg(fileBuffer, sourceMime, targetFormat);
}
if (isDataFormat(sourceMime) || isDataFormat(targetFormat)) {
return convertDataFormat(fileBuffer, sourceMime, targetFormat);
}
// All other pairs: AI bridge (CONV-05)
return convertViaAiBridge(fileBuffer, sourceMime, targetFormat);
}
```
### Pattern 3: Converter Capability Probe (CONV-08)
```typescript
// server/src/services/converter-capabilities.ts
// Probe at startup, cache result (same pattern as hardwareService)
let cache: ConverterCapabilities | null = null;
export interface ConverterCapabilities {
imageConverter: boolean; // sharp — always true (npm dep)
audioVideoConverter: boolean; // ffmpeg-static — always true (npm dep)
docConverter: boolean; // pandoc or libreoffice — probe at startup
dataConverter: boolean; // xlsx + csv-parse — always true (npm dep)
}
export function converterCapabilitiesService() {
async function get(): Promise<ConverterCapabilities> {
if (cache) return cache;
let docConverter = false;
try {
await execFileAsync("pandoc", ["--version"], { timeout: 2000 });
docConverter = true;
} catch {
try {
await execFileAsync("libreoffice", ["--version"], { timeout: 2000 });
docConverter = true;
} catch { /* not available */ }
}
cache = { imageConverter: true, audioVideoConverter: true, docConverter, dataConverter: true };
return cache;
}
return { get };
}
```
### Pattern 4: Social Post Generation (SOCIAL-01 to SOCIAL-03)
```typescript
// server/src/services/renderers/social-renderer.ts
export const PLATFORM_CHAR_LIMITS: Record<string, number> = {
"twitter-x": 280,
"linkedin": 3000,
"instagram-caption": 2200,
"instagram-carousel": 300, // per slide
};
// LLM prompt asks for JSON: { post: string, hashtags: string[], slides?: string[] }
// For carousel: slides array, each under 300 chars
// puterChatComplete returns JSON; renderer parses + validates
```
### Pattern 5: Voice Offline Badge (VOICE-03)
The voice pipeline already handles Whisper detection. Phase 42 adds two things:
1. **Server:** `WHISPER_MODEL` env var read in `voice-pipeline.ts` — when set to `"local"`, include `"local"` in nexus-settings response or expose via `GET /api/system/providers` (already returns `whisperAvailable` from `hardwareService().detect()`).
2. **UI:** In `ChatInput.tsx`, read `whisperAvailable` from a `useConverterCapabilities()` or `useSystemProviders()` hook. Show `<span aria-label="Voice input is offline (local model)">Offline</span>` next to `VoiceMicButton` when `whisperAvailable === true`.
**IMPORTANT:** The existing `GET /api/system/providers` already returns `{ whisperAvailable: boolean, piperAvailable: boolean, ... }` — no new endpoint needed. Create a `useSystemProviders()` hook that calls this endpoint once on mount.
### Pattern 6: ContentStudio Tab Extension + Standalone Convert Page
```typescript
// ui/src/pages/ContentStudio.tsx — extend TabsList
// Add three TabsTriggers: "Wallpapers", "Social", "Convert"
// "Convert" tab value triggers navigate() to /convert (standalone page)
// TabsContent for wallpapers and social are normal panel components
// TabsContent for convert is NOT a content panel — the tab click navigates away
// ui/src/App.tsx — add new routes in boardRoutes()
<Route path="content-studio" element={<ContentStudio />} />
<Route path="convert" element={<ConvertPage />} />
<Route path="convert/:sourceFormat" element={<ConvertPage />} />
<Route path="convert/:sourceFormat/:targetFormat" element={<ConvertPage />} />
```
### Anti-Patterns to Avoid
- **Magic number dimensions:** Never hardcode `2560` or `1440` in component code — always read from `PLATFORM_DIMENSIONS` constant exported from renderer or a shared types file.
- **Passing file buffer as base64 in SSE-triggered jobs with >10MB files:** The 10MB multer limit prevents oversized uploads; document this clearly in the convert route.
- **Blocking HTTP on render:** All conversion dispatched fire-and-forget via `contentJobRunner.dispatch()`. The POST /convert route returns 202 immediately.
- **Showing format pairs as "unavailable":** Per CONV-08, all format pairs are selectable in the UI. Unavailable direct converters show the AI fallback notice, never a disabled/grey chip.
- **Creating a separate `/api/convert/validate` endpoint:** Validate at job submit time in the convert route (simpler, fewer round trips). The UI spec notes this as an OR condition.
- **Satori for wallpaper generation:** Satori is NOT installed. Use the established pattern: LLM generates SVG → sharp rasterizes to exact dimensions. Satori would require JSX rendering infrastructure not needed here.
---
## Don't Hand-Roll
| Problem | Don't Build | Use Instead | Why |
|---------|-------------|-------------|-----|
| MIME type detection from file bytes | Custom magic-byte reader | `file-type@22.0.0` | Handles 500+ MIME types, handles edge cases like truncated files, streaming API |
| XLSX read/write | Custom binary parser | `xlsx@0.18.5` | XLSX format is complex binary (OOXML); hand-rolling is weeks of work |
| CSV parsing | String.split() | `csv-parse@6.2.1` | Handles quoted fields, escaped commas, multiline values, BOM |
| Image format conversion | Native buffer manipulation | `sharp@0.34.5` | Already installed; handles color spaces, ICC profiles, transparency |
| Audio/video conversion | Custom codec wrappers | `ffmpeg-static@7.0.2` | Already installed; handles all codec negotiation |
| SVG rasterization | canvas/Playwright | `@resvg/resvg-js@2.6.2` | Already installed; faster than Playwright for static SVG |
| LLM inference | New HTTP client | `puterChatComplete()` | Already implemented in Phase 41; puter-inference.ts is the project standard |
**Key insight:** All heavy-lifting tools are already installed. Phase 42 is primarily wiring (new renderers + routes + UI panels) rather than infrastructure.
---
## Common Pitfalls
### Pitfall 1: Sharp SVG Input at Large Dimensions
**What goes wrong:** `sharp(svgBuffer).resize(2560, 1440)` produces a blurry image when the SVG has a small implicit pixel density.
**Why it happens:** Sharp defaults to 72 DPI for SVG input; scaling up produces raster artifacts before the resize step.
**How to avoid:** Always pass `{ density: 300 }` option when loading SVG into sharp: `sharp(svgBuffer, { density: 300 }).resize(width, height, { fit: 'fill' }).png()`. Alternatively, ask the LLM to generate an SVG with `viewBox="0 0 {width} {height}"` matching the target dimensions, then use Resvg with `fitTo: { mode: 'width', value: width }`.
**Warning signs:** Generated wallpapers look pixelated or blurry at edges.
### Pitfall 2: file-type v22 Import Syntax
**What goes wrong:** `import FileType from 'file-type'` fails with "does not provide an export named 'default'".
**Why it happens:** file-type v22 is pure ESM with named exports only.
**How to avoid:** Use named import: `import { fileTypeFromBuffer } from 'file-type'`. Server is `type: module` with `module: NodeNext` — ESM imports work directly.
**Warning signs:** TypeScript error TS2613 or runtime "is not a function" errors.
### Pitfall 3: ffmpeg-static Path Resolution
**What goes wrong:** `spawn(ffmpegPath, ...)` throws ENOENT even though ffmpeg-static is installed.
**Why it happens:** `ffmpegPath` from `import ffmpegPath from 'ffmpeg-static'` is the binary path string, but it needs `as unknown as string` cast due to TS type mismatch. The actual binary is at `/opt/nexus/node_modules/.pnpm/ffmpeg-static@5.3.0/node_modules/ffmpeg-static/ffmpeg`.
**How to avoid:** Copy the existing pattern from `voice-pipeline.ts` exactly: `if (!ffmpegPath) throw new Error("ffmpeg-static binary not found"); const ffmpegBin = ffmpegPath as unknown as string;`.
**Warning signs:** `ffmpegBin` is null/undefined; ENOENT on spawn.
### Pitfall 4: Content-Job Input Size for Conversion
**What goes wrong:** Submitting a 10MB file as base64 in job input stores ~13.3MB of base64 in the `content_jobs.input` JSONB column per submission.
**Why it happens:** base64 adds ~33% overhead. For a 10MB file (MAX_ATTACHMENT_BYTES), this is ~13.3MB per job row.
**How to avoid:** This is acceptable for the single-user case (success criteria assume one conversion at a time). Document the max file size clearly in the UI (the multer limit enforces it). If this becomes a problem in future, change the renderer to accept storage object keys (requires extending content-job-runner signature).
**Warning signs:** Postgres table growth visible in db metrics after many conversions.
### Pitfall 5: Social Carousel JSON Parsing from LLM
**What goes wrong:** LLM returns markdown-fenced JSON or adds explanation text, causing `JSON.parse()` to throw.
**Why it happens:** LLMs sometimes wrap JSON in ````json ... ```` fences.
**How to avoid:** Post-process LLM output to strip markdown fences before JSON.parse(). Use a robust extraction pattern: `const match = raw.match(/```json\s*([\s\S]*?)\s*```/) || raw.match(/({[\s\S]*})/); JSON.parse(match ? match[1] : raw)`. Apply the same fix pattern used by icon-renderer.ts SVG validation.
**Warning signs:** SocialPostResult shows "Generation failed" after seemingly valid LLM output.
### Pitfall 6: Deep-Link Route Parameter Case
**What goes wrong:** `/convert/PNG/SVG` doesn't pre-select chips because the component does a case-sensitive compare against format names.
**Why it happens:** URL params are case-sensitive; format chips may be stored as uppercase.
**How to avoid:** Normalize URL params to lowercase on read: `params.sourceFormat?.toLowerCase()`. Match against chip identifiers using `formatId.toLowerCase() === param.toLowerCase()`.
**Warning signs:** Deep-link URL works in one case but not when user types different casing.
### Pitfall 7: Voice Offline Badge Always Showing
**What goes wrong:** The "Offline" badge shows even when whisper is not installed (whisperAvailable: false).
**Why it happens:** Misreading the UI spec: badge shows when `whisperAvailable === true` (local model detected), not when `WHISPER_MODEL=local` env var is set (which is confusing naming).
**How to avoid:** Read `whisperAvailable` from `GET /api/system/providers`. Show badge if `whisperAvailable === true`. The "offline capability" is proven by the binary being detected, not by an env var. The `WHISPER_MODEL` env var mentioned in the UI spec is a future extension point for model selection — do not implement it unless the spec is explicitly required. Per VOICE-03, "works offline with locally cached model" means the whisper-cpp binary + base model are present.
**Warning signs:** Badge shows on machines where whisper is not installed.
---
## Code Examples
### Wallpaper Renderer: Sharp at Target Dimensions
```typescript
// Source: icon-renderer.ts pattern + sharp resize extension
// server/src/services/renderers/wallpaper-renderer.ts
import sharp from "sharp";
import { puterChatComplete } from "../puter-inference.js";
import type { RenderResult } from "./types.js";
async function renderSvgToWallpaper(svgString: string, width: number, height: number): Promise<Buffer> {
return sharp(Buffer.from(svgString), { density: 300 })
.resize(width, height, { fit: "fill" })
.png({ compressionLevel: 9 })
.toBuffer();
}
```
### Magic-Byte MIME Validation
```typescript
// Source: file-type@22 documentation — ESM named import
import { fileTypeFromBuffer } from "file-type";
async function validateMime(buffer: Buffer, claimedExtension: string): Promise<{ ok: boolean; actualMime?: string; claimedMime?: string }> {
const detected = await fileTypeFromBuffer(buffer);
if (!detected) return { ok: true }; // unknown type, allow (SVG/text files have no magic bytes)
const mimeForExtension = extensionToMime(claimedExtension); // lookup table
if (mimeForExtension && detected.mime !== mimeForExtension) {
return { ok: false, actualMime: detected.mime, claimedMime: mimeForExtension };
}
return { ok: true };
}
```
### ffmpeg-static Conversion (audio/video)
```typescript
// Source: voice-pipeline.ts pattern (established in Phase 36)
import ffmpegPath from "ffmpeg-static";
import { spawn } from "node:child_process";
if (!ffmpegPath) throw new Error("ffmpeg-static binary not found");
const ffmpegBin = ffmpegPath as unknown as string;
function convertAVViaFfmpeg(inputBuffer: Buffer, sourceFormat: string, targetFormat: string): Promise<Buffer> {
return new Promise<Buffer>((resolve, reject) => {
const ffmpeg = spawn(ffmpegBin, [
"-f", sourceFormat,
"-i", "pipe:0",
"-f", targetFormat,
"pipe:1",
], { stdio: ["pipe", "pipe", "pipe"] });
const chunks: Buffer[] = [];
ffmpeg.stdout.on("data", (c: Buffer) => chunks.push(c));
ffmpeg.stderr.on("data", () => {}); // discard
ffmpeg.on("close", (code) => code === 0 ? resolve(Buffer.concat(chunks)) : reject(new Error(`ffmpeg exited ${code}`)));
ffmpeg.on("error", reject);
ffmpeg.stdin.write(inputBuffer);
ffmpeg.stdin.end();
});
}
```
### Data Format Conversion (CSV ↔ JSON ↔ XLSX)
```typescript
// Source: xlsx documentation + csv-parse documentation
import * as XLSX from "xlsx";
import { parse as csvParse } from "csv-parse/sync";
// CSV → JSON
function csvToJson(buffer: Buffer): Record<string, unknown>[] {
return csvParse(buffer, { columns: true, skip_empty_lines: true });
}
// JSON → XLSX
function jsonToXlsx(data: Record<string, unknown>[]): Buffer {
const ws = XLSX.utils.json_to_sheet(data);
const wb = XLSX.utils.book_new();
XLSX.utils.book_append_sheet(wb, ws, "Sheet1");
return Buffer.from(XLSX.write(wb, { type: "buffer", bookType: "xlsx" }));
}
```
### useContentJob Pattern (UI — already exists)
```typescript
// Source: ui/src/hooks/useContentJob.ts (Phase 41)
// Usage in WallpaperGeneratePanel:
const job = useContentJob(companyId);
// Submit
job.submit("wallpaper", { prompt, platformKey: "desktop-hd" });
// Render result when done
if (job.status === "done" && job.bundle) {
const bundle = job.bundle as WallpaperBundle;
// bundle.pngBase64, bundle.dimensions, bundle.platformKey
}
```
### Converter Capabilities in UI
```typescript
// ui/src/hooks/useSystemProviders.ts (new)
// Calls GET /api/system/providers once on mount, caches result
// Returns: { whisperAvailable, piperAvailable, ... }
// Used by ChatInput for offline badge, by ConvertTargetSelector for AI-fallback notice
// ui/src/hooks/useConverterCapabilities.ts (new)
// Calls GET /api/system/converters once on mount
// Returns: { imageConverter, audioVideoConverter, docConverter, dataConverter }
```
---
## State of the Art
| Old Approach | Current Approach | When Changed | Impact |
|--------------|------------------|--------------|--------|
| Manual MIME detection via extension | Magic-byte detection via file-type | file-type v19+ | Required for CONV-09 — extension can be spoofed |
| Pandoc/LibreOffice for doc conversion | AI-bridge fallback when not available | CONV-08 design | No installer required; works everywhere |
| Separate validate endpoint | Validate at submit time | UI spec v1 | Fewer round trips, simpler client code |
**Deprecated/outdated:**
- `satori` for wallpaper generation: Not installed and not needed. The Phase 41 pattern (LLM SVG + sharp rasterize) is sufficient and consistent with existing code.
- Separate `/api/convert/validate` endpoint: Consolidate validation into the convert submit route.
---
## Open Questions
1. **WallpaperBundle storage format**
- What we know: Other bundles (DiagramBundle, IconSetBundle) store base64-encoded assets in JSON
- What's unclear: For wallpapers at 2560×1440, the PNG can be 515MB — base64 encoding adds ~33% → 20MB JSON blob stored in content_jobs.output. MAX_GENERATED_ASSET_BYTES = 500MB so it fits, but row size may be large for Postgres.
- Recommendation: Store the PNG as an asset (same as diagram-renderer stores to storage), and return `WallpaperBundle` with `assetId` + `dimensions` + `platformKey`. The UI downloads via `/api/assets/:id/content`. This avoids storing large base64 in the DB. Follow the same pattern if app-icon returns multiple sizes: store each size as a separate asset or as a multi-size ZIP.
2. **Convert job input size for large files**
- What we know: base64(10MB file) = ~13.3MB JSON in content_jobs.input column
- What's unclear: Whether Postgres/Drizzle has JSONB size limits that would reject this
- Recommendation: Postgres JSONB has no practical size limit beyond the max row size (1GB). 13.3MB is fine. Document the 10MB upload cap in the UI.
3. **Social post carousel slide format**
- What we know: SOCIAL-02 says "Instagram carousels and thread sequences"
- What's unclear: Whether thread sequences means Twitter threads (numbered tweets) or just a generic multi-part structure
- Recommendation: Implement as a unified `slides: string[]` field in SocialPostBundle. The `collapsible` sections in SocialPostResult handle both Twitter threads and Instagram carousel displays.
---
## Environment Availability
| Dependency | Required By | Available | Version | Fallback |
|------------|------------|-----------|---------|----------|
| sharp | CONV-01, WALL-01-04 | ✓ | 0.34.5 | — |
| @resvg/resvg-js | WALL-01-04 | ✓ | 2.6.2 | — |
| ffmpeg-static | CONV-02 | ✓ | 7.0.2 (binary) | — |
| file-type | CONV-09 | ✗ | — | Install: `pnpm add file-type@22.0.0` |
| xlsx | CONV-04 | ✗ | — | Install: `pnpm add xlsx@0.18.5` |
| csv-parse | CONV-04 | ✗ | — | Install: `pnpm add csv-parse@6.2.1` |
| pandoc | CONV-03 | ✗ | — | AI-bridge (CONV-08 design) |
| libreoffice | CONV-03 | ✗ | — | AI-bridge (CONV-08 design) |
| whisper-cpp | VOICE-03 | ✗ | — | openai-whisper CLI fallback; error message if neither |
| whisper (openai) | VOICE-03 | ✗ | — | whisper-cpp fallback |
| satori | Phase goal wording | ✗ | — | Not needed — use LLM SVG + sharp pattern |
**Missing dependencies with no fallback:**
- `file-type`, `xlsx`, `csv-parse` — these MUST be installed in Wave 0. Phase cannot complete CONV-01/CONV-04/CONV-09 without them.
**Missing dependencies with fallback:**
- `pandoc`, `libreoffice` — document conversion falls through to AI-bridge per CONV-08 design. Planner should add a startup probe that logs "pandoc not found, doc conversion will use AI bridge" rather than failing.
- `whisper-cpp`, `whisper` — existing voice pipeline already handles both missing gracefully with an informative error. VOICE-03 "offline" badge is shown based on `whisperAvailable` from hardware detection.
---
## Validation Architecture
### Test Framework
| Property | Value |
|----------|-------|
| Framework | vitest 3.0.5 |
| Server config | server/vitest.config.ts (environment: node) |
| UI config | ui/vitest.config.ts (environment: node, react plugin) |
| Quick run (server) | `cd /opt/nexus/server && npx vitest run src/__tests__/42-*.test.ts` |
| Full suite (server) | `cd /opt/nexus/server && npx vitest run` |
| Quick run (UI) | `cd /opt/nexus/ui && npx vitest run src/**/*.test.{ts,tsx}` |
**Note:** Server baseline has 4 pre-existing failing test files (hardware-detection, skill-registry-routes, agent-permissions, heartbeat-workspace-session) — these are NOT caused by Phase 42. Phase 42 tests must not add to this count.
### Phase Requirements → Test Map
| Req ID | Behavior | Test Type | Automated Command | File Exists? |
|--------|----------|-----------|-------------------|-------------|
| WALL-01/02/03 | `renderWallpaper()` returns PNG buffer at correct dimensions per platform key | unit | `npx vitest run src/__tests__/42-wallpaper-renderer.test.ts` | ❌ Wave 0 |
| WALL-04 | App icon renderer returns multi-size array | unit | `npx vitest run src/__tests__/42-wallpaper-renderer.test.ts` | ❌ Wave 0 |
| SOCIAL-01/02/03 | `renderSocialPost()` returns post text + hashtags; carousel returns slides array | unit | `npx vitest run src/__tests__/42-social-renderer.test.ts` | ❌ Wave 0 |
| CONV-01 | Image conversion round-trip (PNG→JPG) via sharp | unit | `npx vitest run src/__tests__/42-convert-renderer.test.ts` | ❌ Wave 0 |
| CONV-02 | Audio conversion dispatch calls ffmpeg-static binary | unit (mocked) | `npx vitest run src/__tests__/42-convert-renderer.test.ts` | ❌ Wave 0 |
| CONV-04 | CSV→JSON and JSON→XLSX conversions | unit | `npx vitest run src/__tests__/42-convert-renderer.test.ts` | ❌ Wave 0 |
| CONV-05 | Unknown pair falls through to AI bridge | unit | `npx vitest run src/__tests__/42-convert-renderer.test.ts` | ❌ Wave 0 |
| CONV-08 | converterCapabilitiesService probes pandoc/libreoffice at startup | unit (mocked execFile) | `npx vitest run src/__tests__/42-converter-capabilities.test.ts` | ❌ Wave 0 |
| CONV-09 | MIME mismatch rejected with 422 at convert route | unit (supertest) | `npx vitest run src/__tests__/42-convert-routes.test.ts` | ❌ Wave 0 |
| VOICE-01/02 | VoiceMicButton renders in ChatInput when enableVoiceInput=true | manual (pre-existing wiring) | n/a — already wired in ChatPanel.tsx | ✅ Existing |
| VOICE-03 | Offline badge shows when whisperAvailable=true from /api/system/providers | unit (mocked hook) | `npx vitest run src/**/*.test.tsx` (UI test) | ❌ Wave 0 |
### Sampling Rate
- **Per task commit:** `cd /opt/nexus/server && npx vitest run src/__tests__/42-*.test.ts`
- **Per wave merge:** `cd /opt/nexus/server && npx vitest run` (full server suite)
- **Phase gate:** Full server + UI suites green before `/gsd:verify-work`
### Wave 0 Gaps
- [ ] `server/src/__tests__/42-wallpaper-renderer.test.ts` — covers WALL-01 through WALL-04
- [ ] `server/src/__tests__/42-social-renderer.test.ts` — covers SOCIAL-01 through SOCIAL-03
- [ ] `server/src/__tests__/42-convert-renderer.test.ts` — covers CONV-01 through CONV-05
- [ ] `server/src/__tests__/42-converter-capabilities.test.ts` — covers CONV-08
- [ ] `server/src/__tests__/42-convert-routes.test.ts` — covers CONV-09 (MIME validation at HTTP layer)
- [ ] UI test for offline badge rendering (VOICE-03)
- [ ] Package install: `pnpm add file-type@22.0.0 xlsx@0.18.5 csv-parse@6.2.1 && pnpm add -D @types/xlsx@0.0.36`
---
## Sources
### Primary (HIGH confidence)
- Codebase direct read: `server/src/services/renderers/icon-renderer.ts` — renderer pattern, sharp usage
- Codebase direct read: `server/src/services/renderers/diagram-renderer.ts` — Playwright + Resvg pattern
- Codebase direct read: `server/src/services/content-job-runner.ts` — job dispatch architecture
- Codebase direct read: `server/src/services/voice-pipeline.ts` — Whisper probe and transcription pattern
- Codebase direct read: `server/src/routes/voice.ts` — multer upload pattern for binary input
- Codebase direct read: `ui/src/hooks/useContentJob.ts` — SSE hook established in Phase 41
- Codebase direct read: `ui/src/components/ChatInput.tsx` — existing VoiceMicButton wiring
- Codebase direct read: `ui/src/hooks/useVoiceMode.ts` — existing voice mode settings pattern
- Codebase direct read: `server/src/services/hardware.ts` — whisperAvailable detection, probe pattern
- Codebase direct read: `server/src/routes/hardware.ts` — GET /api/system/providers returns whisperAvailable
- Codebase direct read: `server/src/app.ts` — route mounting pattern
- Codebase direct read: `server/package.json` — installed deps list
- `npm view file-type version` → 22.0.0 (verified 2026-04-04)
- `npm view xlsx version` → 0.18.5 (verified 2026-04-04)
- `npm view csv-parse version` → 6.2.1 (verified 2026-04-04)
- Binary probe: `/opt/nexus/node_modules/.pnpm/ffmpeg-static@5.3.0/.../ffmpeg -version` → 7.0.2 (verified working)
### Secondary (MEDIUM confidence)
- `.planning/STATE.md` — accumulated decisions: CONV-05, CONV-08, CONV-09 architectural choices locked
- Phase 41-01-SUMMARY.md — renderer pattern, useContentJob hook, tech stack context
- Phase 40-01-SUMMARY.md — content_jobs schema, RenderResult interface, MAX_GENERATED_ASSET_BYTES
### Tertiary (LOW confidence)
- None — all critical claims verified by codebase inspection or npm registry.
---
## Metadata
**Confidence breakdown:**
- Standard stack: HIGH — all packages verified via codebase inspection + npm registry
- Architecture: HIGH — pattern directly derived from Phase 41 implementations in codebase
- Pitfalls: HIGH — most derived from actual code review (ffmpeg-static cast, file-type ESM, etc.)
- Environment availability: HIGH — verified via command execution on target system
**Research date:** 2026-04-04
**Valid until:** 2026-05-04 (packages stable; architecture unlikely to change)