# Phase 42: Wallpapers, Social, Format Conversion & Voice — Research
**Researched:** 2026-04-04
**Domain:** Image generation (sharp/SVG), format conversion (sharp/ffmpeg-static/AI-bridge), social text generation (LLM), voice transcription (Whisper)
**Confidence:** HIGH
---
## User Constraints (from CONTEXT.md)
### Locked Decisions
All implementation choices are at Claude's discretion — discuss phase was skipped per user setting.
### Claude's Discretion
All implementation choices are at Claude's discretion. Use ROADMAP phase goal, success criteria, and codebase conventions to guide decisions.
### Deferred Ideas (OUT OF SCOPE)
None — discuss phase skipped.
---
## Phase Requirements
| ID | Description | Research Support |
|----|-------------|------------------|
| WALL-01 | User can generate desktop and mobile wallpapers from a description | SVG-via-LLM + sharp rasterize at target dimensions; PLATFORM_DIMENSIONS constants in renderer |
| WALL-02 | User can generate social media banners with correct dimensions per platform | Same renderer; platform map covers OG Image, Twitter Card, Instagram, LinkedIn |
| WALL-03 | User can generate Open Graph and social preview images | Same renderer; OG Image = 1200×630 constant |
| WALL-04 | User can generate app icons and favicons in multiple sizes | Renderer returns multi-size bundle (1024, 512, 256, 64, 32); WallpaperPreview renders grid |
| SOCIAL-01 | User can generate platform-ready posts respecting character limits (Twitter, LinkedIn) | LLM prompt with platform limit injected; character count UI enforced per-platform constants |
| SOCIAL-02 | User can generate Instagram carousels and thread sequences | LLM returns JSON with slides array; carousel rendered as numbered collapsible sections |
| SOCIAL-03 | System suggests relevant hashtags for generated content | LLM prompt requests hashtag suggestions as JSON array alongside post text |
| CONV-01 | User can convert between image formats (PNG, JPG, SVG, WebP, GIF) via sharp | sharp 0.34.5 already installed; supports all listed formats |
| CONV-02 | User can convert between audio/video formats via ffmpeg | ffmpeg-static 7.0.2 already installed and verified working |
| CONV-03 | User can convert between document formats via Pandoc/LibreOffice | pandoc/libreoffice NOT installed → falls to AI-bridge per CONV-08 |
| CONV-04 | User can convert between data formats (CSV, JSON, XLSX) | xlsx + csv-parse packages needed; pure-Node.js conversion |
| CONV-05 | User can convert between any format pair via AI-bridged conversion | puterChatComplete already established; handles unsupported pairs |
| CONV-06 | System provides conversion UI with source/target format selection and drag-drop | Standalone /convert page; ConvertPanel as described in UI spec |
| CONV-07 | User can deep-link to specific conversion flows via URL | /convert/:sourceFormat?/:targetFormat? route in App.tsx; pre-select chips on mount |
| CONV-08 | System detects available direct converters at startup | Startup probe service; GET /api/system/converters endpoint |
| CONV-09 | System validates uploaded file MIME type via magic-byte detection | file-type@22.0.0 (ESM, ships own types); validate at convert route before job dispatch |
| VOICE-01 | User can click mic button in web chat to record and auto-transcribe via Whisper | VoiceMicButton already in ChatInput when enableVoiceInput=true; already wired |
| VOICE-02 | User can toggle between text-only, voice-input, and full-voice modes | VoiceModeToggle already exists; already wired in ChatInput; Phase 42 verifies correctness |
| VOICE-03 | Voice input works offline with local Whisper model | voice-pipeline.ts already probes whisper-cpp → openai-whisper; WHISPER_MODEL env var + offline badge |
---
## Summary
Phase 42 extends the Phase 41 content generation system with four new capabilities: platform-aware image generation (wallpapers, OG images, social banners, app icons), LLM-driven social post generation with hashtag suggestions, a full-featured file format conversion pipeline, and offline voice input via Whisper.
The server already has all critical dependencies for images (sharp@0.34.5, @resvg/resvg-js@2.6.2) and audio/video (ffmpeg-static@7.0.2 — verified working at /opt/nexus/node_modules/.pnpm/ffmpeg-static@5.3.0). Three packages need to be added: `file-type@22.0.0` (magic-byte MIME detection), `xlsx@0.18.5` (XLSX data conversion), and `csv-parse@6.2.1` (CSV parsing). Document conversion (pandoc/libreoffice) is not available on this system and will fall through to AI-bridge per CONV-08 — no installation needed.
The voice pipeline (`voice-pipeline.ts`) already handles Whisper probe and transcription. Phase 42's voice work is: (1) add `WHISPER_MODEL=local` env var support to signal offline capability, (2) expose whisper availability to the UI via the existing `/api/system/providers` endpoint (already returns `whisperAvailable`), (3) render the "Offline" badge in `ChatInput` alongside `VoiceMicButton`. The VoiceMicButton, VoiceModeToggle, and `enableVoiceInput=true` wiring already exist in `ChatPanel.tsx`.
**Primary recommendation:** Follow the established Phase 41 renderer pattern: add four new `jobType` cases to `content-job-runner.ts` (`wallpaper`, `social-post`, `convert`), create one renderer file per job type in `server/src/services/renderers/`, and wire three new ContentStudio tabs + one standalone `/convert` page in the UI.
---
## Standard Stack
### Core (all verified installed in server)
| Library | Version | Purpose | Why Standard |
|---------|---------|---------|--------------|
| sharp | 0.34.5 | Image format conversion + SVG rasterization at target dimensions | Already installed; used by icon-renderer and org-chart-svg |
| @resvg/resvg-js | 2.6.2 | High-fidelity SVG→PNG rasterization with fitTo dimensions | Already installed; used by diagram-renderer |
| ffmpeg-static | 5.3.0 (bin: 7.0.2) | Bundled ffmpeg binary for audio/video conversion | Already installed; used by voice-pipeline and telegram |
| culori | 4.0.2 | OKLCH color math (not directly needed but available) | Already installed |
| puterChatComplete | (internal) | LLM inference for wallpaper SVG generation, social posts, AI-bridge conversion | Established pattern in Phase 41 renderers |
### New Dependencies (needs `pnpm add` in server)
| Library | Version | Purpose | Why Standard |
|---------|---------|---------|--------------|
| file-type | 22.0.0 | Magic-byte MIME type detection for CONV-09 | ESM-native, ships own types, well-maintained |
| xlsx | 0.18.5 | XLSX read/write for data conversion CONV-04 | Most-used Excel library for Node.js |
| csv-parse | 6.2.1 | CSV parsing for data conversion CONV-04 | De-facto standard, streaming API |
| @types/xlsx | 0.0.36 | TypeScript types for xlsx | xlsx ships types/index.d.ts but @types available |
### Alternatives Considered
| Instead of | Could Use | Tradeoff |
|------------|-----------|----------|
| file-type@22 (ESM) | mmmagic or mime-magic | file-type is pure JS, no native binding, ships own types; server is type:module so ESM is fine |
| xlsx | exceljs | xlsx is simpler API for read/write; exceljs has streaming but more complex |
| sharp for SVG rasterization | Playwright (like diagram-renderer) | sharp+resvg is faster for simple SVG → PNG; Playwright only needed for JavaScript-rendered content |
**Installation:**
```bash
# Run from /opt/nexus/server
pnpm add file-type@22.0.0 xlsx@0.18.5 csv-parse@6.2.1
pnpm add -D @types/xlsx@0.0.36
```
**Version verification (run before installing):**
```bash
npm view file-type version # → 22.0.0
npm view xlsx version # → 0.18.5
npm view csv-parse version # → 6.2.1
```
---
## Architecture Patterns
### Established Renderer Pattern (from Phase 41)
Every new capability follows this exact structure:
1. **Renderer file:** `server/src/services/renderers/{name}-renderer.ts` exports `async function render{Name}(input: Record): Promise`
2. **Job runner switch:** Add `case '{jobtype}':` to `renderContent()` in `content-job-runner.ts`
3. **Bundle type (if needed):** Add `interface {Name}Bundle` to `types.ts`
4. **API route:** Submit via existing `POST /api/companies/:id/content-jobs` with `{ jobType, input }`
5. **UI hook:** `useContentJob(companyId)` already handles all SSE + state management
6. **UI component:** Panel reads `job.bundle` after `status === 'done'`
The format conversion job is the only exception — it requires a separate multipart upload route because the file binary cannot be passed as JSON input via the standard content-jobs endpoint.
### Recommended Project Structure (new files)
```
server/src/
├── services/renderers/
│ ├── types.ts # ADD: WallpaperBundle, SocialPostBundle, ConvertBundle
│ ├── wallpaper-renderer.ts # NEW
│ ├── social-renderer.ts # NEW
│ └── convert-renderer.ts # NEW
├── services/
│ └── converter-capabilities.ts # NEW: startup probe + cache
└── routes/
└── convert.ts # NEW: POST /api/companies/:id/convert (multipart)
# GET /api/system/converters
ui/src/
├── pages/
│ └── ConvertPage.tsx # NEW: standalone /convert page
├── components/
│ ├── WallpaperGeneratePanel.tsx # NEW
│ ├── WallpaperPreview.tsx # NEW
│ ├── SocialPostPanel.tsx # NEW
│ ├── SocialPostResult.tsx # NEW
│ └── ConvertPanel.tsx # NEW (contains ConvertSourceZone + ConvertTargetSelector + ConvertActionBar)
└── api/
└── convert.ts # NEW: submitConvertJob (multipart), getConverterCapabilities
```
### Pattern 1: Wallpaper Generation (WALL-01 to WALL-04)
**What:** LLM generates an SVG at a conceptual level, then sharp rasterizes it to exact pixel dimensions for the requested platform.
**When to use:** Any fixed-dimension image asset (wallpaper, OG image, social banner, app icon).
```typescript
// Source: established pattern from icon-renderer.ts + sharp resize
// server/src/services/renderers/wallpaper-renderer.ts
export const PLATFORM_DIMENSIONS: Record = {
"desktop-hd": { width: 2560, height: 1440, label: "Desktop HD (2560 × 1440)" },
"desktop-fhd": { width: 1920, height: 1080, label: "Desktop FHD (1920 × 1080)" },
"desktop-4k": { width: 3840, height: 2160, label: "Desktop 4K (3840 × 2160)" },
"mobile-portrait": { width: 1080, height: 1920, label: "Mobile Portrait (1080 × 1920)" },
"mobile-landscape": { width: 1920, height: 1080, label: "Mobile Landscape (1920 × 1080)" },
"og-image": { width: 1200, height: 630, label: "OG Image (1200 × 630)" },
"twitter-card": { width: 1200, height: 628, label: "Twitter Card (1200 × 628)" },
"instagram-post": { width: 1080, height: 1080, label: "Instagram Post (1080 × 1080)" },
"instagram-banner": { width: 1080, height: 566, label: "Instagram Banner (1080 × 566)" },
"linkedin-banner": { width: 1584, height: 396, label: "LinkedIn Banner (1584 × 396)" },
"app-icon": { width: 1024, height: 1024, label: "App Icon (1024 × 1024)" },
"favicon": { width: 32, height: 32, label: "Favicon (32 × 32)" },
};
// App icon + favicon: render multiple sizes from one SVG
const APP_ICON_SIZES = [1024, 512, 256, 64, 32] as const;
// Render flow:
// 1. puterChatComplete → SVG string (LLM generates SVG matching aspect ratio)
// 2. sharp(svgBuffer).resize(width, height, { fit: 'fill' }).png() → PNG buffer
// 3. Return WallpaperBundle with pngBase64 + dimensions
```
**Critical constraint:** Platform dimensions MUST be constants, never magic numbers (success criterion 1). Export `PLATFORM_DIMENSIONS` from the renderer and re-export to the UI API client so the UI's Select options derive from the same source.
### Pattern 2: Format Conversion Architecture (CONV-01 to CONV-09)
**What:** Multipart upload endpoint validates MIME, stores base64 in job input, dispatch to converter renderer which routes to sharp/ffmpeg/xlsx/AI-bridge based on format pair.
**Why separate route:** Content-jobs POST accepts JSON; file binary needs multipart handling.
```typescript
// server/src/routes/convert.ts — new multipart route
// POST /api/companies/:companyId/convert
import multer from "multer";
import { fileTypeFromBuffer } from "file-type";
router.post("/companies/:companyId/convert", async (req, res) => {
// 1. multer.memoryStorage() upload (limit: MAX_ATTACHMENT_BYTES)
// 2. fileTypeFromBuffer(file.buffer) → detected MIME
// 3. Compare detected MIME against file extension claim
// 4. If mismatch: res.status(422).json({ error: "...", actualMime, claimedMime })
// 5. job input: { fileBase64: buffer.toString('base64'), sourceMime, targetFormat, originalFilename }
// 6. contentJobStore.create + contentJobRunner.dispatch
// 7. res.status(202).json({ jobId, status })
});
// GET /api/system/converters — capability map for UI
router.get("/system/converters", async (_req, res) => {
const caps = await converterCapabilitiesService().get();
res.json(caps);
// Returns: { imageConverter: true, audioVideoConverter: true, docConverter: false, dataConverter: true }
});
```
```typescript
// server/src/services/renderers/convert-renderer.ts
async function renderConvert(input: Record): Promise {
const { fileBase64, sourceMime, targetFormat } = input;
const fileBuffer = Buffer.from(fileBase64 as string, "base64");
// Route by format category:
if (isImageFormat(sourceMime) && isImageFormat(targetFormat)) {
return convertImageViaSharp(fileBuffer, sourceMime, targetFormat);
}
if (isAudioVideoFormat(sourceMime) || isAudioVideoFormat(targetFormat)) {
return convertAVViaFfmpeg(fileBuffer, sourceMime, targetFormat);
}
if (isDataFormat(sourceMime) || isDataFormat(targetFormat)) {
return convertDataFormat(fileBuffer, sourceMime, targetFormat);
}
// All other pairs: AI bridge (CONV-05)
return convertViaAiBridge(fileBuffer, sourceMime, targetFormat);
}
```
### Pattern 3: Converter Capability Probe (CONV-08)
```typescript
// server/src/services/converter-capabilities.ts
// Probe at startup, cache result (same pattern as hardwareService)
let cache: ConverterCapabilities | null = null;
export interface ConverterCapabilities {
imageConverter: boolean; // sharp — always true (npm dep)
audioVideoConverter: boolean; // ffmpeg-static — always true (npm dep)
docConverter: boolean; // pandoc or libreoffice — probe at startup
dataConverter: boolean; // xlsx + csv-parse — always true (npm dep)
}
export function converterCapabilitiesService() {
async function get(): Promise {
if (cache) return cache;
let docConverter = false;
try {
await execFileAsync("pandoc", ["--version"], { timeout: 2000 });
docConverter = true;
} catch {
try {
await execFileAsync("libreoffice", ["--version"], { timeout: 2000 });
docConverter = true;
} catch { /* not available */ }
}
cache = { imageConverter: true, audioVideoConverter: true, docConverter, dataConverter: true };
return cache;
}
return { get };
}
```
### Pattern 4: Social Post Generation (SOCIAL-01 to SOCIAL-03)
```typescript
// server/src/services/renderers/social-renderer.ts
export const PLATFORM_CHAR_LIMITS: Record = {
"twitter-x": 280,
"linkedin": 3000,
"instagram-caption": 2200,
"instagram-carousel": 300, // per slide
};
// LLM prompt asks for JSON: { post: string, hashtags: string[], slides?: string[] }
// For carousel: slides array, each under 300 chars
// puterChatComplete returns JSON; renderer parses + validates
```
### Pattern 5: Voice Offline Badge (VOICE-03)
The voice pipeline already handles Whisper detection. Phase 42 adds two things:
1. **Server:** `WHISPER_MODEL` env var read in `voice-pipeline.ts` — when set to `"local"`, include `"local"` in nexus-settings response or expose via `GET /api/system/providers` (already returns `whisperAvailable` from `hardwareService().detect()`).
2. **UI:** In `ChatInput.tsx`, read `whisperAvailable` from a `useConverterCapabilities()` or `useSystemProviders()` hook. Show `Offline` next to `VoiceMicButton` when `whisperAvailable === true`.
**IMPORTANT:** The existing `GET /api/system/providers` already returns `{ whisperAvailable: boolean, piperAvailable: boolean, ... }` — no new endpoint needed. Create a `useSystemProviders()` hook that calls this endpoint once on mount.
### Pattern 6: ContentStudio Tab Extension + Standalone Convert Page
```typescript
// ui/src/pages/ContentStudio.tsx — extend TabsList
// Add three TabsTriggers: "Wallpapers", "Social", "Convert"
// "Convert" tab value triggers navigate() to /convert (standalone page)
// TabsContent for wallpapers and social are normal panel components
// TabsContent for convert is NOT a content panel — the tab click navigates away
// ui/src/App.tsx — add new routes in boardRoutes()
} />
} />
} />
} />
```
### Anti-Patterns to Avoid
- **Magic number dimensions:** Never hardcode `2560` or `1440` in component code — always read from `PLATFORM_DIMENSIONS` constant exported from renderer or a shared types file.
- **Passing file buffer as base64 in SSE-triggered jobs with >10MB files:** The 10MB multer limit prevents oversized uploads; document this clearly in the convert route.
- **Blocking HTTP on render:** All conversion dispatched fire-and-forget via `contentJobRunner.dispatch()`. The POST /convert route returns 202 immediately.
- **Showing format pairs as "unavailable":** Per CONV-08, all format pairs are selectable in the UI. Unavailable direct converters show the AI fallback notice, never a disabled/grey chip.
- **Creating a separate `/api/convert/validate` endpoint:** Validate at job submit time in the convert route (simpler, fewer round trips). The UI spec notes this as an OR condition.
- **Satori for wallpaper generation:** Satori is NOT installed. Use the established pattern: LLM generates SVG → sharp rasterizes to exact dimensions. Satori would require JSX rendering infrastructure not needed here.
---
## Don't Hand-Roll
| Problem | Don't Build | Use Instead | Why |
|---------|-------------|-------------|-----|
| MIME type detection from file bytes | Custom magic-byte reader | `file-type@22.0.0` | Handles 500+ MIME types, handles edge cases like truncated files, streaming API |
| XLSX read/write | Custom binary parser | `xlsx@0.18.5` | XLSX format is complex binary (OOXML); hand-rolling is weeks of work |
| CSV parsing | String.split() | `csv-parse@6.2.1` | Handles quoted fields, escaped commas, multiline values, BOM |
| Image format conversion | Native buffer manipulation | `sharp@0.34.5` | Already installed; handles color spaces, ICC profiles, transparency |
| Audio/video conversion | Custom codec wrappers | `ffmpeg-static@7.0.2` | Already installed; handles all codec negotiation |
| SVG rasterization | canvas/Playwright | `@resvg/resvg-js@2.6.2` | Already installed; faster than Playwright for static SVG |
| LLM inference | New HTTP client | `puterChatComplete()` | Already implemented in Phase 41; puter-inference.ts is the project standard |
**Key insight:** All heavy-lifting tools are already installed. Phase 42 is primarily wiring (new renderers + routes + UI panels) rather than infrastructure.
---
## Common Pitfalls
### Pitfall 1: Sharp SVG Input at Large Dimensions
**What goes wrong:** `sharp(svgBuffer).resize(2560, 1440)` produces a blurry image when the SVG has a small implicit pixel density.
**Why it happens:** Sharp defaults to 72 DPI for SVG input; scaling up produces raster artifacts before the resize step.
**How to avoid:** Always pass `{ density: 300 }` option when loading SVG into sharp: `sharp(svgBuffer, { density: 300 }).resize(width, height, { fit: 'fill' }).png()`. Alternatively, ask the LLM to generate an SVG with `viewBox="0 0 {width} {height}"` matching the target dimensions, then use Resvg with `fitTo: { mode: 'width', value: width }`.
**Warning signs:** Generated wallpapers look pixelated or blurry at edges.
### Pitfall 2: file-type v22 Import Syntax
**What goes wrong:** `import FileType from 'file-type'` fails with "does not provide an export named 'default'".
**Why it happens:** file-type v22 is pure ESM with named exports only.
**How to avoid:** Use named import: `import { fileTypeFromBuffer } from 'file-type'`. Server is `type: module` with `module: NodeNext` — ESM imports work directly.
**Warning signs:** TypeScript error TS2613 or runtime "is not a function" errors.
### Pitfall 3: ffmpeg-static Path Resolution
**What goes wrong:** `spawn(ffmpegPath, ...)` throws ENOENT even though ffmpeg-static is installed.
**Why it happens:** `ffmpegPath` from `import ffmpegPath from 'ffmpeg-static'` is the binary path string, but it needs `as unknown as string` cast due to TS type mismatch. The actual binary is at `/opt/nexus/node_modules/.pnpm/ffmpeg-static@5.3.0/node_modules/ffmpeg-static/ffmpeg`.
**How to avoid:** Copy the existing pattern from `voice-pipeline.ts` exactly: `if (!ffmpegPath) throw new Error("ffmpeg-static binary not found"); const ffmpegBin = ffmpegPath as unknown as string;`.
**Warning signs:** `ffmpegBin` is null/undefined; ENOENT on spawn.
### Pitfall 4: Content-Job Input Size for Conversion
**What goes wrong:** Submitting a 10MB file as base64 in job input stores ~13.3MB of base64 in the `content_jobs.input` JSONB column per submission.
**Why it happens:** base64 adds ~33% overhead. For a 10MB file (MAX_ATTACHMENT_BYTES), this is ~13.3MB per job row.
**How to avoid:** This is acceptable for the single-user case (success criteria assume one conversion at a time). Document the max file size clearly in the UI (the multer limit enforces it). If this becomes a problem in future, change the renderer to accept storage object keys (requires extending content-job-runner signature).
**Warning signs:** Postgres table growth visible in db metrics after many conversions.
### Pitfall 5: Social Carousel JSON Parsing from LLM
**What goes wrong:** LLM returns markdown-fenced JSON or adds explanation text, causing `JSON.parse()` to throw.
**Why it happens:** LLMs sometimes wrap JSON in ````json ... ```` fences.
**How to avoid:** Post-process LLM output to strip markdown fences before JSON.parse(). Use a robust extraction pattern: `const match = raw.match(/```json\s*([\s\S]*?)\s*```/) || raw.match(/({[\s\S]*})/); JSON.parse(match ? match[1] : raw)`. Apply the same fix pattern used by icon-renderer.ts SVG validation.
**Warning signs:** SocialPostResult shows "Generation failed" after seemingly valid LLM output.
### Pitfall 6: Deep-Link Route Parameter Case
**What goes wrong:** `/convert/PNG/SVG` doesn't pre-select chips because the component does a case-sensitive compare against format names.
**Why it happens:** URL params are case-sensitive; format chips may be stored as uppercase.
**How to avoid:** Normalize URL params to lowercase on read: `params.sourceFormat?.toLowerCase()`. Match against chip identifiers using `formatId.toLowerCase() === param.toLowerCase()`.
**Warning signs:** Deep-link URL works in one case but not when user types different casing.
### Pitfall 7: Voice Offline Badge Always Showing
**What goes wrong:** The "Offline" badge shows even when whisper is not installed (whisperAvailable: false).
**Why it happens:** Misreading the UI spec: badge shows when `whisperAvailable === true` (local model detected), not when `WHISPER_MODEL=local` env var is set (which is confusing naming).
**How to avoid:** Read `whisperAvailable` from `GET /api/system/providers`. Show badge if `whisperAvailable === true`. The "offline capability" is proven by the binary being detected, not by an env var. The `WHISPER_MODEL` env var mentioned in the UI spec is a future extension point for model selection — do not implement it unless the spec is explicitly required. Per VOICE-03, "works offline with locally cached model" means the whisper-cpp binary + base model are present.
**Warning signs:** Badge shows on machines where whisper is not installed.
---
## Code Examples
### Wallpaper Renderer: Sharp at Target Dimensions
```typescript
// Source: icon-renderer.ts pattern + sharp resize extension
// server/src/services/renderers/wallpaper-renderer.ts
import sharp from "sharp";
import { puterChatComplete } from "../puter-inference.js";
import type { RenderResult } from "./types.js";
async function renderSvgToWallpaper(svgString: string, width: number, height: number): Promise {
return sharp(Buffer.from(svgString), { density: 300 })
.resize(width, height, { fit: "fill" })
.png({ compressionLevel: 9 })
.toBuffer();
}
```
### Magic-Byte MIME Validation
```typescript
// Source: file-type@22 documentation — ESM named import
import { fileTypeFromBuffer } from "file-type";
async function validateMime(buffer: Buffer, claimedExtension: string): Promise<{ ok: boolean; actualMime?: string; claimedMime?: string }> {
const detected = await fileTypeFromBuffer(buffer);
if (!detected) return { ok: true }; // unknown type, allow (SVG/text files have no magic bytes)
const mimeForExtension = extensionToMime(claimedExtension); // lookup table
if (mimeForExtension && detected.mime !== mimeForExtension) {
return { ok: false, actualMime: detected.mime, claimedMime: mimeForExtension };
}
return { ok: true };
}
```
### ffmpeg-static Conversion (audio/video)
```typescript
// Source: voice-pipeline.ts pattern (established in Phase 36)
import ffmpegPath from "ffmpeg-static";
import { spawn } from "node:child_process";
if (!ffmpegPath) throw new Error("ffmpeg-static binary not found");
const ffmpegBin = ffmpegPath as unknown as string;
function convertAVViaFfmpeg(inputBuffer: Buffer, sourceFormat: string, targetFormat: string): Promise {
return new Promise((resolve, reject) => {
const ffmpeg = spawn(ffmpegBin, [
"-f", sourceFormat,
"-i", "pipe:0",
"-f", targetFormat,
"pipe:1",
], { stdio: ["pipe", "pipe", "pipe"] });
const chunks: Buffer[] = [];
ffmpeg.stdout.on("data", (c: Buffer) => chunks.push(c));
ffmpeg.stderr.on("data", () => {}); // discard
ffmpeg.on("close", (code) => code === 0 ? resolve(Buffer.concat(chunks)) : reject(new Error(`ffmpeg exited ${code}`)));
ffmpeg.on("error", reject);
ffmpeg.stdin.write(inputBuffer);
ffmpeg.stdin.end();
});
}
```
### Data Format Conversion (CSV ↔ JSON ↔ XLSX)
```typescript
// Source: xlsx documentation + csv-parse documentation
import * as XLSX from "xlsx";
import { parse as csvParse } from "csv-parse/sync";
// CSV → JSON
function csvToJson(buffer: Buffer): Record[] {
return csvParse(buffer, { columns: true, skip_empty_lines: true });
}
// JSON → XLSX
function jsonToXlsx(data: Record[]): Buffer {
const ws = XLSX.utils.json_to_sheet(data);
const wb = XLSX.utils.book_new();
XLSX.utils.book_append_sheet(wb, ws, "Sheet1");
return Buffer.from(XLSX.write(wb, { type: "buffer", bookType: "xlsx" }));
}
```
### useContentJob Pattern (UI — already exists)
```typescript
// Source: ui/src/hooks/useContentJob.ts (Phase 41)
// Usage in WallpaperGeneratePanel:
const job = useContentJob(companyId);
// Submit
job.submit("wallpaper", { prompt, platformKey: "desktop-hd" });
// Render result when done
if (job.status === "done" && job.bundle) {
const bundle = job.bundle as WallpaperBundle;
// bundle.pngBase64, bundle.dimensions, bundle.platformKey
}
```
### Converter Capabilities in UI
```typescript
// ui/src/hooks/useSystemProviders.ts (new)
// Calls GET /api/system/providers once on mount, caches result
// Returns: { whisperAvailable, piperAvailable, ... }
// Used by ChatInput for offline badge, by ConvertTargetSelector for AI-fallback notice
// ui/src/hooks/useConverterCapabilities.ts (new)
// Calls GET /api/system/converters once on mount
// Returns: { imageConverter, audioVideoConverter, docConverter, dataConverter }
```
---
## State of the Art
| Old Approach | Current Approach | When Changed | Impact |
|--------------|------------------|--------------|--------|
| Manual MIME detection via extension | Magic-byte detection via file-type | file-type v19+ | Required for CONV-09 — extension can be spoofed |
| Pandoc/LibreOffice for doc conversion | AI-bridge fallback when not available | CONV-08 design | No installer required; works everywhere |
| Separate validate endpoint | Validate at submit time | UI spec v1 | Fewer round trips, simpler client code |
**Deprecated/outdated:**
- `satori` for wallpaper generation: Not installed and not needed. The Phase 41 pattern (LLM SVG + sharp rasterize) is sufficient and consistent with existing code.
- Separate `/api/convert/validate` endpoint: Consolidate validation into the convert submit route.
---
## Open Questions
1. **WallpaperBundle storage format**
- What we know: Other bundles (DiagramBundle, IconSetBundle) store base64-encoded assets in JSON
- What's unclear: For wallpapers at 2560×1440, the PNG can be 5–15MB — base64 encoding adds ~33% → 20MB JSON blob stored in content_jobs.output. MAX_GENERATED_ASSET_BYTES = 500MB so it fits, but row size may be large for Postgres.
- Recommendation: Store the PNG as an asset (same as diagram-renderer stores to storage), and return `WallpaperBundle` with `assetId` + `dimensions` + `platformKey`. The UI downloads via `/api/assets/:id/content`. This avoids storing large base64 in the DB. Follow the same pattern if app-icon returns multiple sizes: store each size as a separate asset or as a multi-size ZIP.
2. **Convert job input size for large files**
- What we know: base64(10MB file) = ~13.3MB JSON in content_jobs.input column
- What's unclear: Whether Postgres/Drizzle has JSONB size limits that would reject this
- Recommendation: Postgres JSONB has no practical size limit beyond the max row size (1GB). 13.3MB is fine. Document the 10MB upload cap in the UI.
3. **Social post carousel slide format**
- What we know: SOCIAL-02 says "Instagram carousels and thread sequences"
- What's unclear: Whether thread sequences means Twitter threads (numbered tweets) or just a generic multi-part structure
- Recommendation: Implement as a unified `slides: string[]` field in SocialPostBundle. The `collapsible` sections in SocialPostResult handle both Twitter threads and Instagram carousel displays.
---
## Environment Availability
| Dependency | Required By | Available | Version | Fallback |
|------------|------------|-----------|---------|----------|
| sharp | CONV-01, WALL-01-04 | ✓ | 0.34.5 | — |
| @resvg/resvg-js | WALL-01-04 | ✓ | 2.6.2 | — |
| ffmpeg-static | CONV-02 | ✓ | 7.0.2 (binary) | — |
| file-type | CONV-09 | ✗ | — | Install: `pnpm add file-type@22.0.0` |
| xlsx | CONV-04 | ✗ | — | Install: `pnpm add xlsx@0.18.5` |
| csv-parse | CONV-04 | ✗ | — | Install: `pnpm add csv-parse@6.2.1` |
| pandoc | CONV-03 | ✗ | — | AI-bridge (CONV-08 design) |
| libreoffice | CONV-03 | ✗ | — | AI-bridge (CONV-08 design) |
| whisper-cpp | VOICE-03 | ✗ | — | openai-whisper CLI fallback; error message if neither |
| whisper (openai) | VOICE-03 | ✗ | — | whisper-cpp fallback |
| satori | Phase goal wording | ✗ | — | Not needed — use LLM SVG + sharp pattern |
**Missing dependencies with no fallback:**
- `file-type`, `xlsx`, `csv-parse` — these MUST be installed in Wave 0. Phase cannot complete CONV-01/CONV-04/CONV-09 without them.
**Missing dependencies with fallback:**
- `pandoc`, `libreoffice` — document conversion falls through to AI-bridge per CONV-08 design. Planner should add a startup probe that logs "pandoc not found, doc conversion will use AI bridge" rather than failing.
- `whisper-cpp`, `whisper` — existing voice pipeline already handles both missing gracefully with an informative error. VOICE-03 "offline" badge is shown based on `whisperAvailable` from hardware detection.
---
## Validation Architecture
### Test Framework
| Property | Value |
|----------|-------|
| Framework | vitest 3.0.5 |
| Server config | server/vitest.config.ts (environment: node) |
| UI config | ui/vitest.config.ts (environment: node, react plugin) |
| Quick run (server) | `cd /opt/nexus/server && npx vitest run src/__tests__/42-*.test.ts` |
| Full suite (server) | `cd /opt/nexus/server && npx vitest run` |
| Quick run (UI) | `cd /opt/nexus/ui && npx vitest run src/**/*.test.{ts,tsx}` |
**Note:** Server baseline has 4 pre-existing failing test files (hardware-detection, skill-registry-routes, agent-permissions, heartbeat-workspace-session) — these are NOT caused by Phase 42. Phase 42 tests must not add to this count.
### Phase Requirements → Test Map
| Req ID | Behavior | Test Type | Automated Command | File Exists? |
|--------|----------|-----------|-------------------|-------------|
| WALL-01/02/03 | `renderWallpaper()` returns PNG buffer at correct dimensions per platform key | unit | `npx vitest run src/__tests__/42-wallpaper-renderer.test.ts` | ❌ Wave 0 |
| WALL-04 | App icon renderer returns multi-size array | unit | `npx vitest run src/__tests__/42-wallpaper-renderer.test.ts` | ❌ Wave 0 |
| SOCIAL-01/02/03 | `renderSocialPost()` returns post text + hashtags; carousel returns slides array | unit | `npx vitest run src/__tests__/42-social-renderer.test.ts` | ❌ Wave 0 |
| CONV-01 | Image conversion round-trip (PNG→JPG) via sharp | unit | `npx vitest run src/__tests__/42-convert-renderer.test.ts` | ❌ Wave 0 |
| CONV-02 | Audio conversion dispatch calls ffmpeg-static binary | unit (mocked) | `npx vitest run src/__tests__/42-convert-renderer.test.ts` | ❌ Wave 0 |
| CONV-04 | CSV→JSON and JSON→XLSX conversions | unit | `npx vitest run src/__tests__/42-convert-renderer.test.ts` | ❌ Wave 0 |
| CONV-05 | Unknown pair falls through to AI bridge | unit | `npx vitest run src/__tests__/42-convert-renderer.test.ts` | ❌ Wave 0 |
| CONV-08 | converterCapabilitiesService probes pandoc/libreoffice at startup | unit (mocked execFile) | `npx vitest run src/__tests__/42-converter-capabilities.test.ts` | ❌ Wave 0 |
| CONV-09 | MIME mismatch rejected with 422 at convert route | unit (supertest) | `npx vitest run src/__tests__/42-convert-routes.test.ts` | ❌ Wave 0 |
| VOICE-01/02 | VoiceMicButton renders in ChatInput when enableVoiceInput=true | manual (pre-existing wiring) | n/a — already wired in ChatPanel.tsx | ✅ Existing |
| VOICE-03 | Offline badge shows when whisperAvailable=true from /api/system/providers | unit (mocked hook) | `npx vitest run src/**/*.test.tsx` (UI test) | ❌ Wave 0 |
### Sampling Rate
- **Per task commit:** `cd /opt/nexus/server && npx vitest run src/__tests__/42-*.test.ts`
- **Per wave merge:** `cd /opt/nexus/server && npx vitest run` (full server suite)
- **Phase gate:** Full server + UI suites green before `/gsd:verify-work`
### Wave 0 Gaps
- [ ] `server/src/__tests__/42-wallpaper-renderer.test.ts` — covers WALL-01 through WALL-04
- [ ] `server/src/__tests__/42-social-renderer.test.ts` — covers SOCIAL-01 through SOCIAL-03
- [ ] `server/src/__tests__/42-convert-renderer.test.ts` — covers CONV-01 through CONV-05
- [ ] `server/src/__tests__/42-converter-capabilities.test.ts` — covers CONV-08
- [ ] `server/src/__tests__/42-convert-routes.test.ts` — covers CONV-09 (MIME validation at HTTP layer)
- [ ] UI test for offline badge rendering (VOICE-03)
- [ ] Package install: `pnpm add file-type@22.0.0 xlsx@0.18.5 csv-parse@6.2.1 && pnpm add -D @types/xlsx@0.0.36`
---
## Sources
### Primary (HIGH confidence)
- Codebase direct read: `server/src/services/renderers/icon-renderer.ts` — renderer pattern, sharp usage
- Codebase direct read: `server/src/services/renderers/diagram-renderer.ts` — Playwright + Resvg pattern
- Codebase direct read: `server/src/services/content-job-runner.ts` — job dispatch architecture
- Codebase direct read: `server/src/services/voice-pipeline.ts` — Whisper probe and transcription pattern
- Codebase direct read: `server/src/routes/voice.ts` — multer upload pattern for binary input
- Codebase direct read: `ui/src/hooks/useContentJob.ts` — SSE hook established in Phase 41
- Codebase direct read: `ui/src/components/ChatInput.tsx` — existing VoiceMicButton wiring
- Codebase direct read: `ui/src/hooks/useVoiceMode.ts` — existing voice mode settings pattern
- Codebase direct read: `server/src/services/hardware.ts` — whisperAvailable detection, probe pattern
- Codebase direct read: `server/src/routes/hardware.ts` — GET /api/system/providers returns whisperAvailable
- Codebase direct read: `server/src/app.ts` — route mounting pattern
- Codebase direct read: `server/package.json` — installed deps list
- `npm view file-type version` → 22.0.0 (verified 2026-04-04)
- `npm view xlsx version` → 0.18.5 (verified 2026-04-04)
- `npm view csv-parse version` → 6.2.1 (verified 2026-04-04)
- Binary probe: `/opt/nexus/node_modules/.pnpm/ffmpeg-static@5.3.0/.../ffmpeg -version` → 7.0.2 (verified working)
### Secondary (MEDIUM confidence)
- `.planning/STATE.md` — accumulated decisions: CONV-05, CONV-08, CONV-09 architectural choices locked
- Phase 41-01-SUMMARY.md — renderer pattern, useContentJob hook, tech stack context
- Phase 40-01-SUMMARY.md — content_jobs schema, RenderResult interface, MAX_GENERATED_ASSET_BYTES
### Tertiary (LOW confidence)
- None — all critical claims verified by codebase inspection or npm registry.
---
## Metadata
**Confidence breakdown:**
- Standard stack: HIGH — all packages verified via codebase inspection + npm registry
- Architecture: HIGH — pattern directly derived from Phase 41 implementations in codebase
- Pitfalls: HIGH — most derived from actual code review (ffmpeg-static cast, file-type ESM, etc.)
- Environment availability: HIGH — verified via command execution on target system
**Research date:** 2026-04-04
**Valid until:** 2026-05-04 (packages stable; architecture unlikely to change)