# Phase 42: Wallpapers, Social, Format Conversion & Voice — Research **Researched:** 2026-04-04 **Domain:** Image generation (sharp/SVG), format conversion (sharp/ffmpeg-static/AI-bridge), social text generation (LLM), voice transcription (Whisper) **Confidence:** HIGH --- ## User Constraints (from CONTEXT.md) ### Locked Decisions All implementation choices are at Claude's discretion — discuss phase was skipped per user setting. ### Claude's Discretion All implementation choices are at Claude's discretion. Use ROADMAP phase goal, success criteria, and codebase conventions to guide decisions. ### Deferred Ideas (OUT OF SCOPE) None — discuss phase skipped. --- ## Phase Requirements | ID | Description | Research Support | |----|-------------|------------------| | WALL-01 | User can generate desktop and mobile wallpapers from a description | SVG-via-LLM + sharp rasterize at target dimensions; PLATFORM_DIMENSIONS constants in renderer | | WALL-02 | User can generate social media banners with correct dimensions per platform | Same renderer; platform map covers OG Image, Twitter Card, Instagram, LinkedIn | | WALL-03 | User can generate Open Graph and social preview images | Same renderer; OG Image = 1200×630 constant | | WALL-04 | User can generate app icons and favicons in multiple sizes | Renderer returns multi-size bundle (1024, 512, 256, 64, 32); WallpaperPreview renders grid | | SOCIAL-01 | User can generate platform-ready posts respecting character limits (Twitter, LinkedIn) | LLM prompt with platform limit injected; character count UI enforced per-platform constants | | SOCIAL-02 | User can generate Instagram carousels and thread sequences | LLM returns JSON with slides array; carousel rendered as numbered collapsible sections | | SOCIAL-03 | System suggests relevant hashtags for generated content | LLM prompt requests hashtag suggestions as JSON array alongside post text | | CONV-01 | User can convert between image formats (PNG, JPG, SVG, WebP, GIF) via sharp | sharp 0.34.5 already installed; supports all listed formats | | CONV-02 | User can convert between audio/video formats via ffmpeg | ffmpeg-static 7.0.2 already installed and verified working | | CONV-03 | User can convert between document formats via Pandoc/LibreOffice | pandoc/libreoffice NOT installed → falls to AI-bridge per CONV-08 | | CONV-04 | User can convert between data formats (CSV, JSON, XLSX) | xlsx + csv-parse packages needed; pure-Node.js conversion | | CONV-05 | User can convert between any format pair via AI-bridged conversion | puterChatComplete already established; handles unsupported pairs | | CONV-06 | System provides conversion UI with source/target format selection and drag-drop | Standalone /convert page; ConvertPanel as described in UI spec | | CONV-07 | User can deep-link to specific conversion flows via URL | /convert/:sourceFormat?/:targetFormat? route in App.tsx; pre-select chips on mount | | CONV-08 | System detects available direct converters at startup | Startup probe service; GET /api/system/converters endpoint | | CONV-09 | System validates uploaded file MIME type via magic-byte detection | file-type@22.0.0 (ESM, ships own types); validate at convert route before job dispatch | | VOICE-01 | User can click mic button in web chat to record and auto-transcribe via Whisper | VoiceMicButton already in ChatInput when enableVoiceInput=true; already wired | | VOICE-02 | User can toggle between text-only, voice-input, and full-voice modes | VoiceModeToggle already exists; already wired in ChatInput; Phase 42 verifies correctness | | VOICE-03 | Voice input works offline with local Whisper model | voice-pipeline.ts already probes whisper-cpp → openai-whisper; WHISPER_MODEL env var + offline badge | --- ## Summary Phase 42 extends the Phase 41 content generation system with four new capabilities: platform-aware image generation (wallpapers, OG images, social banners, app icons), LLM-driven social post generation with hashtag suggestions, a full-featured file format conversion pipeline, and offline voice input via Whisper. The server already has all critical dependencies for images (sharp@0.34.5, @resvg/resvg-js@2.6.2) and audio/video (ffmpeg-static@7.0.2 — verified working at /opt/nexus/node_modules/.pnpm/ffmpeg-static@5.3.0). Three packages need to be added: `file-type@22.0.0` (magic-byte MIME detection), `xlsx@0.18.5` (XLSX data conversion), and `csv-parse@6.2.1` (CSV parsing). Document conversion (pandoc/libreoffice) is not available on this system and will fall through to AI-bridge per CONV-08 — no installation needed. The voice pipeline (`voice-pipeline.ts`) already handles Whisper probe and transcription. Phase 42's voice work is: (1) add `WHISPER_MODEL=local` env var support to signal offline capability, (2) expose whisper availability to the UI via the existing `/api/system/providers` endpoint (already returns `whisperAvailable`), (3) render the "Offline" badge in `ChatInput` alongside `VoiceMicButton`. The VoiceMicButton, VoiceModeToggle, and `enableVoiceInput=true` wiring already exist in `ChatPanel.tsx`. **Primary recommendation:** Follow the established Phase 41 renderer pattern: add four new `jobType` cases to `content-job-runner.ts` (`wallpaper`, `social-post`, `convert`), create one renderer file per job type in `server/src/services/renderers/`, and wire three new ContentStudio tabs + one standalone `/convert` page in the UI. --- ## Standard Stack ### Core (all verified installed in server) | Library | Version | Purpose | Why Standard | |---------|---------|---------|--------------| | sharp | 0.34.5 | Image format conversion + SVG rasterization at target dimensions | Already installed; used by icon-renderer and org-chart-svg | | @resvg/resvg-js | 2.6.2 | High-fidelity SVG→PNG rasterization with fitTo dimensions | Already installed; used by diagram-renderer | | ffmpeg-static | 5.3.0 (bin: 7.0.2) | Bundled ffmpeg binary for audio/video conversion | Already installed; used by voice-pipeline and telegram | | culori | 4.0.2 | OKLCH color math (not directly needed but available) | Already installed | | puterChatComplete | (internal) | LLM inference for wallpaper SVG generation, social posts, AI-bridge conversion | Established pattern in Phase 41 renderers | ### New Dependencies (needs `pnpm add` in server) | Library | Version | Purpose | Why Standard | |---------|---------|---------|--------------| | file-type | 22.0.0 | Magic-byte MIME type detection for CONV-09 | ESM-native, ships own types, well-maintained | | xlsx | 0.18.5 | XLSX read/write for data conversion CONV-04 | Most-used Excel library for Node.js | | csv-parse | 6.2.1 | CSV parsing for data conversion CONV-04 | De-facto standard, streaming API | | @types/xlsx | 0.0.36 | TypeScript types for xlsx | xlsx ships types/index.d.ts but @types available | ### Alternatives Considered | Instead of | Could Use | Tradeoff | |------------|-----------|----------| | file-type@22 (ESM) | mmmagic or mime-magic | file-type is pure JS, no native binding, ships own types; server is type:module so ESM is fine | | xlsx | exceljs | xlsx is simpler API for read/write; exceljs has streaming but more complex | | sharp for SVG rasterization | Playwright (like diagram-renderer) | sharp+resvg is faster for simple SVG → PNG; Playwright only needed for JavaScript-rendered content | **Installation:** ```bash # Run from /opt/nexus/server pnpm add file-type@22.0.0 xlsx@0.18.5 csv-parse@6.2.1 pnpm add -D @types/xlsx@0.0.36 ``` **Version verification (run before installing):** ```bash npm view file-type version # → 22.0.0 npm view xlsx version # → 0.18.5 npm view csv-parse version # → 6.2.1 ``` --- ## Architecture Patterns ### Established Renderer Pattern (from Phase 41) Every new capability follows this exact structure: 1. **Renderer file:** `server/src/services/renderers/{name}-renderer.ts` exports `async function render{Name}(input: Record): Promise` 2. **Job runner switch:** Add `case '{jobtype}':` to `renderContent()` in `content-job-runner.ts` 3. **Bundle type (if needed):** Add `interface {Name}Bundle` to `types.ts` 4. **API route:** Submit via existing `POST /api/companies/:id/content-jobs` with `{ jobType, input }` 5. **UI hook:** `useContentJob(companyId)` already handles all SSE + state management 6. **UI component:** Panel reads `job.bundle` after `status === 'done'` The format conversion job is the only exception — it requires a separate multipart upload route because the file binary cannot be passed as JSON input via the standard content-jobs endpoint. ### Recommended Project Structure (new files) ``` server/src/ ├── services/renderers/ │ ├── types.ts # ADD: WallpaperBundle, SocialPostBundle, ConvertBundle │ ├── wallpaper-renderer.ts # NEW │ ├── social-renderer.ts # NEW │ └── convert-renderer.ts # NEW ├── services/ │ └── converter-capabilities.ts # NEW: startup probe + cache └── routes/ └── convert.ts # NEW: POST /api/companies/:id/convert (multipart) # GET /api/system/converters ui/src/ ├── pages/ │ └── ConvertPage.tsx # NEW: standalone /convert page ├── components/ │ ├── WallpaperGeneratePanel.tsx # NEW │ ├── WallpaperPreview.tsx # NEW │ ├── SocialPostPanel.tsx # NEW │ ├── SocialPostResult.tsx # NEW │ └── ConvertPanel.tsx # NEW (contains ConvertSourceZone + ConvertTargetSelector + ConvertActionBar) └── api/ └── convert.ts # NEW: submitConvertJob (multipart), getConverterCapabilities ``` ### Pattern 1: Wallpaper Generation (WALL-01 to WALL-04) **What:** LLM generates an SVG at a conceptual level, then sharp rasterizes it to exact pixel dimensions for the requested platform. **When to use:** Any fixed-dimension image asset (wallpaper, OG image, social banner, app icon). ```typescript // Source: established pattern from icon-renderer.ts + sharp resize // server/src/services/renderers/wallpaper-renderer.ts export const PLATFORM_DIMENSIONS: Record = { "desktop-hd": { width: 2560, height: 1440, label: "Desktop HD (2560 × 1440)" }, "desktop-fhd": { width: 1920, height: 1080, label: "Desktop FHD (1920 × 1080)" }, "desktop-4k": { width: 3840, height: 2160, label: "Desktop 4K (3840 × 2160)" }, "mobile-portrait": { width: 1080, height: 1920, label: "Mobile Portrait (1080 × 1920)" }, "mobile-landscape": { width: 1920, height: 1080, label: "Mobile Landscape (1920 × 1080)" }, "og-image": { width: 1200, height: 630, label: "OG Image (1200 × 630)" }, "twitter-card": { width: 1200, height: 628, label: "Twitter Card (1200 × 628)" }, "instagram-post": { width: 1080, height: 1080, label: "Instagram Post (1080 × 1080)" }, "instagram-banner": { width: 1080, height: 566, label: "Instagram Banner (1080 × 566)" }, "linkedin-banner": { width: 1584, height: 396, label: "LinkedIn Banner (1584 × 396)" }, "app-icon": { width: 1024, height: 1024, label: "App Icon (1024 × 1024)" }, "favicon": { width: 32, height: 32, label: "Favicon (32 × 32)" }, }; // App icon + favicon: render multiple sizes from one SVG const APP_ICON_SIZES = [1024, 512, 256, 64, 32] as const; // Render flow: // 1. puterChatComplete → SVG string (LLM generates SVG matching aspect ratio) // 2. sharp(svgBuffer).resize(width, height, { fit: 'fill' }).png() → PNG buffer // 3. Return WallpaperBundle with pngBase64 + dimensions ``` **Critical constraint:** Platform dimensions MUST be constants, never magic numbers (success criterion 1). Export `PLATFORM_DIMENSIONS` from the renderer and re-export to the UI API client so the UI's Select options derive from the same source. ### Pattern 2: Format Conversion Architecture (CONV-01 to CONV-09) **What:** Multipart upload endpoint validates MIME, stores base64 in job input, dispatch to converter renderer which routes to sharp/ffmpeg/xlsx/AI-bridge based on format pair. **Why separate route:** Content-jobs POST accepts JSON; file binary needs multipart handling. ```typescript // server/src/routes/convert.ts — new multipart route // POST /api/companies/:companyId/convert import multer from "multer"; import { fileTypeFromBuffer } from "file-type"; router.post("/companies/:companyId/convert", async (req, res) => { // 1. multer.memoryStorage() upload (limit: MAX_ATTACHMENT_BYTES) // 2. fileTypeFromBuffer(file.buffer) → detected MIME // 3. Compare detected MIME against file extension claim // 4. If mismatch: res.status(422).json({ error: "...", actualMime, claimedMime }) // 5. job input: { fileBase64: buffer.toString('base64'), sourceMime, targetFormat, originalFilename } // 6. contentJobStore.create + contentJobRunner.dispatch // 7. res.status(202).json({ jobId, status }) }); // GET /api/system/converters — capability map for UI router.get("/system/converters", async (_req, res) => { const caps = await converterCapabilitiesService().get(); res.json(caps); // Returns: { imageConverter: true, audioVideoConverter: true, docConverter: false, dataConverter: true } }); ``` ```typescript // server/src/services/renderers/convert-renderer.ts async function renderConvert(input: Record): Promise { const { fileBase64, sourceMime, targetFormat } = input; const fileBuffer = Buffer.from(fileBase64 as string, "base64"); // Route by format category: if (isImageFormat(sourceMime) && isImageFormat(targetFormat)) { return convertImageViaSharp(fileBuffer, sourceMime, targetFormat); } if (isAudioVideoFormat(sourceMime) || isAudioVideoFormat(targetFormat)) { return convertAVViaFfmpeg(fileBuffer, sourceMime, targetFormat); } if (isDataFormat(sourceMime) || isDataFormat(targetFormat)) { return convertDataFormat(fileBuffer, sourceMime, targetFormat); } // All other pairs: AI bridge (CONV-05) return convertViaAiBridge(fileBuffer, sourceMime, targetFormat); } ``` ### Pattern 3: Converter Capability Probe (CONV-08) ```typescript // server/src/services/converter-capabilities.ts // Probe at startup, cache result (same pattern as hardwareService) let cache: ConverterCapabilities | null = null; export interface ConverterCapabilities { imageConverter: boolean; // sharp — always true (npm dep) audioVideoConverter: boolean; // ffmpeg-static — always true (npm dep) docConverter: boolean; // pandoc or libreoffice — probe at startup dataConverter: boolean; // xlsx + csv-parse — always true (npm dep) } export function converterCapabilitiesService() { async function get(): Promise { if (cache) return cache; let docConverter = false; try { await execFileAsync("pandoc", ["--version"], { timeout: 2000 }); docConverter = true; } catch { try { await execFileAsync("libreoffice", ["--version"], { timeout: 2000 }); docConverter = true; } catch { /* not available */ } } cache = { imageConverter: true, audioVideoConverter: true, docConverter, dataConverter: true }; return cache; } return { get }; } ``` ### Pattern 4: Social Post Generation (SOCIAL-01 to SOCIAL-03) ```typescript // server/src/services/renderers/social-renderer.ts export const PLATFORM_CHAR_LIMITS: Record = { "twitter-x": 280, "linkedin": 3000, "instagram-caption": 2200, "instagram-carousel": 300, // per slide }; // LLM prompt asks for JSON: { post: string, hashtags: string[], slides?: string[] } // For carousel: slides array, each under 300 chars // puterChatComplete returns JSON; renderer parses + validates ``` ### Pattern 5: Voice Offline Badge (VOICE-03) The voice pipeline already handles Whisper detection. Phase 42 adds two things: 1. **Server:** `WHISPER_MODEL` env var read in `voice-pipeline.ts` — when set to `"local"`, include `"local"` in nexus-settings response or expose via `GET /api/system/providers` (already returns `whisperAvailable` from `hardwareService().detect()`). 2. **UI:** In `ChatInput.tsx`, read `whisperAvailable` from a `useConverterCapabilities()` or `useSystemProviders()` hook. Show `Offline` next to `VoiceMicButton` when `whisperAvailable === true`. **IMPORTANT:** The existing `GET /api/system/providers` already returns `{ whisperAvailable: boolean, piperAvailable: boolean, ... }` — no new endpoint needed. Create a `useSystemProviders()` hook that calls this endpoint once on mount. ### Pattern 6: ContentStudio Tab Extension + Standalone Convert Page ```typescript // ui/src/pages/ContentStudio.tsx — extend TabsList // Add three TabsTriggers: "Wallpapers", "Social", "Convert" // "Convert" tab value triggers navigate() to /convert (standalone page) // TabsContent for wallpapers and social are normal panel components // TabsContent for convert is NOT a content panel — the tab click navigates away // ui/src/App.tsx — add new routes in boardRoutes() } /> } /> } /> } /> ``` ### Anti-Patterns to Avoid - **Magic number dimensions:** Never hardcode `2560` or `1440` in component code — always read from `PLATFORM_DIMENSIONS` constant exported from renderer or a shared types file. - **Passing file buffer as base64 in SSE-triggered jobs with >10MB files:** The 10MB multer limit prevents oversized uploads; document this clearly in the convert route. - **Blocking HTTP on render:** All conversion dispatched fire-and-forget via `contentJobRunner.dispatch()`. The POST /convert route returns 202 immediately. - **Showing format pairs as "unavailable":** Per CONV-08, all format pairs are selectable in the UI. Unavailable direct converters show the AI fallback notice, never a disabled/grey chip. - **Creating a separate `/api/convert/validate` endpoint:** Validate at job submit time in the convert route (simpler, fewer round trips). The UI spec notes this as an OR condition. - **Satori for wallpaper generation:** Satori is NOT installed. Use the established pattern: LLM generates SVG → sharp rasterizes to exact dimensions. Satori would require JSX rendering infrastructure not needed here. --- ## Don't Hand-Roll | Problem | Don't Build | Use Instead | Why | |---------|-------------|-------------|-----| | MIME type detection from file bytes | Custom magic-byte reader | `file-type@22.0.0` | Handles 500+ MIME types, handles edge cases like truncated files, streaming API | | XLSX read/write | Custom binary parser | `xlsx@0.18.5` | XLSX format is complex binary (OOXML); hand-rolling is weeks of work | | CSV parsing | String.split() | `csv-parse@6.2.1` | Handles quoted fields, escaped commas, multiline values, BOM | | Image format conversion | Native buffer manipulation | `sharp@0.34.5` | Already installed; handles color spaces, ICC profiles, transparency | | Audio/video conversion | Custom codec wrappers | `ffmpeg-static@7.0.2` | Already installed; handles all codec negotiation | | SVG rasterization | canvas/Playwright | `@resvg/resvg-js@2.6.2` | Already installed; faster than Playwright for static SVG | | LLM inference | New HTTP client | `puterChatComplete()` | Already implemented in Phase 41; puter-inference.ts is the project standard | **Key insight:** All heavy-lifting tools are already installed. Phase 42 is primarily wiring (new renderers + routes + UI panels) rather than infrastructure. --- ## Common Pitfalls ### Pitfall 1: Sharp SVG Input at Large Dimensions **What goes wrong:** `sharp(svgBuffer).resize(2560, 1440)` produces a blurry image when the SVG has a small implicit pixel density. **Why it happens:** Sharp defaults to 72 DPI for SVG input; scaling up produces raster artifacts before the resize step. **How to avoid:** Always pass `{ density: 300 }` option when loading SVG into sharp: `sharp(svgBuffer, { density: 300 }).resize(width, height, { fit: 'fill' }).png()`. Alternatively, ask the LLM to generate an SVG with `viewBox="0 0 {width} {height}"` matching the target dimensions, then use Resvg with `fitTo: { mode: 'width', value: width }`. **Warning signs:** Generated wallpapers look pixelated or blurry at edges. ### Pitfall 2: file-type v22 Import Syntax **What goes wrong:** `import FileType from 'file-type'` fails with "does not provide an export named 'default'". **Why it happens:** file-type v22 is pure ESM with named exports only. **How to avoid:** Use named import: `import { fileTypeFromBuffer } from 'file-type'`. Server is `type: module` with `module: NodeNext` — ESM imports work directly. **Warning signs:** TypeScript error TS2613 or runtime "is not a function" errors. ### Pitfall 3: ffmpeg-static Path Resolution **What goes wrong:** `spawn(ffmpegPath, ...)` throws ENOENT even though ffmpeg-static is installed. **Why it happens:** `ffmpegPath` from `import ffmpegPath from 'ffmpeg-static'` is the binary path string, but it needs `as unknown as string` cast due to TS type mismatch. The actual binary is at `/opt/nexus/node_modules/.pnpm/ffmpeg-static@5.3.0/node_modules/ffmpeg-static/ffmpeg`. **How to avoid:** Copy the existing pattern from `voice-pipeline.ts` exactly: `if (!ffmpegPath) throw new Error("ffmpeg-static binary not found"); const ffmpegBin = ffmpegPath as unknown as string;`. **Warning signs:** `ffmpegBin` is null/undefined; ENOENT on spawn. ### Pitfall 4: Content-Job Input Size for Conversion **What goes wrong:** Submitting a 10MB file as base64 in job input stores ~13.3MB of base64 in the `content_jobs.input` JSONB column per submission. **Why it happens:** base64 adds ~33% overhead. For a 10MB file (MAX_ATTACHMENT_BYTES), this is ~13.3MB per job row. **How to avoid:** This is acceptable for the single-user case (success criteria assume one conversion at a time). Document the max file size clearly in the UI (the multer limit enforces it). If this becomes a problem in future, change the renderer to accept storage object keys (requires extending content-job-runner signature). **Warning signs:** Postgres table growth visible in db metrics after many conversions. ### Pitfall 5: Social Carousel JSON Parsing from LLM **What goes wrong:** LLM returns markdown-fenced JSON or adds explanation text, causing `JSON.parse()` to throw. **Why it happens:** LLMs sometimes wrap JSON in ````json ... ```` fences. **How to avoid:** Post-process LLM output to strip markdown fences before JSON.parse(). Use a robust extraction pattern: `const match = raw.match(/```json\s*([\s\S]*?)\s*```/) || raw.match(/({[\s\S]*})/); JSON.parse(match ? match[1] : raw)`. Apply the same fix pattern used by icon-renderer.ts SVG validation. **Warning signs:** SocialPostResult shows "Generation failed" after seemingly valid LLM output. ### Pitfall 6: Deep-Link Route Parameter Case **What goes wrong:** `/convert/PNG/SVG` doesn't pre-select chips because the component does a case-sensitive compare against format names. **Why it happens:** URL params are case-sensitive; format chips may be stored as uppercase. **How to avoid:** Normalize URL params to lowercase on read: `params.sourceFormat?.toLowerCase()`. Match against chip identifiers using `formatId.toLowerCase() === param.toLowerCase()`. **Warning signs:** Deep-link URL works in one case but not when user types different casing. ### Pitfall 7: Voice Offline Badge Always Showing **What goes wrong:** The "Offline" badge shows even when whisper is not installed (whisperAvailable: false). **Why it happens:** Misreading the UI spec: badge shows when `whisperAvailable === true` (local model detected), not when `WHISPER_MODEL=local` env var is set (which is confusing naming). **How to avoid:** Read `whisperAvailable` from `GET /api/system/providers`. Show badge if `whisperAvailable === true`. The "offline capability" is proven by the binary being detected, not by an env var. The `WHISPER_MODEL` env var mentioned in the UI spec is a future extension point for model selection — do not implement it unless the spec is explicitly required. Per VOICE-03, "works offline with locally cached model" means the whisper-cpp binary + base model are present. **Warning signs:** Badge shows on machines where whisper is not installed. --- ## Code Examples ### Wallpaper Renderer: Sharp at Target Dimensions ```typescript // Source: icon-renderer.ts pattern + sharp resize extension // server/src/services/renderers/wallpaper-renderer.ts import sharp from "sharp"; import { puterChatComplete } from "../puter-inference.js"; import type { RenderResult } from "./types.js"; async function renderSvgToWallpaper(svgString: string, width: number, height: number): Promise { return sharp(Buffer.from(svgString), { density: 300 }) .resize(width, height, { fit: "fill" }) .png({ compressionLevel: 9 }) .toBuffer(); } ``` ### Magic-Byte MIME Validation ```typescript // Source: file-type@22 documentation — ESM named import import { fileTypeFromBuffer } from "file-type"; async function validateMime(buffer: Buffer, claimedExtension: string): Promise<{ ok: boolean; actualMime?: string; claimedMime?: string }> { const detected = await fileTypeFromBuffer(buffer); if (!detected) return { ok: true }; // unknown type, allow (SVG/text files have no magic bytes) const mimeForExtension = extensionToMime(claimedExtension); // lookup table if (mimeForExtension && detected.mime !== mimeForExtension) { return { ok: false, actualMime: detected.mime, claimedMime: mimeForExtension }; } return { ok: true }; } ``` ### ffmpeg-static Conversion (audio/video) ```typescript // Source: voice-pipeline.ts pattern (established in Phase 36) import ffmpegPath from "ffmpeg-static"; import { spawn } from "node:child_process"; if (!ffmpegPath) throw new Error("ffmpeg-static binary not found"); const ffmpegBin = ffmpegPath as unknown as string; function convertAVViaFfmpeg(inputBuffer: Buffer, sourceFormat: string, targetFormat: string): Promise { return new Promise((resolve, reject) => { const ffmpeg = spawn(ffmpegBin, [ "-f", sourceFormat, "-i", "pipe:0", "-f", targetFormat, "pipe:1", ], { stdio: ["pipe", "pipe", "pipe"] }); const chunks: Buffer[] = []; ffmpeg.stdout.on("data", (c: Buffer) => chunks.push(c)); ffmpeg.stderr.on("data", () => {}); // discard ffmpeg.on("close", (code) => code === 0 ? resolve(Buffer.concat(chunks)) : reject(new Error(`ffmpeg exited ${code}`))); ffmpeg.on("error", reject); ffmpeg.stdin.write(inputBuffer); ffmpeg.stdin.end(); }); } ``` ### Data Format Conversion (CSV ↔ JSON ↔ XLSX) ```typescript // Source: xlsx documentation + csv-parse documentation import * as XLSX from "xlsx"; import { parse as csvParse } from "csv-parse/sync"; // CSV → JSON function csvToJson(buffer: Buffer): Record[] { return csvParse(buffer, { columns: true, skip_empty_lines: true }); } // JSON → XLSX function jsonToXlsx(data: Record[]): Buffer { const ws = XLSX.utils.json_to_sheet(data); const wb = XLSX.utils.book_new(); XLSX.utils.book_append_sheet(wb, ws, "Sheet1"); return Buffer.from(XLSX.write(wb, { type: "buffer", bookType: "xlsx" })); } ``` ### useContentJob Pattern (UI — already exists) ```typescript // Source: ui/src/hooks/useContentJob.ts (Phase 41) // Usage in WallpaperGeneratePanel: const job = useContentJob(companyId); // Submit job.submit("wallpaper", { prompt, platformKey: "desktop-hd" }); // Render result when done if (job.status === "done" && job.bundle) { const bundle = job.bundle as WallpaperBundle; // bundle.pngBase64, bundle.dimensions, bundle.platformKey } ``` ### Converter Capabilities in UI ```typescript // ui/src/hooks/useSystemProviders.ts (new) // Calls GET /api/system/providers once on mount, caches result // Returns: { whisperAvailable, piperAvailable, ... } // Used by ChatInput for offline badge, by ConvertTargetSelector for AI-fallback notice // ui/src/hooks/useConverterCapabilities.ts (new) // Calls GET /api/system/converters once on mount // Returns: { imageConverter, audioVideoConverter, docConverter, dataConverter } ``` --- ## State of the Art | Old Approach | Current Approach | When Changed | Impact | |--------------|------------------|--------------|--------| | Manual MIME detection via extension | Magic-byte detection via file-type | file-type v19+ | Required for CONV-09 — extension can be spoofed | | Pandoc/LibreOffice for doc conversion | AI-bridge fallback when not available | CONV-08 design | No installer required; works everywhere | | Separate validate endpoint | Validate at submit time | UI spec v1 | Fewer round trips, simpler client code | **Deprecated/outdated:** - `satori` for wallpaper generation: Not installed and not needed. The Phase 41 pattern (LLM SVG + sharp rasterize) is sufficient and consistent with existing code. - Separate `/api/convert/validate` endpoint: Consolidate validation into the convert submit route. --- ## Open Questions 1. **WallpaperBundle storage format** - What we know: Other bundles (DiagramBundle, IconSetBundle) store base64-encoded assets in JSON - What's unclear: For wallpapers at 2560×1440, the PNG can be 5–15MB — base64 encoding adds ~33% → 20MB JSON blob stored in content_jobs.output. MAX_GENERATED_ASSET_BYTES = 500MB so it fits, but row size may be large for Postgres. - Recommendation: Store the PNG as an asset (same as diagram-renderer stores to storage), and return `WallpaperBundle` with `assetId` + `dimensions` + `platformKey`. The UI downloads via `/api/assets/:id/content`. This avoids storing large base64 in the DB. Follow the same pattern if app-icon returns multiple sizes: store each size as a separate asset or as a multi-size ZIP. 2. **Convert job input size for large files** - What we know: base64(10MB file) = ~13.3MB JSON in content_jobs.input column - What's unclear: Whether Postgres/Drizzle has JSONB size limits that would reject this - Recommendation: Postgres JSONB has no practical size limit beyond the max row size (1GB). 13.3MB is fine. Document the 10MB upload cap in the UI. 3. **Social post carousel slide format** - What we know: SOCIAL-02 says "Instagram carousels and thread sequences" - What's unclear: Whether thread sequences means Twitter threads (numbered tweets) or just a generic multi-part structure - Recommendation: Implement as a unified `slides: string[]` field in SocialPostBundle. The `collapsible` sections in SocialPostResult handle both Twitter threads and Instagram carousel displays. --- ## Environment Availability | Dependency | Required By | Available | Version | Fallback | |------------|------------|-----------|---------|----------| | sharp | CONV-01, WALL-01-04 | ✓ | 0.34.5 | — | | @resvg/resvg-js | WALL-01-04 | ✓ | 2.6.2 | — | | ffmpeg-static | CONV-02 | ✓ | 7.0.2 (binary) | — | | file-type | CONV-09 | ✗ | — | Install: `pnpm add file-type@22.0.0` | | xlsx | CONV-04 | ✗ | — | Install: `pnpm add xlsx@0.18.5` | | csv-parse | CONV-04 | ✗ | — | Install: `pnpm add csv-parse@6.2.1` | | pandoc | CONV-03 | ✗ | — | AI-bridge (CONV-08 design) | | libreoffice | CONV-03 | ✗ | — | AI-bridge (CONV-08 design) | | whisper-cpp | VOICE-03 | ✗ | — | openai-whisper CLI fallback; error message if neither | | whisper (openai) | VOICE-03 | ✗ | — | whisper-cpp fallback | | satori | Phase goal wording | ✗ | — | Not needed — use LLM SVG + sharp pattern | **Missing dependencies with no fallback:** - `file-type`, `xlsx`, `csv-parse` — these MUST be installed in Wave 0. Phase cannot complete CONV-01/CONV-04/CONV-09 without them. **Missing dependencies with fallback:** - `pandoc`, `libreoffice` — document conversion falls through to AI-bridge per CONV-08 design. Planner should add a startup probe that logs "pandoc not found, doc conversion will use AI bridge" rather than failing. - `whisper-cpp`, `whisper` — existing voice pipeline already handles both missing gracefully with an informative error. VOICE-03 "offline" badge is shown based on `whisperAvailable` from hardware detection. --- ## Validation Architecture ### Test Framework | Property | Value | |----------|-------| | Framework | vitest 3.0.5 | | Server config | server/vitest.config.ts (environment: node) | | UI config | ui/vitest.config.ts (environment: node, react plugin) | | Quick run (server) | `cd /opt/nexus/server && npx vitest run src/__tests__/42-*.test.ts` | | Full suite (server) | `cd /opt/nexus/server && npx vitest run` | | Quick run (UI) | `cd /opt/nexus/ui && npx vitest run src/**/*.test.{ts,tsx}` | **Note:** Server baseline has 4 pre-existing failing test files (hardware-detection, skill-registry-routes, agent-permissions, heartbeat-workspace-session) — these are NOT caused by Phase 42. Phase 42 tests must not add to this count. ### Phase Requirements → Test Map | Req ID | Behavior | Test Type | Automated Command | File Exists? | |--------|----------|-----------|-------------------|-------------| | WALL-01/02/03 | `renderWallpaper()` returns PNG buffer at correct dimensions per platform key | unit | `npx vitest run src/__tests__/42-wallpaper-renderer.test.ts` | ❌ Wave 0 | | WALL-04 | App icon renderer returns multi-size array | unit | `npx vitest run src/__tests__/42-wallpaper-renderer.test.ts` | ❌ Wave 0 | | SOCIAL-01/02/03 | `renderSocialPost()` returns post text + hashtags; carousel returns slides array | unit | `npx vitest run src/__tests__/42-social-renderer.test.ts` | ❌ Wave 0 | | CONV-01 | Image conversion round-trip (PNG→JPG) via sharp | unit | `npx vitest run src/__tests__/42-convert-renderer.test.ts` | ❌ Wave 0 | | CONV-02 | Audio conversion dispatch calls ffmpeg-static binary | unit (mocked) | `npx vitest run src/__tests__/42-convert-renderer.test.ts` | ❌ Wave 0 | | CONV-04 | CSV→JSON and JSON→XLSX conversions | unit | `npx vitest run src/__tests__/42-convert-renderer.test.ts` | ❌ Wave 0 | | CONV-05 | Unknown pair falls through to AI bridge | unit | `npx vitest run src/__tests__/42-convert-renderer.test.ts` | ❌ Wave 0 | | CONV-08 | converterCapabilitiesService probes pandoc/libreoffice at startup | unit (mocked execFile) | `npx vitest run src/__tests__/42-converter-capabilities.test.ts` | ❌ Wave 0 | | CONV-09 | MIME mismatch rejected with 422 at convert route | unit (supertest) | `npx vitest run src/__tests__/42-convert-routes.test.ts` | ❌ Wave 0 | | VOICE-01/02 | VoiceMicButton renders in ChatInput when enableVoiceInput=true | manual (pre-existing wiring) | n/a — already wired in ChatPanel.tsx | ✅ Existing | | VOICE-03 | Offline badge shows when whisperAvailable=true from /api/system/providers | unit (mocked hook) | `npx vitest run src/**/*.test.tsx` (UI test) | ❌ Wave 0 | ### Sampling Rate - **Per task commit:** `cd /opt/nexus/server && npx vitest run src/__tests__/42-*.test.ts` - **Per wave merge:** `cd /opt/nexus/server && npx vitest run` (full server suite) - **Phase gate:** Full server + UI suites green before `/gsd:verify-work` ### Wave 0 Gaps - [ ] `server/src/__tests__/42-wallpaper-renderer.test.ts` — covers WALL-01 through WALL-04 - [ ] `server/src/__tests__/42-social-renderer.test.ts` — covers SOCIAL-01 through SOCIAL-03 - [ ] `server/src/__tests__/42-convert-renderer.test.ts` — covers CONV-01 through CONV-05 - [ ] `server/src/__tests__/42-converter-capabilities.test.ts` — covers CONV-08 - [ ] `server/src/__tests__/42-convert-routes.test.ts` — covers CONV-09 (MIME validation at HTTP layer) - [ ] UI test for offline badge rendering (VOICE-03) - [ ] Package install: `pnpm add file-type@22.0.0 xlsx@0.18.5 csv-parse@6.2.1 && pnpm add -D @types/xlsx@0.0.36` --- ## Sources ### Primary (HIGH confidence) - Codebase direct read: `server/src/services/renderers/icon-renderer.ts` — renderer pattern, sharp usage - Codebase direct read: `server/src/services/renderers/diagram-renderer.ts` — Playwright + Resvg pattern - Codebase direct read: `server/src/services/content-job-runner.ts` — job dispatch architecture - Codebase direct read: `server/src/services/voice-pipeline.ts` — Whisper probe and transcription pattern - Codebase direct read: `server/src/routes/voice.ts` — multer upload pattern for binary input - Codebase direct read: `ui/src/hooks/useContentJob.ts` — SSE hook established in Phase 41 - Codebase direct read: `ui/src/components/ChatInput.tsx` — existing VoiceMicButton wiring - Codebase direct read: `ui/src/hooks/useVoiceMode.ts` — existing voice mode settings pattern - Codebase direct read: `server/src/services/hardware.ts` — whisperAvailable detection, probe pattern - Codebase direct read: `server/src/routes/hardware.ts` — GET /api/system/providers returns whisperAvailable - Codebase direct read: `server/src/app.ts` — route mounting pattern - Codebase direct read: `server/package.json` — installed deps list - `npm view file-type version` → 22.0.0 (verified 2026-04-04) - `npm view xlsx version` → 0.18.5 (verified 2026-04-04) - `npm view csv-parse version` → 6.2.1 (verified 2026-04-04) - Binary probe: `/opt/nexus/node_modules/.pnpm/ffmpeg-static@5.3.0/.../ffmpeg -version` → 7.0.2 (verified working) ### Secondary (MEDIUM confidence) - `.planning/STATE.md` — accumulated decisions: CONV-05, CONV-08, CONV-09 architectural choices locked - Phase 41-01-SUMMARY.md — renderer pattern, useContentJob hook, tech stack context - Phase 40-01-SUMMARY.md — content_jobs schema, RenderResult interface, MAX_GENERATED_ASSET_BYTES ### Tertiary (LOW confidence) - None — all critical claims verified by codebase inspection or npm registry. --- ## Metadata **Confidence breakdown:** - Standard stack: HIGH — all packages verified via codebase inspection + npm registry - Architecture: HIGH — pattern directly derived from Phase 41 implementations in codebase - Pitfalls: HIGH — most derived from actual code review (ffmpeg-static cast, file-type ESM, etc.) - Environment availability: HIGH — verified via command execution on target system **Research date:** 2026-04-04 **Valid until:** 2026-05-04 (packages stable; architecture unlikely to change)