nexus/.planning/phases/42-wallpapers-social-format-conversion-voice/42-RESEARCH.md

38 KiB
Raw Blame History

Phase 42: Wallpapers, Social, Format Conversion & Voice — Research

Researched: 2026-04-04 Domain: Image generation (sharp/SVG), format conversion (sharp/ffmpeg-static/AI-bridge), social text generation (LLM), voice transcription (Whisper) Confidence: HIGH


<user_constraints>

User Constraints (from CONTEXT.md)

Locked Decisions

All implementation choices are at Claude's discretion — discuss phase was skipped per user setting.

Claude's Discretion

All implementation choices are at Claude's discretion. Use ROADMAP phase goal, success criteria, and codebase conventions to guide decisions.

Deferred Ideas (OUT OF SCOPE)

None — discuss phase skipped. </user_constraints>


<phase_requirements>

Phase Requirements

ID Description Research Support
WALL-01 User can generate desktop and mobile wallpapers from a description SVG-via-LLM + sharp rasterize at target dimensions; PLATFORM_DIMENSIONS constants in renderer
WALL-02 User can generate social media banners with correct dimensions per platform Same renderer; platform map covers OG Image, Twitter Card, Instagram, LinkedIn
WALL-03 User can generate Open Graph and social preview images Same renderer; OG Image = 1200×630 constant
WALL-04 User can generate app icons and favicons in multiple sizes Renderer returns multi-size bundle (1024, 512, 256, 64, 32); WallpaperPreview renders grid
SOCIAL-01 User can generate platform-ready posts respecting character limits (Twitter, LinkedIn) LLM prompt with platform limit injected; character count UI enforced per-platform constants
SOCIAL-02 User can generate Instagram carousels and thread sequences LLM returns JSON with slides array; carousel rendered as numbered collapsible sections
SOCIAL-03 System suggests relevant hashtags for generated content LLM prompt requests hashtag suggestions as JSON array alongside post text
CONV-01 User can convert between image formats (PNG, JPG, SVG, WebP, GIF) via sharp sharp 0.34.5 already installed; supports all listed formats
CONV-02 User can convert between audio/video formats via ffmpeg ffmpeg-static 7.0.2 already installed and verified working
CONV-03 User can convert between document formats via Pandoc/LibreOffice pandoc/libreoffice NOT installed → falls to AI-bridge per CONV-08
CONV-04 User can convert between data formats (CSV, JSON, XLSX) xlsx + csv-parse packages needed; pure-Node.js conversion
CONV-05 User can convert between any format pair via AI-bridged conversion puterChatComplete already established; handles unsupported pairs
CONV-06 System provides conversion UI with source/target format selection and drag-drop Standalone /convert page; ConvertPanel as described in UI spec
CONV-07 User can deep-link to specific conversion flows via URL /convert/:sourceFormat?/:targetFormat? route in App.tsx; pre-select chips on mount
CONV-08 System detects available direct converters at startup Startup probe service; GET /api/system/converters endpoint
CONV-09 System validates uploaded file MIME type via magic-byte detection file-type@22.0.0 (ESM, ships own types); validate at convert route before job dispatch
VOICE-01 User can click mic button in web chat to record and auto-transcribe via Whisper VoiceMicButton already in ChatInput when enableVoiceInput=true; already wired
VOICE-02 User can toggle between text-only, voice-input, and full-voice modes VoiceModeToggle already exists; already wired in ChatInput; Phase 42 verifies correctness
VOICE-03 Voice input works offline with local Whisper model voice-pipeline.ts already probes whisper-cpp → openai-whisper; WHISPER_MODEL env var + offline badge
</phase_requirements>

Summary

Phase 42 extends the Phase 41 content generation system with four new capabilities: platform-aware image generation (wallpapers, OG images, social banners, app icons), LLM-driven social post generation with hashtag suggestions, a full-featured file format conversion pipeline, and offline voice input via Whisper.

The server already has all critical dependencies for images (sharp@0.34.5, @resvg/resvg-js@2.6.2) and audio/video (ffmpeg-static@7.0.2 — verified working at /opt/nexus/node_modules/.pnpm/ffmpeg-static@5.3.0). Three packages need to be added: file-type@22.0.0 (magic-byte MIME detection), xlsx@0.18.5 (XLSX data conversion), and csv-parse@6.2.1 (CSV parsing). Document conversion (pandoc/libreoffice) is not available on this system and will fall through to AI-bridge per CONV-08 — no installation needed.

The voice pipeline (voice-pipeline.ts) already handles Whisper probe and transcription. Phase 42's voice work is: (1) add WHISPER_MODEL=local env var support to signal offline capability, (2) expose whisper availability to the UI via the existing /api/system/providers endpoint (already returns whisperAvailable), (3) render the "Offline" badge in ChatInput alongside VoiceMicButton. The VoiceMicButton, VoiceModeToggle, and enableVoiceInput=true wiring already exist in ChatPanel.tsx.

Primary recommendation: Follow the established Phase 41 renderer pattern: add four new jobType cases to content-job-runner.ts (wallpaper, social-post, convert), create one renderer file per job type in server/src/services/renderers/, and wire three new ContentStudio tabs + one standalone /convert page in the UI.


Standard Stack

Core (all verified installed in server)

Library Version Purpose Why Standard
sharp 0.34.5 Image format conversion + SVG rasterization at target dimensions Already installed; used by icon-renderer and org-chart-svg
@resvg/resvg-js 2.6.2 High-fidelity SVG→PNG rasterization with fitTo dimensions Already installed; used by diagram-renderer
ffmpeg-static 5.3.0 (bin: 7.0.2) Bundled ffmpeg binary for audio/video conversion Already installed; used by voice-pipeline and telegram
culori 4.0.2 OKLCH color math (not directly needed but available) Already installed
puterChatComplete (internal) LLM inference for wallpaper SVG generation, social posts, AI-bridge conversion Established pattern in Phase 41 renderers

New Dependencies (needs pnpm add in server)

Library Version Purpose Why Standard
file-type 22.0.0 Magic-byte MIME type detection for CONV-09 ESM-native, ships own types, well-maintained
xlsx 0.18.5 XLSX read/write for data conversion CONV-04 Most-used Excel library for Node.js
csv-parse 6.2.1 CSV parsing for data conversion CONV-04 De-facto standard, streaming API
@types/xlsx 0.0.36 TypeScript types for xlsx xlsx ships types/index.d.ts but @types available

Alternatives Considered

Instead of Could Use Tradeoff
file-type@22 (ESM) mmmagic or mime-magic file-type is pure JS, no native binding, ships own types; server is type:module so ESM is fine
xlsx exceljs xlsx is simpler API for read/write; exceljs has streaming but more complex
sharp for SVG rasterization Playwright (like diagram-renderer) sharp+resvg is faster for simple SVG → PNG; Playwright only needed for JavaScript-rendered content

Installation:

# Run from /opt/nexus/server
pnpm add file-type@22.0.0 xlsx@0.18.5 csv-parse@6.2.1
pnpm add -D @types/xlsx@0.0.36

Version verification (run before installing):

npm view file-type version   # → 22.0.0
npm view xlsx version        # → 0.18.5
npm view csv-parse version   # → 6.2.1

Architecture Patterns

Established Renderer Pattern (from Phase 41)

Every new capability follows this exact structure:

  1. Renderer file: server/src/services/renderers/{name}-renderer.ts exports async function render{Name}(input: Record<string, unknown>): Promise<RenderResult>
  2. Job runner switch: Add case '{jobtype}': to renderContent() in content-job-runner.ts
  3. Bundle type (if needed): Add interface {Name}Bundle to types.ts
  4. API route: Submit via existing POST /api/companies/:id/content-jobs with { jobType, input }
  5. UI hook: useContentJob(companyId) already handles all SSE + state management
  6. UI component: Panel reads job.bundle after status === 'done'

The format conversion job is the only exception — it requires a separate multipart upload route because the file binary cannot be passed as JSON input via the standard content-jobs endpoint.

server/src/
├── services/renderers/
│   ├── types.ts                    # ADD: WallpaperBundle, SocialPostBundle, ConvertBundle
│   ├── wallpaper-renderer.ts       # NEW
│   ├── social-renderer.ts          # NEW
│   └── convert-renderer.ts         # NEW
├── services/
│   └── converter-capabilities.ts   # NEW: startup probe + cache
└── routes/
    └── convert.ts                  # NEW: POST /api/companies/:id/convert (multipart)
                                    #      GET /api/system/converters

ui/src/
├── pages/
│   └── ConvertPage.tsx             # NEW: standalone /convert page
├── components/
│   ├── WallpaperGeneratePanel.tsx  # NEW
│   ├── WallpaperPreview.tsx        # NEW
│   ├── SocialPostPanel.tsx         # NEW
│   ├── SocialPostResult.tsx        # NEW
│   └── ConvertPanel.tsx            # NEW (contains ConvertSourceZone + ConvertTargetSelector + ConvertActionBar)
└── api/
    └── convert.ts                  # NEW: submitConvertJob (multipart), getConverterCapabilities

Pattern 1: Wallpaper Generation (WALL-01 to WALL-04)

What: LLM generates an SVG at a conceptual level, then sharp rasterizes it to exact pixel dimensions for the requested platform. When to use: Any fixed-dimension image asset (wallpaper, OG image, social banner, app icon).

// Source: established pattern from icon-renderer.ts + sharp resize
// server/src/services/renderers/wallpaper-renderer.ts

export const PLATFORM_DIMENSIONS: Record<string, { width: number; height: number; label: string }> = {
  "desktop-hd":       { width: 2560, height: 1440, label: "Desktop HD (2560 × 1440)" },
  "desktop-fhd":      { width: 1920, height: 1080, label: "Desktop FHD (1920 × 1080)" },
  "desktop-4k":       { width: 3840, height: 2160, label: "Desktop 4K (3840 × 2160)" },
  "mobile-portrait":  { width: 1080, height: 1920, label: "Mobile Portrait (1080 × 1920)" },
  "mobile-landscape": { width: 1920, height: 1080, label: "Mobile Landscape (1920 × 1080)" },
  "og-image":         { width: 1200, height: 630,  label: "OG Image (1200 × 630)" },
  "twitter-card":     { width: 1200, height: 628,  label: "Twitter Card (1200 × 628)" },
  "instagram-post":   { width: 1080, height: 1080, label: "Instagram Post (1080 × 1080)" },
  "instagram-banner": { width: 1080, height: 566,  label: "Instagram Banner (1080 × 566)" },
  "linkedin-banner":  { width: 1584, height: 396,  label: "LinkedIn Banner (1584 × 396)" },
  "app-icon":         { width: 1024, height: 1024, label: "App Icon (1024 × 1024)" },
  "favicon":          { width: 32,   height: 32,   label: "Favicon (32 × 32)" },
};

// App icon + favicon: render multiple sizes from one SVG
const APP_ICON_SIZES = [1024, 512, 256, 64, 32] as const;

// Render flow:
// 1. puterChatComplete → SVG string (LLM generates SVG matching aspect ratio)
// 2. sharp(svgBuffer).resize(width, height, { fit: 'fill' }).png() → PNG buffer
// 3. Return WallpaperBundle with pngBase64 + dimensions

Critical constraint: Platform dimensions MUST be constants, never magic numbers (success criterion 1). Export PLATFORM_DIMENSIONS from the renderer and re-export to the UI API client so the UI's Select options derive from the same source.

Pattern 2: Format Conversion Architecture (CONV-01 to CONV-09)

What: Multipart upload endpoint validates MIME, stores base64 in job input, dispatch to converter renderer which routes to sharp/ffmpeg/xlsx/AI-bridge based on format pair. Why separate route: Content-jobs POST accepts JSON; file binary needs multipart handling.

// server/src/routes/convert.ts — new multipart route
// POST /api/companies/:companyId/convert

import multer from "multer";
import { fileTypeFromBuffer } from "file-type";

router.post("/companies/:companyId/convert", async (req, res) => {
  // 1. multer.memoryStorage() upload (limit: MAX_ATTACHMENT_BYTES)
  // 2. fileTypeFromBuffer(file.buffer) → detected MIME
  // 3. Compare detected MIME against file extension claim
  // 4. If mismatch: res.status(422).json({ error: "...", actualMime, claimedMime })
  // 5. job input: { fileBase64: buffer.toString('base64'), sourceMime, targetFormat, originalFilename }
  // 6. contentJobStore.create + contentJobRunner.dispatch
  // 7. res.status(202).json({ jobId, status })
});

// GET /api/system/converters — capability map for UI
router.get("/system/converters", async (_req, res) => {
  const caps = await converterCapabilitiesService().get();
  res.json(caps);
  // Returns: { imageConverter: true, audioVideoConverter: true, docConverter: false, dataConverter: true }
});
// server/src/services/renderers/convert-renderer.ts

async function renderConvert(input: Record<string, unknown>): Promise<RenderResult> {
  const { fileBase64, sourceMime, targetFormat } = input;
  const fileBuffer = Buffer.from(fileBase64 as string, "base64");

  // Route by format category:
  if (isImageFormat(sourceMime) && isImageFormat(targetFormat)) {
    return convertImageViaSharp(fileBuffer, sourceMime, targetFormat);
  }
  if (isAudioVideoFormat(sourceMime) || isAudioVideoFormat(targetFormat)) {
    return convertAVViaFfmpeg(fileBuffer, sourceMime, targetFormat);
  }
  if (isDataFormat(sourceMime) || isDataFormat(targetFormat)) {
    return convertDataFormat(fileBuffer, sourceMime, targetFormat);
  }
  // All other pairs: AI bridge (CONV-05)
  return convertViaAiBridge(fileBuffer, sourceMime, targetFormat);
}

Pattern 3: Converter Capability Probe (CONV-08)

// server/src/services/converter-capabilities.ts
// Probe at startup, cache result (same pattern as hardwareService)

let cache: ConverterCapabilities | null = null;

export interface ConverterCapabilities {
  imageConverter: boolean;   // sharp — always true (npm dep)
  audioVideoConverter: boolean; // ffmpeg-static — always true (npm dep)
  docConverter: boolean;     // pandoc or libreoffice — probe at startup
  dataConverter: boolean;    // xlsx + csv-parse — always true (npm dep)
}

export function converterCapabilitiesService() {
  async function get(): Promise<ConverterCapabilities> {
    if (cache) return cache;
    let docConverter = false;
    try {
      await execFileAsync("pandoc", ["--version"], { timeout: 2000 });
      docConverter = true;
    } catch {
      try {
        await execFileAsync("libreoffice", ["--version"], { timeout: 2000 });
        docConverter = true;
      } catch { /* not available */ }
    }
    cache = { imageConverter: true, audioVideoConverter: true, docConverter, dataConverter: true };
    return cache;
  }
  return { get };
}

Pattern 4: Social Post Generation (SOCIAL-01 to SOCIAL-03)

// server/src/services/renderers/social-renderer.ts

export const PLATFORM_CHAR_LIMITS: Record<string, number> = {
  "twitter-x": 280,
  "linkedin": 3000,
  "instagram-caption": 2200,
  "instagram-carousel": 300, // per slide
};

// LLM prompt asks for JSON: { post: string, hashtags: string[], slides?: string[] }
// For carousel: slides array, each under 300 chars
// puterChatComplete returns JSON; renderer parses + validates

Pattern 5: Voice Offline Badge (VOICE-03)

The voice pipeline already handles Whisper detection. Phase 42 adds two things:

  1. Server: WHISPER_MODEL env var read in voice-pipeline.ts — when set to "local", include "local" in nexus-settings response or expose via GET /api/system/providers (already returns whisperAvailable from hardwareService().detect()).

  2. UI: In ChatInput.tsx, read whisperAvailable from a useConverterCapabilities() or useSystemProviders() hook. Show <span aria-label="Voice input is offline (local model)">Offline</span> next to VoiceMicButton when whisperAvailable === true.

IMPORTANT: The existing GET /api/system/providers already returns { whisperAvailable: boolean, piperAvailable: boolean, ... } — no new endpoint needed. Create a useSystemProviders() hook that calls this endpoint once on mount.

Pattern 6: ContentStudio Tab Extension + Standalone Convert Page

// ui/src/pages/ContentStudio.tsx — extend TabsList
// Add three TabsTriggers: "Wallpapers", "Social", "Convert"
// "Convert" tab value triggers navigate() to /convert (standalone page)
// TabsContent for wallpapers and social are normal panel components
// TabsContent for convert is NOT a content panel — the tab click navigates away

// ui/src/App.tsx — add new routes in boardRoutes()
<Route path="content-studio" element={<ContentStudio />} />
<Route path="convert" element={<ConvertPage />} />
<Route path="convert/:sourceFormat" element={<ConvertPage />} />
<Route path="convert/:sourceFormat/:targetFormat" element={<ConvertPage />} />

Anti-Patterns to Avoid

  • Magic number dimensions: Never hardcode 2560 or 1440 in component code — always read from PLATFORM_DIMENSIONS constant exported from renderer or a shared types file.
  • Passing file buffer as base64 in SSE-triggered jobs with >10MB files: The 10MB multer limit prevents oversized uploads; document this clearly in the convert route.
  • Blocking HTTP on render: All conversion dispatched fire-and-forget via contentJobRunner.dispatch(). The POST /convert route returns 202 immediately.
  • Showing format pairs as "unavailable": Per CONV-08, all format pairs are selectable in the UI. Unavailable direct converters show the AI fallback notice, never a disabled/grey chip.
  • Creating a separate /api/convert/validate endpoint: Validate at job submit time in the convert route (simpler, fewer round trips). The UI spec notes this as an OR condition.
  • Satori for wallpaper generation: Satori is NOT installed. Use the established pattern: LLM generates SVG → sharp rasterizes to exact dimensions. Satori would require JSX rendering infrastructure not needed here.

Don't Hand-Roll

Problem Don't Build Use Instead Why
MIME type detection from file bytes Custom magic-byte reader file-type@22.0.0 Handles 500+ MIME types, handles edge cases like truncated files, streaming API
XLSX read/write Custom binary parser xlsx@0.18.5 XLSX format is complex binary (OOXML); hand-rolling is weeks of work
CSV parsing String.split() csv-parse@6.2.1 Handles quoted fields, escaped commas, multiline values, BOM
Image format conversion Native buffer manipulation sharp@0.34.5 Already installed; handles color spaces, ICC profiles, transparency
Audio/video conversion Custom codec wrappers ffmpeg-static@7.0.2 Already installed; handles all codec negotiation
SVG rasterization canvas/Playwright @resvg/resvg-js@2.6.2 Already installed; faster than Playwright for static SVG
LLM inference New HTTP client puterChatComplete() Already implemented in Phase 41; puter-inference.ts is the project standard

Key insight: All heavy-lifting tools are already installed. Phase 42 is primarily wiring (new renderers + routes + UI panels) rather than infrastructure.


Common Pitfalls

Pitfall 1: Sharp SVG Input at Large Dimensions

What goes wrong: sharp(svgBuffer).resize(2560, 1440) produces a blurry image when the SVG has a small implicit pixel density. Why it happens: Sharp defaults to 72 DPI for SVG input; scaling up produces raster artifacts before the resize step. How to avoid: Always pass { density: 300 } option when loading SVG into sharp: sharp(svgBuffer, { density: 300 }).resize(width, height, { fit: 'fill' }).png(). Alternatively, ask the LLM to generate an SVG with viewBox="0 0 {width} {height}" matching the target dimensions, then use Resvg with fitTo: { mode: 'width', value: width }. Warning signs: Generated wallpapers look pixelated or blurry at edges.

Pitfall 2: file-type v22 Import Syntax

What goes wrong: import FileType from 'file-type' fails with "does not provide an export named 'default'". Why it happens: file-type v22 is pure ESM with named exports only. How to avoid: Use named import: import { fileTypeFromBuffer } from 'file-type'. Server is type: module with module: NodeNext — ESM imports work directly. Warning signs: TypeScript error TS2613 or runtime "is not a function" errors.

Pitfall 3: ffmpeg-static Path Resolution

What goes wrong: spawn(ffmpegPath, ...) throws ENOENT even though ffmpeg-static is installed. Why it happens: ffmpegPath from import ffmpegPath from 'ffmpeg-static' is the binary path string, but it needs as unknown as string cast due to TS type mismatch. The actual binary is at /opt/nexus/node_modules/.pnpm/ffmpeg-static@5.3.0/node_modules/ffmpeg-static/ffmpeg. How to avoid: Copy the existing pattern from voice-pipeline.ts exactly: if (!ffmpegPath) throw new Error("ffmpeg-static binary not found"); const ffmpegBin = ffmpegPath as unknown as string;. Warning signs: ffmpegBin is null/undefined; ENOENT on spawn.

Pitfall 4: Content-Job Input Size for Conversion

What goes wrong: Submitting a 10MB file as base64 in job input stores ~13.3MB of base64 in the content_jobs.input JSONB column per submission. Why it happens: base64 adds ~33% overhead. For a 10MB file (MAX_ATTACHMENT_BYTES), this is ~13.3MB per job row. How to avoid: This is acceptable for the single-user case (success criteria assume one conversion at a time). Document the max file size clearly in the UI (the multer limit enforces it). If this becomes a problem in future, change the renderer to accept storage object keys (requires extending content-job-runner signature). Warning signs: Postgres table growth visible in db metrics after many conversions.

What goes wrong: LLM returns markdown-fenced JSON or adds explanation text, causing JSON.parse() to throw. Why it happens: LLMs sometimes wrap JSON in json ... fences. How to avoid: Post-process LLM output to strip markdown fences before JSON.parse(). Use a robust extraction pattern: const match = raw.match(/```json\s*([\s\S]*?)\s*```/) || raw.match(/({[\s\S]*})/); JSON.parse(match ? match[1] : raw). Apply the same fix pattern used by icon-renderer.ts SVG validation. Warning signs: SocialPostResult shows "Generation failed" after seemingly valid LLM output.

What goes wrong: /convert/PNG/SVG doesn't pre-select chips because the component does a case-sensitive compare against format names. Why it happens: URL params are case-sensitive; format chips may be stored as uppercase. How to avoid: Normalize URL params to lowercase on read: params.sourceFormat?.toLowerCase(). Match against chip identifiers using formatId.toLowerCase() === param.toLowerCase(). Warning signs: Deep-link URL works in one case but not when user types different casing.

Pitfall 7: Voice Offline Badge Always Showing

What goes wrong: The "Offline" badge shows even when whisper is not installed (whisperAvailable: false). Why it happens: Misreading the UI spec: badge shows when whisperAvailable === true (local model detected), not when WHISPER_MODEL=local env var is set (which is confusing naming). How to avoid: Read whisperAvailable from GET /api/system/providers. Show badge if whisperAvailable === true. The "offline capability" is proven by the binary being detected, not by an env var. The WHISPER_MODEL env var mentioned in the UI spec is a future extension point for model selection — do not implement it unless the spec is explicitly required. Per VOICE-03, "works offline with locally cached model" means the whisper-cpp binary + base model are present. Warning signs: Badge shows on machines where whisper is not installed.


Code Examples

Wallpaper Renderer: Sharp at Target Dimensions

// Source: icon-renderer.ts pattern + sharp resize extension
// server/src/services/renderers/wallpaper-renderer.ts

import sharp from "sharp";
import { puterChatComplete } from "../puter-inference.js";
import type { RenderResult } from "./types.js";

async function renderSvgToWallpaper(svgString: string, width: number, height: number): Promise<Buffer> {
  return sharp(Buffer.from(svgString), { density: 300 })
    .resize(width, height, { fit: "fill" })
    .png({ compressionLevel: 9 })
    .toBuffer();
}

Magic-Byte MIME Validation

// Source: file-type@22 documentation — ESM named import
import { fileTypeFromBuffer } from "file-type";

async function validateMime(buffer: Buffer, claimedExtension: string): Promise<{ ok: boolean; actualMime?: string; claimedMime?: string }> {
  const detected = await fileTypeFromBuffer(buffer);
  if (!detected) return { ok: true }; // unknown type, allow (SVG/text files have no magic bytes)
  const mimeForExtension = extensionToMime(claimedExtension); // lookup table
  if (mimeForExtension && detected.mime !== mimeForExtension) {
    return { ok: false, actualMime: detected.mime, claimedMime: mimeForExtension };
  }
  return { ok: true };
}

ffmpeg-static Conversion (audio/video)

// Source: voice-pipeline.ts pattern (established in Phase 36)
import ffmpegPath from "ffmpeg-static";
import { spawn } from "node:child_process";

if (!ffmpegPath) throw new Error("ffmpeg-static binary not found");
const ffmpegBin = ffmpegPath as unknown as string;

function convertAVViaFfmpeg(inputBuffer: Buffer, sourceFormat: string, targetFormat: string): Promise<Buffer> {
  return new Promise<Buffer>((resolve, reject) => {
    const ffmpeg = spawn(ffmpegBin, [
      "-f", sourceFormat,
      "-i", "pipe:0",
      "-f", targetFormat,
      "pipe:1",
    ], { stdio: ["pipe", "pipe", "pipe"] });
    const chunks: Buffer[] = [];
    ffmpeg.stdout.on("data", (c: Buffer) => chunks.push(c));
    ffmpeg.stderr.on("data", () => {}); // discard
    ffmpeg.on("close", (code) => code === 0 ? resolve(Buffer.concat(chunks)) : reject(new Error(`ffmpeg exited ${code}`)));
    ffmpeg.on("error", reject);
    ffmpeg.stdin.write(inputBuffer);
    ffmpeg.stdin.end();
  });
}

Data Format Conversion (CSV ↔ JSON ↔ XLSX)

// Source: xlsx documentation + csv-parse documentation
import * as XLSX from "xlsx";
import { parse as csvParse } from "csv-parse/sync";

// CSV → JSON
function csvToJson(buffer: Buffer): Record<string, unknown>[] {
  return csvParse(buffer, { columns: true, skip_empty_lines: true });
}

// JSON → XLSX
function jsonToXlsx(data: Record<string, unknown>[]): Buffer {
  const ws = XLSX.utils.json_to_sheet(data);
  const wb = XLSX.utils.book_new();
  XLSX.utils.book_append_sheet(wb, ws, "Sheet1");
  return Buffer.from(XLSX.write(wb, { type: "buffer", bookType: "xlsx" }));
}

useContentJob Pattern (UI — already exists)

// Source: ui/src/hooks/useContentJob.ts (Phase 41)
// Usage in WallpaperGeneratePanel:
const job = useContentJob(companyId);

// Submit
job.submit("wallpaper", { prompt, platformKey: "desktop-hd" });

// Render result when done
if (job.status === "done" && job.bundle) {
  const bundle = job.bundle as WallpaperBundle;
  // bundle.pngBase64, bundle.dimensions, bundle.platformKey
}

Converter Capabilities in UI

// ui/src/hooks/useSystemProviders.ts (new)
// Calls GET /api/system/providers once on mount, caches result
// Returns: { whisperAvailable, piperAvailable, ... }
// Used by ChatInput for offline badge, by ConvertTargetSelector for AI-fallback notice

// ui/src/hooks/useConverterCapabilities.ts (new)
// Calls GET /api/system/converters once on mount
// Returns: { imageConverter, audioVideoConverter, docConverter, dataConverter }

State of the Art

Old Approach Current Approach When Changed Impact
Manual MIME detection via extension Magic-byte detection via file-type file-type v19+ Required for CONV-09 — extension can be spoofed
Pandoc/LibreOffice for doc conversion AI-bridge fallback when not available CONV-08 design No installer required; works everywhere
Separate validate endpoint Validate at submit time UI spec v1 Fewer round trips, simpler client code

Deprecated/outdated:

  • satori for wallpaper generation: Not installed and not needed. The Phase 41 pattern (LLM SVG + sharp rasterize) is sufficient and consistent with existing code.
  • Separate /api/convert/validate endpoint: Consolidate validation into the convert submit route.

Open Questions

  1. WallpaperBundle storage format

    • What we know: Other bundles (DiagramBundle, IconSetBundle) store base64-encoded assets in JSON
    • What's unclear: For wallpapers at 2560×1440, the PNG can be 515MB — base64 encoding adds ~33% → 20MB JSON blob stored in content_jobs.output. MAX_GENERATED_ASSET_BYTES = 500MB so it fits, but row size may be large for Postgres.
    • Recommendation: Store the PNG as an asset (same as diagram-renderer stores to storage), and return WallpaperBundle with assetId + dimensions + platformKey. The UI downloads via /api/assets/:id/content. This avoids storing large base64 in the DB. Follow the same pattern if app-icon returns multiple sizes: store each size as a separate asset or as a multi-size ZIP.
  2. Convert job input size for large files

    • What we know: base64(10MB file) = ~13.3MB JSON in content_jobs.input column
    • What's unclear: Whether Postgres/Drizzle has JSONB size limits that would reject this
    • Recommendation: Postgres JSONB has no practical size limit beyond the max row size (1GB). 13.3MB is fine. Document the 10MB upload cap in the UI.
  3. Social post carousel slide format

    • What we know: SOCIAL-02 says "Instagram carousels and thread sequences"
    • What's unclear: Whether thread sequences means Twitter threads (numbered tweets) or just a generic multi-part structure
    • Recommendation: Implement as a unified slides: string[] field in SocialPostBundle. The collapsible sections in SocialPostResult handle both Twitter threads and Instagram carousel displays.

Environment Availability

Dependency Required By Available Version Fallback
sharp CONV-01, WALL-01-04 0.34.5
@resvg/resvg-js WALL-01-04 2.6.2
ffmpeg-static CONV-02 7.0.2 (binary)
file-type CONV-09 Install: pnpm add file-type@22.0.0
xlsx CONV-04 Install: pnpm add xlsx@0.18.5
csv-parse CONV-04 Install: pnpm add csv-parse@6.2.1
pandoc CONV-03 AI-bridge (CONV-08 design)
libreoffice CONV-03 AI-bridge (CONV-08 design)
whisper-cpp VOICE-03 openai-whisper CLI fallback; error message if neither
whisper (openai) VOICE-03 whisper-cpp fallback
satori Phase goal wording Not needed — use LLM SVG + sharp pattern

Missing dependencies with no fallback:

  • file-type, xlsx, csv-parse — these MUST be installed in Wave 0. Phase cannot complete CONV-01/CONV-04/CONV-09 without them.

Missing dependencies with fallback:

  • pandoc, libreoffice — document conversion falls through to AI-bridge per CONV-08 design. Planner should add a startup probe that logs "pandoc not found, doc conversion will use AI bridge" rather than failing.
  • whisper-cpp, whisper — existing voice pipeline already handles both missing gracefully with an informative error. VOICE-03 "offline" badge is shown based on whisperAvailable from hardware detection.

Validation Architecture

Test Framework

Property Value
Framework vitest 3.0.5
Server config server/vitest.config.ts (environment: node)
UI config ui/vitest.config.ts (environment: node, react plugin)
Quick run (server) cd /opt/nexus/server && npx vitest run src/__tests__/42-*.test.ts
Full suite (server) cd /opt/nexus/server && npx vitest run
Quick run (UI) cd /opt/nexus/ui && npx vitest run src/**/*.test.{ts,tsx}

Note: Server baseline has 4 pre-existing failing test files (hardware-detection, skill-registry-routes, agent-permissions, heartbeat-workspace-session) — these are NOT caused by Phase 42. Phase 42 tests must not add to this count.

Phase Requirements → Test Map

Req ID Behavior Test Type Automated Command File Exists?
WALL-01/02/03 renderWallpaper() returns PNG buffer at correct dimensions per platform key unit npx vitest run src/__tests__/42-wallpaper-renderer.test.ts Wave 0
WALL-04 App icon renderer returns multi-size array unit npx vitest run src/__tests__/42-wallpaper-renderer.test.ts Wave 0
SOCIAL-01/02/03 renderSocialPost() returns post text + hashtags; carousel returns slides array unit npx vitest run src/__tests__/42-social-renderer.test.ts Wave 0
CONV-01 Image conversion round-trip (PNG→JPG) via sharp unit npx vitest run src/__tests__/42-convert-renderer.test.ts Wave 0
CONV-02 Audio conversion dispatch calls ffmpeg-static binary unit (mocked) npx vitest run src/__tests__/42-convert-renderer.test.ts Wave 0
CONV-04 CSV→JSON and JSON→XLSX conversions unit npx vitest run src/__tests__/42-convert-renderer.test.ts Wave 0
CONV-05 Unknown pair falls through to AI bridge unit npx vitest run src/__tests__/42-convert-renderer.test.ts Wave 0
CONV-08 converterCapabilitiesService probes pandoc/libreoffice at startup unit (mocked execFile) npx vitest run src/__tests__/42-converter-capabilities.test.ts Wave 0
CONV-09 MIME mismatch rejected with 422 at convert route unit (supertest) npx vitest run src/__tests__/42-convert-routes.test.ts Wave 0
VOICE-01/02 VoiceMicButton renders in ChatInput when enableVoiceInput=true manual (pre-existing wiring) n/a — already wired in ChatPanel.tsx Existing
VOICE-03 Offline badge shows when whisperAvailable=true from /api/system/providers unit (mocked hook) npx vitest run src/**/*.test.tsx (UI test) Wave 0

Sampling Rate

  • Per task commit: cd /opt/nexus/server && npx vitest run src/__tests__/42-*.test.ts
  • Per wave merge: cd /opt/nexus/server && npx vitest run (full server suite)
  • Phase gate: Full server + UI suites green before /gsd:verify-work

Wave 0 Gaps

  • server/src/__tests__/42-wallpaper-renderer.test.ts — covers WALL-01 through WALL-04
  • server/src/__tests__/42-social-renderer.test.ts — covers SOCIAL-01 through SOCIAL-03
  • server/src/__tests__/42-convert-renderer.test.ts — covers CONV-01 through CONV-05
  • server/src/__tests__/42-converter-capabilities.test.ts — covers CONV-08
  • server/src/__tests__/42-convert-routes.test.ts — covers CONV-09 (MIME validation at HTTP layer)
  • UI test for offline badge rendering (VOICE-03)
  • Package install: pnpm add file-type@22.0.0 xlsx@0.18.5 csv-parse@6.2.1 && pnpm add -D @types/xlsx@0.0.36

Sources

Primary (HIGH confidence)

  • Codebase direct read: server/src/services/renderers/icon-renderer.ts — renderer pattern, sharp usage
  • Codebase direct read: server/src/services/renderers/diagram-renderer.ts — Playwright + Resvg pattern
  • Codebase direct read: server/src/services/content-job-runner.ts — job dispatch architecture
  • Codebase direct read: server/src/services/voice-pipeline.ts — Whisper probe and transcription pattern
  • Codebase direct read: server/src/routes/voice.ts — multer upload pattern for binary input
  • Codebase direct read: ui/src/hooks/useContentJob.ts — SSE hook established in Phase 41
  • Codebase direct read: ui/src/components/ChatInput.tsx — existing VoiceMicButton wiring
  • Codebase direct read: ui/src/hooks/useVoiceMode.ts — existing voice mode settings pattern
  • Codebase direct read: server/src/services/hardware.ts — whisperAvailable detection, probe pattern
  • Codebase direct read: server/src/routes/hardware.ts — GET /api/system/providers returns whisperAvailable
  • Codebase direct read: server/src/app.ts — route mounting pattern
  • Codebase direct read: server/package.json — installed deps list
  • npm view file-type version → 22.0.0 (verified 2026-04-04)
  • npm view xlsx version → 0.18.5 (verified 2026-04-04)
  • npm view csv-parse version → 6.2.1 (verified 2026-04-04)
  • Binary probe: /opt/nexus/node_modules/.pnpm/ffmpeg-static@5.3.0/.../ffmpeg -version → 7.0.2 (verified working)

Secondary (MEDIUM confidence)

  • .planning/STATE.md — accumulated decisions: CONV-05, CONV-08, CONV-09 architectural choices locked
  • Phase 41-01-SUMMARY.md — renderer pattern, useContentJob hook, tech stack context
  • Phase 40-01-SUMMARY.md — content_jobs schema, RenderResult interface, MAX_GENERATED_ASSET_BYTES

Tertiary (LOW confidence)

  • None — all critical claims verified by codebase inspection or npm registry.

Metadata

Confidence breakdown:

  • Standard stack: HIGH — all packages verified via codebase inspection + npm registry
  • Architecture: HIGH — pattern directly derived from Phase 41 implementations in codebase
  • Pitfalls: HIGH — most derived from actual code review (ffmpeg-static cast, file-type ESM, etc.)
  • Environment availability: HIGH — verified via command execution on target system

Research date: 2026-04-04 Valid until: 2026-05-04 (packages stable; architecture unlikely to change)