# Phase 43: Documents & Branding - Research **Researched:** 2026-04-04 **Domain:** PDF generation via Playwright, brand identity kit assembly, ZIP packaging **Confidence:** HIGH --- ## User Constraints (from CONTEXT.md) ### Locked Decisions All implementation choices are at Claude's discretion — discuss phase was skipped per user setting. ### Claude's Discretion All implementation choices are at Claude's discretion. Use ROADMAP phase goal, success criteria, and codebase conventions. ### Deferred Ideas (OUT OF SCOPE) None — discuss phase skipped. --- ## Phase Requirements | ID | Description | Research Support | |----|-------------|------------------| | DOC-01 | User can generate formatted PDF reports from conversation content | Playwright page.pdf() with HTML-to-PDF approach; follows diagram-renderer.ts Playwright pattern | | DOC-02 | User can generate invoices and contracts from templates | Template-driven HTML rendered to PDF via Playwright; pdf-lib for lightweight data-only invoices | | DOC-03 | User can generate one-pagers and API documentation | LLM-generated HTML + CSS styled page → Playwright PDF; same pipeline as DOC-01 | | BRAND-01 | User can generate a full brand identity from a single conversation | Multi-step LLM extraction: brand name/colors/typography → feeds each sub-renderer | | BRAND-02 | System produces logo mark (SVG), avatar in multiple sizes | SVG logo via LLM + validateAndCleanSvg; sharp rasterizes to [512, 256, 128, 64, 32]px PNGs | | BRAND-03 | System produces social media profile images and banners per platform | Re-uses PLATFORM_DIMENSIONS from wallpaper-renderer; logo composited onto brand-colored background | | BRAND-04 | System produces email signature and letterhead templates | LLM generates HTML; stored as HTML string inside bundle + PNG preview via Playwright screenshot | | BRAND-05 | System produces a brand guidelines document (PDF) | Playwright PDF of a styled brand guidelines HTML page generated by LLM | | BRAND-06 | User can download all brand assets as a zip package | `archiver` v7 streams all bundle assets into a ZIP buffer returned as a single RenderResult | --- ## Summary Phase 43 adds two new job types — `pdf-document` (DOC-01..03) and `brand-kit` (BRAND-01..06) — to the existing content job pipeline. Both use infrastructure from Phase 40 (job store, SSE, asset storage) and follow the renderer pattern established in Phases 41-42. PDF generation uses the Playwright Chromium browser already installed at `~/.cache/ms-playwright/chromium-1217/`. The `resolveBrowserPath()` function in `diagram-renderer.ts` is already in place and reusable. The pattern is: LLM generates an HTML page → Playwright renders it → `page.pdf()` outputs a PDF buffer. This sidesteps any native binary dependencies beyond the already-present Chromium. Brand kit generation orchestrates multiple sub-renders — logo SVG, avatar PNGs, social images, email signature HTML, letterhead HTML, brand guidelines PDF — and then packages them with `archiver` into a single ZIP buffer. The ZIP is stored as a single generated asset; the UI fetches and triggers a browser download. `pdf-lib` is NOT needed for Phase 43 — Playwright PDF covers all three document types. **Primary recommendation:** One `pdf-renderer.ts` for DOC-01..03 and one `brand-renderer.ts` for BRAND-01..06. Both follow the `renderX(input) → RenderResult` contract. Add two new `case` blocks in `content-job-runner.ts`. Add two new tabs to `ContentStudio.tsx`. --- ## Project Constraints (from CLAUDE.md) No `CLAUDE.md` exists in the project root. Constraints are derived from codebase conventions documented in STATE.md: - **Async job pattern is mandatory** — all render requests return 202 + job ID immediately; never block HTTP on render - **sourceTaskId required** on every generated asset from day one - **MAX_GENERATED_ASSET_BYTES** applies to all generated assets (bypasses 10MB upload limit for "generated" namespace) - **Playwright Chromium** already decided for design-rich PDFs (confirmed in STATE.md blocker note: "Confirm pdf-lib scope: Playwright for design-rich PDFs, pdf-lib for data-driven invoices — decide at Phase 43 planning") - **Renderer pattern**: `renderX(input: Record): Promise` — default export via dynamic import in `content-job-runner.ts` - **Bundle pattern**: rich JSON blob stored as the RenderResult; UI fetches the asset URL, parses JSON, hydrates component - **puterChatComplete** for all LLM calls; reads `PUTER_AUTH_TOKEN` from env - **No new binary dependencies** beyond what is already installed (`sharp`, `svgo`, `playwright-core`, `@resvg/resvg-js`, `ffmpeg-static`) - **TypeScript strict** — all new files need proper types - **Test mocks**: `playwright-core` and `puter-inference.js` are always vi.mock()ed in tests --- ## Standard Stack ### Core (already installed — no new installs needed for PDF) | Library | Version | Purpose | Why Standard | |---------|---------|---------|--------------| | `playwright-core` | 1.58.2 | HTML → PDF via Chromium headless | Already installed, `resolveBrowserPath()` written | | `sharp` | ^0.34.5 | SVG → PNG rasterization for avatars and social images | Already in use (wallpaper-renderer.ts) | | `svgo` | ^4.0.1 | SVG cleanup/validation | Already in use (icon-renderer.ts) | | `@resvg/resvg-js` | ^2.6.2 | High-fidelity SVG → PNG for logo mark | Already in use (diagram-renderer.ts) | ### New Dependency: ZIP Packaging | Library | Version | Purpose | Why | |---------|---------|---------|-----| | `archiver` | ^7.0.1 | Stream multiple buffers into a ZIP buffer | Well-maintained (MIT), streams-based, works entirely in memory via `archiver.finalize()` + collect into Buffer; no disk I/O | **Installation (one new package):** ```bash pnpm --filter @paperclipai/server add archiver pnpm --filter @paperclipai/server add -D @types/archiver ``` **Version verification (npm registry, 2026-04-04):** - `archiver`: 7.0.1 (latest) — confirmed - `@types/archiver`: published alongside, v5.3.4 ### Why NOT pdf-lib STATE.md blocker note says "Confirm pdf-lib scope: Playwright for design-rich PDFs, pdf-lib for data-driven invoices — decide at Phase 43 planning." **Decision: Use Playwright for all three doc types (DOC-01, DOC-02, DOC-03).** Rationale: - DOC-01 (reports), DOC-02 (invoices), DOC-03 (one-pagers) all need styled output — headings, tables, code blocks - LLM can generate HTML + inline CSS in a single shot; Playwright renders it faithfully - `pdf-lib` is excellent for programmatic PDF manipulation (merge, fill form fields) but poor for styled layout - We already have Playwright Chromium; adding pdf-lib adds another package for no benefit - Invoice "templates" work cleanly as HTML templates: LLM fills the line items, Playwright renders to PDF ### Alternatives Considered | Instead of | Could Use | Tradeoff | |------------|-----------|----------| | Playwright HTML→PDF | `pdf-lib` | pdf-lib has no layout engine; styling complex reports requires hand-coding coordinates — far harder than HTML+CSS | | archiver | `jszip` | jszip is promise-based but slower; archiver's streaming API handles large asset sets better | | archiver | `adm-zip` | adm-zip v0.5 is synchronous; blocks event loop for large zips | | LLM-generated SVG logo | Stable Diffusion / DALL-E | Out of scope per REQUIREMENTS.md | --- ## Architecture Patterns ### Recommended Project Structure ``` server/src/services/renderers/ ├── pdf-renderer.ts # NEW — DOC-01, DOC-02, DOC-03 ├── brand-renderer.ts # NEW — BRAND-01..06 ├── diagram-renderer.ts # existing ├── icon-renderer.ts # existing ├── wallpaper-renderer.ts # existing ├── social-renderer.ts # existing ├── convert-renderer.ts # existing └── types.ts # add PdfDocumentBundle + BrandKitBundle ui/src/components/ ├── DocumentGeneratePanel.tsx # NEW — DOC tab UI ├── BrandKitPanel.tsx # NEW — Brand tab UI ├── BrandKitResult.tsx # NEW — brand kit display + ZIP download trigger └── ... (existing) ui/src/pages/ └── ContentStudio.tsx # add "Documents" tab + "Brand" tab ``` ### Pattern 1: Playwright PDF Renderer Same structure as `diagram-renderer.ts` Playwright usage — launch browser from `resolveBrowserPath()`, create page, set HTML content, call `page.pdf()`, close browser. ```typescript // server/src/services/renderers/pdf-renderer.ts import { chromium } from "playwright-core"; import { resolveBrowserPath } from "./diagram-renderer.js"; import { puterChatComplete } from "../puter-inference.js"; import type { RenderResult, PdfDocumentBundle } from "./types.js"; export async function renderPdfDocument( input: Record, ): Promise { const docType = typeof input.docType === "string" ? input.docType : "report"; const prompt = typeof input.prompt === "string" ? input.prompt : ""; const title = typeof input.title === "string" ? input.title : "Document"; // LLM generates a complete, self-contained HTML document const html = await puterChatComplete([ { role: "system", content: buildPdfSystemPrompt(docType) }, { role: "user", content: prompt }, ]); const cleanHtml = stripMarkdownFences(html); const executablePath = resolveBrowserPath(); const browser = await chromium.launch({ executablePath, headless: true, args: ["--no-sandbox", "--disable-setuid-sandbox"], }); let pdfBuffer: Buffer; try { const page = await browser.newPage(); await page.setContent(cleanHtml, { waitUntil: "networkidle" }); const pdfUint8 = await page.pdf({ format: "A4", printBackground: true, margin: { top: "20mm", bottom: "20mm", left: "20mm", right: "20mm" }, }); pdfBuffer = Buffer.from(pdfUint8); } finally { await browser.close(); } const bundle: PdfDocumentBundle = { type: "pdf-document-bundle", docType, title, pdfBase64: pdfBuffer.toString("base64"), }; return { filename: `document-${docType}.json`, contentType: "application/json", buffer: Buffer.from(JSON.stringify(bundle)), }; } ``` **Key Playwright PDF options:** - `page.pdf({ format: "A4", printBackground: true })` — produces a Buffer (Uint8Array in Playwright v1.58) - `waitUntil: "networkidle"` — ensures any web fonts / images finish before capture; use `"domcontentloaded"` as fallback for offline-only HTML - Margin in mm units - `printBackground: true` — needed for colored headers/footers in styled documents ### Pattern 2: Brand Kit Orchestration Brand kit is a multi-step job: one LLM call extracts the brand specification, then sub-renderers produce each asset in sequence, then archiver packages everything. ```typescript // server/src/services/renderers/brand-renderer.ts interface BrandSpec { name: string; tagline: string; primaryColor: string; // hex secondaryColor: string; // hex fontStyle: "sans" | "serif" | "mono"; logoDescription: string; industry: string; } export async function renderBrandKit( input: Record, ): Promise { const prompt = typeof input.prompt === "string" ? input.prompt : ""; // Step 1: Extract brand specification const spec = await extractBrandSpec(prompt); // Step 2: Generate logo SVG const logoSvg = await generateLogoSvg(spec); // Step 3: Rasterize logo to avatar sizes [512, 256, 128, 64, 32] const avatarPngs = await rasterizeAvatars(logoSvg); // Step 4: Generate social platform images (profile + banner per platform) const socialImages = await generateSocialImages(spec, logoSvg); // Step 5: Generate email signature HTML + letterhead HTML const { signature, letterhead } = await generateTemplates(spec, logoSvg); // Step 6: Generate brand guidelines PDF via Playwright const guidelinesPdf = await generateGuidelinesPdf(spec, logoSvg, signature); // Step 7: Package everything into a ZIP const zipBuffer = await buildZip({ logoSvg, avatarPngs, socialImages, signature, letterhead, guidelinesPdf, }); const bundle: BrandKitBundle = { type: "brand-kit-bundle", spec, logoSvgBase64: Buffer.from(logoSvg).toString("base64"), avatarPngs, // { "512": base64, "256": base64, ... } socialImages, // { "twitter-profile": base64, ... } signatureHtml: signature, letterheadHtml: letterhead, guidelinesPdfBase64: guidelinesPdf.toString("base64"), zipBase64: zipBuffer.toString("base64"), }; return { filename: "brand-kit-bundle.json", contentType: "application/json", buffer: Buffer.from(JSON.stringify(bundle)), }; } ``` ### Pattern 3: archiver ZIP buffer assembly ```typescript // Source: archiver v7 official docs import archiver from "archiver"; import { Writable } from "stream"; async function buildZipBuffer(entries: Array<{ name: string; data: Buffer }>): Promise { return new Promise((resolve, reject) => { const chunks: Buffer[] = []; const sink = new Writable({ write(chunk: Buffer, _enc, cb) { chunks.push(chunk); cb(); }, }); const archive = archiver("zip", { zlib: { level: 6 } }); archive.on("error", reject); sink.on("finish", () => resolve(Buffer.concat(chunks))); archive.pipe(sink); for (const entry of entries) { archive.append(entry.data, { name: entry.name }); } void archive.finalize(); }); } ``` ### Pattern 4: content-job-runner.ts additions ```typescript // Add to renderContent() switch in content-job-runner.ts case "pdf-document": { const { renderPdfDocument } = await import("./renderers/pdf-renderer.js"); return renderPdfDocument(input); } case "brand-kit": { const { renderBrandKit } = await import("./renderers/brand-renderer.js"); return renderBrandKit(input); } ``` ### Pattern 5: UI — useContentJob + bundle fetch (established pattern) The UI pattern for both new tabs is identical to `SocialPostPanel.tsx`: 1. `useContentJob(companyId)` for submit + SSE progress 2. `if (job.status === "done" && job.resultAssetId && !bundle)` → fetch asset URL → `fetch()` → `JSON.parse()` → set bundle state 3. Display result component 4. Download button triggers `URL.createObjectURL(base64ToBinary(...))` + `.click()` For BRAND-06 (ZIP download): the zipBase64 field in the bundle drives a single download button that triggers a browser `` with the ZIP blob. ### Anti-Patterns to Avoid - **Do NOT use `waitUntil: "load"` for Playwright PDF** — network requests for CDN fonts will fail in the sandbox; use self-contained inline CSS with `@import` disabled, or use `waitUntil: "domcontentloaded"` + system fonts only - **Do NOT open a new Playwright browser per sub-step in brand kit** — open once, generate guidelines PDF in that session, close; reuse same `resolveBrowserPath()` pattern - **Do NOT pass a file path to archiver from tmpdir** — use `archive.append(buffer, { name })` in-memory to avoid disk temp files - **Do NOT build the brand kit as a single sequential LLM mega-prompt** — extract spec first (structured JSON), then feed spec fields into individual generators; this gives predictable output shapes - **Do NOT define BrandKitBundle only in the panel component file** — unlike wallpaper/social bundles (which are panel-local per STATE.md decision), `BrandKitBundle` and `PdfDocumentBundle` must be added to `server/src/services/renderers/types.ts` because the brand renderer references them directly --- ## Don't Hand-Roll | Problem | Don't Build | Use Instead | Why | |---------|-------------|-------------|-----| | PDF from styled HTML | Custom layout engine | `playwright-core` `page.pdf()` | CSS layout is already solved; Playwright has CSS paged media support | | ZIP archive in memory | Manually writing ZIP file format | `archiver` v7 | ZIP format has CRC32, compression, directory entries — trivially wrong to hand-roll | | SVG logo cleanup | Custom regex stripper | `svgo` (already installed) `validateAndCleanSvg()` in icon-renderer.ts | Already written and tested | | SVG → PNG rasterization | Sharp for logo mark | `@resvg/resvg-js` (already installed, used in diagram-renderer.ts) | Handles embedded fonts and complex gradients better than sharp for LLM-generated logos | | Brand color parsing | Write hex parser | CSS string; pass raw hex from LLM spec directly to SVG fill attributes | No parsing needed | **Key insight:** Every difficult sub-problem in this phase already has a solved dependency in the codebase. This phase is integration work, not new infrastructure. --- ## Common Pitfalls ### Pitfall 1: Playwright `page.pdf()` returns Uint8Array, not Buffer **What goes wrong:** TypeScript infers `Uint8Array`; passing directly to `Buffer.from()` works but `byteLength` comparison for MAX_GENERATED_ASSET_BYTES expects a Buffer. **Why it happens:** Playwright v1.58 changed the return type of `page.pdf()` to `Promise`. **How to avoid:** Always wrap: `const pdfBuffer = Buffer.from(await page.pdf({...}))`. **Warning signs:** TS2345 type error, or `buffer.byteLength` returning wrong value. ### Pitfall 2: LLM-generated HTML includes external resource references **What goes wrong:** HTML links to Google Fonts CDN, external images. Playwright in `--no-sandbox` headless mode may not fetch them, producing blank/missing content. **Why it happens:** LLM follows standard web conventions; doesn't know the context is offline. **How to avoid:** System prompt must explicitly say: "Use only inline CSS. No external URLs. Use web-safe system fonts (Arial, Georgia, monospace). No `` or `