diff --git a/.planning/research/ARCHITECTURE.md b/.planning/research/ARCHITECTURE.md index 08468d96..26ba985f 100644 --- a/.planning/research/ARCHITECTURE.md +++ b/.planning/research/ARCHITECTURE.md @@ -1,507 +1,535 @@ # Architecture Research -**Domain:** Voice Pipeline + Minimal Telegram Bridge (v1.6) — integration with existing Nexus/Paperclip monorepo -**Researched:** 2026-04-03 -**Confidence:** HIGH — based on direct codebase inspection + verified current documentation +**Domain:** Content generation integration — Nexus v1.7 +**Researched:** 2026-04-04 +**Confidence:** HIGH (based on direct codebase inspection of /opt/nexus) ---- +## Standard Architecture -## System Overview - -v1.6 adds two parallel capability tracks onto the existing monorepo: a transport-agnostic voice pipeline (Whisper STT + Piper TTS) and a disposable Telegram bridge that reuses those pipeline primitives for phone access. The architecture constraint is that no voice or chat logic is Telegram-specific — Telegram is an interchangeable transport layer that calls the same server services as the web UI. +### System Overview ``` -+-----------------------------------------------------------------------------------+ -| UI Layer (React/Vite) | -| | -| +-------------------------------------------------------------------------+ | -| | ChatPanel / PersonalAssistant (MODIFIED) | | -| | +---------------------+ +--------------------+ +------------------+ | | -| | | VoiceMicButton (NEW)| | WaveformDisplay | | TtsButton (v1.5) | | | -| | | silence detection | | (NEW) animated bars| | + auto-play prop | | | -| | | auto-send on silence| +--------------------+ +------------------+ | | -| | +---------------------+ | | -| | +-------------------------------------------------------------------+ | | -| | | ChatMessage (MODIFIED) — voice_mode badge, dual output toggle | | | -| | +-------------------------------------------------------------------+ | | -| | +-------------------------------------------------------------------+ | | -| | | VoiceModeToggle (NEW) — text only / voice input / full voice | | | -| | +-------------------------------------------------------------------+ | | -| +-------------------------------------------------------------------------+ | -+-----------------------------------------------------------------------------------+ - | HTTP + SSE -+-----------------------------------------------------------------------------------+ -| Server Layer (Express) | -| | -| +------------------------------------+ +------------------------------------+ | -| | voice.ts (NEW route) | | telegram.ts (NEW route/service) | | -| | POST /transcribe (MOVED) | | grammY long-poll process | | -| | POST /synthesize (NEW) | | text + voice relay | | -| +------------------------------------+ +------------------------------------+ | -| | | | -| +-----------------v--------------------------------------------v--------------+ | -| | voicePipelineService (NEW — core) | | -| | transcribe(audioBuffer, format) -> string | | -| | synthesize(text, voiceId?) -> Buffer (WAV) | | -| | formatForVoice(text) -> { voice: string, full: string } | | -| +------------------------------------------------------------------------------+ | -| | | -| +-----------------v--------------------------------------------------------------+| -| | chatService / nexusSettingsService (EXISTING) || -| | conversations . messages . stream SSE . memory . voiceEnabled || -| +--------------------------------------------------------------------------------+| -| | | -| +-----------------v--------------------------------------------------------------+| -| | External Processes (spawned via child_process.spawn / execFile) || -| | whisper-cpp / whisper (STT) piper (TTS) || -| +--------------------------------------------------------------------------------+| -+-----------------------------------------------------------------------------------+ - ^ - | Telegram Bot API (HTTPS long-poll) -+--------+------------------------------------------------------------------------+ -| Telegram (external service) | -| User sends text -> bot relays to chatService -> SSE reply -> bot sends back | -| User sends voice -> bot downloads OGG -> voicePipelineService.transcribe() | -| -> chatService -> reply -> voicePipelineService.synthesize() | -| -> bot sends OGG audio reply | -+----------------------------------------------------------------------------------+ ++---------------------------------------------------------------------------------+ +| UI Layer (React/Vite) | +| +------------------+ +------------------+ +--------------+ +----------------+ | +| | ChatPanel | | ContentJobViewer | | ThemePreview | | DiagramRenderer| | +| | (existing, | | (new) | | (new) | | (new, wraps | | +| | minor extension)| | progress+result | | CSS vars | | mermaid dep) | | +| +--------+---------+ +--------+---------+ +------+-------+ +-------+--------+ | +| | | | | | ++-----------|--------------------|--------------------|-----------------|------------+ + | HTTP/SSE | HTTP/SSE | HTTP | (client-side) ++-----------|--------------------|--------------------|-------------------------+ +| | API Layer (Express) | | +| +--------v----------------------------------------------v------------------+ | +| | /api/companies/:id/content-jobs (new) | | +| | /api/content-jobs/:id (new) | | +| | /api/companies/:id/themes/generate (new) | | +| +-------------------------------------------------------------------------+ | ++---------------------------------------------------------------------------------+ +| Service Layer (Node.js) | +| +-------------------+ +------------------+ +---------------------------------+| +| | contentJobService | | themeEngineService| | renderPipelineService || +| | (new) | | (new) | | (new) || +| | enqueue, status, | | palette gen, | | routes jobs to renderer adapters || +| | list | | WCAG check,export | | || +| +--------+----------+ +------------------+ +---------------+-----------------+| +| | | | +| +--------v----------------------------------------------------v--------------+ | +| | Renderer Adapters (new, behind interface) | | +| | +-------------+ +------------+ +----------+ +-----------+ +----------+ | | +| | | Mermaid | | SVG | | Remotion | | PDF | | Image | | | +| | | (isomorphic)| | (generator)| | (CLI) | | (Puppeteer| | (Sharp) | | | +| | +-------------+ +------------+ +----------+ +-----------+ +----------+ | | +| +-------------------------------------------------------------------------+ | ++---------------------------------------------------------------------------------+ +| Storage + Events Layer (existing, minimally extended) | +| +------------------+ +--------------------+ +-----------------------------+ | +| | StorageService | | publishLiveEvent | | assets table (existing) | | +| | (existing) | | (existing +3 types)| | content_jobs table (new) | | +| +------------------+ +--------------------+ +-----------------------------+ | ++---------------------------------------------------------------------------------+ ``` ---- +### Component Responsibilities -## Integration Points: New vs. Existing - -### What Stays Unchanged - -| Component | Location | Status | -|-----------|----------|--------| -| `chatService` | `server/src/services/chat.ts` | No changes — voice pipeline uses it as-is | -| `nexusSettingsService` | `server/src/services/nexus-settings.ts` | Extend schema only (add `voiceMode`, `telegramToken`) | -| `chatFileRoutes` | `server/src/routes/chat-files.ts` | `/transcribe` moves out; file upload stays | -| `usePiperTts` | `ui/src/hooks/usePiperTts.ts` | No changes — TtsButton continues using browser WASM | -| `TtsButton` | `ui/src/components/TtsButton.tsx` | Add auto-play prop only | -| SSE stream endpoint | `server/src/routes/chat.ts` | No changes — Telegram bridge calls services directly | -| DB schema | `packages/db` | No changes — voice is file/process, not a DB column | - -### What Changes (MODIFIED) - -| Component | Location | Change | -|-----------|----------|--------| -| `VoiceRecordButton` | `ui/src/components/VoiceRecordButton.tsx` | Add silence detection, waveform data emission, auto-send on silence | -| `ChatInput` | `ui/src/components/ChatInput.tsx` | Wire new VoiceMicButton, add voice mode prop | -| `ChatMessage` | `ui/src/components/ChatMessage.tsx` | Show voice_mode badge, show dual output collapse/expand | -| `nexusSettingsSchema` | `server/src/services/nexus-settings.ts` | Add `voiceMode` enum and `telegramToken` optional string | -| `app.ts` | `server/src/app.ts` | Register `voiceRoutes`, `telegramRoutes` | -| `createMessageSchema` | `packages/shared/src/validators/chat.ts` | Add `voiceMode: z.boolean().optional()` flag on messages | -| `ChatMessage` type | `packages/shared/src/types/chat.ts` | Add `voiceMode: boolean | null` field | -| `chat-files.ts` | `server/src/routes/chat-files.ts` | Remove `/transcribe` handler (moved to voice.ts) | - -### What Is New (NEW) - -| Component | Location | Purpose | -|-----------|----------|---------| -| `voicePipelineService` | `server/src/services/voice-pipeline.ts` | Transport-agnostic STT/TTS core — used by web routes AND Telegram bridge | -| `voice.ts` (route) | `server/src/routes/voice.ts` | `POST /api/transcribe`, `POST /api/synthesize` — thin HTTP wrappers | -| `telegram.ts` (service) | `server/src/services/telegram.ts` | grammY bot init, long-poll loop, message relay, voice relay | -| `telegram.ts` (route) | `server/src/routes/telegram.ts` | `GET /api/telegram/status`, `POST /api/telegram/token` management endpoints | -| `VoiceMicButton` | `ui/src/components/VoiceMicButton.tsx` | Enhanced mic button with silence detection and waveform display | -| `WaveformDisplay` | `ui/src/components/WaveformDisplay.tsx` | Animated audio waveform bars using AnalyserNode | -| `VoiceModeToggle` | `ui/src/components/VoiceModeToggle.tsx` | Three-state toggle: text only / voice input / full voice | -| `useVoiceMode` | `ui/src/hooks/useVoiceMode.ts` | Reads/writes voice mode setting via `/api/nexus-settings` | -| `useSilenceDetection` | `ui/src/hooks/useSilenceDetection.ts` | Web Audio API AnalyserNode watching for 1.5s silence threshold | - ---- - -## Component Boundaries - -### voicePipelineService (Core) - -This is the key abstraction for v1.6. Both the web HTTP route and the Telegram bridge call this service — neither knows about the other. - -| Method | Input | Output | Implementation | -|--------|-------|--------|----------------| -| `transcribe(buffer, format)` | `Buffer`, `"webm" or "ogg"` | `Promise` | Writes temp file, uses `execFile` (not `exec`) to spawn `whisper-cpp` or `whisper` CLI, reads stdout, cleans up | -| `synthesize(text, voiceId?)` | `string`, optional voiceId | `Promise` | Spawns `piper` CLI via `spawn`, pipes text to stdin, collects WAV stdout | -| `formatForVoice(text)` | `string` | `{ voice: string; full: string }` | Strips code blocks and markdown for voice; returns both variants | - -The `transcribe` method extends the existing `/transcribe` implementation from `chat-files.ts` by adding an `ogg` format path alongside the existing `webm` path. The same cascade (whisper-cpp first, openai-whisper fallback) is preserved. - -**Why a dedicated service vs. inline in routes:** -The Telegram bridge cannot call the web route (circular HTTP call within the same process). Both transports need the same logic. Extracting to a service eliminates duplication and makes both implementations testable in isolation. - -### telegram service - -A thin relay, not a feature-rich bot. It: -1. Holds a single grammY `Bot` instance, initialized when `telegramToken` is set in nexus-settings -2. Routes text messages to `chatService.addMessage()` then collects AI response via `puterProxyService.chatStream()` -3. Routes voice messages — downloads OGG file, calls `voicePipelineService.transcribe()`, then same text path -4. If `voiceMode === "full_voice"`: calls `voicePipelineService.synthesize()`, sends audio back via `ctx.replyWithAudio()` -5. Prefixes agent name on replies: `[Agent Name]: message text` - -**No per-user conversation tracking.** All Telegram messages go to a single conversation (or create one on first use) associated with the workspace. This is the intentional "thin bridge" design — full sync is out of scope per PROJECT.md. - -### Voice Route vs. Chat Files Route - -The existing `/transcribe` endpoint lives inside `chatFileRoutes` in `chat-files.ts`. For v1.6, the endpoint moves to a dedicated `voice.ts` route. This is a path-preserving refactor: the endpoint behavior is unchanged, but the code now lives in a Nexus-specific file rather than inside a mostly-upstream file. - -Moving the handler reduces merge conflict surface on future upstream rebases of `chat-files.ts`. - ---- +| Component | Responsibility | Status | Notes | +|-----------|----------------|--------|-------| +| `contentJobService` | Queue and track async render jobs; emit live events on status change | New | Factory function, matches `chatService` pattern | +| `renderPipelineService` | Route render requests to the correct renderer adapter | New | Strategy pattern over adapters | +| `themeEngineService` | Palette generation, WCAG AA validation, CSS/JSON/Tailwind exports | New | Pure computation, no DB, deterministic | +| `mermaidRendererAdapter` | Mermaid DSL string to SVG buffer, server-side | New | Uses `@mermaid-js/mermaid-isomorphic`; no Chromium needed | +| `remotionRendererAdapter` | Invoke Remotion CLI subprocess, return MP4/WebM path | New | Subprocess; outputs go to storage namespace `generated/videos` | +| `svgGeneratorAdapter` | Template-based SVG generation (icons, banners, placeholders) | New | No binary deps; pure string construction + existing sanitizer | +| `pdfRendererAdapter` | HTML to PDF via Puppeteer (arm64 Chromium on M4) | New | Subprocess; Puppeteer arm64 works on Apple Silicon | +| `imageProcessorAdapter` | Composite and resize via Sharp | Modified | Sharp already in `server/package.json`; extend for content use | +| `placeholderService` | Manifest tracking for draft assets | Existing | Already implemented; optionally extend PlaceholderEntry with `contentJobId` | +| `assetService` | CRUD for the `assets` table | Existing | Already handles `createdByAgentId`; use as-is | +| `StorageService` | Provider-agnostic blob storage | Existing | Use `generated/` namespace prefix for all new content | +| `publishLiveEvent` | SSE fan-out to UI subscribers | Existing | Extend `LIVE_EVENT_TYPES` with 3 new content job event types | +| `ContentJobViewer` (UI) | Poll/stream job status; show progress, render result inline | New | Subscribes to SSE live events | +| `DiagramRenderer` (UI) | Client-side Mermaid render using existing `mermaid` dep | New | `mermaid ^11.12.0` already in `ui/package.json` | +| `ThemePreview` (UI) | Live palette preview via CSS custom properties | New | No server round-trip for preview | +| `ContentGallery` (UI) | Workspace page showing all generated assets | New | Pagination via `assetService.list` | ## Recommended Project Structure +New files follow existing monorepo conventions: factory functions, co-located types, no class syntax. + ``` server/src/ - app.ts # MODIFY: register voiceRoutes, telegramRoutes - routes/ - chat-files.ts # MODIFY: remove /transcribe handler (moved to voice.ts) - voice.ts # NEW: POST /transcribe, POST /synthesize - nexus-settings.ts # MODIFY: expose voiceMode + telegramToken fields - telegram.ts # NEW: GET /telegram/status, POST /telegram/token - services/ - voice-pipeline.ts # NEW: transcribe(), synthesize(), formatForVoice() - telegram.ts # NEW: grammY bot lifecycle + relay logic - nexus-settings.ts # MODIFY: add voiceMode + telegramToken to schema +├── services/ +│ ├── content-job.ts # contentJobService factory +│ ├── render-pipeline.ts # renderPipelineService — adapter dispatch +│ ├── theme-engine.ts # themeEngineService — pure palette computation +│ └── renderers/ +│ ├── index.ts # RendererAdapter interface + barrel +│ ├── mermaid-renderer.ts # Mermaid DSL -> SVG (server-side isomorphic) +│ ├── remotion-renderer.ts # Remotion CLI subprocess wrapper +│ ├── svg-generator.ts # Template SVG (icons, placeholders, banners) +│ └── pdf-renderer.ts # HTML -> PDF via Puppeteer +├── routes/ +│ ├── content-jobs.ts # GET/POST /companies/:id/content-jobs +│ └── themes.ts # POST /companies/:id/themes/generate +└── types/ + └── content.ts # Server-internal ContentJobType, ContentJobStatus -ui/src/ - components/ - VoiceMicButton.tsx # NEW: replaces VoiceRecordButton in ChatInput - WaveformDisplay.tsx # NEW: animated bars from AnalyserNode data - VoiceModeToggle.tsx # NEW: 3-state toggle (text / voice-in / full-voice) - VoiceRecordButton.tsx # KEEP as-is (still used in file upload contexts) - TtsButton.tsx # MODIFY: add autoPlay prop - ChatInput.tsx # MODIFY: add VoiceModeToggle, swap in VoiceMicButton - ChatMessage.tsx # MODIFY: voice_mode badge + dual output expand - hooks/ - useVoiceMode.ts # NEW: reads/writes voiceMode setting - useSilenceDetection.ts # NEW: AnalyserNode silence threshold - usePiperTts.ts # KEEP as-is (browser-side TTS unchanged) +packages/db/src/ +├── schema/ +│ └── content_jobs.ts # NEW table (upstream-safe, no upstream equivalent) +└── migrations/ + └── NNNN_add_content_jobs.sql packages/shared/src/ - validators/chat.ts # MODIFY: add voiceMode flag to createMessageSchema - types/chat.ts # MODIFY: add voiceMode field to ChatMessage +├── types/ +│ └── content.ts # ContentJob, ContentJobStatus shared types +└── constants.ts # LIVE_EVENT_TYPES extended (+3 content.job.* types) + +packages/ +└── remotion-compositions/ # NEW workspace package + ├── package.json + └── src/ + └── index.ts # Remotion composition definitions + +ui/src/ +├── components/ +│ ├── ContentJobViewer.tsx # Job progress + result display +│ ├── ContentJobCard.tsx # Compact job status card +│ ├── DiagramRenderer.tsx # Mermaid client-side wrapper +│ ├── ThemePreview.tsx # Live palette preview +│ └── GeneratedAssetCard.tsx # Thumbnail + download + metadata +└── pages/ + └── ContentGallery.tsx # Gallery of generated assets per workspace ``` ---- +### Structure Rationale + +- **`server/src/services/renderers/`**: Isolates binary-dependent adapters behind a shared `RendererAdapter` interface. New renderers plug in without touching the core job service. +- **`content_jobs` table**: Separate from `assets`. A job tracks render lifecycle (queued to running to done/failed); on success it writes an `assets` row and records the `assetId`. This mirrors how `heartbeat_runs` tracks execution separately from its outputs. +- **`packages/remotion-compositions/`**: Remotion compositions must be bundled ahead of time. Keeping them in a dedicated workspace package lets the bundle step run once at startup, not on every render request. +- **Content as skills**: Skills (`company_skills`) are Markdown instruction files. Content type skills tell agents which `/api/companies/:id/content-jobs` endpoint to call and with what parameters. No new schema needed. ## Architectural Patterns -### Pattern 1: Transport-Agnostic Voice Service +### Pattern 1: Async Job with SSE Progress -**What:** A server service (`voicePipelineService`) owns STT and TTS logic. HTTP routes and Telegram relay both call the service — neither implements STT/TTS directly. +**What:** Long-running renders (Remotion, Puppeteer PDF) run asynchronously. The service creates a `content_jobs` row with `status: "queued"`, immediately returns the job record, then spawns the renderer. Live events push progress to the UI over the existing SSE stream. -**When to use:** Any time two transports (web + bot) need the same capability. +**When to use:** Any render taking more than ~200ms: Remotion, PDF, large Mermaid diagrams. Fast operations (SVG generation, theme palette) can be synchronous HTTP. -**Trade-offs:** Adds one indirection layer. Worth it: eliminates duplication, makes each transport testable independently. +**Trade-offs:** One DB row per render. Adds durable history of what was generated. Acceptable at solo-user scale. -**Shape:** +**Example:** ```typescript -// server/src/services/voice-pipeline.ts -export function voicePipelineService() { - // Uses execFile (not exec) — prevents shell injection, consistent with codebase pattern - async function transcribe(buffer: Buffer, format: "webm" | "ogg"): Promise; - async function synthesize(text: string, voiceId?: string): Promise; - function formatForVoice(text: string): { voice: string; full: string }; - return { transcribe, synthesize, formatForVoice }; +// server/src/services/content-job.ts +export function contentJobService(db: Db, storage: StorageService) { + return { + async enqueue(companyId: string, input: ContentJobInput): Promise { + const [row] = await db + .insert(contentJobs) + .values({ companyId, type: input.type, params: input.params, status: "queued" }) + .returning(); + publishLiveEvent({ companyId, type: "content.job.started", payload: { jobId: row.id } }); + // Non-blocking — kick off render + renderPipelineService(storage).render(row) + .then(async (result) => { + await db.update(contentJobs) + .set({ status: "done", assetId: result.assetId, completedAt: new Date() }) + .where(eq(contentJobs.id, row.id)); + publishLiveEvent({ companyId, type: "content.job.done", payload: { jobId: row.id, assetId: result.assetId } }); + }) + .catch(async (err) => { + await db.update(contentJobs) + .set({ status: "failed", errorMessage: String(err) }) + .where(eq(contentJobs.id, row.id)); + publishLiveEvent({ companyId, type: "content.job.failed", payload: { jobId: row.id } }); + }); + return toContentJob(row); + }, + }; } ``` -The existing `/transcribe` handler in `chat-files.ts` already uses `promisify(execFile)` — this pattern is the right model. The service wraps it with format selection (`webm` vs `ogg`) and the same whisper-cpp → openai-whisper cascade. +### Pattern 2: RendererAdapter Interface -### Pattern 2: Thin Telegram Relay +**What:** Each renderer implements a shared interface. `renderPipelineService` selects the adapter based on `ContentJobType`. Adding a new renderer requires only: (a) implement the interface, (b) register in the dispatch table. -**What:** The Telegram bot is a relay, not a first-class UI. It translates Telegram message events into the same chatService calls the web UI makes, then sends the response back via Telegram. +**When to use:** Every new content type. -**When to use:** Building a disposable bridge that will be replaced by a richer implementation later. +**Trade-offs:** Thin abstraction, no framework needed. Appropriate for the codebase size and single-user scale. -**Trade-offs:** No rich UI (no inline keyboards, no threading). Acceptable: PROJECT.md explicitly calls out "thin bridge only" and "Telegram threads/topics/inline keyboards" are out of scope. - -**Shape:** +**Example:** ```typescript -// server/src/services/telegram.ts -import { Bot } from "grammy"; - -export function telegramService(db: Db) { - let bot: Bot | null = null; - - function start(token: string): void; // idempotent, long-poll - function stop(): void; - function isRunning(): boolean; - - return { start, stop, isRunning }; +// server/src/services/renderers/index.ts +export interface RendererAdapter { + type: ContentJobType; + render( + params: Record, + storage: StorageService, + companyId: string + ): Promise<{ objectKey: string; contentType: string; byteSize: number }>; } ``` -The bot calls `chatService(db)` and `puterProxyService(db)` directly — no HTTP round-trip to the same server. +### Pattern 3: Content Types as Skill Files -### Pattern 3: Voice Mode Flag on Messages +**What:** A Mermaid-generation "skill" is a Markdown file in `company_skills` that instructs agents: "When asked for a diagram, call `POST /api/companies/:id/content-jobs` with `{type: 'mermaid', params: {dsl: '...'}}` and wait for `content.job.done` event." No new schema required. -**What:** Each message carries an optional `voiceMode: boolean` flag. When `true`, the server formats the response for voice (dual output: `voice` + `full`), and the client auto-plays TTS and shows the full text in a collapsible block. +**When to use:** All content types — this is how they become installable skills. -**When to use:** Differentiating voice-initiated messages from text messages within the same conversation. +**Trade-offs:** Agent must know the API contract, included in the skill markdown. Works with all adapters (Claude Code, Hermes, Ollama) since skills are plain text. -**Trade-offs:** Adds a field to `createMessageSchema` and the `ChatMessage` type. The field is optional and defaults to `false`, so existing messages and the upstream schema are not broken. +### Pattern 4: Theme Engine as Pure Function, Preview Client-Side -**Schema change:** -```typescript -// packages/shared/src/validators/chat.ts — additive only -export const createMessageSchema = z.object({ - role: z.enum(["user", "assistant", "system"]), - content: z.string().min(1).max(100_000), - agentId: z.string().uuid().optional(), - messageType: z.string().optional(), - voiceMode: z.boolean().optional(), // NEW in v1.6 -}); -``` +**What:** Theme generation is a pure computation: seed hex color in, palette object out. Preview injects CSS custom properties directly into the DOM — no server round-trip. Saving a theme stores the palette JSON via `StorageService`. -### Pattern 4: Direct Service Calls in Telegram Bridge +**When to use:** Theme generation and live preview. -**What:** The Telegram bot does not call the Express HTTP API to get AI responses. It calls `chatService(db)` and `puterProxyService(db)` as regular TypeScript function calls within the same server process. - -**When to use:** Any time a server-side integration needs the same AI response capability as the web UI without an HTTP round-trip. - -**Trade-offs:** Telegram handler and web handler share the same in-process service instances. If chatService has connection pooling issues, both paths are affected. This is acceptable — single-user deployment, same DB connection pool. - -**Why not HTTP:** A `fetch("http://localhost:PORT/api/...")` call from within the same server requires auth token injection, port discovery, and creates circular request chains that are hard to test and fragile in development. - -### Pattern 5: grammY Long-Poll for Single-User Local Deployment - -**What:** Use grammY `bot.start()` (long polling) rather than webhooks. The bot polls Telegram for new messages continuously while the server is running. - -**When to use:** Local single-user deployments where a public HTTPS endpoint is not available. No reverse proxy needed, no SSL cert, no domain. - -**Trade-offs:** Long polling is slightly less efficient than webhooks (Telegram must respond to each poll request) but functionally equivalent for <5,000 messages/hour. Fine for personal use. - -**Lifecycle:** -- Start: `nexusSettingsService().get()` finds `telegramToken` set → `telegramService(db).start(token)` -- Stop: `server.close()` → `telegramService(db).stop()` -- Runtime toggle: `POST /api/telegram/token` updates nexus-settings and calls start/stop - ---- +**Trade-offs:** No server latency for preview feedback. The JSON is reusable for all export formats (CSS, Tailwind config, design tokens). ## Data Flow -### Web Voice Input Flow +### Async Content Job Request Flow ``` -User holds mic button +Agent / UI | v -VoiceMicButton: MediaRecorder + AnalyserNode - | - v (silence detected after 1.5s or stop pressed) -POST /api/transcribe {audio: webm blob} +POST /api/companies/:id/content-jobs + { type: "mermaid", params: { dsl: "graph TD..." } } | v -voice.ts route -> voicePipelineService.transcribe(buffer, "webm") +contentJobService.enqueue() + INSERT content_jobs WHERE status = "queued" + publishLiveEvent("content.job.started") + (non-blocking) -> renderPipelineService.render() | - v (whisper-cpp or openai-whisper CLI via execFile) -{ text: "transcribed text" } + v (async) +renderPipelineService + selects MermaidRendererAdapter | v -ChatInput fills textarea -> user sends (message tagged voiceMode: true) +MermaidRendererAdapter.render() + Mermaid DSL -> SVG Buffer (via @mermaid-js/mermaid-isomorphic) | v -POST /conversations/:id/stream -> chatService + puterProxyService - | - v (SSE tokens arrive) -ChatMessage with voice_mode badge + dual output (voice text + full text collapsible) +StorageService.putFile() + objectKey: "{companyId}/generated/diagrams/2026/04/04/{uuid}-diagram.svg" | v -TtsButton auto-plays (browser-side piper-tts-web WASM — unchanged from v1.5) +assetService.create() + INSERT assets (objectKey, contentType, byteSize, createdByAgentId) + | + v +contentJobService callback + UPDATE content_jobs SET status = "done", asset_id = ... + publishLiveEvent("content.job.done", { jobId, assetId }) + | + v +UI (SSE subscriber) + ContentJobViewer receives event -> fetches asset URL -> renders preview inline ``` -### Server-Side TTS Flow (POST /synthesize) +### Theme Generation Flow (Synchronous) ``` -POST /api/synthesize { text, voiceId? } +User picks seed color | v -voice.ts route -> voicePipelineService.synthesize(text) +ThemePreview component + (no server round-trip — CSS custom properties injected directly into DOM) | - v (piper CLI via spawn: text -> stdin, WAV bytes <- stdout) -Response: Content-Type audio/wav, Buffer body + v (on "Save Theme") +POST /api/companies/:id/themes/generate + { seedColor: "#4a90d9" } | v -Client: new Audio(URL.createObjectURL(blob)).play() +themeEngineService.generate() + Compute palette (tints, shades, semantic tokens, WCAG AA checks) + Returns palette JSON + | + v +StorageService.putFile() + objectKey: "{companyId}/themes/{uuid}-theme.json" + contentType: "application/json" + | + v +assetService.create() -> 201 { assetId, downloadUrl } ``` -Note: Server-side `/synthesize` is new in v1.6. Its primary consumer is the Telegram bridge (which cannot use browser WASM). Web chat continues using browser-side `usePiperTts` WASM (v1.5 unchanged). The route is available for headless/server scenarios going forward. - -### Telegram Text Message Flow +### Mermaid Client-Side Fast Path ``` -Telegram user sends text +Agent sends message with ```mermaid code block | v -grammY bot.on("message:text") handler +ChatMarkdownMessage (existing component, minor extension) + Detects ```mermaid fence | v -telegramService: resolveOrCreateConversation(db) - | - v -chatService(db).addMessage(conversationId, { role: "user", content: text }) - | - v -telegramService: collect full response via puterProxyService(db).chatStream() - | - v (if voiceMode !== "full_voice") -ctx.reply("[AgentName]: full_response_text") - - | (if voiceMode === "full_voice") - v -voicePipelineService.formatForVoice(response) -> { voice, full } -ctx.reply("[AgentName]: " + full) -- text message with full details - | - v -voicePipelineService.synthesize(voice) -> WAV Buffer -ctx.replyWithAudio(InputFile(wavBuffer, "reply.ogg")) +DiagramRenderer (new component, wraps existing mermaid dep) + Calls mermaid.render() client-side (mermaid ^11.12 already in ui/package.json) + Displays SVG inline + "Save as asset" button -> POST /api/companies/:id/content-jobs (server path) ``` -### Telegram Voice Message Flow +### State Transitions: content_jobs ``` -Telegram user sends voice note (OGG Opus format) - | - v -grammY bot.on("message:voice") -> ctx.getFile() -> download Buffer - | - v -voicePipelineService.transcribe(buffer, "ogg") -> whisper CLI -> text - | - v -(same path as Telegram text message above) +queued -> running (renderPipelineService picks up job) +running -> done (renderer returns, asset created) +running -> failed (renderer throws, error recorded) ``` -### nexus-settings Schema Evolution +## New vs Modified: Explicit Breakdown -``` -v1.5: { mode, voiceEnabled } -v1.6: { mode, voiceEnabled, voiceMode, telegramToken } +### New (does not exist) - voiceMode: "text" | "voice_input" | "full_voice" (default: "text") - telegramToken: string | undefined (set by user via UI or POST /telegram/token) +| Artifact | Type | Purpose | +|----------|------|---------| +| `content_jobs` table | DB schema + migration | Track async render job lifecycle | +| `contentJobService` | Server service | Enqueue, status, list jobs | +| `renderPipelineService` | Server service | Route jobs to renderer adapters | +| `themeEngineService` | Server service | Palette generation + WCAG validation | +| `mermaidRendererAdapter` | Server renderer | Server-side Mermaid to SVG | +| `remotionRendererAdapter` | Server renderer | Remotion CLI to MP4/WebM | +| `svgGeneratorAdapter` | Server renderer | Template SVG generation | +| `pdfRendererAdapter` | Server renderer | HTML to PDF via Puppeteer | +| `content-jobs.ts` route | API route | Create and list content jobs | +| `themes.ts` route | API route | Synchronous theme generation | +| `packages/remotion-compositions/` | Workspace package | Remotion composition definitions | +| `packages/shared/src/types/content.ts` | Shared type | `ContentJob`, `ContentJobStatus` | +| `ContentJobViewer` | UI component | Job progress + result display | +| `DiagramRenderer` | UI component | Client-side Mermaid wrapper | +| `ThemePreview` | UI component | Live palette preview | +| `GeneratedAssetCard` | UI component | Asset thumbnail + actions | +| `ContentGallery` | UI page | Workspace content library | + +### Modified (exists, needs extension) + +| Artifact | Change | Risk | +|----------|--------|------| +| `packages/shared/src/constants.ts` | Add 3 new `LIVE_EVENT_TYPES`: `content.job.started`, `content.job.done`, `content.job.failed` | LOW — additive only | +| `server/src/app.ts` | Mount `contentJobRoutes` and `themeRoutes` | LOW — two lines | +| `ChatMarkdownMessage` | Detect triple-backtick mermaid fence; render via `DiagramRenderer` | MEDIUM — existing component, test carefully | +| `assetService` | Add `list(companyId, opts)` method for gallery pagination | LOW — new method, no schema change | +| `packages/db/src/schema/index.ts` | Export `contentJobs` table | LOW | +| `packages/db/src/index.ts` | Export `contentJobs` from db package | LOW | + +## Data Model + +### New Table: `content_jobs` + +No changes to existing tables. Standalone new table; upstream-safe because Paperclip has no content generation system. + +```sql +CREATE TABLE content_jobs ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + company_id UUID NOT NULL REFERENCES companies(id), + type TEXT NOT NULL, -- 'mermaid' | 'remotion' | 'pdf' | 'svg' | 'theme' | 'image' + status TEXT NOT NULL DEFAULT 'queued', -- 'queued' | 'running' | 'done' | 'failed' + params JSONB NOT NULL DEFAULT '{}', -- renderer-specific input params + asset_id UUID REFERENCES assets(id), -- set on success + error_message TEXT, -- set on failure + created_by_agent_id UUID REFERENCES agents(id), + created_by_user_id TEXT, + started_at TIMESTAMPTZ, + completed_at TIMESTAMPTZ, + created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), + updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW() +); +CREATE INDEX content_jobs_company_status_idx ON content_jobs(company_id, status); +CREATE INDEX content_jobs_company_created_idx ON content_jobs(company_id, created_at DESC); ``` -`voiceMode` is a workspace-level setting (not per-agent). The three states map to: -- `"text"`: mic button transcribes to text input, TTS manual-only, Telegram text-only -- `"voice_input"`: mic transcribes and auto-sends, TTS manual-only, Telegram voice-in + text-out -- `"full_voice"`: mic auto-sends, TTS auto-plays on every response, Telegram voice-in + voice-out +### Storage Namespaces (extends existing StorageService path conventions) ---- +``` +{companyId}/generated/diagrams/YYYY/MM/DD/{uuid}-{name}.svg +{companyId}/generated/videos/YYYY/MM/DD/{uuid}-{name}.mp4 +{companyId}/generated/pdfs/YYYY/MM/DD/{uuid}-{name}.pdf +{companyId}/generated/images/YYYY/MM/DD/{uuid}-{name}.png +{companyId}/generated/icons/YYYY/MM/DD/{uuid}-{name}.svg +{companyId}/themes/{uuid}-theme.json +{companyId}/placeholders/{uuid}-placeholder.svg +``` -## Scaling Considerations - -This system targets a single user on Mac Mini M4 throughout its lifetime. Scaling is not a concern. The architecture is optimized for simplicity and upstream merge compatibility. - -| Concern | At 1 user (target) | Notes | -|---------|-------------------|-------| -| STT latency | whisper-cpp base.en on M4: ~1-3s | Acceptable; shows transcribing spinner | -| TTS latency | piper CLI on M4: ~0.3-1s for short text | <3s target met | -| Telegram poll | grammY `bot.start()`, 1 process | Adequate for <5,000 msgs/hour | -| Memory overhead | ~10-20MB for polling loop | Acceptable on 16GB+ M4 | -| Piper model | First server-side synthesize: cold start | Piper loads model into memory; subsequent calls fast | - ---- - -## Anti-Patterns - -### Anti-Pattern 1: Telegram-Specific Voice Logic - -**What people do:** Implement OGG-to-text and text-to-OGG directly inside the Telegram bot handler. - -**Why it's wrong:** Creates two separate STT/TTS code paths that diverge over time. Voice bugs must be fixed in two places. Untestable in isolation. - -**Do this instead:** All voice processing goes through `voicePipelineService`. The Telegram handler calls `transcribe(buf, "ogg")` — the service handles format differences. The web route calls `transcribe(buf, "webm")` — same service, different format argument. - -### Anti-Pattern 2: Circular HTTP Call for Telegram AI Response - -**What people do:** Telegram bot handler calls `fetch("http://localhost:PORT/api/conversations/:id/stream")` to get AI responses from within the same server process. - -**Why it's wrong:** Requires auth token injection. Fragile (port discovery). Extra TCP round-trip. Fails in test environments where the HTTP server may not be running. - -**Do this instead:** `telegramService` imports `chatService(db)` and `puterProxyService(db)` directly. Collect tokens from the async generator into a string, then send to Telegram as a single message. - -### Anti-Pattern 3: Blocking grammY on Slow CLI Processes - -**What people do:** `await synthesize()` inside a bot handler with no timeout, assuming piper is always available and fast. - -**Why it's wrong:** If the `piper` binary is not installed or hangs, the grammY update queue stalls. The same update gets retried indefinitely. - -**Do this instead:** Wrap CLI calls in a `Promise.race([piperCall, timeout(8_000)])`. If piper times out or is not installed, fall back to text-only reply and log the failure. Bot degrades gracefully to text mode. - -### Anti-Pattern 4: Keeping /transcribe Inside chat-files.ts - -**What people do:** Leave the STT handler in `chat-files.ts` and call `voicePipelineService` from there, adding Nexus-specific logic to an upstream-sourced file. - -**Why it's wrong:** `chat-files.ts` is a mostly-upstream Paperclip file. Each rebase introduces merge conflicts. More Nexus-specific code in the file = more conflict surface. - -**Do this instead:** Move `/transcribe` and `/synthesize` to a new `voice.ts` route file (Nexus-only, never in upstream). Keep `chat-files.ts` as close to upstream as possible. - -### Anti-Pattern 5: Storing Telegram Token in Database - -**What people do:** Create a new DB table or add a column to `instance_settings` to store the Telegram bot token. - -**Why it's wrong:** Any DB schema change blocks upstream rebase (migration files conflict). The `nexus-settings.json` file-backed service is the established Nexus pattern for project-specific config that has no upstream equivalent. - -**Do this instead:** Store `telegramToken` in `nexus-settings.json` via the existing `nexusSettingsService`. Same pattern as `voiceEnabled`, `mode`. - ---- +The `StorageService.putFile()` method already handles path construction from `namespace` + `originalFilename` + timestamp. Pass `namespace: "generated/diagrams"` etc. ## Integration Points -### External Services +### Existing System Touch Points -| Service | Integration Pattern | Notes | -|---------|---------------------|-------| -| Telegram Bot API | grammY `bot.start()` long-polling (Node.js) | No public URL required; polling starts on server boot if token present in nexus-settings | -| whisper-cpp / openai-whisper | `execFile` cascade (same as existing `/transcribe`) | Format argument added: writes `.webm` or `.ogg` temp file based on input | -| piper TTS binary | `child_process.spawn` stdin -> stdout | Text piped to stdin; WAV or raw PCM bytes collected from stdout | +| Integration Point | How Content Gen Connects | Notes | +|-------------------|--------------------------|-------| +| `assetService` + `assets` table | Every rendered output creates an asset row | `createdByAgentId` already supported; agents get credit | +| `StorageService` | All rendered blobs stored via existing `putFile()` | Use `generated/` namespace prefix; no service changes | +| `publishLiveEvent` | Job lifecycle events push to SSE stream | Extend `LIVE_EVENT_TYPES` in `packages/shared/src/constants.ts` | +| `ChatMarkdownMessage` | Inline diagram rendering; "save as asset" button | Mermaid already a UI dep; add `DiagramRenderer` wrapper | +| `companySkills` + `skill-registry` | Content types as installable skill markdown files | No schema change; skills are text files agents read as context | +| `placeholderService` | Placeholder assets tracked in PLACEHOLDERS.md manifest | Optionally extend `PlaceholderEntry` with `contentJobId` | +| `hardwareService` | Detect if Remotion/Puppeteer can run | M4 Mac Mini: arm64 Chromium available, 24GB unified memory sufficient | +| `companyId` scoping | All content jobs scoped to `companyId` | Consistent with every other resource in the system | +| Agent task sessions | Agents invoke content APIs during task execution | Use `createdByAgentId`; same pattern as `documents`, `work-products` | -### Internal Boundaries +### External Dependencies (new server deps) -| Boundary | Communication | Notes | -|----------|---------------|-------| -| voice route <-> voicePipelineService | Direct function call | Route is thin HTTP wrapper; all logic in service | -| telegram service <-> voicePipelineService | Direct function call | Same service used by both transports | -| telegram service <-> chatService | Direct function call | Bot calls `chatService(db)` directly — no HTTP round-trip | -| telegram service <-> nexusSettingsService | Direct function call | Reads `voiceMode` and `telegramToken` at start and on each message | -| web UI <-> voice route | REST: `POST /api/transcribe`, `POST /api/synthesize` | Web client uses browser-side piper WASM for TTS; `/synthesize` primarily for Telegram | -| UI VoiceModeToggle <-> nexus-settings | REST: `PATCH /api/nexus-settings` | Reads/writes `voiceMode` setting | +| Dependency | Purpose | Platform Notes | +|------------|---------|---------------| +| `@mermaid-js/mermaid-isomorphic` | Server-side Mermaid to SVG | No Chromium needed; fast; preferred over Puppeteer for Mermaid | +| `puppeteer` | HTML to PDF | ~300MB install; bundled arm64 Chromium works on M4; only add if PDF is a phase priority | +| `remotion` (CLI) | Video/presentation render | Add as devDep in `remotion-compositions` package; CLI called via subprocess | ---- +Mermaid client-side and Sharp are already present. No changes needed for those paths. -## Build Order +## Scaling Considerations -Based on component dependencies, the recommended build order within this milestone: +This is a Mac Mini M4 single-user deployment. Analysis focuses on resource contention, not user count. -| Step | Component(s) | Reason | -|------|-------------|--------| -| 1 | `nexus-settings` schema extensions (`voiceMode`, `telegramToken`) | Everything downstream reads settings | -| 2 | `voicePipelineService` | Backs all voice. No new deps. Independently testable. | -| 3 | `voice.ts` route (`POST /transcribe`, `POST /synthesize`) | Thin wrapper. Register in `app.ts`. Move handler from chat-files. | -| 4 | `VoiceMicButton` + `WaveformDisplay` + `useSilenceDetection` | Pure UI. Depends only on `/transcribe`. | -| 5 | `VoiceModeToggle` + `useVoiceMode` | Depends on `voiceMode` in nexus-settings schema (Step 1). | -| 6 | `ChatMessage` dual output | Depends on `voiceMode` in shared `ChatMessage` type. | -| 7 | `createMessageSchema` + `ChatMessage` type (`voiceMode` flag) | Shared package change. Required by Steps 5-6. Could move earlier. | -| 8 | `telegramService` | Depends on voicePipelineService (2), chatService (existing), nexusSettings (1). | -| 9 | `telegram.ts` route + app.ts registration | Management endpoints. Needs telegramService. | -| 10 | Onboarding STT/TTS hardware detection step | Final: wires all voice detection into onboarding flow. | +| Concern | Approach | +|---------|----------| +| Concurrent render jobs | Node.js event loop is safe for I/O. CPU-bound renders (Remotion, Puppeteer) spawn subprocesses, keeping the event loop responsive. | +| Remotion render duration | Renders can take minutes. Never synchronous HTTP. Async job pattern + SSE progress is mandatory. | +| Chromium memory (PDF/Puppeteer) | Puppeteer can use 500MB+ per render. Serialize PDF renders via an in-memory queue (one at a time). | +| Storage growth | Generated content accumulates. Add `retention_days` field to `content_jobs`; implement a cleanup cron using the existing `cron.ts` service. | +| Remotion bundle step | Bundle compositions once at server startup (or on demand). Never bundle on each render request — it takes 30-60s. | -Steps 4-6 can run in parallel with Steps 7-9 if split across phases. +## Build Order (Phase Dependencies) ---- +Dependencies flow from infrastructure upward to content types upward to UI. + +``` +Phase A: Core Infrastructure (unblocks everything) + - Add content_jobs schema + migration (db package) + - Extend LIVE_EVENT_TYPES with content.job.* (shared package) + - Implement contentJobService (server) + - Implement renderPipelineService stub (server) + - Add API routes + app.ts mounts (server) + +Phase B: Fast Content Types (no heavy binary deps; validates pipeline end-to-end) + - svgGeneratorAdapter (pure TypeScript; icons, placeholders) + - mermaidRendererAdapter (@mermaid-js/mermaid-isomorphic; no Chromium) + - themeEngineService (pure computation) + - UI: DiagramRenderer, ThemePreview, ContentJobViewer + +Phase C: Client-Side Mermaid + Content Gallery + - ChatMarkdownMessage extension (detect mermaid fence) + - DiagramRenderer client-side component + - ContentGallery page + assetService.list() + - GeneratedAssetCard + +Phase D: Document Generation (introduces Puppeteer) + - Add puppeteer to server deps + - pdfRendererAdapter + - PDF download flow in UI + +Phase E: Video / Presentations (introduces Remotion) + - packages/remotion-compositions/ workspace package + - remotionRendererAdapter (CLI subprocess) + - Video playback in UI + +Phase F: Image Generation + - imageProcessorAdapter using Sharp (banners, OG images, social cards) + - imageGenerationAdapter interface (Stable Diffusion / cloud APIs — future) + - Social media content generation + +Phase G: Content as Skills (no code, pure skill markdown) + - Skill markdown files for each content type in company_skills + - Agent-callable via existing skill system +``` + +## Anti-Patterns + +### Anti-Pattern 1: Rendering inside chatService + +**What people do:** Add Mermaid rendering to `chatService` or `documentService` because content requests arrive from chat. + +**Why it's wrong:** Couples unrelated concerns. Future content types (video, PDF) would bloat chatService and block upstream rebases. + +**Do this instead:** `chatService` calls `contentJobService.enqueue()`. Rendering is entirely separate. Chat is a trigger, not an owner. + +### Anti-Pattern 2: Synchronous HTTP response for long renders + +**What people do:** `POST /render/remotion` holds the connection open for 2+ minutes while rendering. + +**Why it's wrong:** HTTP timeout (30s default on most proxies). No progress feedback. Retry hell. + +**Do this instead:** Return a `contentJobId` immediately with `202 Accepted`. Client subscribes to SSE `content.job.done` event. + +### Anti-Pattern 3: One DB table per content type + +**What people do:** Add separate `diagrams`, `presentations`, `themes` tables. + +**Why it's wrong:** The existing `assets` table already handles typed binary blobs. The `content_jobs` table handles any render job regardless of output type. Fragmented schema multiplies migration surface. + +**Do this instead:** Use `content_jobs.type` to discriminate job types. Use `assets.content_type` to discriminate output format. One jobs table, one assets table. + +### Anti-Pattern 4: Bypassing StorageService for renderer output + +**What people do:** Remotion adapter writes to `/tmp` and returns a filesystem path. + +**Why it's wrong:** Bypasses the provider abstraction (local disk vs S3), deduplication (sha256), the `assets` table, and download URL generation. + +**Do this instead:** Renderer writes output to a `Buffer`, passes to `StorageService.putFile()`, returns `objectKey`. Asset serving goes through the existing `/api` asset download route. + +### Anti-Pattern 5: Modifying upstream DB tables + +**What people do:** Add a `generated_content_type` column to the existing `assets` table. + +**Why it's wrong:** Modifies upstream schema — migration conflict on next `git rebase upstream/master`. Violates the display-only fork constraint. + +**Do this instead:** Use the `content_jobs.asset_id` FK as the signal that an asset is generated. Query `content_jobs JOIN assets` to distinguish generated from uploaded. Keep `assets` table untouched. + +### Anti-Pattern 6: Remotion bundle on every render request + +**What people do:** Call `bundle()` inside the render adapter on each job. + +**Why it's wrong:** Bundling takes 30-60s. Renders that should take 5s take 90s. + +**Do this instead:** Bundle once at server startup (or lazily on first render, cached). The `remotionRendererAdapter` calls `renderMedia()` against the pre-built bundle path. ## Sources -- Direct codebase inspection: `server/src/routes/chat-files.ts` (lines 297-386), `server/src/routes/chat.ts`, `server/src/services/nexus-settings.ts`, `server/src/app.ts`, `ui/src/components/VoiceRecordButton.tsx`, `ui/src/components/TtsButton.tsx`, `ui/src/hooks/usePiperTts.ts`, `packages/shared/src/validators/chat.ts`, `packages/shared/src/types/chat.ts` -- `.planning/STATE.md` — v1.6 architectural decisions (transport-agnostic, disposable bridge, dual output, per-message flag) -- `.planning/milestones/v1.5-phases/34-voice/34-RESEARCH.md` — existing voice implementation details, WASM TTS pattern -- [grammY documentation](https://grammy.dev/) — TypeScript-native, Bot API 9.6 (April 2026), long-polling vs webhooks -- [grammY deployment types guide](https://grammy.dev/guide/deployment-types) — long polling recommended for single-user local; Express integration pattern -- [rhasspy/piper (archived)](https://github.com/rhasspy/piper) — CLI: `echo "text" | piper --model voice.onnx -f -`; development moved to OHF-Voice/piper1-gpl Oct 2025 -- grammY supports Telegram Bot API 9.6 (released April 3, 2026) — latest version confirmed +- Direct inspection of `/opt/nexus` codebase (2026-04-04): + - `server/src/services/` — factory function service patterns + - `server/src/storage/` — `StorageService` / `StorageProvider` interfaces + - `server/src/storage/service.ts` — `buildObjectKey()` namespace + path conventions + - `server/src/services/live-events.ts` — SSE event bus (`publishLiveEvent`, `subscribeCompanyLiveEvents`) + - `server/src/services/voice-pipeline.ts` — async subprocess service pattern + - `server/src/services/placeholder-service.ts` — existing `PlaceholderEntry` manifest service + - `server/src/services/assets.ts` — `assetService` factory (minimal; extend for listing) + - `server/src/services/work-products.ts` — job/output separation pattern + - `packages/db/src/schema/assets.ts` — existing `assets` table + - `packages/db/src/schema/documents.ts` — document + revision pattern + - `packages/shared/src/constants.ts` — `LIVE_EVENT_TYPES` (currently 9 types) + - `server/src/app.ts` — route mounting conventions + - `server/src/routes/voice.ts` — SSE streaming response pattern + - `ui/package.json` — confirms `mermaid ^11.12.0` already installed + - `server/package.json` — confirms `sharp`, `ffmpeg-static` already installed +- Mermaid v11 isomorphic: https://mermaid.js.org/config/usage.html +- Remotion CLI rendering: https://www.remotion.dev/docs/cli/render +- `@mermaid-js/mermaid-isomorphic` for server-side rendering without a browser --- -*Architecture research for: Voice Pipeline + Minimal Telegram Bridge (v1.6)* -*Researched: 2026-04-03* +*Architecture research for: Nexus v1.7 Content Generation* +*Researched: 2026-04-04* diff --git a/.planning/research/FEATURES.md b/.planning/research/FEATURES.md index beb4599b..50e3bcfb 100644 --- a/.planning/research/FEATURES.md +++ b/.planning/research/FEATURES.md @@ -1,30 +1,34 @@ # Feature Research -**Domain:** Voice Pipeline (Whisper STT + Piper TTS) + Telegram Bridge (Nexus v1.6) -**Researched:** 2026-04-03 -**Confidence:** MEDIUM-HIGH — STT/TTS pipeline patterns are well-documented; Telegram bot API is stable; dual-output formatting and voice mode UX patterns inferred from ChatGPT/Meta AI voice implementations and community patterns +**Domain:** Content Generation Layer (Nexus v1.7) — agents produce visual, document, and media deliverables +**Researched:** 2026-04-04 +**Confidence:** MEDIUM-HIGH — technology capabilities verified via docs and ecosystem research; UX expectations inferred from comparable tools (Canva, Pitch, Mermaid Live, Figma tokens); skill system patterns based on existing Nexus skill architecture --- ## Milestone Scope -This document covers only the NEW features in v1.6. The following are already built and are dependencies, not deliverables: +This document covers only NEW features in v1.7. The following are already built and are dependencies, not deliverables: -- VoiceRecordButton with MediaRecorder API in ChatInput (v1.3) -- TtsButton with @mintplex-labs/piper-tts-web WASM synthesis (v1.3/v1.5) -- POST /transcribe endpoint with whisper-cpp/openai-whisper cascade (v1.3) -- VoiceStep in onboarding wizard (v1.5) -- voiceEnabled in nexus-settings (v1.5) -- Full chat system with streaming SSE (v1.3) +- File system with upload, git versioning, PLACEHOLDERS.md manifest (v1.3) +- Skill system with Skill Aggregator and company skills API (Paperclip upstream) +- Chat interface with streaming SSE (v1.3) +- Agent orchestration, heartbeat lifecycle (Paperclip upstream) +- Voice I/O with Whisper STT + Piper TTS (v1.6) +- Hermes adapter with native skills, Ollama integration (v1.4) **New features being researched:** -- Transport-agnostic voice pipeline (server-side, not just browser WASM) -- Voice mode flag on messages (affects response formatting) -- Dual output pattern: voice-optimized prose + full markdown text -- Web chat voice UI improvements: silence detection, waveform, auto-submit -- Web chat audio playback: inline player, auto-play toggle -- Voice mode toggle setting (text only / voice input / full voice) -- Minimal Telegram bridge: single bot, text + voice relay, agent prefixing + +- Presentations and video generation via Remotion +- Placeholder assets with DRAFT styling and manifest tracking +- Theme and palette generator (seed color → full theme, WCAG AA, exports) +- Wallpapers and visual assets (desktop/mobile, banners, OG images) +- Diagram generation (natural language → Mermaid → SVG/PNG) +- Document generation (PDF reports, invoices, one-pagers) +- Icon generation (SVG from description, consistent sets) +- Social media content (platform-formatted posts, carousels, hashtags) +- Branding media kit (full brand identity from conversation) +- Content types as installable Nexus skills --- @@ -32,131 +36,148 @@ This document covers only the NEW features in v1.6. The following are already bu ### Table Stakes (Users Expect These) -Features users assume exist when voice or Telegram is mentioned. Missing these makes the feature feel broken or incomplete. +Features that must exist for content generation to feel complete. Missing any of these and the deliverable is not production-ready. | Feature | Why Expected | Complexity | Notes | |---------|--------------|------------|-------| -| Silence-based auto-submit | Every voice input UI (Siri, Google, Whisper demos) stops recording on silence; holding a button feels archaic | MEDIUM | WebRTC VAD or AudioWorklet amplitude monitoring; 1.5s silence threshold typical; must show countdown so user knows what's happening | -| Waveform/amplitude visualization while recording | Users expect visual feedback that the mic is active; a static "recording..." text feels broken | LOW | Canvas or SVG with 30-50 data points; AnalyserNode from Web Audio API; real-time amplitude bars, not pre-rendered waveform | -| Voice response auto-play toggle | If the AI responded with audio, playing it automatically is expected unless the user disabled it; manual play-only feels incomplete | LOW | Boolean setting in nexus-settings (voiceAutoPlay); inline HTML5 `