nexus/.planning/research/ARCHITECTURE.md

# Architecture Research

**Domain:** Voice Pipeline + Minimal Telegram Bridge (v1.6) — integration with existing Nexus/Paperclip monorepo
**Researched:** 2026-04-03
**Confidence:** HIGH — based on direct codebase inspection + verified current documentation

---

## System Overview

v1.6 adds two parallel capability tracks onto the existing monorepo: a transport-agnostic voice pipeline (Whisper STT + Piper TTS) and a disposable Telegram bridge that reuses those pipeline primitives for phone access. The architecture constraint is that no voice or chat logic is Telegram-specific — Telegram is an interchangeable transport layer that calls the same server services as the web UI.

```
+-----------------------------------------------------------------------------------+
|                              UI Layer (React/Vite)                                |
|                                                                                   |
|  +-------------------------------------------------------------------------+     |
|  |  ChatPanel / PersonalAssistant (MODIFIED)                               |     |
|  |  +---------------------+  +--------------------+  +------------------+ |     |
|  |  | VoiceMicButton (NEW)|  | WaveformDisplay    |  | TtsButton (v1.5) | |     |
|  |  | silence detection   |  | (NEW) animated bars|  | + auto-play prop | |     |
|  |  | auto-send on silence|  +--------------------+  +------------------+ |     |
|  |  +---------------------+                                               |     |
|  |  +-------------------------------------------------------------------+ |     |
|  |  | ChatMessage (MODIFIED) — voice_mode badge, dual output toggle     | |     |
|  |  +-------------------------------------------------------------------+ |     |
|  |  +-------------------------------------------------------------------+ |     |
|  |  | VoiceModeToggle (NEW) — text only / voice input / full voice      | |     |
|  |  +-------------------------------------------------------------------+ |     |
|  +-------------------------------------------------------------------------+     |
+-----------------------------------------------------------------------------------+
                                        | HTTP + SSE
+-----------------------------------------------------------------------------------+
|                              Server Layer (Express)                               |
|                                                                                   |
|  +------------------------------------+  +------------------------------------+   |
|  |  voice.ts (NEW route)              |  |  telegram.ts (NEW route/service)   |   |
|  |  POST /transcribe  (MOVED)         |  |  grammY long-poll process          |   |
|  |  POST /synthesize  (NEW)           |  |  text + voice relay                |   |
|  +------------------------------------+  +------------------------------------+   |
|                    |                                    |                         |
|  +-----------------v--------------------------------------------v--------------+ |
|  |                    voicePipelineService (NEW — core)                          | |
|  |  transcribe(audioBuffer, format) -> string                                   | |
|  |  synthesize(text, voiceId?) -> Buffer (WAV)                                  | |
|  |  formatForVoice(text) -> { voice: string, full: string }                     | |
|  +------------------------------------------------------------------------------+ |
|                    |                                                               |
|  +-----------------v--------------------------------------------------------------+|
|  |               chatService / nexusSettingsService (EXISTING)                   ||
|  |   conversations . messages . stream SSE . memory . voiceEnabled               ||
|  +--------------------------------------------------------------------------------+|
|                    |                                                               |
|  +-----------------v--------------------------------------------------------------+|
|  |         External Processes (spawned via child_process.spawn / execFile)       ||
|  |   whisper-cpp / whisper (STT)          piper (TTS)                            ||
|  +--------------------------------------------------------------------------------+|
+-----------------------------------------------------------------------------------+
         ^
         | Telegram Bot API (HTTPS long-poll)
+--------+------------------------------------------------------------------------+
|                        Telegram (external service)                               |
|  User sends text -> bot relays to chatService -> SSE reply -> bot sends back     |
|  User sends voice -> bot downloads OGG -> voicePipelineService.transcribe()      |
|                    -> chatService -> reply -> voicePipelineService.synthesize()  |
|                    -> bot sends OGG audio reply                                  |
+----------------------------------------------------------------------------------+
```

---

## Integration Points: New vs. Existing

### What Stays Unchanged

| Component | Location | Status |
|-----------|----------|--------|
| `chatService` | `server/src/services/chat.ts` | No changes — voice pipeline uses it as-is |
| `nexusSettingsService` | `server/src/services/nexus-settings.ts` | Extend schema only (add `voiceMode`, `telegramToken`) |
| `chatFileRoutes` | `server/src/routes/chat-files.ts` | `/transcribe` moves out; file upload stays |
| `usePiperTts` | `ui/src/hooks/usePiperTts.ts` | No changes — TtsButton continues using browser WASM |
| `TtsButton` | `ui/src/components/TtsButton.tsx` | Add auto-play prop only |
| SSE stream endpoint | `server/src/routes/chat.ts` | No changes — Telegram bridge calls services directly |
| DB schema | `packages/db` | No changes — voice is file/process, not a DB column |

### What Changes (MODIFIED)

| Component | Location | Change |
|-----------|----------|--------|
| `VoiceRecordButton` | `ui/src/components/VoiceRecordButton.tsx` | Add silence detection, waveform data emission, auto-send on silence |
| `ChatInput` | `ui/src/components/ChatInput.tsx` | Wire new VoiceMicButton, add voice mode prop |
| `ChatMessage` | `ui/src/components/ChatMessage.tsx` | Show voice_mode badge, show dual output collapse/expand |
| `nexusSettingsSchema` | `server/src/services/nexus-settings.ts` | Add `voiceMode` enum and `telegramToken` optional string |
| `app.ts` | `server/src/app.ts` | Register `voiceRoutes`, `telegramRoutes` |
| `createMessageSchema` | `packages/shared/src/validators/chat.ts` | Add `voiceMode: z.boolean().optional()` flag on messages |
| `ChatMessage` type | `packages/shared/src/types/chat.ts` | Add `voiceMode: boolean | null` field |
| `chat-files.ts` | `server/src/routes/chat-files.ts` | Remove `/transcribe` handler (moved to voice.ts) |

### What Is New (NEW)

| Component | Location | Purpose |
|-----------|----------|---------|
| `voicePipelineService` | `server/src/services/voice-pipeline.ts` | Transport-agnostic STT/TTS core — used by web routes AND Telegram bridge |
| `voice.ts` (route) | `server/src/routes/voice.ts` | `POST /api/transcribe`, `POST /api/synthesize` — thin HTTP wrappers |
| `telegram.ts` (service) | `server/src/services/telegram.ts` | grammY bot init, long-poll loop, message relay, voice relay |
| `telegram.ts` (route) | `server/src/routes/telegram.ts` | `GET /api/telegram/status`, `POST /api/telegram/token` management endpoints |
| `VoiceMicButton` | `ui/src/components/VoiceMicButton.tsx` | Enhanced mic button with silence detection and waveform display |
| `WaveformDisplay` | `ui/src/components/WaveformDisplay.tsx` | Animated audio waveform bars using AnalyserNode |
| `VoiceModeToggle` | `ui/src/components/VoiceModeToggle.tsx` | Three-state toggle: text only / voice input / full voice |
| `useVoiceMode` | `ui/src/hooks/useVoiceMode.ts` | Reads/writes voice mode setting via `/api/nexus-settings` |
| `useSilenceDetection` | `ui/src/hooks/useSilenceDetection.ts` | Web Audio API AnalyserNode watching for 1.5s silence threshold |

---

## Component Boundaries

### voicePipelineService (Core)

This is the key abstraction for v1.6. Both the web HTTP route and the Telegram bridge call this service — neither knows about the other.

| Method | Input | Output | Implementation |
|--------|-------|--------|----------------|
| `transcribe(buffer, format)` | `Buffer`, `"webm" or "ogg"` | `Promise<string>` | Writes temp file, uses `execFile` (not `exec`) to spawn `whisper-cpp` or `whisper` CLI, reads stdout, cleans up |
| `synthesize(text, voiceId?)` | `string`, optional voiceId | `Promise<Buffer>` | Spawns `piper` CLI via `spawn`, pipes text to stdin, collects WAV stdout |
| `formatForVoice(text)` | `string` | `{ voice: string; full: string }` | Strips code blocks and markdown for voice; returns both variants |

The `transcribe` method extends the existing `/transcribe` implementation from `chat-files.ts` by adding an `ogg` format path alongside the existing `webm` path. The same cascade (whisper-cpp first, openai-whisper fallback) is preserved.

**Why a dedicated service vs. inline in routes:**
The Telegram bridge cannot call the web route (circular HTTP call within the same process). Both transports need the same logic. Extracting to a service eliminates duplication and makes both implementations testable in isolation.

### telegram service

A thin relay, not a feature-rich bot. It:
1. Holds a single grammY `Bot` instance, initialized when `telegramToken` is set in nexus-settings
2. Routes text messages to `chatService.addMessage()` then collects AI response via `puterProxyService.chatStream()`
3. Routes voice messages — downloads OGG file, calls `voicePipelineService.transcribe()`, then same text path
4. If `voiceMode === "full_voice"`: calls `voicePipelineService.synthesize()`, sends audio back via `ctx.replyWithAudio()`
5. Prefixes agent name on replies: `[Agent Name]: message text`

**No per-user conversation tracking.** All Telegram messages go to a single conversation (or create one on first use) associated with the workspace. This is the intentional "thin bridge" design — full sync is out of scope per PROJECT.md.

### Voice Route vs. Chat Files Route

The existing `/transcribe` endpoint lives inside `chatFileRoutes` in `chat-files.ts`. For v1.6, the endpoint moves to a dedicated `voice.ts` route. This is a path-preserving refactor: the endpoint behavior is unchanged, but the code now lives in a Nexus-specific file rather than inside a mostly-upstream file.

Moving the handler reduces merge conflict surface on future upstream rebases of `chat-files.ts`.

---

## Recommended Project Structure

```
server/src/
  app.ts                         # MODIFY: register voiceRoutes, telegramRoutes
  routes/
    chat-files.ts                # MODIFY: remove /transcribe handler (moved to voice.ts)
    voice.ts                     # NEW: POST /transcribe, POST /synthesize
    nexus-settings.ts            # MODIFY: expose voiceMode + telegramToken fields
    telegram.ts                  # NEW: GET /telegram/status, POST /telegram/token
  services/
    voice-pipeline.ts            # NEW: transcribe(), synthesize(), formatForVoice()
    telegram.ts                  # NEW: grammY bot lifecycle + relay logic
    nexus-settings.ts            # MODIFY: add voiceMode + telegramToken to schema

ui/src/
  components/
    VoiceMicButton.tsx           # NEW: replaces VoiceRecordButton in ChatInput
    WaveformDisplay.tsx          # NEW: animated bars from AnalyserNode data
    VoiceModeToggle.tsx          # NEW: 3-state toggle (text / voice-in / full-voice)
    VoiceRecordButton.tsx        # KEEP as-is (still used in file upload contexts)
    TtsButton.tsx                # MODIFY: add autoPlay prop
    ChatInput.tsx                # MODIFY: add VoiceModeToggle, swap in VoiceMicButton
    ChatMessage.tsx              # MODIFY: voice_mode badge + dual output expand
  hooks/
    useVoiceMode.ts              # NEW: reads/writes voiceMode setting
    useSilenceDetection.ts       # NEW: AnalyserNode silence threshold
    usePiperTts.ts               # KEEP as-is (browser-side TTS unchanged)

packages/shared/src/
  validators/chat.ts             # MODIFY: add voiceMode flag to createMessageSchema
  types/chat.ts                  # MODIFY: add voiceMode field to ChatMessage
```

---

## Architectural Patterns

### Pattern 1: Transport-Agnostic Voice Service

**What:** A server service (`voicePipelineService`) owns STT and TTS logic. HTTP routes and Telegram relay both call the service — neither implements STT/TTS directly.

**When to use:** Any time two transports (web + bot) need the same capability.

**Trade-offs:** Adds one indirection layer. Worth it: eliminates duplication, makes each transport testable independently.

**Shape:**
```typescript
// server/src/services/voice-pipeline.ts
export function voicePipelineService() {
  // Uses execFile (not exec) — prevents shell injection, consistent with codebase pattern
  async function transcribe(buffer: Buffer, format: "webm" | "ogg"): Promise<string>;
  async function synthesize(text: string, voiceId?: string): Promise<Buffer>;
  function formatForVoice(text: string): { voice: string; full: string };
  return { transcribe, synthesize, formatForVoice };
}
```

The existing `/transcribe` handler in `chat-files.ts` already uses `promisify(execFile)` — this pattern is the right model. The service wraps it with format selection (`webm` vs `ogg`) and the same whisper-cpp → openai-whisper cascade.

### Pattern 2: Thin Telegram Relay

**What:** The Telegram bot is a relay, not a first-class UI. It translates Telegram message events into the same chatService calls the web UI makes, then sends the response back via Telegram.

**When to use:** Building a disposable bridge that will be replaced by a richer implementation later.

**Trade-offs:** No rich UI (no inline keyboards, no threading). Acceptable: PROJECT.md explicitly calls out "thin bridge only" and "Telegram threads/topics/inline keyboards" are out of scope.

**Shape:**
```typescript
// server/src/services/telegram.ts
import { Bot } from "grammy";

export function telegramService(db: Db) {
  let bot: Bot | null = null;

  function start(token: string): void; // idempotent, long-poll
  function stop(): void;
  function isRunning(): boolean;

  return { start, stop, isRunning };
}
```

The bot calls `chatService(db)` and `puterProxyService(db)` directly — no HTTP round-trip to the same server.

### Pattern 3: Voice Mode Flag on Messages

**What:** Each message carries an optional `voiceMode: boolean` flag. When `true`, the server formats the response for voice (dual output: `voice` + `full`), and the client auto-plays TTS and shows the full text in a collapsible block.

**When to use:** Differentiating voice-initiated messages from text messages within the same conversation.

**Trade-offs:** Adds a field to `createMessageSchema` and the `ChatMessage` type. The field is optional and defaults to `false`, so existing messages and the upstream schema are not broken.

**Schema change:**
```typescript
// packages/shared/src/validators/chat.ts — additive only
export const createMessageSchema = z.object({
  role: z.enum(["user", "assistant", "system"]),
  content: z.string().min(1).max(100_000),
  agentId: z.string().uuid().optional(),
  messageType: z.string().optional(),
  voiceMode: z.boolean().optional(),  // NEW in v1.6
});
```

### Pattern 4: Direct Service Calls in Telegram Bridge

**What:** The Telegram bot does not call the Express HTTP API to get AI responses. It calls `chatService(db)` and `puterProxyService(db)` as regular TypeScript function calls within the same server process.

**When to use:** Any time a server-side integration needs the same AI response capability as the web UI without an HTTP round-trip.

**Trade-offs:** Telegram handler and web handler share the same in-process service instances. If chatService has connection pooling issues, both paths are affected. This is acceptable — single-user deployment, same DB connection pool.

**Why not HTTP:** A `fetch("http://localhost:PORT/api/...")` call from within the same server requires auth token injection, port discovery, and creates circular request chains that are hard to test and fragile in development.

### Pattern 5: grammY Long-Poll for Single-User Local Deployment

**What:** Use grammY `bot.start()` (long polling) rather than webhooks. The bot polls Telegram for new messages continuously while the server is running.

**When to use:** Local single-user deployments where a public HTTPS endpoint is not available. No reverse proxy needed, no SSL cert, no domain.

**Trade-offs:** Long polling is slightly less efficient than webhooks (Telegram must respond to each poll request) but functionally equivalent for <5,000 messages/hour. Fine for personal use.

**Lifecycle:**
- Start: `nexusSettingsService().get()` finds `telegramToken` set → `telegramService(db).start(token)`
- Stop: `server.close()` → `telegramService(db).stop()`
- Runtime toggle: `POST /api/telegram/token` updates nexus-settings and calls start/stop

---

## Data Flow

### Web Voice Input Flow

```
User holds mic button
    |
    v
VoiceMicButton: MediaRecorder + AnalyserNode
    |
    v (silence detected after 1.5s or stop pressed)
POST /api/transcribe {audio: webm blob}
    |
    v
voice.ts route -> voicePipelineService.transcribe(buffer, "webm")
    |
    v (whisper-cpp or openai-whisper CLI via execFile)
{ text: "transcribed text" }
    |
    v
ChatInput fills textarea -> user sends (message tagged voiceMode: true)
    |
    v
POST /conversations/:id/stream -> chatService + puterProxyService
    |
    v (SSE tokens arrive)
ChatMessage with voice_mode badge + dual output (voice text + full text collapsible)
    |
    v
TtsButton auto-plays (browser-side piper-tts-web WASM — unchanged from v1.5)
```

### Server-Side TTS Flow (POST /synthesize)

```
POST /api/synthesize { text, voiceId? }
    |
    v
voice.ts route -> voicePipelineService.synthesize(text)
    |
    v (piper CLI via spawn: text -> stdin, WAV bytes <- stdout)
Response: Content-Type audio/wav, Buffer body
    |
    v
Client: new Audio(URL.createObjectURL(blob)).play()
```

Note: Server-side `/synthesize` is new in v1.6. Its primary consumer is the Telegram bridge (which cannot use browser WASM). Web chat continues using browser-side `usePiperTts` WASM (v1.5 unchanged). The route is available for headless/server scenarios going forward.

### Telegram Text Message Flow

```
Telegram user sends text
    |
    v
grammY bot.on("message:text") handler
    |
    v
telegramService: resolveOrCreateConversation(db)
    |
    v
chatService(db).addMessage(conversationId, { role: "user", content: text })
    |
    v
telegramService: collect full response via puterProxyService(db).chatStream()
    |
    v (if voiceMode !== "full_voice")
ctx.reply("[AgentName]: full_response_text")

    | (if voiceMode === "full_voice")
    v
voicePipelineService.formatForVoice(response) -> { voice, full }
ctx.reply("[AgentName]: " + full)  -- text message with full details
    |
    v
voicePipelineService.synthesize(voice) -> WAV Buffer
ctx.replyWithAudio(InputFile(wavBuffer, "reply.ogg"))
```

### Telegram Voice Message Flow

```
Telegram user sends voice note (OGG Opus format)
    |
    v
grammY bot.on("message:voice") -> ctx.getFile() -> download Buffer
    |
    v
voicePipelineService.transcribe(buffer, "ogg") -> whisper CLI -> text
    |
    v
(same path as Telegram text message above)
```

### nexus-settings Schema Evolution

```
v1.5:  { mode, voiceEnabled }
v1.6:  { mode, voiceEnabled, voiceMode, telegramToken }

  voiceMode:     "text" | "voice_input" | "full_voice"  (default: "text")
  telegramToken: string | undefined                      (set by user via UI or POST /telegram/token)
```

`voiceMode` is a workspace-level setting (not per-agent). The three states map to:
- `"text"`: mic button transcribes to text input, TTS manual-only, Telegram text-only
- `"voice_input"`: mic transcribes and auto-sends, TTS manual-only, Telegram voice-in + text-out
- `"full_voice"`: mic auto-sends, TTS auto-plays on every response, Telegram voice-in + voice-out

---

## Scaling Considerations

This system targets a single user on Mac Mini M4 throughout its lifetime. Scaling is not a concern. The architecture is optimized for simplicity and upstream merge compatibility.

| Concern | At 1 user (target) | Notes |
|---------|-------------------|-------|
| STT latency | whisper-cpp base.en on M4: ~1-3s | Acceptable; shows transcribing spinner |
| TTS latency | piper CLI on M4: ~0.3-1s for short text | <3s target met |
| Telegram poll | grammY `bot.start()`, 1 process | Adequate for <5,000 msgs/hour |
| Memory overhead | ~10-20MB for polling loop | Acceptable on 16GB+ M4 |
| Piper model | First server-side synthesize: cold start | Piper loads model into memory; subsequent calls fast |

---

## Anti-Patterns

### Anti-Pattern 1: Telegram-Specific Voice Logic

**What people do:** Implement OGG-to-text and text-to-OGG directly inside the Telegram bot handler.

**Why it's wrong:** Creates two separate STT/TTS code paths that diverge over time. Voice bugs must be fixed in two places. Untestable in isolation.

**Do this instead:** All voice processing goes through `voicePipelineService`. The Telegram handler calls `transcribe(buf, "ogg")` — the service handles format differences. The web route calls `transcribe(buf, "webm")` — same service, different format argument.

### Anti-Pattern 2: Circular HTTP Call for Telegram AI Response

**What people do:** Telegram bot handler calls `fetch("http://localhost:PORT/api/conversations/:id/stream")` to get AI responses from within the same server process.

**Why it's wrong:** Requires auth token injection. Fragile (port discovery). Extra TCP round-trip. Fails in test environments where the HTTP server may not be running.

**Do this instead:** `telegramService` imports `chatService(db)` and `puterProxyService(db)` directly. Collect tokens from the async generator into a string, then send to Telegram as a single message.

### Anti-Pattern 3: Blocking grammY on Slow CLI Processes

**What people do:** `await synthesize()` inside a bot handler with no timeout, assuming piper is always available and fast.

**Why it's wrong:** If the `piper` binary is not installed or hangs, the grammY update queue stalls. The same update gets retried indefinitely.

**Do this instead:** Wrap CLI calls in a `Promise.race([piperCall, timeout(8_000)])`. If piper times out or is not installed, fall back to text-only reply and log the failure. Bot degrades gracefully to text mode.

### Anti-Pattern 4: Keeping /transcribe Inside chat-files.ts

**What people do:** Leave the STT handler in `chat-files.ts` and call `voicePipelineService` from there, adding Nexus-specific logic to an upstream-sourced file.

**Why it's wrong:** `chat-files.ts` is a mostly-upstream Paperclip file. Each rebase introduces merge conflicts. More Nexus-specific code in the file = more conflict surface.

**Do this instead:** Move `/transcribe` and `/synthesize` to a new `voice.ts` route file (Nexus-only, never in upstream). Keep `chat-files.ts` as close to upstream as possible.

### Anti-Pattern 5: Storing Telegram Token in Database

**What people do:** Create a new DB table or add a column to `instance_settings` to store the Telegram bot token.

**Why it's wrong:** Any DB schema change blocks upstream rebase (migration files conflict). The `nexus-settings.json` file-backed service is the established Nexus pattern for project-specific config that has no upstream equivalent.

**Do this instead:** Store `telegramToken` in `nexus-settings.json` via the existing `nexusSettingsService`. Same pattern as `voiceEnabled`, `mode`.

---

## Integration Points

### External Services

| Service | Integration Pattern | Notes |
|---------|---------------------|-------|
| Telegram Bot API | grammY `bot.start()` long-polling (Node.js) | No public URL required; polling starts on server boot if token present in nexus-settings |
| whisper-cpp / openai-whisper | `execFile` cascade (same as existing `/transcribe`) | Format argument added: writes `.webm` or `.ogg` temp file based on input |
| piper TTS binary | `child_process.spawn` stdin -> stdout | Text piped to stdin; WAV or raw PCM bytes collected from stdout |

### Internal Boundaries

| Boundary | Communication | Notes |
|----------|---------------|-------|
| voice route <-> voicePipelineService | Direct function call | Route is thin HTTP wrapper; all logic in service |
| telegram service <-> voicePipelineService | Direct function call | Same service used by both transports |
| telegram service <-> chatService | Direct function call | Bot calls `chatService(db)` directly — no HTTP round-trip |
| telegram service <-> nexusSettingsService | Direct function call | Reads `voiceMode` and `telegramToken` at start and on each message |
| web UI <-> voice route | REST: `POST /api/transcribe`, `POST /api/synthesize` | Web client uses browser-side piper WASM for TTS; `/synthesize` primarily for Telegram |
| UI VoiceModeToggle <-> nexus-settings | REST: `PATCH /api/nexus-settings` | Reads/writes `voiceMode` setting |

---

## Build Order

Based on component dependencies, the recommended build order within this milestone:

| Step | Component(s) | Reason |
|------|-------------|--------|
| 1 | `nexus-settings` schema extensions (`voiceMode`, `telegramToken`) | Everything downstream reads settings |
| 2 | `voicePipelineService` | Backs all voice. No new deps. Independently testable. |
| 3 | `voice.ts` route (`POST /transcribe`, `POST /synthesize`) | Thin wrapper. Register in `app.ts`. Move handler from chat-files. |
| 4 | `VoiceMicButton` + `WaveformDisplay` + `useSilenceDetection` | Pure UI. Depends only on `/transcribe`. |
| 5 | `VoiceModeToggle` + `useVoiceMode` | Depends on `voiceMode` in nexus-settings schema (Step 1). |
| 6 | `ChatMessage` dual output | Depends on `voiceMode` in shared `ChatMessage` type. |
| 7 | `createMessageSchema` + `ChatMessage` type (`voiceMode` flag) | Shared package change. Required by Steps 5-6. Could move earlier. |
| 8 | `telegramService` | Depends on voicePipelineService (2), chatService (existing), nexusSettings (1). |
| 9 | `telegram.ts` route + app.ts registration | Management endpoints. Needs telegramService. |
| 10 | Onboarding STT/TTS hardware detection step | Final: wires all voice detection into onboarding flow. |

Steps 4-6 can run in parallel with Steps 7-9 if split across phases.

---

## Sources

- Direct codebase inspection: `server/src/routes/chat-files.ts` (lines 297-386), `server/src/routes/chat.ts`, `server/src/services/nexus-settings.ts`, `server/src/app.ts`, `ui/src/components/VoiceRecordButton.tsx`, `ui/src/components/TtsButton.tsx`, `ui/src/hooks/usePiperTts.ts`, `packages/shared/src/validators/chat.ts`, `packages/shared/src/types/chat.ts`
- `.planning/STATE.md` — v1.6 architectural decisions (transport-agnostic, disposable bridge, dual output, per-message flag)
- `.planning/milestones/v1.5-phases/34-voice/34-RESEARCH.md` — existing voice implementation details, WASM TTS pattern
- [grammY documentation](https://grammy.dev/) — TypeScript-native, Bot API 9.6 (April 2026), long-polling vs webhooks
- [grammY deployment types guide](https://grammy.dev/guide/deployment-types) — long polling recommended for single-user local; Express integration pattern
- [rhasspy/piper (archived)](https://github.com/rhasspy/piper) — CLI: `echo "text" | piper --model voice.onnx -f -`; development moved to OHF-Voice/piper1-gpl Oct 2025
- grammY supports Telegram Bot API 9.6 (released April 3, 2026) — latest version confirmed

---
*Architecture research for: Voice Pipeline + Minimal Telegram Bridge (v1.6)*
*Researched: 2026-04-03*