nexus/.planning/research/ARCHITECTURE.md
2026-04-04 03:55:49 +00:00

507 lines
28 KiB
Markdown

# Architecture Research
**Domain:** Voice Pipeline + Minimal Telegram Bridge (v1.6) — integration with existing Nexus/Paperclip monorepo
**Researched:** 2026-04-03
**Confidence:** HIGH — based on direct codebase inspection + verified current documentation
---
## System Overview
v1.6 adds two parallel capability tracks onto the existing monorepo: a transport-agnostic voice pipeline (Whisper STT + Piper TTS) and a disposable Telegram bridge that reuses those pipeline primitives for phone access. The architecture constraint is that no voice or chat logic is Telegram-specific — Telegram is an interchangeable transport layer that calls the same server services as the web UI.
```
+-----------------------------------------------------------------------------------+
| UI Layer (React/Vite) |
| |
| +-------------------------------------------------------------------------+ |
| | ChatPanel / PersonalAssistant (MODIFIED) | |
| | +---------------------+ +--------------------+ +------------------+ | |
| | | VoiceMicButton (NEW)| | WaveformDisplay | | TtsButton (v1.5) | | |
| | | silence detection | | (NEW) animated bars| | + auto-play prop | | |
| | | auto-send on silence| +--------------------+ +------------------+ | |
| | +---------------------+ | |
| | +-------------------------------------------------------------------+ | |
| | | ChatMessage (MODIFIED) — voice_mode badge, dual output toggle | | |
| | +-------------------------------------------------------------------+ | |
| | +-------------------------------------------------------------------+ | |
| | | VoiceModeToggle (NEW) — text only / voice input / full voice | | |
| | +-------------------------------------------------------------------+ | |
| +-------------------------------------------------------------------------+ |
+-----------------------------------------------------------------------------------+
| HTTP + SSE
+-----------------------------------------------------------------------------------+
| Server Layer (Express) |
| |
| +------------------------------------+ +------------------------------------+ |
| | voice.ts (NEW route) | | telegram.ts (NEW route/service) | |
| | POST /transcribe (MOVED) | | grammY long-poll process | |
| | POST /synthesize (NEW) | | text + voice relay | |
| +------------------------------------+ +------------------------------------+ |
| | | |
| +-----------------v--------------------------------------------v--------------+ |
| | voicePipelineService (NEW — core) | |
| | transcribe(audioBuffer, format) -> string | |
| | synthesize(text, voiceId?) -> Buffer (WAV) | |
| | formatForVoice(text) -> { voice: string, full: string } | |
| +------------------------------------------------------------------------------+ |
| | |
| +-----------------v--------------------------------------------------------------+|
| | chatService / nexusSettingsService (EXISTING) ||
| | conversations . messages . stream SSE . memory . voiceEnabled ||
| +--------------------------------------------------------------------------------+|
| | |
| +-----------------v--------------------------------------------------------------+|
| | External Processes (spawned via child_process.spawn / execFile) ||
| | whisper-cpp / whisper (STT) piper (TTS) ||
| +--------------------------------------------------------------------------------+|
+-----------------------------------------------------------------------------------+
^
| Telegram Bot API (HTTPS long-poll)
+--------+------------------------------------------------------------------------+
| Telegram (external service) |
| User sends text -> bot relays to chatService -> SSE reply -> bot sends back |
| User sends voice -> bot downloads OGG -> voicePipelineService.transcribe() |
| -> chatService -> reply -> voicePipelineService.synthesize() |
| -> bot sends OGG audio reply |
+----------------------------------------------------------------------------------+
```
---
## Integration Points: New vs. Existing
### What Stays Unchanged
| Component | Location | Status |
|-----------|----------|--------|
| `chatService` | `server/src/services/chat.ts` | No changes — voice pipeline uses it as-is |
| `nexusSettingsService` | `server/src/services/nexus-settings.ts` | Extend schema only (add `voiceMode`, `telegramToken`) |
| `chatFileRoutes` | `server/src/routes/chat-files.ts` | `/transcribe` moves out; file upload stays |
| `usePiperTts` | `ui/src/hooks/usePiperTts.ts` | No changes — TtsButton continues using browser WASM |
| `TtsButton` | `ui/src/components/TtsButton.tsx` | Add auto-play prop only |
| SSE stream endpoint | `server/src/routes/chat.ts` | No changes — Telegram bridge calls services directly |
| DB schema | `packages/db` | No changes — voice is file/process, not a DB column |
### What Changes (MODIFIED)
| Component | Location | Change |
|-----------|----------|--------|
| `VoiceRecordButton` | `ui/src/components/VoiceRecordButton.tsx` | Add silence detection, waveform data emission, auto-send on silence |
| `ChatInput` | `ui/src/components/ChatInput.tsx` | Wire new VoiceMicButton, add voice mode prop |
| `ChatMessage` | `ui/src/components/ChatMessage.tsx` | Show voice_mode badge, show dual output collapse/expand |
| `nexusSettingsSchema` | `server/src/services/nexus-settings.ts` | Add `voiceMode` enum and `telegramToken` optional string |
| `app.ts` | `server/src/app.ts` | Register `voiceRoutes`, `telegramRoutes` |
| `createMessageSchema` | `packages/shared/src/validators/chat.ts` | Add `voiceMode: z.boolean().optional()` flag on messages |
| `ChatMessage` type | `packages/shared/src/types/chat.ts` | Add `voiceMode: boolean | null` field |
| `chat-files.ts` | `server/src/routes/chat-files.ts` | Remove `/transcribe` handler (moved to voice.ts) |
### What Is New (NEW)
| Component | Location | Purpose |
|-----------|----------|---------|
| `voicePipelineService` | `server/src/services/voice-pipeline.ts` | Transport-agnostic STT/TTS core — used by web routes AND Telegram bridge |
| `voice.ts` (route) | `server/src/routes/voice.ts` | `POST /api/transcribe`, `POST /api/synthesize` — thin HTTP wrappers |
| `telegram.ts` (service) | `server/src/services/telegram.ts` | grammY bot init, long-poll loop, message relay, voice relay |
| `telegram.ts` (route) | `server/src/routes/telegram.ts` | `GET /api/telegram/status`, `POST /api/telegram/token` management endpoints |
| `VoiceMicButton` | `ui/src/components/VoiceMicButton.tsx` | Enhanced mic button with silence detection and waveform display |
| `WaveformDisplay` | `ui/src/components/WaveformDisplay.tsx` | Animated audio waveform bars using AnalyserNode |
| `VoiceModeToggle` | `ui/src/components/VoiceModeToggle.tsx` | Three-state toggle: text only / voice input / full voice |
| `useVoiceMode` | `ui/src/hooks/useVoiceMode.ts` | Reads/writes voice mode setting via `/api/nexus-settings` |
| `useSilenceDetection` | `ui/src/hooks/useSilenceDetection.ts` | Web Audio API AnalyserNode watching for 1.5s silence threshold |
---
## Component Boundaries
### voicePipelineService (Core)
This is the key abstraction for v1.6. Both the web HTTP route and the Telegram bridge call this service — neither knows about the other.
| Method | Input | Output | Implementation |
|--------|-------|--------|----------------|
| `transcribe(buffer, format)` | `Buffer`, `"webm" or "ogg"` | `Promise<string>` | Writes temp file, uses `execFile` (not `exec`) to spawn `whisper-cpp` or `whisper` CLI, reads stdout, cleans up |
| `synthesize(text, voiceId?)` | `string`, optional voiceId | `Promise<Buffer>` | Spawns `piper` CLI via `spawn`, pipes text to stdin, collects WAV stdout |
| `formatForVoice(text)` | `string` | `{ voice: string; full: string }` | Strips code blocks and markdown for voice; returns both variants |
The `transcribe` method extends the existing `/transcribe` implementation from `chat-files.ts` by adding an `ogg` format path alongside the existing `webm` path. The same cascade (whisper-cpp first, openai-whisper fallback) is preserved.
**Why a dedicated service vs. inline in routes:**
The Telegram bridge cannot call the web route (circular HTTP call within the same process). Both transports need the same logic. Extracting to a service eliminates duplication and makes both implementations testable in isolation.
### telegram service
A thin relay, not a feature-rich bot. It:
1. Holds a single grammY `Bot` instance, initialized when `telegramToken` is set in nexus-settings
2. Routes text messages to `chatService.addMessage()` then collects AI response via `puterProxyService.chatStream()`
3. Routes voice messages — downloads OGG file, calls `voicePipelineService.transcribe()`, then same text path
4. If `voiceMode === "full_voice"`: calls `voicePipelineService.synthesize()`, sends audio back via `ctx.replyWithAudio()`
5. Prefixes agent name on replies: `[Agent Name]: message text`
**No per-user conversation tracking.** All Telegram messages go to a single conversation (or create one on first use) associated with the workspace. This is the intentional "thin bridge" design — full sync is out of scope per PROJECT.md.
### Voice Route vs. Chat Files Route
The existing `/transcribe` endpoint lives inside `chatFileRoutes` in `chat-files.ts`. For v1.6, the endpoint moves to a dedicated `voice.ts` route. This is a path-preserving refactor: the endpoint behavior is unchanged, but the code now lives in a Nexus-specific file rather than inside a mostly-upstream file.
Moving the handler reduces merge conflict surface on future upstream rebases of `chat-files.ts`.
---
## Recommended Project Structure
```
server/src/
app.ts # MODIFY: register voiceRoutes, telegramRoutes
routes/
chat-files.ts # MODIFY: remove /transcribe handler (moved to voice.ts)
voice.ts # NEW: POST /transcribe, POST /synthesize
nexus-settings.ts # MODIFY: expose voiceMode + telegramToken fields
telegram.ts # NEW: GET /telegram/status, POST /telegram/token
services/
voice-pipeline.ts # NEW: transcribe(), synthesize(), formatForVoice()
telegram.ts # NEW: grammY bot lifecycle + relay logic
nexus-settings.ts # MODIFY: add voiceMode + telegramToken to schema
ui/src/
components/
VoiceMicButton.tsx # NEW: replaces VoiceRecordButton in ChatInput
WaveformDisplay.tsx # NEW: animated bars from AnalyserNode data
VoiceModeToggle.tsx # NEW: 3-state toggle (text / voice-in / full-voice)
VoiceRecordButton.tsx # KEEP as-is (still used in file upload contexts)
TtsButton.tsx # MODIFY: add autoPlay prop
ChatInput.tsx # MODIFY: add VoiceModeToggle, swap in VoiceMicButton
ChatMessage.tsx # MODIFY: voice_mode badge + dual output expand
hooks/
useVoiceMode.ts # NEW: reads/writes voiceMode setting
useSilenceDetection.ts # NEW: AnalyserNode silence threshold
usePiperTts.ts # KEEP as-is (browser-side TTS unchanged)
packages/shared/src/
validators/chat.ts # MODIFY: add voiceMode flag to createMessageSchema
types/chat.ts # MODIFY: add voiceMode field to ChatMessage
```
---
## Architectural Patterns
### Pattern 1: Transport-Agnostic Voice Service
**What:** A server service (`voicePipelineService`) owns STT and TTS logic. HTTP routes and Telegram relay both call the service — neither implements STT/TTS directly.
**When to use:** Any time two transports (web + bot) need the same capability.
**Trade-offs:** Adds one indirection layer. Worth it: eliminates duplication, makes each transport testable independently.
**Shape:**
```typescript
// server/src/services/voice-pipeline.ts
export function voicePipelineService() {
// Uses execFile (not exec) — prevents shell injection, consistent with codebase pattern
async function transcribe(buffer: Buffer, format: "webm" | "ogg"): Promise<string>;
async function synthesize(text: string, voiceId?: string): Promise<Buffer>;
function formatForVoice(text: string): { voice: string; full: string };
return { transcribe, synthesize, formatForVoice };
}
```
The existing `/transcribe` handler in `chat-files.ts` already uses `promisify(execFile)` — this pattern is the right model. The service wraps it with format selection (`webm` vs `ogg`) and the same whisper-cpp → openai-whisper cascade.
### Pattern 2: Thin Telegram Relay
**What:** The Telegram bot is a relay, not a first-class UI. It translates Telegram message events into the same chatService calls the web UI makes, then sends the response back via Telegram.
**When to use:** Building a disposable bridge that will be replaced by a richer implementation later.
**Trade-offs:** No rich UI (no inline keyboards, no threading). Acceptable: PROJECT.md explicitly calls out "thin bridge only" and "Telegram threads/topics/inline keyboards" are out of scope.
**Shape:**
```typescript
// server/src/services/telegram.ts
import { Bot } from "grammy";
export function telegramService(db: Db) {
let bot: Bot | null = null;
function start(token: string): void; // idempotent, long-poll
function stop(): void;
function isRunning(): boolean;
return { start, stop, isRunning };
}
```
The bot calls `chatService(db)` and `puterProxyService(db)` directly — no HTTP round-trip to the same server.
### Pattern 3: Voice Mode Flag on Messages
**What:** Each message carries an optional `voiceMode: boolean` flag. When `true`, the server formats the response for voice (dual output: `voice` + `full`), and the client auto-plays TTS and shows the full text in a collapsible block.
**When to use:** Differentiating voice-initiated messages from text messages within the same conversation.
**Trade-offs:** Adds a field to `createMessageSchema` and the `ChatMessage` type. The field is optional and defaults to `false`, so existing messages and the upstream schema are not broken.
**Schema change:**
```typescript
// packages/shared/src/validators/chat.ts — additive only
export const createMessageSchema = z.object({
role: z.enum(["user", "assistant", "system"]),
content: z.string().min(1).max(100_000),
agentId: z.string().uuid().optional(),
messageType: z.string().optional(),
voiceMode: z.boolean().optional(), // NEW in v1.6
});
```
### Pattern 4: Direct Service Calls in Telegram Bridge
**What:** The Telegram bot does not call the Express HTTP API to get AI responses. It calls `chatService(db)` and `puterProxyService(db)` as regular TypeScript function calls within the same server process.
**When to use:** Any time a server-side integration needs the same AI response capability as the web UI without an HTTP round-trip.
**Trade-offs:** Telegram handler and web handler share the same in-process service instances. If chatService has connection pooling issues, both paths are affected. This is acceptable — single-user deployment, same DB connection pool.
**Why not HTTP:** A `fetch("http://localhost:PORT/api/...")` call from within the same server requires auth token injection, port discovery, and creates circular request chains that are hard to test and fragile in development.
### Pattern 5: grammY Long-Poll for Single-User Local Deployment
**What:** Use grammY `bot.start()` (long polling) rather than webhooks. The bot polls Telegram for new messages continuously while the server is running.
**When to use:** Local single-user deployments where a public HTTPS endpoint is not available. No reverse proxy needed, no SSL cert, no domain.
**Trade-offs:** Long polling is slightly less efficient than webhooks (Telegram must respond to each poll request) but functionally equivalent for <5,000 messages/hour. Fine for personal use.
**Lifecycle:**
- Start: `nexusSettingsService().get()` finds `telegramToken` set `telegramService(db).start(token)`
- Stop: `server.close()` `telegramService(db).stop()`
- Runtime toggle: `POST /api/telegram/token` updates nexus-settings and calls start/stop
---
## Data Flow
### Web Voice Input Flow
```
User holds mic button
|
v
VoiceMicButton: MediaRecorder + AnalyserNode
|
v (silence detected after 1.5s or stop pressed)
POST /api/transcribe {audio: webm blob}
|
v
voice.ts route -> voicePipelineService.transcribe(buffer, "webm")
|
v (whisper-cpp or openai-whisper CLI via execFile)
{ text: "transcribed text" }
|
v
ChatInput fills textarea -> user sends (message tagged voiceMode: true)
|
v
POST /conversations/:id/stream -> chatService + puterProxyService
|
v (SSE tokens arrive)
ChatMessage with voice_mode badge + dual output (voice text + full text collapsible)
|
v
TtsButton auto-plays (browser-side piper-tts-web WASM — unchanged from v1.5)
```
### Server-Side TTS Flow (POST /synthesize)
```
POST /api/synthesize { text, voiceId? }
|
v
voice.ts route -> voicePipelineService.synthesize(text)
|
v (piper CLI via spawn: text -> stdin, WAV bytes <- stdout)
Response: Content-Type audio/wav, Buffer body
|
v
Client: new Audio(URL.createObjectURL(blob)).play()
```
Note: Server-side `/synthesize` is new in v1.6. Its primary consumer is the Telegram bridge (which cannot use browser WASM). Web chat continues using browser-side `usePiperTts` WASM (v1.5 unchanged). The route is available for headless/server scenarios going forward.
### Telegram Text Message Flow
```
Telegram user sends text
|
v
grammY bot.on("message:text") handler
|
v
telegramService: resolveOrCreateConversation(db)
|
v
chatService(db).addMessage(conversationId, { role: "user", content: text })
|
v
telegramService: collect full response via puterProxyService(db).chatStream()
|
v (if voiceMode !== "full_voice")
ctx.reply("[AgentName]: full_response_text")
| (if voiceMode === "full_voice")
v
voicePipelineService.formatForVoice(response) -> { voice, full }
ctx.reply("[AgentName]: " + full) -- text message with full details
|
v
voicePipelineService.synthesize(voice) -> WAV Buffer
ctx.replyWithAudio(InputFile(wavBuffer, "reply.ogg"))
```
### Telegram Voice Message Flow
```
Telegram user sends voice note (OGG Opus format)
|
v
grammY bot.on("message:voice") -> ctx.getFile() -> download Buffer
|
v
voicePipelineService.transcribe(buffer, "ogg") -> whisper CLI -> text
|
v
(same path as Telegram text message above)
```
### nexus-settings Schema Evolution
```
v1.5: { mode, voiceEnabled }
v1.6: { mode, voiceEnabled, voiceMode, telegramToken }
voiceMode: "text" | "voice_input" | "full_voice" (default: "text")
telegramToken: string | undefined (set by user via UI or POST /telegram/token)
```
`voiceMode` is a workspace-level setting (not per-agent). The three states map to:
- `"text"`: mic button transcribes to text input, TTS manual-only, Telegram text-only
- `"voice_input"`: mic transcribes and auto-sends, TTS manual-only, Telegram voice-in + text-out
- `"full_voice"`: mic auto-sends, TTS auto-plays on every response, Telegram voice-in + voice-out
---
## Scaling Considerations
This system targets a single user on Mac Mini M4 throughout its lifetime. Scaling is not a concern. The architecture is optimized for simplicity and upstream merge compatibility.
| Concern | At 1 user (target) | Notes |
|---------|-------------------|-------|
| STT latency | whisper-cpp base.en on M4: ~1-3s | Acceptable; shows transcribing spinner |
| TTS latency | piper CLI on M4: ~0.3-1s for short text | <3s target met |
| Telegram poll | grammY `bot.start()`, 1 process | Adequate for <5,000 msgs/hour |
| Memory overhead | ~10-20MB for polling loop | Acceptable on 16GB+ M4 |
| Piper model | First server-side synthesize: cold start | Piper loads model into memory; subsequent calls fast |
---
## Anti-Patterns
### Anti-Pattern 1: Telegram-Specific Voice Logic
**What people do:** Implement OGG-to-text and text-to-OGG directly inside the Telegram bot handler.
**Why it's wrong:** Creates two separate STT/TTS code paths that diverge over time. Voice bugs must be fixed in two places. Untestable in isolation.
**Do this instead:** All voice processing goes through `voicePipelineService`. The Telegram handler calls `transcribe(buf, "ogg")` the service handles format differences. The web route calls `transcribe(buf, "webm")` same service, different format argument.
### Anti-Pattern 2: Circular HTTP Call for Telegram AI Response
**What people do:** Telegram bot handler calls `fetch("http://localhost:PORT/api/conversations/:id/stream")` to get AI responses from within the same server process.
**Why it's wrong:** Requires auth token injection. Fragile (port discovery). Extra TCP round-trip. Fails in test environments where the HTTP server may not be running.
**Do this instead:** `telegramService` imports `chatService(db)` and `puterProxyService(db)` directly. Collect tokens from the async generator into a string, then send to Telegram as a single message.
### Anti-Pattern 3: Blocking grammY on Slow CLI Processes
**What people do:** `await synthesize()` inside a bot handler with no timeout, assuming piper is always available and fast.
**Why it's wrong:** If the `piper` binary is not installed or hangs, the grammY update queue stalls. The same update gets retried indefinitely.
**Do this instead:** Wrap CLI calls in a `Promise.race([piperCall, timeout(8_000)])`. If piper times out or is not installed, fall back to text-only reply and log the failure. Bot degrades gracefully to text mode.
### Anti-Pattern 4: Keeping /transcribe Inside chat-files.ts
**What people do:** Leave the STT handler in `chat-files.ts` and call `voicePipelineService` from there, adding Nexus-specific logic to an upstream-sourced file.
**Why it's wrong:** `chat-files.ts` is a mostly-upstream Paperclip file. Each rebase introduces merge conflicts. More Nexus-specific code in the file = more conflict surface.
**Do this instead:** Move `/transcribe` and `/synthesize` to a new `voice.ts` route file (Nexus-only, never in upstream). Keep `chat-files.ts` as close to upstream as possible.
### Anti-Pattern 5: Storing Telegram Token in Database
**What people do:** Create a new DB table or add a column to `instance_settings` to store the Telegram bot token.
**Why it's wrong:** Any DB schema change blocks upstream rebase (migration files conflict). The `nexus-settings.json` file-backed service is the established Nexus pattern for project-specific config that has no upstream equivalent.
**Do this instead:** Store `telegramToken` in `nexus-settings.json` via the existing `nexusSettingsService`. Same pattern as `voiceEnabled`, `mode`.
---
## Integration Points
### External Services
| Service | Integration Pattern | Notes |
|---------|---------------------|-------|
| Telegram Bot API | grammY `bot.start()` long-polling (Node.js) | No public URL required; polling starts on server boot if token present in nexus-settings |
| whisper-cpp / openai-whisper | `execFile` cascade (same as existing `/transcribe`) | Format argument added: writes `.webm` or `.ogg` temp file based on input |
| piper TTS binary | `child_process.spawn` stdin -> stdout | Text piped to stdin; WAV or raw PCM bytes collected from stdout |
### Internal Boundaries
| Boundary | Communication | Notes |
|----------|---------------|-------|
| voice route <-> voicePipelineService | Direct function call | Route is thin HTTP wrapper; all logic in service |
| telegram service <-> voicePipelineService | Direct function call | Same service used by both transports |
| telegram service <-> chatService | Direct function call | Bot calls `chatService(db)` directly — no HTTP round-trip |
| telegram service <-> nexusSettingsService | Direct function call | Reads `voiceMode` and `telegramToken` at start and on each message |
| web UI <-> voice route | REST: `POST /api/transcribe`, `POST /api/synthesize` | Web client uses browser-side piper WASM for TTS; `/synthesize` primarily for Telegram |
| UI VoiceModeToggle <-> nexus-settings | REST: `PATCH /api/nexus-settings` | Reads/writes `voiceMode` setting |
---
## Build Order
Based on component dependencies, the recommended build order within this milestone:
| Step | Component(s) | Reason |
|------|-------------|--------|
| 1 | `nexus-settings` schema extensions (`voiceMode`, `telegramToken`) | Everything downstream reads settings |
| 2 | `voicePipelineService` | Backs all voice. No new deps. Independently testable. |
| 3 | `voice.ts` route (`POST /transcribe`, `POST /synthesize`) | Thin wrapper. Register in `app.ts`. Move handler from chat-files. |
| 4 | `VoiceMicButton` + `WaveformDisplay` + `useSilenceDetection` | Pure UI. Depends only on `/transcribe`. |
| 5 | `VoiceModeToggle` + `useVoiceMode` | Depends on `voiceMode` in nexus-settings schema (Step 1). |
| 6 | `ChatMessage` dual output | Depends on `voiceMode` in shared `ChatMessage` type. |
| 7 | `createMessageSchema` + `ChatMessage` type (`voiceMode` flag) | Shared package change. Required by Steps 5-6. Could move earlier. |
| 8 | `telegramService` | Depends on voicePipelineService (2), chatService (existing), nexusSettings (1). |
| 9 | `telegram.ts` route + app.ts registration | Management endpoints. Needs telegramService. |
| 10 | Onboarding STT/TTS hardware detection step | Final: wires all voice detection into onboarding flow. |
Steps 4-6 can run in parallel with Steps 7-9 if split across phases.
---
## Sources
- Direct codebase inspection: `server/src/routes/chat-files.ts` (lines 297-386), `server/src/routes/chat.ts`, `server/src/services/nexus-settings.ts`, `server/src/app.ts`, `ui/src/components/VoiceRecordButton.tsx`, `ui/src/components/TtsButton.tsx`, `ui/src/hooks/usePiperTts.ts`, `packages/shared/src/validators/chat.ts`, `packages/shared/src/types/chat.ts`
- `.planning/STATE.md` — v1.6 architectural decisions (transport-agnostic, disposable bridge, dual output, per-message flag)
- `.planning/milestones/v1.5-phases/34-voice/34-RESEARCH.md` — existing voice implementation details, WASM TTS pattern
- [grammY documentation](https://grammy.dev/) — TypeScript-native, Bot API 9.6 (April 2026), long-polling vs webhooks
- [grammY deployment types guide](https://grammy.dev/guide/deployment-types) — long polling recommended for single-user local; Express integration pattern
- [rhasspy/piper (archived)](https://github.com/rhasspy/piper) — CLI: `echo "text" | piper --model voice.onnx -f -`; development moved to OHF-Voice/piper1-gpl Oct 2025
- grammY supports Telegram Bot API 9.6 (released April 3, 2026) — latest version confirmed
---
*Architecture research for: Voice Pipeline + Minimal Telegram Bridge (v1.6)*
*Researched: 2026-04-03*