docs(38): research Telegram bridge phase
This commit is contained in:
parent
1d64140575
commit
4073625cb0
1 changed files with 576 additions and 0 deletions
576
.planning/phases/38-telegram-bridge/38-RESEARCH.md
Normal file
576
.planning/phases/38-telegram-bridge/38-RESEARCH.md
Normal file
|
|
@ -0,0 +1,576 @@
|
|||
# Phase 38: Telegram Bridge - Research
|
||||
|
||||
**Researched:** 2026-04-03
|
||||
**Domain:** Telegram bot integration (grammY), voice note relay (OGG/ffmpeg/Whisper), onboarding wizard step
|
||||
**Confidence:** HIGH
|
||||
|
||||
---
|
||||
|
||||
<user_constraints>
|
||||
## User Constraints (from CONTEXT.md)
|
||||
|
||||
### Locked Decisions
|
||||
- `grammy ^1.41.1` — TypeScript-native Telegram bot framework, long polling, clean file handling
|
||||
- Long polling via `bot.start()` — no public HTTPS required for Mac Mini behind NAT
|
||||
- Single bot for all agents — messages prefixed with `[AgentName]`
|
||||
- Telegram voice messages are OGG/Opus — download via `ctx.getFile()`, transcode to WAV 16kHz via ffmpeg before Whisper
|
||||
- TTS reply: synthesize via VoicePipelineService, convert WAV → OGG/Opus via ffmpeg, send via `ctx.replyWithVoice()`
|
||||
- Telegram token stored in nexus-settings.json (already in schema from Phase 36)
|
||||
- Bridge calls chatService and voicePipelineService directly (same-process, no HTTP round-trip)
|
||||
- Acknowledge updates immediately, process async to prevent Telegram resending
|
||||
- chatId → sessionId mapping: lightweight in-memory Map (single-user deployment)
|
||||
- Bridge service must be under 500 lines (TGRAM-06)
|
||||
- Onboarding BotFather setup: wizard step with guided token entry and validation
|
||||
|
||||
### Claude's Discretion
|
||||
All implementation choices are at Claude's discretion — discuss phase was skipped per user setting.
|
||||
|
||||
### Deferred Ideas (OUT OF SCOPE)
|
||||
None — discuss phase skipped. Per REQUIREMENTS.md out-of-scope:
|
||||
- Deep Telegram ↔ web chat session sync (requires Postgres event bus)
|
||||
- Telegram inline keyboards/threaded replies
|
||||
- Per-agent Telegram bots
|
||||
- GSD formatting in Telegram
|
||||
- Transcription editing before sending
|
||||
</user_constraints>
|
||||
|
||||
<phase_requirements>
|
||||
## Phase Requirements
|
||||
|
||||
| ID | Description | Research Support |
|
||||
|----|-------------|------------------|
|
||||
| TGRAM-01 | Single Telegram bot relays text messages bidirectionally between user and agents | grammY `bot.on("message:text")` + `puterProxyService.chatStream` collector + `chatService.addMessage` |
|
||||
| TGRAM-02 | Agent replies in Telegram are prefixed with agent identity (e.g. `[PM]`, `[Engineer]`) | Resolve agent name from `agentService.list(companyId)` before each reply; prepend `[AgentName]:` |
|
||||
| TGRAM-03 | Telegram voice messages are transcribed (OGG → Whisper) and forwarded to agent as text | `ctx.getFile()` → fetch buffer → `voicePipelineService.transcodeToWav16k(buf, "ogg")` → `transcribe` |
|
||||
| TGRAM-04 | Agent responses can be sent back as Telegram voice notes (TTS → OGG) | `voicePipelineService.synthesize(text)` (returns raw PCM) → ffmpeg WAV→OGG/Opus → `ctx.replyWithVoice(new InputFile(buffer))` |
|
||||
| TGRAM-05 | Telegram bridge uses long polling (no public HTTPS required) | `bot.start()` — confirmed correct for NAT/local deployments |
|
||||
| TGRAM-06 | Telegram bridge is under 500 lines of code | Service pattern + thin relay architecture enforces this |
|
||||
| ONBRD-03 | Guided BotFather setup flow for Telegram bot token during onboarding | New `TelegramStep` component added as step 5 in `NexusOnboardingWizard.tsx`; saves via `PATCH /api/nexus/settings` |
|
||||
</phase_requirements>
|
||||
|
||||
---
|
||||
|
||||
## Summary
|
||||
|
||||
Phase 38 builds a thin Telegram relay bridge that connects a user's phone to Nexus agents already running in the same process. The architecture is pure consumer — grammY handles Telegram protocol, `voicePipelineService` (shipped in Phase 36) handles all audio conversion, and `chatService` + `puterProxyService` handle message persistence and LLM generation. No new services are invented: `telegram.ts` is a factory function that wires existing services together.
|
||||
|
||||
The critical design constraint is under-500-lines (TGRAM-06). This is achievable because the bridge does no LLM work, no audio DSP, and no session management beyond an in-memory Map. The full pipeline for a text message is: receive → persist user message → collect LLM stream → prefix with agent name → send reply. For a voice message: receive → download OGG → transcode to WAV → transcribe → relay as text → same text pipeline.
|
||||
|
||||
The onboarding step (ONBRD-03) adds a new wizard step `TelegramStep` inserted as step 5 in `NexusOnboardingWizard.tsx` (before the existing root directory step 5, pushing it to step 6). The step guides the user through BotFather token creation, validates the token with a live API call (`bot.api.getMe()`), and saves it via the existing `PATCH /api/nexus/settings` endpoint.
|
||||
|
||||
**Primary recommendation:** Build in two plans — (1) `telegram.ts` service + `app.ts` wiring for text relay (TGRAM-01, TGRAM-02, TGRAM-05, TGRAM-06), (2) voice relay extension + TGRAM-03/TGRAM-04, then (3) onboarding wizard step (ONBRD-03).
|
||||
|
||||
---
|
||||
|
||||
## Standard Stack
|
||||
|
||||
### Core
|
||||
| Library | Version | Purpose | Why Standard |
|
||||
|---------|---------|---------|--------------|
|
||||
| grammy | ^1.42.0 | Telegram Bot API framework | TypeScript-native, long polling, `ctx.getFile()`, `InputFile` buffer upload, Bot API 9.6; 1.4M weekly downloads; verified current 2026-04-03 |
|
||||
| ffmpeg-static | ^5.3.0 | FFmpeg binary (already installed) | Ships FFmpeg 6.1.1 macOS arm64 binary; already in `server/package.json`; used by `voicePipelineService` |
|
||||
|
||||
### Supporting
|
||||
| Library | Version | Purpose | When to Use |
|
||||
|---------|---------|---------|-------------|
|
||||
| node:child_process spawn | built-in | WAV → OGG/Opus transcoding | Only for voice reply path (TGRAM-04); same pattern as `voicePipelineService.transcodeToWav16k` |
|
||||
|
||||
### Alternatives Considered
|
||||
| Instead of | Could Use | Tradeoff |
|
||||
|------------|-----------|----------|
|
||||
| grammy | Telegraf | Telegraf is older (800K weekly vs 1.4M), less TypeScript-native, grammY has cleaner file API |
|
||||
| in-memory Map | grammY session storage | grammY session plugin adds `@grammyjs/storage-*` deps; Map is correct for single-user deployment |
|
||||
| in-memory Map | grammY conversations plugin | Conversation plugin is stateful multi-turn; not needed for thin relay pattern |
|
||||
|
||||
**Installation:**
|
||||
```bash
|
||||
cd server && pnpm add grammy
|
||||
```
|
||||
|
||||
**Version verification:** `npm view grammy version` → `1.42.0` (verified 2026-04-03)
|
||||
|
||||
---
|
||||
|
||||
## Architecture Patterns
|
||||
|
||||
### Recommended File Structure
|
||||
```
|
||||
server/src/services/
|
||||
├── telegram.ts # NEW: grammY bot lifecycle + relay logic (< 500 lines)
|
||||
server/src/routes/
|
||||
├── telegram.ts # NEW: POST /api/telegram/token, GET /api/telegram/status
|
||||
ui/src/components/onboarding/
|
||||
├── TelegramStep.tsx # NEW: BotFather guided step with token validation
|
||||
ui/src/components/
|
||||
├── NexusOnboardingWizard.tsx # MODIFY: insert TelegramStep as step 5, shift root-dir to step 6, summary to step 7
|
||||
```
|
||||
|
||||
### Pattern 1: Bot Lifecycle (Factory Function)
|
||||
**What:** `telegramService(db)` returns `{ start, stop, isRunning }`. Called from `app.ts` after settings are loaded. Uses existing factory-function service pattern.
|
||||
**When to use:** Always — matches all other services in `server/src/services/`.
|
||||
|
||||
```typescript
|
||||
// Source: grammy.dev/guide/getting-started (verified 2026-04-03)
|
||||
import { Bot, InputFile } from "grammy";
|
||||
|
||||
export function telegramService(db: Db) {
|
||||
let bot: Bot | null = null;
|
||||
|
||||
async function start(token: string) {
|
||||
bot = new Bot(token);
|
||||
bot.catch((err) => logger.error({ err }, "Telegram bot error"));
|
||||
registerHandlers(bot, db);
|
||||
bot.start(); // non-blocking — returns Promise that never resolves until stopped
|
||||
}
|
||||
|
||||
async function stop() {
|
||||
await bot?.stop();
|
||||
bot = null;
|
||||
}
|
||||
|
||||
function isRunning() { return bot !== null; }
|
||||
|
||||
return { start, stop, isRunning };
|
||||
}
|
||||
```
|
||||
|
||||
### Pattern 2: Text Message Handler
|
||||
**What:** On `message:text`, persist user message, collect full LLM stream, prefix with `[AgentName]`, send reply.
|
||||
**Key detail:** The Telegram bridge cannot use SSE streaming. Must collect all tokens from `puterProxyService.chatStream` into a full string before sending. This is the only place in the codebase where we consume an async generator to collect a full response.
|
||||
|
||||
```typescript
|
||||
// Source: grammy.dev guide + codebase chat.ts pattern
|
||||
bot.on("message:text", async (ctx) => {
|
||||
const chatId = ctx.chat.id;
|
||||
const conversationId = await getOrCreateConversation(chatId, db);
|
||||
|
||||
// Acknowledge immediately — Telegram resends if no response within ~15s
|
||||
await ctx.react("👍").catch(() => {}); // optional status feedback
|
||||
|
||||
try {
|
||||
const userText = ctx.message.text;
|
||||
await chatSvc.addMessage(conversationId, { role: "user", content: userText });
|
||||
|
||||
const { agentName, agentId } = await resolveAgent(db, chatId);
|
||||
const messages = await buildMessagesArray(conversationId, userText, chatSvc);
|
||||
|
||||
let fullResponse = "";
|
||||
const stream = puterProxy.chatStream(companyId, agentId, messages, undefined, undefined);
|
||||
for await (const token of stream) {
|
||||
fullResponse += token;
|
||||
}
|
||||
const reply = `[${agentName}]: ${fullResponse.trim()}`;
|
||||
|
||||
await chatSvc.addMessage(conversationId, {
|
||||
role: "assistant",
|
||||
content: fullResponse.trim(),
|
||||
agentId,
|
||||
});
|
||||
await ctx.reply(reply, { parse_mode: "Markdown" });
|
||||
} catch (err) {
|
||||
await ctx.reply("Sorry, something went wrong.").catch(() => {});
|
||||
}
|
||||
});
|
||||
```
|
||||
|
||||
### Pattern 3: Voice Message Handler (Async — Acknowledge First)
|
||||
**What:** On `message:voice`, immediately send "Transcribing..." status, then process OGG download → WAV transcode → transcribe → relay as text message.
|
||||
**Critical:** The download + transcode + Whisper pipeline takes 2–5 seconds. If the handler does not return quickly, Telegram resends the update and the bot processes the same voice message multiple times.
|
||||
|
||||
```typescript
|
||||
// Source: grammy.dev/guide/files (verified 2026-04-03)
|
||||
bot.on("message:voice", async (ctx) => {
|
||||
const chatId = ctx.chat.id;
|
||||
await ctx.reply("Transcribing...").catch(() => {});
|
||||
|
||||
// Do NOT await the heavy pipeline — process async
|
||||
processVoiceMessage(ctx, chatId, db).catch((err) =>
|
||||
ctx.reply("Voice transcription failed.").catch(() => {})
|
||||
);
|
||||
});
|
||||
|
||||
async function processVoiceMessage(ctx, chatId, db) {
|
||||
const file = await ctx.getFile();
|
||||
// Construct download URL manually (files plugin not needed)
|
||||
const token = (ctx.api as any).token as string;
|
||||
const url = `https://api.telegram.org/file/bot${token}/${file.file_path}`;
|
||||
const response = await fetch(url);
|
||||
const arrayBuffer = await response.arrayBuffer();
|
||||
const oggBuffer = Buffer.from(arrayBuffer);
|
||||
|
||||
const { text } = await voiceSvc.transcribe(oggBuffer, "ogg");
|
||||
// ... then relay as text (same pipeline as Pattern 2, starting from user message)
|
||||
}
|
||||
```
|
||||
|
||||
### Pattern 4: WAV → OGG/Opus Transcoding (for TTS reply, TGRAM-04)
|
||||
**What:** `voicePipelineService.synthesize()` returns raw PCM (no WAV header). For Telegram voice notes, must produce OGG/Opus. Use same ffmpeg-static spawn pattern as `transcodeToWav16k`, but with different args.
|
||||
**Telegram requirement:** OGG Opus, 48kHz, mono (Bot API sendVoice spec).
|
||||
|
||||
```typescript
|
||||
// Source: Telegram Bot API docs + ffmpeg pattern from voice-pipeline.ts
|
||||
async function transcodeToOggOpus(rawPcmBuffer: Buffer): Promise<Buffer> {
|
||||
return new Promise<Buffer>((resolve, reject) => {
|
||||
// Input: raw PCM s16le 22050Hz (Piper default output)
|
||||
// Output: OGG Opus 48kHz for Telegram
|
||||
const ffmpeg = spawn(ffmpegBin, [
|
||||
"-f", "s16le", "-ar", "22050", "-ac", "1", "-i", "pipe:0",
|
||||
"-c:a", "libopus", "-ar", "48000", "-f", "ogg", "pipe:1"
|
||||
], { stdio: ["pipe", "pipe", "pipe"] });
|
||||
|
||||
const chunks: Buffer[] = [];
|
||||
ffmpeg.stdout.on("data", (chunk: Buffer) => chunks.push(chunk));
|
||||
ffmpeg.stderr.on("data", () => {}); // discard
|
||||
ffmpeg.on("close", (code) => {
|
||||
code === 0 ? resolve(Buffer.concat(chunks)) : reject(new Error(`ffmpeg exited ${code}`));
|
||||
});
|
||||
ffmpeg.on("error", reject);
|
||||
ffmpeg.stdin.write(rawPcmBuffer);
|
||||
ffmpeg.stdin.end();
|
||||
});
|
||||
}
|
||||
```
|
||||
|
||||
**Note on Piper output format:** Piper `--output-raw` produces raw s16le PCM. The sample rate depends on the voice model — `en_US-lessac-medium` outputs 22050Hz. Verify with `piper --help` or model metadata. The ffmpeg `-ar 22050` input flag must match.
|
||||
|
||||
### Pattern 5: chatId → conversationId Mapping
|
||||
**What:** Persistent in-memory Map. Single-user deployment means no persistence across server restarts is needed. Each `chatId` gets one conversation per agent.
|
||||
|
||||
```typescript
|
||||
// chatId:agentId → conversationId
|
||||
const sessionMap = new Map<string, string>();
|
||||
|
||||
async function getOrCreateConversation(chatId: number, agentId: string, db: Db): Promise<string> {
|
||||
const key = `${chatId}:${agentId}`;
|
||||
if (sessionMap.has(key)) return sessionMap.get(key)!;
|
||||
|
||||
const company = await getFirstCompany(db); // companyService(db).list()[0]
|
||||
const conv = await chatSvc.createConversation(company.id, {
|
||||
title: `Telegram:${chatId}`,
|
||||
agentId,
|
||||
});
|
||||
sessionMap.set(key, conv.id);
|
||||
return conv.id;
|
||||
}
|
||||
```
|
||||
|
||||
### Pattern 6: app.ts Integration
|
||||
**What:** After `createApp`, read settings and conditionally start telegram service. The `telegramService` is not mounted as an Express route — it runs as a side-effect process.
|
||||
|
||||
```typescript
|
||||
// In server/src/index.ts or app startup (after createApp resolves)
|
||||
const settings = await nexusSettingsService().get();
|
||||
const tg = telegramService(db);
|
||||
if (settings.telegramToken) {
|
||||
await tg.start(settings.telegramToken);
|
||||
}
|
||||
|
||||
// Also expose management endpoints via Express:
|
||||
// POST /api/telegram/token — saves token and (re)starts bot
|
||||
// GET /api/telegram/status — returns { running: boolean }
|
||||
```
|
||||
|
||||
### Pattern 7: Onboarding TelegramStep Component
|
||||
**What:** New step inserted as step 5 in `NexusOnboardingWizard.tsx` (after VoiceStep at step 4). Shows instructions for creating a bot via BotFather, provides a token input field, validates the token with `GET https://api.telegram.org/bot<token>/getMe` (or via server endpoint), saves via `PATCH /api/nexus/settings`.
|
||||
|
||||
```typescript
|
||||
// ui/src/components/onboarding/TelegramStep.tsx
|
||||
interface TelegramStepProps {
|
||||
onSave: (token: string) => void;
|
||||
onSkip: () => void;
|
||||
}
|
||||
// Validation: call POST /api/telegram/token with the token
|
||||
// Server validates via bot.api.getMe() before saving
|
||||
// On success: show bot username, enable Continue button
|
||||
```
|
||||
|
||||
### Anti-Patterns to Avoid
|
||||
- **Awaiting the full pipeline synchronously in the voice handler:** Telegram will resend the update if the handler takes >15 seconds. Always fire-and-forget heavy processing, sending an intermediate status message.
|
||||
- **Using `exec` instead of `spawn` for ffmpeg:** `exec` buffers stdout — for large OGG files this causes memory spikes and truncation. Always use `spawn` with streaming pipes.
|
||||
- **Polling `bot.start()` after `await`:** `bot.start()` returns a Promise that never resolves except on stop. Never `await` it at the top level — call it non-blocking.
|
||||
- **Hardcoding `22050` for Piper sample rate without checking the model:** Different Piper voice models have different sample rates. Read from model metadata or confirm with `--output-file /dev/stdout | soxi -r` at startup.
|
||||
- **Calling `ctx.getFile()` and constructing the download URL with string interpolation from `ctx.msg.voice.file_id`:** `ctx.getFile()` returns a `File` object with `file_path`; the download URL is `https://api.telegram.org/file/bot{TOKEN}/{file_path}`, NOT constructed from `file_id`.
|
||||
|
||||
---
|
||||
|
||||
## Don't Hand-Roll
|
||||
|
||||
| Problem | Don't Build | Use Instead | Why |
|
||||
|---------|-------------|-------------|-----|
|
||||
| Telegram Bot API protocol | Custom HTTP polling loop | `grammy Bot.start()` | Handles getUpdates loop, retry, graceful shutdown, error isolation |
|
||||
| File download from Telegram | Custom URL construction + retry | `ctx.getFile()` + fetch | `ctx.getFile()` handles file_path resolution; file_path URLs are temporarily valid |
|
||||
| OGG audio parsing | Custom OGG/Opus demuxer | ffmpeg-static (already installed) | OGG/Opus has multiple container/codec variants; ffmpeg handles all |
|
||||
| Token validation | Manually calling Bot API | `new Bot(token).api.getMe()` | Single call returns bot info or throws on invalid token |
|
||||
| Session management | Custom SQLite session store | In-memory Map | Single-user deployment; restarts are rare; Map is idiomatic for this |
|
||||
|
||||
**Key insight:** The Telegram bridge is a relay, not a platform. Every non-trivial problem (audio, LLM, persistence) is already solved by existing services.
|
||||
|
||||
---
|
||||
|
||||
## Common Pitfalls
|
||||
|
||||
### Pitfall 1: Telegram Resending Voice Updates (Async Pipeline)
|
||||
**What goes wrong:** Handler downloads OGG + calls ffmpeg + runs Whisper. This takes 3–8 seconds. Telegram's getUpdates loop sees no acknowledgement and resends the same update. Bot processes the same voice note 2–3 times, sending duplicate replies.
|
||||
**Why it happens:** grammY's long polling acknowledges updates when `getUpdates` is called with the next `offset`. If the handler blocks the loop (sequential processing), the offset isn't advanced until the handler completes.
|
||||
**How to avoid:** Fire off the heavy pipeline as a detached async task (do NOT await it in the handler body). Send an immediate "Transcribing..." reply so the user gets feedback. The handler returns immediately; offset advances; Telegram does not resend.
|
||||
**Warning signs:** Duplicate replies for voice messages; "Transcribing..." appearing twice.
|
||||
|
||||
### Pitfall 2: Piper Output Sample Rate Mismatch
|
||||
**What goes wrong:** `synthesize()` returns raw s16le PCM from Piper `--output-raw`. ffmpeg is invoked with `-ar 22050` (assumed). If the voice model outputs 16000Hz or 44100Hz, the OGG will be pitched wrong or malformed.
|
||||
**Why it happens:** Piper model sample rates vary by model. `en_US-lessac-medium` is 22050Hz, but `en_US-amy-low` is 16000Hz.
|
||||
**How to avoid:** At service startup, run `piper --model <voice> --output-raw < /dev/null 2>&1 | grep "sample rate"` or read the `.onnx.json` model config to get `audio.sample_rate`. Use that value in the ffmpeg `-ar` flag.
|
||||
**Warning signs:** Voice messages play back at wrong pitch; ffmpeg exits with "Invalid data found when processing input".
|
||||
|
||||
### Pitfall 3: Long Message Truncation in Telegram
|
||||
**What goes wrong:** Telegram has a 4096-character limit per message. LLM responses on complex topics can exceed this. `ctx.reply()` throws with "Bad Request: message is too long".
|
||||
**Why it happens:** Telegram Bot API hard limit.
|
||||
**How to avoid:** Split responses at 4000 chars (buffer for prefix): `const chunks = splitAt4000(reply); for (const chunk of chunks) await ctx.reply(chunk)`.
|
||||
**Warning signs:** Error thrown on long agent replies; bot crashes without error handler.
|
||||
|
||||
### Pitfall 4: Webhook Conflict Blocking Long Polling
|
||||
**What goes wrong:** `bot.start()` throws "Conflict: can't use getUpdates method while webhook is active; use deleteWebhook to delete the webhook first".
|
||||
**Why it happens:** If the bot token was ever configured with a webhook (e.g. during testing), the webhook remains registered and blocks long polling.
|
||||
**How to avoid:** Before `bot.start()`, always call `await bot.api.deleteWebhook()`. grammY may do this automatically in some versions — add it explicitly to be safe.
|
||||
**Warning signs:** `bot.start()` throws Conflict error immediately on startup.
|
||||
|
||||
### Pitfall 5: `bot.start()` Crash Kills Express Process
|
||||
**What goes wrong:** An unhandled error in a grammY middleware crashes the Node process. Express server goes down.
|
||||
**Why it happens:** grammY wraps handlers but unhandled promise rejections outside handlers can propagate.
|
||||
**How to avoid:** Always call `bot.catch((err) => logger.error(err, "Telegram bot error"))` before `bot.start()`. Wrap the `start()` call in try/catch.
|
||||
**Warning signs:** Express server exits unexpectedly; Telegram bot stops responding.
|
||||
|
||||
### Pitfall 6: Missing Agent for Telegram Conversation
|
||||
**What goes wrong:** Company exists but has no agents (was reset, or agents were deleted). `agentService.list(companyId)` returns `[]`. Bot crashes trying to access `agents[0].name`.
|
||||
**Why it happens:** Edge case in single-user setup; agents can be deleted via UI.
|
||||
**How to avoid:** If `agents.length === 0`, reply with "No agents configured — please set up an agent in the Nexus dashboard." and return.
|
||||
**Warning signs:** NullPointerError or "Cannot read properties of undefined" on agent name access.
|
||||
|
||||
---
|
||||
|
||||
## Code Examples
|
||||
|
||||
### Complete telegramService skeleton (verified patterns)
|
||||
```typescript
|
||||
// server/src/services/telegram.ts
|
||||
// Source: grammy.dev guide + codebase patterns from voice-pipeline.ts, chat.ts
|
||||
import { Bot, InputFile } from "grammy";
|
||||
import type { Db } from "@paperclipai/db";
|
||||
import { chatService } from "./chat.js";
|
||||
import { agentService } from "./agents.js";
|
||||
import { companyService } from "./companies.js";
|
||||
import { puterProxyService } from "./puter-proxy.js";
|
||||
import { voicePipelineService } from "./voice-pipeline.js";
|
||||
|
||||
export function telegramService(db: Db) {
|
||||
let bot: Bot | null = null;
|
||||
const sessionMap = new Map<string, string>(); // `${chatId}:${agentId}` → conversationId
|
||||
|
||||
// ... handler registration, start/stop, etc.
|
||||
return { start, stop, isRunning };
|
||||
}
|
||||
```
|
||||
|
||||
### Token validation endpoint
|
||||
```typescript
|
||||
// server/src/routes/telegram.ts
|
||||
router.post("/telegram/token", async (req, res) => {
|
||||
assertBoard(req);
|
||||
const { token } = req.body as { token?: string };
|
||||
if (!token) { res.status(400).json({ error: "token required" }); return; }
|
||||
|
||||
// Validate token with Telegram
|
||||
const testBot = new Bot(token);
|
||||
const me = await testBot.api.getMe(); // throws on invalid token
|
||||
|
||||
// Save to nexus-settings
|
||||
await nexusSettingsService().set({ telegramToken: token });
|
||||
|
||||
// (Re)start telegram service if already initialized
|
||||
// ...
|
||||
|
||||
res.json({ ok: true, botUsername: me.username });
|
||||
});
|
||||
```
|
||||
|
||||
### TelegramStep onboarding component structure
|
||||
```typescript
|
||||
// ui/src/components/onboarding/TelegramStep.tsx
|
||||
export function TelegramStep({ onSave, onSkip }: TelegramStepProps) {
|
||||
const [token, setToken] = useState("");
|
||||
const [validating, setValidating] = useState(false);
|
||||
const [botUsername, setBotUsername] = useState<string | null>(null);
|
||||
const [error, setError] = useState<string | null>(null);
|
||||
|
||||
async function handleValidate() {
|
||||
setValidating(true);
|
||||
setError(null);
|
||||
try {
|
||||
const res = await fetch("/api/telegram/token", {
|
||||
method: "POST",
|
||||
headers: { "Content-Type": "application/json" },
|
||||
body: JSON.stringify({ token }),
|
||||
});
|
||||
const data = await res.json();
|
||||
if (!res.ok) throw new Error(data.error);
|
||||
setBotUsername(data.botUsername);
|
||||
} catch (e) {
|
||||
setError(e instanceof Error ? e.message : "Invalid token");
|
||||
} finally {
|
||||
setValidating(false);
|
||||
}
|
||||
}
|
||||
// ... render: BotFather instructions + input + validate button + success state
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Codebase Integration Map
|
||||
|
||||
### Files to Create
|
||||
| File | Purpose |
|
||||
|------|---------|
|
||||
| `server/src/services/telegram.ts` | grammY bot lifecycle + all relay handlers |
|
||||
| `server/src/routes/telegram.ts` | `POST /api/telegram/token`, `GET /api/telegram/status` |
|
||||
| `ui/src/components/onboarding/TelegramStep.tsx` | BotFather guided token entry step |
|
||||
|
||||
### Files to Modify
|
||||
| File | Change |
|
||||
|------|--------|
|
||||
| `server/src/app.ts` | Import and mount `telegramRoutes()`; export `telegramService` reference |
|
||||
| `server/src/index.ts` (or startup entry) | Read `telegramToken` from settings on startup; call `tg.start(token)` if present |
|
||||
| `ui/src/components/NexusOnboardingWizard.tsx` | Insert `TelegramStep` as step 5; shift current step 5 (root dir) → step 6; shift step 6 (summary) → step 7; update step counter label |
|
||||
|
||||
### Key Service Dependencies (all already exist)
|
||||
| Service | Method Used | From |
|
||||
|---------|------------|------|
|
||||
| `chatService(db)` | `createConversation`, `addMessage`, `listMessages` | `server/src/services/chat.ts` |
|
||||
| `agentService(db)` | `list(companyId)` | `server/src/services/agents.ts` |
|
||||
| `companyService(db)` | `list()` | `server/src/services/companies.ts` |
|
||||
| `puterProxyService(db)` | `chatStream(companyId, agentId, messages)` | `server/src/services/puter-proxy.ts` |
|
||||
| `voicePipelineService()` | `transcribe(buf, "ogg")`, `synthesize(text)`, `transcodeToWav16k` | `server/src/services/voice-pipeline.ts` |
|
||||
| `nexusSettingsService()` | `get()`, `set({ telegramToken })` | `server/src/services/nexus-settings.ts` |
|
||||
|
||||
---
|
||||
|
||||
## State of the Art
|
||||
|
||||
| Old Approach | Current Approach | When Changed | Impact |
|
||||
|--------------|------------------|--------------|--------|
|
||||
| Telegraf (older grammY predecessor) | grammY 1.42 | grammY introduced ~2021, now dominant | TypeScript-first, cleaner `ctx.getFile()`, Bot API 9.6 support |
|
||||
| Webhooks for all deployments | Long polling for local/NAT | N/A — always deployment-specific | Mac Mini behind NAT → long polling is the only workable option |
|
||||
| Custom ffmpeg binary download | ffmpeg-static package | ~2019 | Already installed; no PATH issues in service environment |
|
||||
|
||||
**Deprecated/outdated:**
|
||||
- Telegraf: still maintained but grammY has overtaken it; ecosystem (plugins, docs) is grammY-centric as of 2025
|
||||
- fluent-ffmpeg: archived May 2025 — use `child_process.spawn` with ffmpeg-static path directly
|
||||
|
||||
---
|
||||
|
||||
## Open Questions
|
||||
|
||||
1. **Piper output sample rate for voice model**
|
||||
- What we know: `en_US-lessac-medium` likely outputs 22050Hz (common for medium models)
|
||||
- What's unclear: Exact sample rate not verified from installed model metadata
|
||||
- Recommendation: At service startup, read `/path/to/en_US-lessac-medium.onnx.json` and extract `audio.sample_rate`. Use that value for ffmpeg WAV→OGG transcode.
|
||||
|
||||
2. **`puterProxyService` — handling missing token for Telegram**
|
||||
- What we know: `puterProxyService.chatStream` throws `unprocessable("Puter auth token not configured")` if no token
|
||||
- What's unclear: Should the bot reply "AI not configured" or silently fall back to `chatService.streamEcho`?
|
||||
- Recommendation: Reply with a user-friendly message: "AI provider not configured. Please connect a provider in the Nexus dashboard." This matches the existing UI behavior.
|
||||
|
||||
3. **grammY concurrent update processing**
|
||||
- What we know: `bot.start()` processes updates sequentially by default ("processes all updates sequentially" per docs)
|
||||
- What's unclear: If two voice messages arrive simultaneously, does the second queue or get dropped?
|
||||
- Recommendation: Sequential processing is acceptable for single-user deployment. If concurrent voice processing is needed, grammY's `runner` plugin adds parallelism — defer to future phase.
|
||||
|
||||
---
|
||||
|
||||
## Environment Availability
|
||||
|
||||
| Dependency | Required By | Available | Version | Fallback |
|
||||
|------------|------------|-----------|---------|----------|
|
||||
| ffmpeg-static binary | Voice transcoding (TGRAM-03, TGRAM-04) | ✓ | 5.3.0 (binary at node_modules/ffmpeg-static/ffmpeg) | — |
|
||||
| grammy | Telegram protocol | ✗ (not yet installed) | — | Install: `pnpm add grammy` in server/ |
|
||||
| whisper / whisper-cpp | Voice transcription (TGRAM-03) | ✗ (not in PATH) | — | Runtime error with user-friendly message if not installed |
|
||||
| piper | TTS voice reply (TGRAM-04) | ✗ (not in PATH) | — | Skip voice reply; text-only reply is acceptable fallback |
|
||||
| Telegram Bot token | All TGRAM-* requirements | ✗ (not configured yet) | — | ONBRD-03 provides the setup flow |
|
||||
|
||||
**Missing dependencies with no fallback:**
|
||||
- `grammy` package — must be installed (`pnpm add grammy` in server/) before any code can run
|
||||
|
||||
**Missing dependencies with fallback:**
|
||||
- `whisper`/`whisper-cpp` — voice transcription unavailable; bot replies "Voice transcription not available on this server."
|
||||
- `piper` — TTS reply unavailable; bot sends text-only reply (TGRAM-04 feature gracefully degrades)
|
||||
|
||||
---
|
||||
|
||||
## Validation Architecture
|
||||
|
||||
### Test Framework
|
||||
| Property | Value |
|
||||
|----------|-------|
|
||||
| Framework | vitest (workspace config at `/opt/nexus/vitest.config.ts`) |
|
||||
| Config file | `server/vitest.config.ts` — `environment: "node"` |
|
||||
| Quick run command | `pnpm --filter @paperclipai/server test run src/__tests__/38-telegram*.test.ts` |
|
||||
| Full suite command | `pnpm test:run` |
|
||||
|
||||
### Phase Requirements → Test Map
|
||||
| Req ID | Behavior | Test Type | Automated Command | File Exists? |
|
||||
|--------|----------|-----------|-------------------|-------------|
|
||||
| TGRAM-01 | Text message relay: user message persisted, LLM stream collected, reply sent | unit | `pnpm --filter @paperclipai/server test run src/__tests__/38-telegram-text.test.ts` | ❌ Wave 0 |
|
||||
| TGRAM-02 | Agent prefix: reply starts with `[AgentName]:` | unit (co-located with TGRAM-01) | same file | ❌ Wave 0 |
|
||||
| TGRAM-03 | Voice transcription: OGG buffer → transcribe called with `"ogg"` format | unit | `pnpm --filter @paperclipai/server test run src/__tests__/38-telegram-voice.test.ts` | ❌ Wave 0 |
|
||||
| TGRAM-04 | TTS reply: synthesize called, OGG transcoding produces buffer, replyWithVoice called | unit (co-located with TGRAM-03) | same file | ❌ Wave 0 |
|
||||
| TGRAM-05 | Long polling: `bot.start()` called (not `bot.setWebhook`) | unit | same as TGRAM-01 file | ❌ Wave 0 |
|
||||
| TGRAM-06 | Line count: telegram.ts under 500 lines | static check | `wc -l server/src/services/telegram.ts` in CI | manual |
|
||||
| ONBRD-03 | Token validation endpoint: invalid token → 400; valid token → saves to nexus-settings | unit | `pnpm --filter @paperclipai/server test run src/__tests__/38-telegram-routes.test.ts` | ❌ Wave 0 |
|
||||
|
||||
**Test approach for grammY:** Mock `grammy` module with `vi.mock("grammy", ...)`. The `Bot` constructor should return a mock with spies on `on()`, `start()`, `stop()`, `api.getMe()`, `api.deleteWebhook()`. Follow the exact pattern in `36-voice-pipeline.test.ts` (mock node:child_process with EventEmitter mocks).
|
||||
|
||||
### Sampling Rate
|
||||
- **Per task commit:** `pnpm --filter @paperclipai/server test run src/__tests__/38-telegram*.test.ts`
|
||||
- **Per wave merge:** `pnpm test:run`
|
||||
- **Phase gate:** Full suite green before `/gsd:verify-work`
|
||||
|
||||
### Wave 0 Gaps
|
||||
- [ ] `server/src/__tests__/38-telegram-text.test.ts` — covers TGRAM-01, TGRAM-02, TGRAM-05
|
||||
- [ ] `server/src/__tests__/38-telegram-voice.test.ts` — covers TGRAM-03, TGRAM-04
|
||||
- [ ] `server/src/__tests__/38-telegram-routes.test.ts` — covers ONBRD-03 token validation endpoint
|
||||
|
||||
---
|
||||
|
||||
## Sources
|
||||
|
||||
### Primary (HIGH confidence)
|
||||
- [grammY getting started guide](https://grammy.dev/guide/getting-started) — Bot constructor, bot.start(), message handlers
|
||||
- [grammY file handling guide](https://grammy.dev/guide/files) — ctx.getFile(), InputFile(buffer), download URL pattern
|
||||
- [grammY deployment types guide](https://grammy.dev/guide/deployment-types) — long polling vs webhook; sequential processing confirmation
|
||||
- [grammY bot.start() reference](https://grammy.dev/ref/core/bot#start) — PollingOptions, bot.stop(), never-resolving Promise behavior
|
||||
- Direct codebase inspection: `server/src/services/voice-pipeline.ts` — ffmpegBin, spawn pattern, transcodeToWav16k
|
||||
- Direct codebase inspection: `server/src/services/nexus-settings.ts` — telegramToken in schema
|
||||
- Direct codebase inspection: `server/src/routes/chat.ts` — stream collection pattern, agentId handling
|
||||
- Direct codebase inspection: `ui/src/components/NexusOnboardingWizard.tsx` — step flow, VoiceStep insertion pattern
|
||||
- npm registry: `npm view grammy version` → `1.42.0` (verified 2026-04-03)
|
||||
|
||||
### Secondary (MEDIUM confidence)
|
||||
- [Telegram Bot API sendVoice](https://core.telegram.org/bots/api#sendvoice) — OGG Opus format, 48kHz requirement
|
||||
- Project research SUMMARY.md — grammY session management gap flagged, OGG → WAV transcode pattern
|
||||
- `.planning/STATE.md` — grammY session decision (in-memory Map), 500-line constraint, long polling decision
|
||||
|
||||
### Tertiary (LOW confidence — inferred)
|
||||
- Piper `en_US-lessac-medium` sample rate = 22050Hz — inferred from common Piper model metadata; verify at implementation time from `.onnx.json`
|
||||
- grammY sequential update processing detail — confirmed via deployment guide but exact timeout behavior not benchmarked
|
||||
|
||||
---
|
||||
|
||||
## Metadata
|
||||
|
||||
**Confidence breakdown:**
|
||||
- Standard stack: HIGH — grammy 1.42.0 verified on npm registry 2026-04-03; ffmpeg-static already installed
|
||||
- Architecture: HIGH — based on direct codebase inspection of all integration points; factory function pattern matches all existing services
|
||||
- Pitfalls: HIGH — sourced from SUMMARY.md pre-research + Telegram Bot API docs; all 6 pitfalls are specific and actionable
|
||||
- Test approach: HIGH — vitest pattern matches 36-voice-pipeline.test.ts exactly; grammY mock strategy follows existing child_process mock pattern
|
||||
|
||||
**Research date:** 2026-04-03
|
||||
**Valid until:** 2026-05-03 (grammy releases frequently; re-verify version before install if > 30 days elapsed)
|
||||
Loading…
Add table
Reference in a new issue