Nexus Dev 46bad4cc60 docs(38): research Telegram bridge phase

2026-04-04 03:02:48 +00:00

32 KiB

Raw Blame History

Phase 38: Telegram Bridge - Research

Researched: 2026-04-03 Domain: Telegram bot integration (grammY), voice note relay (OGG/ffmpeg/Whisper), onboarding wizard step Confidence: HIGH

<user_constraints>

User Constraints (from CONTEXT.md)

Locked Decisions

grammy ^1.41.1 — TypeScript-native Telegram bot framework, long polling, clean file handling
Long polling via bot.start() — no public HTTPS required for Mac Mini behind NAT
Single bot for all agents — messages prefixed with [AgentName]
Telegram voice messages are OGG/Opus — download via ctx.getFile(), transcode to WAV 16kHz via ffmpeg before Whisper
TTS reply: synthesize via VoicePipelineService, convert WAV → OGG/Opus via ffmpeg, send via ctx.replyWithVoice()
Telegram token stored in nexus-settings.json (already in schema from Phase 36)
Bridge calls chatService and voicePipelineService directly (same-process, no HTTP round-trip)
Acknowledge updates immediately, process async to prevent Telegram resending
chatId → sessionId mapping: lightweight in-memory Map (single-user deployment)
Bridge service must be under 500 lines (TGRAM-06)
Onboarding BotFather setup: wizard step with guided token entry and validation

Claude's Discretion

All implementation choices are at Claude's discretion — discuss phase was skipped per user setting.

Deferred Ideas (OUT OF SCOPE)

None — discuss phase skipped. Per REQUIREMENTS.md out-of-scope:

Deep Telegram ↔ web chat session sync (requires Postgres event bus)
Telegram inline keyboards/threaded replies
Per-agent Telegram bots
GSD formatting in Telegram
Transcription editing before sending </user_constraints>

<phase_requirements>

Phase Requirements

ID	Description	Research Support
TGRAM-01	Single Telegram bot relays text messages bidirectionally between user and agents	grammY `bot.on("message:text")` + `puterProxyService.chatStream` collector + `chatService.addMessage`
TGRAM-02	Agent replies in Telegram are prefixed with agent identity (e.g. `[PM]`, `[Engineer]`)	Resolve agent name from `agentService.list(companyId)` before each reply; prepend `[AgentName]:`
TGRAM-03	Telegram voice messages are transcribed (OGG → Whisper) and forwarded to agent as text	`ctx.getFile()` → fetch buffer → `voicePipelineService.transcodeToWav16k(buf, "ogg")` → `transcribe`
TGRAM-04	Agent responses can be sent back as Telegram voice notes (TTS → OGG)	`voicePipelineService.synthesize(text)` (returns raw PCM) → ffmpeg WAV→OGG/Opus → `ctx.replyWithVoice(new InputFile(buffer))`
TGRAM-05	Telegram bridge uses long polling (no public HTTPS required)	`bot.start()` — confirmed correct for NAT/local deployments
TGRAM-06	Telegram bridge is under 500 lines of code	Service pattern + thin relay architecture enforces this
ONBRD-03	Guided BotFather setup flow for Telegram bot token during onboarding	New `TelegramStep` component added as step 5 in `NexusOnboardingWizard.tsx`; saves via `PATCH /api/nexus/settings`
</phase_requirements>

Summary

Phase 38 builds a thin Telegram relay bridge that connects a user's phone to Nexus agents already running in the same process. The architecture is pure consumer — grammY handles Telegram protocol, voicePipelineService (shipped in Phase 36) handles all audio conversion, and chatService + puterProxyService handle message persistence and LLM generation. No new services are invented: telegram.ts is a factory function that wires existing services together.

The critical design constraint is under-500-lines (TGRAM-06). This is achievable because the bridge does no LLM work, no audio DSP, and no session management beyond an in-memory Map. The full pipeline for a text message is: receive → persist user message → collect LLM stream → prefix with agent name → send reply. For a voice message: receive → download OGG → transcode to WAV → transcribe → relay as text → same text pipeline.

The onboarding step (ONBRD-03) adds a new wizard step TelegramStep inserted as step 5 in NexusOnboardingWizard.tsx (before the existing root directory step 5, pushing it to step 6). The step guides the user through BotFather token creation, validates the token with a live API call (bot.api.getMe()), and saves it via the existing PATCH /api/nexus/settings endpoint.

Primary recommendation: Build in two plans — (1) telegram.ts service + app.ts wiring for text relay (TGRAM-01, TGRAM-02, TGRAM-05, TGRAM-06), (2) voice relay extension + TGRAM-03/TGRAM-04, then (3) onboarding wizard step (ONBRD-03).

Standard Stack

Core

Library	Version	Purpose	Why Standard
grammy	^1.42.0	Telegram Bot API framework	TypeScript-native, long polling, `ctx.getFile()`, `InputFile` buffer upload, Bot API 9.6; 1.4M weekly downloads; verified current 2026-04-03
ffmpeg-static	^5.3.0	FFmpeg binary (already installed)	Ships FFmpeg 6.1.1 macOS arm64 binary; already in `server/package.json`; used by `voicePipelineService`

Supporting

Library	Version	Purpose	When to Use
node:child_process spawn	built-in	WAV → OGG/Opus transcoding	Only for voice reply path (TGRAM-04); same pattern as `voicePipelineService.transcodeToWav16k`

Alternatives Considered

Instead of	Could Use	Tradeoff
grammy	Telegraf	Telegraf is older (800K weekly vs 1.4M), less TypeScript-native, grammY has cleaner file API
in-memory Map	grammY session storage	grammY session plugin adds `@grammyjs/storage-*` deps; Map is correct for single-user deployment
in-memory Map	grammY conversations plugin	Conversation plugin is stateful multi-turn; not needed for thin relay pattern

Installation:

cd server && pnpm add grammy

Version verification: npm view grammy version → 1.42.0 (verified 2026-04-03)

Architecture Patterns

Recommended File Structure

server/src/services/
├── telegram.ts          # NEW: grammY bot lifecycle + relay logic (< 500 lines)
server/src/routes/
├── telegram.ts          # NEW: POST /api/telegram/token, GET /api/telegram/status
ui/src/components/onboarding/
├── TelegramStep.tsx     # NEW: BotFather guided step with token validation
ui/src/components/
├── NexusOnboardingWizard.tsx  # MODIFY: insert TelegramStep as step 5, shift root-dir to step 6, summary to step 7

Pattern 1: Bot Lifecycle (Factory Function)

What: telegramService(db) returns { start, stop, isRunning }. Called from app.ts after settings are loaded. Uses existing factory-function service pattern. When to use: Always — matches all other services in server/src/services/.

// Source: grammy.dev/guide/getting-started (verified 2026-04-03)
import { Bot, InputFile } from "grammy";

export function telegramService(db: Db) {
  let bot: Bot | null = null;

  async function start(token: string) {
    bot = new Bot(token);
    bot.catch((err) => logger.error({ err }, "Telegram bot error"));
    registerHandlers(bot, db);
    bot.start(); // non-blocking — returns Promise that never resolves until stopped
  }

  async function stop() {
    await bot?.stop();
    bot = null;
  }

  function isRunning() { return bot !== null; }

  return { start, stop, isRunning };
}

Pattern 2: Text Message Handler

What: On message:text, persist user message, collect full LLM stream, prefix with [AgentName], send reply. Key detail: The Telegram bridge cannot use SSE streaming. Must collect all tokens from puterProxyService.chatStream into a full string before sending. This is the only place in the codebase where we consume an async generator to collect a full response.

// Source: grammy.dev guide + codebase chat.ts pattern
bot.on("message:text", async (ctx) => {
  const chatId = ctx.chat.id;
  const conversationId = await getOrCreateConversation(chatId, db);

  // Acknowledge immediately — Telegram resends if no response within ~15s
  await ctx.react("👍").catch(() => {}); // optional status feedback

  try {
    const userText = ctx.message.text;
    await chatSvc.addMessage(conversationId, { role: "user", content: userText });

    const { agentName, agentId } = await resolveAgent(db, chatId);
    const messages = await buildMessagesArray(conversationId, userText, chatSvc);

    let fullResponse = "";
    const stream = puterProxy.chatStream(companyId, agentId, messages, undefined, undefined);
    for await (const token of stream) {
      fullResponse += token;
    }
    const reply = `[${agentName}]: ${fullResponse.trim()}`;

    await chatSvc.addMessage(conversationId, {
      role: "assistant",
      content: fullResponse.trim(),
      agentId,
    });
    await ctx.reply(reply, { parse_mode: "Markdown" });
  } catch (err) {
    await ctx.reply("Sorry, something went wrong.").catch(() => {});
  }
});

Pattern 3: Voice Message Handler (Async — Acknowledge First)

What: On message:voice, immediately send "Transcribing..." status, then process OGG download → WAV transcode → transcribe → relay as text message. Critical: The download + transcode + Whisper pipeline takes 2–5 seconds. If the handler does not return quickly, Telegram resends the update and the bot processes the same voice message multiple times.

// Source: grammy.dev/guide/files (verified 2026-04-03)
bot.on("message:voice", async (ctx) => {
  const chatId = ctx.chat.id;
  await ctx.reply("Transcribing...").catch(() => {});

  // Do NOT await the heavy pipeline — process async
  processVoiceMessage(ctx, chatId, db).catch((err) =>
    ctx.reply("Voice transcription failed.").catch(() => {})
  );
});

async function processVoiceMessage(ctx, chatId, db) {
  const file = await ctx.getFile();
  // Construct download URL manually (files plugin not needed)
  const token = (ctx.api as any).token as string;
  const url = `https://api.telegram.org/file/bot${token}/${file.file_path}`;
  const response = await fetch(url);
  const arrayBuffer = await response.arrayBuffer();
  const oggBuffer = Buffer.from(arrayBuffer);

  const { text } = await voiceSvc.transcribe(oggBuffer, "ogg");
  // ... then relay as text (same pipeline as Pattern 2, starting from user message)
}

Pattern 4: WAV → OGG/Opus Transcoding (for TTS reply, TGRAM-04)

What: voicePipelineService.synthesize() returns raw PCM (no WAV header). For Telegram voice notes, must produce OGG/Opus. Use same ffmpeg-static spawn pattern as transcodeToWav16k, but with different args. Telegram requirement: OGG Opus, 48kHz, mono (Bot API sendVoice spec).

// Source: Telegram Bot API docs + ffmpeg pattern from voice-pipeline.ts
async function transcodeToOggOpus(rawPcmBuffer: Buffer): Promise<Buffer> {
  return new Promise<Buffer>((resolve, reject) => {
    // Input: raw PCM s16le 22050Hz (Piper default output)
    // Output: OGG Opus 48kHz for Telegram
    const ffmpeg = spawn(ffmpegBin, [
      "-f", "s16le", "-ar", "22050", "-ac", "1", "-i", "pipe:0",
      "-c:a", "libopus", "-ar", "48000", "-f", "ogg", "pipe:1"
    ], { stdio: ["pipe", "pipe", "pipe"] });

    const chunks: Buffer[] = [];
    ffmpeg.stdout.on("data", (chunk: Buffer) => chunks.push(chunk));
    ffmpeg.stderr.on("data", () => {}); // discard
    ffmpeg.on("close", (code) => {
      code === 0 ? resolve(Buffer.concat(chunks)) : reject(new Error(`ffmpeg exited ${code}`));
    });
    ffmpeg.on("error", reject);
    ffmpeg.stdin.write(rawPcmBuffer);
    ffmpeg.stdin.end();
  });
}

Note on Piper output format: Piper --output-raw produces raw s16le PCM. The sample rate depends on the voice model — en_US-lessac-medium outputs 22050Hz. Verify with piper --help or model metadata. The ffmpeg -ar 22050 input flag must match.

Pattern 5: chatId → conversationId Mapping

What: Persistent in-memory Map. Single-user deployment means no persistence across server restarts is needed. Each chatId gets one conversation per agent.

// chatId:agentId → conversationId
const sessionMap = new Map<string, string>();

async function getOrCreateConversation(chatId: number, agentId: string, db: Db): Promise<string> {
  const key = `${chatId}:${agentId}`;
  if (sessionMap.has(key)) return sessionMap.get(key)!;

  const company = await getFirstCompany(db);  // companyService(db).list()[0]
  const conv = await chatSvc.createConversation(company.id, {
    title: `Telegram:${chatId}`,
    agentId,
  });
  sessionMap.set(key, conv.id);
  return conv.id;
}

Pattern 6: app.ts Integration

What: After createApp, read settings and conditionally start telegram service. The telegramService is not mounted as an Express route — it runs as a side-effect process.

// In server/src/index.ts or app startup (after createApp resolves)
const settings = await nexusSettingsService().get();
const tg = telegramService(db);
if (settings.telegramToken) {
  await tg.start(settings.telegramToken);
}

// Also expose management endpoints via Express:
// POST /api/telegram/token — saves token and (re)starts bot
// GET /api/telegram/status — returns { running: boolean }

Pattern 7: Onboarding TelegramStep Component

What: New step inserted as step 5 in NexusOnboardingWizard.tsx (after VoiceStep at step 4). Shows instructions for creating a bot via BotFather, provides a token input field, validates the token with GET https://api.telegram.org/bot<token>/getMe (or via server endpoint), saves via PATCH /api/nexus/settings.

// ui/src/components/onboarding/TelegramStep.tsx
interface TelegramStepProps {
  onSave: (token: string) => void;
  onSkip: () => void;
}
// Validation: call POST /api/telegram/token with the token
// Server validates via bot.api.getMe() before saving
// On success: show bot username, enable Continue button

Anti-Patterns to Avoid

Awaiting the full pipeline synchronously in the voice handler: Telegram will resend the update if the handler takes >15 seconds. Always fire-and-forget heavy processing, sending an intermediate status message.
Using exec instead of spawn for ffmpeg: exec buffers stdout — for large OGG files this causes memory spikes and truncation. Always use spawn with streaming pipes.
Polling bot.start() after await: bot.start() returns a Promise that never resolves except on stop. Never await it at the top level — call it non-blocking.
Hardcoding 22050 for Piper sample rate without checking the model: Different Piper voice models have different sample rates. Read from model metadata or confirm with --output-file /dev/stdout | soxi -r at startup.
Calling ctx.getFile() and constructing the download URL with string interpolation from ctx.msg.voice.file_id: ctx.getFile() returns a File object with file_path; the download URL is https://api.telegram.org/file/bot{TOKEN}/{file_path}, NOT constructed from file_id.

Don't Hand-Roll

Problem	Don't Build	Use Instead	Why
Telegram Bot API protocol	Custom HTTP polling loop	`grammy Bot.start()`	Handles getUpdates loop, retry, graceful shutdown, error isolation
File download from Telegram	Custom URL construction + retry	`ctx.getFile()` + fetch	`ctx.getFile()` handles file_path resolution; file_path URLs are temporarily valid
OGG audio parsing	Custom OGG/Opus demuxer	ffmpeg-static (already installed)	OGG/Opus has multiple container/codec variants; ffmpeg handles all
Token validation	Manually calling Bot API	`new Bot(token).api.getMe()`	Single call returns bot info or throws on invalid token
Session management	Custom SQLite session store	In-memory Map	Single-user deployment; restarts are rare; Map is idiomatic for this

Key insight: The Telegram bridge is a relay, not a platform. Every non-trivial problem (audio, LLM, persistence) is already solved by existing services.

Common Pitfalls

Pitfall 1: Telegram Resending Voice Updates (Async Pipeline)

What goes wrong: Handler downloads OGG + calls ffmpeg + runs Whisper. This takes 3–8 seconds. Telegram's getUpdates loop sees no acknowledgement and resends the same update. Bot processes the same voice note 2–3 times, sending duplicate replies. Why it happens: grammY's long polling acknowledges updates when getUpdates is called with the next offset. If the handler blocks the loop (sequential processing), the offset isn't advanced until the handler completes. How to avoid: Fire off the heavy pipeline as a detached async task (do NOT await it in the handler body). Send an immediate "Transcribing..." reply so the user gets feedback. The handler returns immediately; offset advances; Telegram does not resend. Warning signs: Duplicate replies for voice messages; "Transcribing..." appearing twice.

Pitfall 2: Piper Output Sample Rate Mismatch

What goes wrong: synthesize() returns raw s16le PCM from Piper --output-raw. ffmpeg is invoked with -ar 22050 (assumed). If the voice model outputs 16000Hz or 44100Hz, the OGG will be pitched wrong or malformed. Why it happens: Piper model sample rates vary by model. en_US-lessac-medium is 22050Hz, but en_US-amy-low is 16000Hz. How to avoid: At service startup, run piper --model <voice> --output-raw < /dev/null 2>&1 | grep "sample rate" or read the .onnx.json model config to get audio.sample_rate. Use that value in the ffmpeg -ar flag. Warning signs: Voice messages play back at wrong pitch; ffmpeg exits with "Invalid data found when processing input".

Pitfall 3: Long Message Truncation in Telegram

What goes wrong: Telegram has a 4096-character limit per message. LLM responses on complex topics can exceed this. ctx.reply() throws with "Bad Request: message is too long". Why it happens: Telegram Bot API hard limit. How to avoid: Split responses at 4000 chars (buffer for prefix): const chunks = splitAt4000(reply); for (const chunk of chunks) await ctx.reply(chunk). Warning signs: Error thrown on long agent replies; bot crashes without error handler.

Pitfall 4: Webhook Conflict Blocking Long Polling

What goes wrong: bot.start() throws "Conflict: can't use getUpdates method while webhook is active; use deleteWebhook to delete the webhook first". Why it happens: If the bot token was ever configured with a webhook (e.g. during testing), the webhook remains registered and blocks long polling. How to avoid: Before bot.start(), always call await bot.api.deleteWebhook(). grammY may do this automatically in some versions — add it explicitly to be safe. Warning signs: bot.start() throws Conflict error immediately on startup.

Pitfall 5: `bot.start()` Crash Kills Express Process

What goes wrong: An unhandled error in a grammY middleware crashes the Node process. Express server goes down. Why it happens: grammY wraps handlers but unhandled promise rejections outside handlers can propagate. How to avoid: Always call bot.catch((err) => logger.error(err, "Telegram bot error")) before bot.start(). Wrap the start() call in try/catch. Warning signs: Express server exits unexpectedly; Telegram bot stops responding.

Pitfall 6: Missing Agent for Telegram Conversation

What goes wrong: Company exists but has no agents (was reset, or agents were deleted). agentService.list(companyId) returns []. Bot crashes trying to access agents[0].name. Why it happens: Edge case in single-user setup; agents can be deleted via UI. How to avoid: If agents.length === 0, reply with "No agents configured — please set up an agent in the Nexus dashboard." and return. Warning signs: NullPointerError or "Cannot read properties of undefined" on agent name access.

Code Examples

Complete telegramService skeleton (verified patterns)

// server/src/services/telegram.ts
// Source: grammy.dev guide + codebase patterns from voice-pipeline.ts, chat.ts
import { Bot, InputFile } from "grammy";
import type { Db } from "@paperclipai/db";
import { chatService } from "./chat.js";
import { agentService } from "./agents.js";
import { companyService } from "./companies.js";
import { puterProxyService } from "./puter-proxy.js";
import { voicePipelineService } from "./voice-pipeline.js";

export function telegramService(db: Db) {
  let bot: Bot | null = null;
  const sessionMap = new Map<string, string>(); // `${chatId}:${agentId}` → conversationId

  // ... handler registration, start/stop, etc.
  return { start, stop, isRunning };
}

Token validation endpoint

// server/src/routes/telegram.ts
router.post("/telegram/token", async (req, res) => {
  assertBoard(req);
  const { token } = req.body as { token?: string };
  if (!token) { res.status(400).json({ error: "token required" }); return; }

  // Validate token with Telegram
  const testBot = new Bot(token);
  const me = await testBot.api.getMe(); // throws on invalid token

  // Save to nexus-settings
  await nexusSettingsService().set({ telegramToken: token });

  // (Re)start telegram service if already initialized
  // ...

  res.json({ ok: true, botUsername: me.username });
});

TelegramStep onboarding component structure

// ui/src/components/onboarding/TelegramStep.tsx
export function TelegramStep({ onSave, onSkip }: TelegramStepProps) {
  const [token, setToken] = useState("");
  const [validating, setValidating] = useState(false);
  const [botUsername, setBotUsername] = useState<string | null>(null);
  const [error, setError] = useState<string | null>(null);

  async function handleValidate() {
    setValidating(true);
    setError(null);
    try {
      const res = await fetch("/api/telegram/token", {
        method: "POST",
        headers: { "Content-Type": "application/json" },
        body: JSON.stringify({ token }),
      });
      const data = await res.json();
      if (!res.ok) throw new Error(data.error);
      setBotUsername(data.botUsername);
    } catch (e) {
      setError(e instanceof Error ? e.message : "Invalid token");
    } finally {
      setValidating(false);
    }
  }
  // ... render: BotFather instructions + input + validate button + success state
}

Codebase Integration Map

Files to Create

File	Purpose
`server/src/services/telegram.ts`	grammY bot lifecycle + all relay handlers
`server/src/routes/telegram.ts`	`POST /api/telegram/token`, `GET /api/telegram/status`
`ui/src/components/onboarding/TelegramStep.tsx`	BotFather guided token entry step

Files to Modify

File	Change
`server/src/app.ts`	Import and mount `telegramRoutes()`; export `telegramService` reference
`server/src/index.ts` (or startup entry)	Read `telegramToken` from settings on startup; call `tg.start(token)` if present
`ui/src/components/NexusOnboardingWizard.tsx`	Insert `TelegramStep` as step 5; shift current step 5 (root dir) → step 6; shift step 6 (summary) → step 7; update step counter label

Key Service Dependencies (all already exist)

Service	Method Used	From
`chatService(db)`	`createConversation`, `addMessage`, `listMessages`	`server/src/services/chat.ts`
`agentService(db)`	`list(companyId)`	`server/src/services/agents.ts`
`companyService(db)`	`list()`	`server/src/services/companies.ts`
`puterProxyService(db)`	`chatStream(companyId, agentId, messages)`	`server/src/services/puter-proxy.ts`
`voicePipelineService()`	`transcribe(buf, "ogg")`, `synthesize(text)`, `transcodeToWav16k`	`server/src/services/voice-pipeline.ts`
`nexusSettingsService()`	`get()`, `set({ telegramToken })`	`server/src/services/nexus-settings.ts`

State of the Art

Old Approach	Current Approach	When Changed	Impact
Telegraf (older grammY predecessor)	grammY 1.42	grammY introduced ~2021, now dominant	TypeScript-first, cleaner `ctx.getFile()`, Bot API 9.6 support
Webhooks for all deployments	Long polling for local/NAT	N/A — always deployment-specific	Mac Mini behind NAT → long polling is the only workable option
Custom ffmpeg binary download	ffmpeg-static package	~2019	Already installed; no PATH issues in service environment

Deprecated/outdated:

Telegraf: still maintained but grammY has overtaken it; ecosystem (plugins, docs) is grammY-centric as of 2025
fluent-ffmpeg: archived May 2025 — use child_process.spawn with ffmpeg-static path directly

Open Questions

Piper output sample rate for voice model
- What we know: en_US-lessac-medium likely outputs 22050Hz (common for medium models)
- What's unclear: Exact sample rate not verified from installed model metadata
- Recommendation: At service startup, read /path/to/en_US-lessac-medium.onnx.json and extract audio.sample_rate. Use that value for ffmpeg WAV→OGG transcode.
puterProxyService — handling missing token for Telegram
- What we know: puterProxyService.chatStream throws unprocessable("Puter auth token not configured") if no token
- What's unclear: Should the bot reply "AI not configured" or silently fall back to chatService.streamEcho?
- Recommendation: Reply with a user-friendly message: "AI provider not configured. Please connect a provider in the Nexus dashboard." This matches the existing UI behavior.
grammY concurrent update processing
- What we know: bot.start() processes updates sequentially by default ("processes all updates sequentially" per docs)
- What's unclear: If two voice messages arrive simultaneously, does the second queue or get dropped?
- Recommendation: Sequential processing is acceptable for single-user deployment. If concurrent voice processing is needed, grammY's runner plugin adds parallelism — defer to future phase.

Environment Availability

Dependency	Required By	Available	Version	Fallback
ffmpeg-static binary	Voice transcoding (TGRAM-03, TGRAM-04)	✓	5.3.0 (binary at node_modules/ffmpeg-static/ffmpeg)	—
grammy	Telegram protocol	✗ (not yet installed)	—	Install: `pnpm add grammy` in server/
whisper / whisper-cpp	Voice transcription (TGRAM-03)	✗ (not in PATH)	—	Runtime error with user-friendly message if not installed
piper	TTS voice reply (TGRAM-04)	✗ (not in PATH)	—	Skip voice reply; text-only reply is acceptable fallback
Telegram Bot token	All TGRAM-* requirements	✗ (not configured yet)	—	ONBRD-03 provides the setup flow

Missing dependencies with no fallback:

grammy package — must be installed (pnpm add grammy in server/) before any code can run

Missing dependencies with fallback:

whisper/whisper-cpp — voice transcription unavailable; bot replies "Voice transcription not available on this server."
piper — TTS reply unavailable; bot sends text-only reply (TGRAM-04 feature gracefully degrades)

Validation Architecture

Test Framework

Property	Value
Framework	vitest (workspace config at `/opt/nexus/vitest.config.ts`)
Config file	`server/vitest.config.ts` — `environment: "node"`
Quick run command	`pnpm --filter @paperclipai/server test run src/__tests__/38-telegram*.test.ts`
Full suite command	`pnpm test:run`

Phase Requirements → Test Map

Req ID	Behavior	Test Type	Automated Command	File Exists?
TGRAM-01	Text message relay: user message persisted, LLM stream collected, reply sent	unit	`pnpm --filter @paperclipai/server test run src/__tests__/38-telegram-text.test.ts`	❌ Wave 0
TGRAM-02	Agent prefix: reply starts with `[AgentName]:`	unit (co-located with TGRAM-01)	same file	❌ Wave 0
TGRAM-03	Voice transcription: OGG buffer → transcribe called with `"ogg"` format	unit	`pnpm --filter @paperclipai/server test run src/__tests__/38-telegram-voice.test.ts`	❌ Wave 0
TGRAM-04	TTS reply: synthesize called, OGG transcoding produces buffer, replyWithVoice called	unit (co-located with TGRAM-03)	same file	❌ Wave 0
TGRAM-05	Long polling: `bot.start()` called (not `bot.setWebhook`)	unit	same as TGRAM-01 file	❌ Wave 0
TGRAM-06	Line count: telegram.ts under 500 lines	static check	`wc -l server/src/services/telegram.ts` in CI	manual
ONBRD-03	Token validation endpoint: invalid token → 400; valid token → saves to nexus-settings	unit	`pnpm --filter @paperclipai/server test run src/__tests__/38-telegram-routes.test.ts`	❌ Wave 0

Test approach for grammY: Mock grammy module with vi.mock("grammy", ...). The Bot constructor should return a mock with spies on on(), start(), stop(), api.getMe(), api.deleteWebhook(). Follow the exact pattern in 36-voice-pipeline.test.ts (mock node:child_process with EventEmitter mocks).

Sampling Rate

Per task commit: pnpm --filter @paperclipai/server test run src/__tests__/38-telegram*.test.ts
Per wave merge: pnpm test:run
Phase gate: Full suite green before /gsd:verify-work

Wave 0 Gaps

server/src/__tests__/38-telegram-text.test.ts — covers TGRAM-01, TGRAM-02, TGRAM-05
server/src/__tests__/38-telegram-voice.test.ts — covers TGRAM-03, TGRAM-04
server/src/__tests__/38-telegram-routes.test.ts — covers ONBRD-03 token validation endpoint

Sources

Primary (HIGH confidence)

grammY getting started guide — Bot constructor, bot.start(), message handlers
grammY file handling guide — ctx.getFile(), InputFile(buffer), download URL pattern
grammY deployment types guide — long polling vs webhook; sequential processing confirmation
grammY bot.start() reference — PollingOptions, bot.stop(), never-resolving Promise behavior
Direct codebase inspection: server/src/services/voice-pipeline.ts — ffmpegBin, spawn pattern, transcodeToWav16k
Direct codebase inspection: server/src/services/nexus-settings.ts — telegramToken in schema
Direct codebase inspection: server/src/routes/chat.ts — stream collection pattern, agentId handling
Direct codebase inspection: ui/src/components/NexusOnboardingWizard.tsx — step flow, VoiceStep insertion pattern
npm registry: npm view grammy version → 1.42.0 (verified 2026-04-03)

Secondary (MEDIUM confidence)

Telegram Bot API sendVoice — OGG Opus format, 48kHz requirement
Project research SUMMARY.md — grammY session management gap flagged, OGG → WAV transcode pattern
.planning/STATE.md — grammY session decision (in-memory Map), 500-line constraint, long polling decision

Tertiary (LOW confidence — inferred)

Piper en_US-lessac-medium sample rate = 22050Hz — inferred from common Piper model metadata; verify at implementation time from .onnx.json
grammY sequential update processing detail — confirmed via deployment guide but exact timeout behavior not benchmarked

Metadata

Confidence breakdown:

Standard stack: HIGH — grammy 1.42.0 verified on npm registry 2026-04-03; ffmpeg-static already installed
Architecture: HIGH — based on direct codebase inspection of all integration points; factory function pattern matches all existing services
Pitfalls: HIGH — sourced from SUMMARY.md pre-research + Telegram Bot API docs; all 6 pitfalls are specific and actionable
Test approach: HIGH — vitest pattern matches 36-voice-pipeline.test.ts exactly; grammY mock strategy follows existing child_process mock pattern

Research date: 2026-04-03 Valid until: 2026-05-03 (grammy releases frequently; re-verify version before install if > 30 days elapsed)

32 KiB Raw Blame History Unescape Escape