Telegram has the highest ceiling for bot abuse: 30 messages/sec to different chats and 1 message/sec to the same chat. A self-hosted LLM on the RTX 5060 Ti 16GB at our UK dedicated GPU hosting keeps pace without per-token fees. The Blackwell card runs Mistral 7B FP8 at 122 t/s and Llama 3 8B FP8 at 112 t/s, which matches or beats Telegram’s delivery ceiling on one card.
Contents
- python-telegram-bot setup
- Webhook vs long-polling
- Streaming responses
- Capacity by bot size
- Image and voice commands
Bot setup
Use python-telegram-bot v20+. The handler calls your vLLM endpoint and edits the reply as tokens arrive, giving users a typing-effect response.
async def handle(update, context):
msg = await update.message.reply_text("...")
buf = ""
async for chunk in llm.chat.completions.create(
model="mistral-7b-fp8",
messages=[{"role":"user","content":update.message.text}],
stream=True,
):
buf += chunk.choices[0].delta.content or ""
if len(buf) % 60 == 0:
await msg.edit_text(buf)
await msg.edit_text(buf)
Webhook vs long-polling
| Mode | Pros | Cons |
|---|---|---|
Long-polling (getUpdates) | No public URL required, easy local dev | ~500 ms extra latency, single-process only |
| Webhook | Sub-100 ms delivery, horizontal scaling | Needs HTTPS and a valid certificate |
Run webhooks behind Nginx with a Cloudflare origin certificate. Telegram accepts any CA, so there is no Let’s Encrypt dance. For low-volume bots, long-polling is fine and simpler.
Streaming
Telegram allows roughly 1 edit/sec on the same message. Buffer tokens and edit every 60-80 characters to stay under the rate limit while keeping the typing effect. The overall perceived latency for a 300-token reply drops from 2.5 s blob-delivery to 300 ms first-word-visible.
Capacity
| Bot profile | Peak msg/min | 5060 Ti |
|---|---|---|
| Small group bot (500 users) | 10-30 | Trivial |
| Public service bot (50k users) | 100-300 | Comfortable (16 concurrent streams) |
| News channel + interactive Q&A (500k subs) | 500-1000 | Queue some; still under Telegram’s 1800/min ceiling |
Image and voice commands
/image {prompt} routes to SDXL Lightning 4-step at about 2.2 seconds for 1024×1024 on the same card, which you send back as a JPEG with reply_photo. Telegram voice messages (Opus OGG) can be transcribed with Whisper Turbo at roughly 60x real-time on the 5060 Ti, so a 30-second voice note is transcribed in under a second. See our Whisper benchmark.
Private Telegram AI bot
Streamed replies on Blackwell 16GB. UK dedicated hosting.
Order the RTX 5060 Ti 16GBSee also: chatbot backend, customer support, Whisper benchmark, FP8 Llama deployment, internal tooling.