Home / Blog / Tutorials / Connect Telegram Bot to Self-Hosted AI

Tutorials

Connect Telegram Bot to Self-Hosted AI

Connect a Telegram bot to your GPU-hosted LLM for AI-powered conversations in Telegram. This guide covers bot creation, webhook setup, streaming replies from your self-hosted model, and handling multi-turn conversations with persistent chat history.

Tutorials April 16, 2026 3 min read gigagpu

What You’ll Connect

After this guide, your Telegram bot will answer messages using your own GPU-hosted LLM — delivering ChatGPT-like conversations inside Telegram with multi-turn memory, markdown formatting, and streaming-style progressive replies. The bot connects to your vLLM or Ollama endpoint on dedicated GPU hardware, giving your team or customers a private AI assistant accessible from any device where Telegram runs.

The integration uses the Telegram Bot API with webhooks for real-time message delivery. Your bot server receives messages, calls your OpenAI-compatible API, and sends the response back to the chat. Progressive replies use Telegram’s message editing to simulate streaming — updating the message as new tokens arrive.

Prerequisites

A GigaGPU server running a self-hosted LLM (setup guide)
A Telegram Bot Token from @BotFather
Python 3.10+ with python-telegram-bot and httpx
A public HTTPS endpoint for the webhook (your GPU server or a separate host)

Integration Steps

Create a Telegram bot via @BotFather and save the token. Set up a webhook URL pointing to your bot server — this can run on the same GPU server or a separate lightweight host. When a user sends a message, Telegram posts it to your webhook, your server calls the GPU endpoint, and sends the response back.

Implement conversation memory by storing chat history per Telegram user in Redis or a simple dictionary. Each incoming message appends to the user’s history, the full history sends to the LLM for context-aware replies, and the response appends to history. Cap history at a token limit to fit the model’s context window.

Add progressive replies for long responses: send an initial “Thinking…” message, then edit it with accumulated tokens every few hundred milliseconds. This simulates streaming in Telegram, which does not support true server-sent events in messages.

Code Example

Telegram bot with conversation memory and progressive replies from your self-hosted LLM:

from telegram import Update
from telegram.ext import Application, MessageHandler, filters
import httpx, asyncio

BOT_TOKEN = "your-telegram-bot-token"
GPU_URL = "https://your-gpu-server.gigagpu.com/v1/chat/completions"
GPU_KEY = "your-api-key"
chat_history = {}  # user_id -> list of messages

async def handle_message(update: Update, context):
    user_id = update.effective_user.id
    user_text = update.message.text

    if user_id not in chat_history:
        chat_history[user_id] = [
            {"role": "system", "content": "You are a helpful assistant."}
        ]
    chat_history[user_id].append({"role": "user", "content": user_text})

    # Send initial placeholder
    reply = await update.message.reply_text("Thinking...")
    full_response = ""

    async with httpx.AsyncClient(timeout=60) as client:
        async with client.stream("POST", GPU_URL, json={
            "model": "meta-llama/Llama-3-70b-chat-hf",
            "messages": chat_history[user_id][-20:],  # last 20 messages
            "stream": True, "max_tokens": 1024
        }, headers={"Authorization": f"Bearer {GPU_KEY}"}) as resp:
            buffer = ""
            async for line in resp.aiter_lines():
                if not line.startswith("data: ") or line == "data: [DONE]":
                    continue
                import json
                data = json.loads(line[6:])
                token = data["choices"][0]["delta"].get("content", "")
                full_response += token
                buffer += token
                # Update message every 20 characters
                if len(buffer) > 20:
                    await reply.edit_text(full_response)
                    buffer = ""

    if buffer:
        await reply.edit_text(full_response, parse_mode="Markdown")

    chat_history[user_id].append(
        {"role": "assistant", "content": full_response}
    )

app = Application.builder().token(BOT_TOKEN).build()
app.add_handler(MessageHandler(filters.TEXT, handle_message))
app.run_polling()

Testing Your Integration

Start the bot server and send a test message in Telegram. Verify the bot responds with AI-generated text and that the message updates progressively during generation. Send follow-up messages to test conversation memory — the bot should reference previous exchanges. Test the /clear command (add a handler) to reset conversation history.

Test with multiple users simultaneously to confirm chat histories are isolated per user. Test with long responses to verify Telegram’s message editing works smoothly. Check the Telegram rate limits — the bot can edit a message roughly once per second.

Production Tips

Move chat history from in-memory storage to Redis for persistence across server restarts and horizontal scaling. Add user authentication if the bot is for internal use — check the Telegram user ID against an allowlist. Implement a /model command that lets users switch between models hosted on your GPU server.

For group chats, configure the bot to respond only when mentioned (@botname) to avoid triggering on every message. Add rate limiting per user to prevent GPU abuse. Build a comprehensive AI chatbot with admin controls, usage analytics, and custom personas per group. Explore more tutorials or get started with GigaGPU to power your Telegram bot.

Need a Dedicated GPU Server?

Deploy from RTX 3050 to RTX 5090. Full root access, NVMe storage, 1Gbps — UK datacenter.

Browse GPU Servers

Tutorials

gigagpu

We benchmark, deploy, and optimise GPU infrastructure for AI workloads. All data in our guides comes from real-world testing on our UK-based dedicated GPU servers.

Ready to deploy your AI workload?

Dedicated GPU servers from our UK datacenter. NVMe storage, 1Gbps networking, full root access.

Browse GPU Servers Contact Sales

Connect Telegram Bot to Self-Hosted AI

What You’ll Connect

Prerequisites

Integration Steps

Code Example

Testing Your Integration

Production Tips

Need a Dedicated GPU Server?

gigagpu

GPU Hosting

Blog Categories

AI Model Hosting

Benchmarks & Tools

Deploy a GPU Server

Ready to deploy your AI workload?

Have a question? Need help?

Connect Telegram Bot to Self-Hosted AI

What You’ll Connect

Prerequisites

Integration Steps

Code Example

Testing Your Integration

Production Tips

Need a Dedicated GPU Server?

gigagpu

Related Articles

Migrate from Google Vertex to Dedicated GPU: Translation Pipeline Guide

Audio Format Conversion for AI: FFmpeg Guide

Connect Zapier to Self-Hosted AI API on GPU

vLLM Setup on the RTX 4090 24 GB: The Production Config

GPU Hosting

Blog Categories

AI Model Hosting

Benchmarks & Tools

Deploy a GPU Server

Ready to deploy your AI workload?

Have a question? Need help? Contact us

Have a question? Need help?