What You’ll Connect
After this guide, your Microsoft Teams environment will have an AI chatbot powered entirely by your own GPU server — no API costs, no rate limits. Team members will message the bot directly or mention it in channels, and responses come from a large language model running on dedicated GPU infrastructure you control.
The integration uses the Microsoft Bot Framework to receive messages from Teams, routes them through a lightweight middleware service, and forwards prompts to your vLLM or Ollama inference endpoint. This gives enterprise teams a private AI assistant embedded in their daily workflow without exposing sensitive conversations to external providers.
Azure Bot Service –> Bot Framework Middleware –> GPU Server (vLLM) | | | User @mentions TeamsActivityHandler LLM inference or DMs bot processes turn context on dedicated GPU | | | Bot replies <-- Teams Connector <-- Middleware returns <-- Model completion -->Prerequisites
- A GigaGPU dedicated GPU server with an LLM exposed via an OpenAI-compatible API (setup guide)
- An Azure account with permission to register applications in Entra ID (formerly Azure AD)
- Node.js 18+ or Python 3.10+ on the machine hosting your bot middleware
- HTTPS endpoint for your middleware — use Nginx as a reverse proxy with a valid TLS certificate
- Microsoft Teams admin access to sideload or publish custom apps
Integration Steps
Begin in the Azure Portal. Navigate to App registrations and create a new registration. Select “Accounts in this organizational directory only” for tenant scope. Note the Application (client) ID and generate a client secret — these authenticate your bot with Azure Bot Service.
Next, create an Azure Bot resource. Link it to your app registration and set the messaging endpoint to your middleware URL (e.g., https://bot.yourdomain.com/api/messages). Enable the Microsoft Teams channel in the bot resource’s Channels blade.
Build the middleware using the Bot Framework SDK. The bot listens for incoming activities, extracts the user’s text, calls your self-hosted AI endpoint, and returns the completion as a reply. Package the bot as a Teams app using a manifest that references your Azure app ID, then sideload it into your tenant.
Code Example
This Node.js bot uses the Bot Framework SDK and calls your GPU-hosted model via the FastAPI inference server:
const { TeamsActivityHandler, TurnContext } = require('botbuilder');
const OpenAI = require('openai');
const llm = new OpenAI({
baseURL: 'https://your-gpu-server.gigagpu.com/v1',
apiKey: process.env.GPU_API_KEY,
});
class AIBot extends TeamsActivityHandler {
async onMessage(context, next) {
const userText = TurnContext.removeRecipientMention(context.activity);
const prompt = userText?.text?.trim() || '';
const completion = await llm.chat.completions.create({
model: 'meta-llama/Llama-3-70b-chat-hf',
messages: [
{ role: 'system', content: 'You are a corporate AI assistant.' },
{ role: 'user', content: prompt }
],
max_tokens: 1024,
});
await context.sendActivity(completion.choices[0].message.content);
await next();
}
}
module.exports.AIBot = AIBot;
Testing Your Integration
Open Teams and send a direct message to your bot: “Summarise our Q3 priorities.” The reply should appear within seconds. Test in both one-to-one chats and channel @mentions to confirm the bot handles both activity types.
Use the Bot Framework Emulator for local debugging before deploying to Azure. Verify that your GPU server logs show incoming inference requests and that responses return valid completions. Check the Azure Bot Service metrics blade for message delivery success rates.
Production Tips
Teams enforces a response timeout — if your model is large and inference takes more than 15 seconds, send a “typing” indicator immediately and follow up with the completed answer. This prevents Teams from flagging the bot as unresponsive.
For multi-turn conversations, maintain a session store keyed by Teams conversation ID. Include the last few exchanges as context in each LLM call, but cap total tokens to stay within your model’s window. Redis works well for ephemeral conversation state.
Lock down access using your secure API authentication layer between the middleware and GPU server. Apply Azure Conditional Access policies so only authorised tenant users can interact with the bot. For organisations evaluating AI chatbot hosting on open-source models, a dedicated GPU removes per-query cost uncertainty entirely. Explore more tutorials or provision your GigaGPU server to bring AI into Teams today.