What You’ll Connect
After this guide, your Intercom workspace will have AI-powered conversation assistance driven by your own GPU server — no API costs, no rate limits. When a customer sends a message, your middleware analyses it using a large language model on dedicated GPU hardware, then posts a suggested reply as an internal note for the agent or sends an automated response directly.
The integration uses Intercom webhooks to stream conversation events to your middleware, which calls your vLLM or Ollama endpoint for inference. Support teams get AI-drafted answers without any customer conversation data leaving your private infrastructure.
Webhook Event –> Middleware (Node.js) –> GPU Server (vLLM) (customer msg) conversation. parses message, LLM inference on user.created builds prompt dedicated GPU | | Agent sees <-- Intercom API <-- Middleware posts <-- AI draft response AI-drafted admin.newNote note or reply returned suggestion or reply -->Prerequisites
- A GigaGPU server with an LLM behind an OpenAI-compatible API (self-host guide)
- An Intercom workspace with API access (Developer Hub app)
- A middleware server running Node.js 18+ or Python 3.10+
- HTTPS endpoints for the middleware and GPU server (Nginx proxy guide)
- Intercom access token with conversation read/write permissions
Integration Steps
Create an app in the Intercom Developer Hub. Under Webhooks, subscribe to the conversation.user.created and conversation.user.replied topics. Set the webhook URL to your middleware endpoint (e.g., https://middleware.yourdomain.com/intercom/webhook).
Your middleware receives each webhook payload containing the conversation ID and latest message text. It sends the customer’s message to your GPU inference API along with a system prompt tailored to your support context. The AI generates a draft reply.
The middleware then posts the draft back to Intercom using the Conversations API — either as an internal note (visible only to agents) or as an automated reply (sent directly to the customer). For most teams, internal notes are safer initially, letting agents review before responding.
Code Example
This Node.js middleware handles Intercom webhooks and calls your FastAPI inference server:
import express from 'express';
import OpenAI from 'openai';
import axios from 'axios';
const app = express();
app.use(express.json());
const llm = new OpenAI({
baseURL: 'https://your-gpu-server.gigagpu.com/v1',
apiKey: process.env.GPU_API_KEY,
});
const INTERCOM_TOKEN = process.env.INTERCOM_TOKEN;
app.post('/intercom/webhook', async (req, res) => {
const { topic, data } = req.body;
if (!['conversation.user.created', 'conversation.user.replied'].includes(topic)) {
return res.sendStatus(200);
}
const conversationId = data.item.id;
const userMessage = data.item.conversation_parts?.conversation_parts?.[0]?.body
|| data.item.source?.body || '';
const plainText = userMessage.replace(/<[^>]*>/g, '').trim();
const completion = await llm.chat.completions.create({
model: 'meta-llama/Llama-3-70b-chat-hf',
messages: [
{ role: 'system', content: 'You are a customer support agent. Draft a helpful, professional reply.' },
{ role: 'user', content: plainText }
],
max_tokens: 400,
});
const draft = completion.choices[0].message.content;
await axios.post(
`https://api.intercom.io/conversations/${conversationId}/reply`,
{ message_type: 'note', type: 'admin', admin_id: process.env.ADMIN_ID, body: `AI Draft: ${draft}` },
{ headers: { Authorization: `Bearer ${INTERCOM_TOKEN}`, 'Content-Type': 'application/json' } }
);
res.sendStatus(200);
});
app.listen(3000);
Testing Your Integration
Send a test message through your Intercom Messenger as a visitor or test user. Your middleware should receive the webhook, call the GPU server, and post an internal note on the conversation within a few seconds. Open the conversation in the Intercom inbox to verify the AI draft appears as a note.
Test with different types of inquiries — billing questions, technical issues, feature requests — to validate that the system prompt produces appropriate responses. Check webhook delivery logs in Intercom’s Developer Hub for any failed deliveries.
Production Tips
Intercom conversations often span multiple messages. For better AI responses, fetch the full conversation history via the API and include the last 3-5 exchanges in the prompt. This gives the model context for follow-up questions rather than treating each message in isolation.
Use Intercom’s conversation tags to mark which conversations should trigger AI assistance. Not every chat needs AI — simple handoffs or spam conversations can be excluded by checking tags before calling the GPU endpoint.
For teams using Intercom as their primary customer communication platform, self-hosted AI on open-source models keeps all conversation data private. Authenticate every request with our secure API guide. Browse more tutorials or get started with GigaGPU to add AI to your Intercom workflow.