RTX 3050 - Order Now
Home / Blog / Tutorials / Connect Intercom to Self-Hosted AI on GPU
Tutorials

Connect Intercom to Self-Hosted AI on GPU

Add a private AI layer to Intercom using your own GPU server. This tutorial covers Intercom's webhook system, a middleware for routing conversations to your self-hosted LLM, and configuring AI-assisted replies that help agents respond faster.

What You’ll Connect

After this guide, your Intercom workspace will have AI-powered conversation assistance driven by your own GPU server — no API costs, no rate limits. When a customer sends a message, your middleware analyses it using a large language model on dedicated GPU hardware, then posts a suggested reply as an internal note for the agent or sends an automated response directly.

The integration uses Intercom webhooks to stream conversation events to your middleware, which calls your vLLM or Ollama endpoint for inference. Support teams get AI-drafted answers without any customer conversation data leaving your private infrastructure.

Webhook Event –> Middleware (Node.js) –> GPU Server (vLLM) (customer msg) conversation. parses message, LLM inference on user.created builds prompt dedicated GPU | | Agent sees <-- Intercom API <-- Middleware posts <-- AI draft response AI-drafted admin.newNote note or reply returned suggestion or reply -->

Prerequisites

  • A GigaGPU server with an LLM behind an OpenAI-compatible API (self-host guide)
  • An Intercom workspace with API access (Developer Hub app)
  • A middleware server running Node.js 18+ or Python 3.10+
  • HTTPS endpoints for the middleware and GPU server (Nginx proxy guide)
  • Intercom access token with conversation read/write permissions

Integration Steps

Create an app in the Intercom Developer Hub. Under Webhooks, subscribe to the conversation.user.created and conversation.user.replied topics. Set the webhook URL to your middleware endpoint (e.g., https://middleware.yourdomain.com/intercom/webhook).

Your middleware receives each webhook payload containing the conversation ID and latest message text. It sends the customer’s message to your GPU inference API along with a system prompt tailored to your support context. The AI generates a draft reply.

The middleware then posts the draft back to Intercom using the Conversations API — either as an internal note (visible only to agents) or as an automated reply (sent directly to the customer). For most teams, internal notes are safer initially, letting agents review before responding.

Code Example

This Node.js middleware handles Intercom webhooks and calls your FastAPI inference server:

import express from 'express';
import OpenAI from 'openai';
import axios from 'axios';

const app = express();
app.use(express.json());

const llm = new OpenAI({
  baseURL: 'https://your-gpu-server.gigagpu.com/v1',
  apiKey: process.env.GPU_API_KEY,
});

const INTERCOM_TOKEN = process.env.INTERCOM_TOKEN;

app.post('/intercom/webhook', async (req, res) => {
  const { topic, data } = req.body;
  if (!['conversation.user.created', 'conversation.user.replied'].includes(topic)) {
    return res.sendStatus(200);
  }

  const conversationId = data.item.id;
  const userMessage = data.item.conversation_parts?.conversation_parts?.[0]?.body
    || data.item.source?.body || '';

  const plainText = userMessage.replace(/<[^>]*>/g, '').trim();

  const completion = await llm.chat.completions.create({
    model: 'meta-llama/Llama-3-70b-chat-hf',
    messages: [
      { role: 'system', content: 'You are a customer support agent. Draft a helpful, professional reply.' },
      { role: 'user', content: plainText }
    ],
    max_tokens: 400,
  });

  const draft = completion.choices[0].message.content;

  await axios.post(
    `https://api.intercom.io/conversations/${conversationId}/reply`,
    { message_type: 'note', type: 'admin', admin_id: process.env.ADMIN_ID, body: `AI Draft: ${draft}` },
    { headers: { Authorization: `Bearer ${INTERCOM_TOKEN}`, 'Content-Type': 'application/json' } }
  );

  res.sendStatus(200);
});

app.listen(3000);

Testing Your Integration

Send a test message through your Intercom Messenger as a visitor or test user. Your middleware should receive the webhook, call the GPU server, and post an internal note on the conversation within a few seconds. Open the conversation in the Intercom inbox to verify the AI draft appears as a note.

Test with different types of inquiries — billing questions, technical issues, feature requests — to validate that the system prompt produces appropriate responses. Check webhook delivery logs in Intercom’s Developer Hub for any failed deliveries.

Production Tips

Intercom conversations often span multiple messages. For better AI responses, fetch the full conversation history via the API and include the last 3-5 exchanges in the prompt. This gives the model context for follow-up questions rather than treating each message in isolation.

Use Intercom’s conversation tags to mark which conversations should trigger AI assistance. Not every chat needs AI — simple handoffs or spam conversations can be excluded by checking tags before calling the GPU endpoint.

For teams using Intercom as their primary customer communication platform, self-hosted AI on open-source models keeps all conversation data private. Authenticate every request with our secure API guide. Browse more tutorials or get started with GigaGPU to add AI to your Intercom workflow.

Need a Dedicated GPU Server?

Deploy from RTX 3050 to RTX 5090. Full root access, NVMe storage, 1Gbps — UK datacenter.

Browse GPU Servers

admin

We benchmark, deploy, and optimise GPU infrastructure for AI workloads. All data in our guides comes from real-world testing on our UK-based dedicated GPU servers.

Ready to deploy your AI workload?

Dedicated GPU servers from our UK datacenter. NVMe storage, 1Gbps networking, full root access.

Browse GPU Servers Contact Sales

Have a question? Need help?