What You’ll Connect
After this guide, your Notion workspace will have AI capabilities powered by your own GPU server — no API costs, no rate limits. A middleware service reads Notion pages via the Notion API, sends content to your vLLM or Ollama endpoint for processing, and writes AI-generated summaries, translations, or analyses back into Notion databases.
This integration is ideal for teams that store documentation, project notes, or knowledge bases in Notion and want to apply AI across that content — without sending proprietary information to external AI providers. Everything runs on dedicated GPU infrastructure you control.
Notion API –> Middleware Service –> GPU Server (vLLM) | (Python/Node.js) | Pages, databases Fetches page blocks LLM inference on tagged for AI Builds prompts dedicated GPU processing Posts results back | | | | Updated pages <-- Notion API <-- Middleware writes <-- Completion returned with AI content PATCH blocks AI output to DB -->Prerequisites
- A GigaGPU server with a running model behind an OpenAI-compatible API (self-host guide)
- A Notion workspace where you can create integrations (Settings & Members > Connections)
- A Notion internal integration token with read and update capabilities
- Python 3.10+ or Node.js 18+ for the middleware service
- HTTPS endpoint for your GPU server (Nginx proxy guide)
Integration Steps
Create a Notion internal integration at notion.so/my-integrations. Grant it Read content, Update content, and Insert content capabilities. Copy the integration token. Then share the specific Notion databases or pages you want the AI to access with your integration.
Build a middleware script that queries the Notion API for pages matching certain criteria — for example, pages in a database with a “Needs Summary” status. The script extracts the page’s text content from Notion’s block structure, sends it to your GPU inference API, and writes the AI output back as a new block or database property.
Schedule the middleware to run on a cron job or trigger it via a webhook when Notion database entries change. This creates a hands-off workflow where tagging a page in Notion automatically processes it through your private LLM.
Code Example
This Python script fetches Notion pages, sends them to your GPU-hosted model, and writes summaries back using the OpenAI-compatible API:
import os
from notion_client import Client as NotionClient
from openai import OpenAI
notion = NotionClient(auth=os.environ["NOTION_TOKEN"])
llm = OpenAI(
base_url="https://your-gpu-server.gigagpu.com/v1",
api_key=os.environ["GPU_API_KEY"],
)
DATABASE_ID = "your-notion-database-id"
def get_pending_pages():
results = notion.databases.query(
database_id=DATABASE_ID,
filter={"property": "AI Status", "select": {"equals": "Pending"}}
)
return results["results"]
def extract_text(page_id):
blocks = notion.blocks.children.list(block_id=page_id)["results"]
texts = []
for block in blocks:
btype = block["type"]
if btype in block and "rich_text" in block[btype]:
texts.append("".join(t["plain_text"] for t in block[btype]["rich_text"]))
return "\n".join(texts)
def summarise_and_update(page_id, content):
completion = llm.chat.completions.create(
model="meta-llama/Llama-3-70b-chat-hf",
messages=[
{"role": "system", "content": "Summarise the following document concisely."},
{"role": "user", "content": content}
],
max_tokens=500,
)
summary = completion.choices[0].message.content
notion.pages.update(page_id=page_id, properties={
"AI Summary": {"rich_text": [{"text": {"content": summary[:2000]}}]},
"AI Status": {"select": {"name": "Complete"}}
})
for page in get_pending_pages():
content = extract_text(page["id"])
if content:
summarise_and_update(page["id"], content)
Testing Your Integration
Create a test page in your Notion database and set its “AI Status” property to “Pending.” Run the script manually and verify that the “AI Summary” field populates with a relevant summary and the status flips to “Complete.” Check your GPU server logs to confirm the inference request was processed.
Test with pages of varying lengths — short notes and long documents — to verify the middleware handles token limits gracefully. Truncate input text if it exceeds your model’s context window.
Production Tips
Notion’s API has rate limits (approximately three requests per second). When processing a large backlog of pages, add delays between API calls or use exponential backoff on 429 responses. Batch your GPU inference calls separately from Notion API calls to decouple the two rate limits.
For real-time processing, use Notion’s webhook capabilities (or poll the database on a short interval) to detect new entries immediately. Pair this with a task queue so your middleware can process pages asynchronously without blocking.
Secure your pipeline with API key authentication between the middleware and your GPU inference endpoint. For teams managing knowledge bases with open-source models, this integration keeps all data on your own infrastructure. Browse more tutorials or get started with GigaGPU to power AI across your Notion workspace.