What You’ll Connect
After this guide, your Snowflake warehouse will call your GPU-hosted AI directly from SQL queries — classifying records, generating summaries, extracting entities, and answering natural language questions about your data. Snowflake external functions connect to your vLLM endpoint on dedicated GPU hardware through an API gateway, letting analysts run AI operations on warehouse data without leaving SQL.
The integration uses Snowflake’s external function feature to call your OpenAI-compatible API via an API integration and proxy service. A Python middleware translates Snowflake’s batch function call format into individual or batched inference requests to your GPU endpoint, returning results that Snowflake inserts back into your query results.
Prerequisites
- A GigaGPU server running a self-hosted LLM (setup guide)
- A Snowflake account with ACCOUNTADMIN or CREATE INTEGRATION privileges
- An API gateway (AWS API Gateway, Azure API Management, or a simple proxy)
- Python 3.10+ with
fastapifor the middleware proxy
Integration Steps
Set up a middleware proxy that accepts Snowflake’s external function request format and forwards inference requests to your GPU endpoint. Snowflake sends batch rows as a JSON array, and your proxy maps each row to an inference call, batches them efficiently, and returns results in Snowflake’s expected response format.
Create an API integration in Snowflake that points to your proxy endpoint. Then create external functions that analysts can call in SQL queries. Each function maps to a specific AI capability — AI_CLASSIFY(text) for classification, AI_SUMMARISE(text) for summarisation, AI_EXTRACT(text, entity_type) for entity extraction. Analysts use these functions like any built-in Snowflake function.
Build a natural language query interface by creating a function that accepts a question in English and returns a SQL query. The LLM generates SQL based on your schema metadata, and a wrapper function executes the generated query. This lets non-technical users query Snowflake by asking questions rather than writing SQL.
Code Example
Middleware proxy and Snowflake SQL for AI analytics from your self-hosted models:
# middleware_proxy.py — Translates Snowflake format to GPU API
from fastapi import FastAPI, Request
import requests
app = FastAPI()
GPU_URL = "http://gpu-server:8000/v1/chat/completions"
GPU_KEY = "your-api-key"
@app.post("/ai/classify")
async def classify(request: Request):
body = await request.json()
rows = body["data"] # [[row_num, text], ...]
results = []
for row_num, text in rows:
resp = requests.post(GPU_URL, json={
"model": "meta-llama/Llama-3-8b-chat-hf",
"messages": [{"role": "user",
"content": f"Classify into one category "
f"(positive/negative/neutral): {text}"}],
"max_tokens": 10, "temperature": 0.1
}, headers={"Authorization": f"Bearer {GPU_KEY}"})
label = resp.json()["choices"][0]["message"]["content"].strip()
results.append([row_num, label])
return {"data": results}
# --- Snowflake SQL setup ---
# CREATE OR REPLACE API INTEGRATION gpu_ai_integration
# API_PROVIDER = aws_api_gateway
# API_AWS_ROLE_ARN = 'arn:aws:iam::role/snowflake-gpu'
# ENABLED = TRUE
# API_ALLOWED_PREFIXES = ('https://your-proxy.example.com/');
#
# CREATE OR REPLACE EXTERNAL FUNCTION ai_classify(text VARCHAR)
# RETURNS VARCHAR
# API_INTEGRATION = gpu_ai_integration
# AS 'https://your-proxy.example.com/ai/classify';
#
# CREATE OR REPLACE EXTERNAL FUNCTION ai_summarise(text VARCHAR)
# RETURNS VARCHAR
# API_INTEGRATION = gpu_ai_integration
# AS 'https://your-proxy.example.com/ai/summarise';
#
# -- Usage in SQL queries:
# SELECT customer_id, feedback_text,
# ai_classify(feedback_text) AS sentiment,
# ai_summarise(feedback_text) AS summary
# FROM customer_feedback
# WHERE created_at > DATEADD(day, -7, CURRENT_DATE);
Testing Your Integration
Deploy the middleware proxy and test it directly with curl using Snowflake’s request format. Create the API integration and external function in Snowflake. Run a simple query: SELECT ai_classify('Great product, fast delivery') and verify it returns a classification. Then test on a table with 100 rows to verify batch processing works correctly.
Measure query execution time to establish baseline latency. External function calls add network round-trip and GPU inference time to query execution. For large result sets, test with LIMIT clauses first, then scale up gradually. Verify that the middleware handles concurrent requests from multiple Snowflake queries.
Production Tips
Cache AI results in Snowflake by writing enrichment outputs to a dedicated table rather than calling the external function in every query. Run a scheduled enrichment job that processes new rows nightly, storing AI classifications, summaries, and entities alongside the source data. Analysts query the pre-computed results without triggering live GPU inference.
For the natural language query interface, provide schema metadata and sample queries in the LLM prompt so it generates accurate SQL. Restrict the interface to SELECT queries only — never let AI-generated SQL modify data. Add guardrails that validate generated SQL before execution. Build an AI chatbot frontend for your Snowflake text-to-SQL interface. Explore more tutorials or get started with GigaGPU to power AI analytics on your warehouse data.