For data-platform-aligned teams (most enterprises), self-hosted AI needs to plug into Snowflake / Databricks / BigQuery / dbt workflows. The integration patterns are well-defined; pick the one that suits your data flow direction.
Two main patterns: (1) data → AI: data platform queries your AI tier as a UDF / external function, results return to data warehouse. (2) AI → data: AI tier reads from data warehouse via SQL queries, writes back results. Snowflake External Functions, Databricks UC volumes, BigQuery Remote Functions all support pattern 1 cleanly.
Patterns
- Snowflake External Functions / SnowPark: SQL queries call AI as UDF; results join with warehouse data
- Databricks Mosaic ML / Spark UDF: Spark dataframe operations call AI; results back to Databricks
- BigQuery Remote Functions: SQL functions call AI tier; integrated into BigQuery dialect
- dbt + AI: dbt models include calls to AI for transformations (descriptions from data, etc.)
- Reverse: AI reads from warehouse: AI service queries Snowflake / BigQuery via standard JDBC / ODBC for retrieval / context
Examples
- Generate product descriptions: dbt model takes product attributes; calls AI; produces marketing description as derived column
- Categorise support tickets: SQL function classifies ticket text via AI; result available as warehouse column
- Embed for vector search: dbt batch job embeds product catalog; loads to Qdrant; warehouse + vector store stay in sync
- Anomaly explanation: anomaly detected in metric; AI tier generates structured explanation grounded in context data
Verdict
For data-platform-aligned enterprises, self-hosted AI integrates cleanly via External Functions / Remote Functions / Spark UDFs. The patterns are mature; cost economics still favour self-hosted; latency is acceptable for batch workloads. Build AI tier as standard OpenAI-compatible service; data platform calls it as just another external function.
Bottom line
External Functions for SQL-driven workflows. See OpenAI API guide.