What You’ll Connect
After this guide, your Firebase application will have AI capabilities powered by your own GPU server — no API costs, no rate limits. Cloud Functions call your vLLM or Ollama endpoint on dedicated GPU hardware, and Firestore triggers process documents through your self-hosted model automatically.
This integration suits mobile and web apps built on Firebase that need AI features — chat assistants, content moderation, or smart search. Firebase handles user auth and real-time data sync while your GPU server handles the heavy inference work.
Callable Cloud Function –> fetch() to GPU –> vLLM Inference (web/mobile) onCall or onRequest /v1/chat/ on dedicated GPU | completions Firestore –> Firestore Trigger –> Cloud Function –> GPU Server (new document) onCreate/onUpdate processes doc LLM inference | | | Updated doc <-- Function writes <-- Completion <-- Model returns with AI field back to Firestore parsed response -->Prerequisites
- A GigaGPU server with an LLM on an OpenAI-compatible API (self-host guide)
- A Firebase project with Cloud Functions enabled (Blaze plan required)
- Firebase CLI installed:
npm install -g firebase-tools - HTTPS access to your GPU server (Nginx proxy guide)
- GPU API key set as a Firebase function config secret
Integration Steps
Initialise Cloud Functions in your Firebase project: firebase init functions. Choose TypeScript for type safety. Install the OpenAI SDK: npm install openai in the functions directory.
Store your GPU API credentials using Firebase function secrets: firebase functions:secrets:set GPU_API_KEY and firebase functions:secrets:set GPU_API_URL. These are encrypted at rest and only available to your Cloud Functions at runtime.
Create two types of functions: a callable function that your client app invokes directly for chat interactions, and a Firestore trigger that automatically processes new documents. The callable function handles real-time AI chat, while the trigger handles background AI enrichment on database writes.
Code Example
Cloud Functions connecting to your GPU inference server via the OpenAI-compatible API:
import { onCall, HttpsError } from "firebase-functions/v2/https";
import { onDocumentCreated } from "firebase-functions/v2/firestore";
import { defineSecret } from "firebase-functions/params";
import OpenAI from "openai";
import * as admin from "firebase-admin";
admin.initializeApp();
const db = admin.firestore();
const gpuApiKey = defineSecret("GPU_API_KEY");
const gpuApiUrl = defineSecret("GPU_API_URL");
export const aiChat = onCall(
{ secrets: [gpuApiKey, gpuApiUrl] },
async (request) => {
if (!request.auth) throw new HttpsError("unauthenticated", "Login required");
const llm = new OpenAI({
baseURL: gpuApiUrl.value() + "/v1",
apiKey: gpuApiKey.value(),
});
const completion = await llm.chat.completions.create({
model: "meta-llama/Llama-3-70b-chat-hf",
messages: request.data.messages,
max_tokens: 1024,
});
return { content: completion.choices[0].message.content };
}
);
export const processDocument = onDocumentCreated(
{ document: "documents/{docId}", secrets: [gpuApiKey, gpuApiUrl] },
async (event) => {
const snap = event.data;
if (!snap) return;
const { text } = snap.data();
const llm = new OpenAI({
baseURL: gpuApiUrl.value() + "/v1",
apiKey: gpuApiKey.value(),
});
const result = await llm.chat.completions.create({
model: "meta-llama/Llama-3-70b-chat-hf",
messages: [
{ role: "system", content: "Summarise this document in 2-3 sentences." },
{ role: "user", content: text }
],
max_tokens: 200,
});
await snap.ref.update({ ai_summary: result.choices[0].message.content });
}
);
Testing Your Integration
Deploy with firebase deploy --only functions. Test the callable function from your client app or the Firebase Emulator Suite. Verify the response contains a valid AI completion. For the Firestore trigger, add a document to the “documents” collection and watch the ai_summary field populate automatically.
Check the Firebase Functions logs for execution times and any errors. Ensure your GPU server responds within Cloud Functions’ timeout (60 seconds for callable functions, 540 seconds for background triggers).
Production Tips
Cloud Functions cold starts add 1-3 seconds to the first invocation. For latency-sensitive chat features, keep at least one function instance warm using minimum instances configuration. This adds cost on the Firebase side but eliminates cold start delays for users.
Use Firebase Auth rules in the callable function to control who can access AI features. Combine with Firestore security rules that restrict the AI summary fields to write-only from the service account, preventing clients from tampering with AI-generated content.
Firebase’s real-time listeners mean your client app can display AI processing status in real time — write a “processing” flag to Firestore before the GPU call, then update it with the result. Users see the status change instantly via Firestore subscriptions. Explore open-source models, browse more tutorials, or get started with GigaGPU to power your Firebase AI features.