Home / Blog / Tutorials / Connect Firebase to Self-Hosted AI on GPU

Tutorials

Connect Firebase to Self-Hosted AI on GPU

Add self-hosted AI to your Firebase app using Cloud Functions that call your GPU inference endpoint. This guide covers function setup, Firestore triggers, and building AI features that run on your own GPU server instead of third-party AI APIs.

Tutorials April 16, 2026 2 min read admin

What You’ll Connect

After this guide, your Firebase application will have AI capabilities powered by your own GPU server — no API costs, no rate limits. Cloud Functions call your vLLM or Ollama endpoint on dedicated GPU hardware, and Firestore triggers process documents through your self-hosted model automatically.

This integration suits mobile and web apps built on Firebase that need AI features — chat assistants, content moderation, or smart search. Firebase handles user auth and real-time data sync while your GPU server handles the heavy inference work.

Callable Cloud Function –> fetch() to GPU –> vLLM Inference (web/mobile) onCall or onRequest /v1/chat/ on dedicated GPU | completions Firestore –> Firestore Trigger –> Cloud Function –> GPU Server (new document) onCreate/onUpdate processes doc LLM inference | | | Updated doc <-- Function writes <-- Completion <-- Model returns with AI field back to Firestore parsed response -->

Prerequisites

A GigaGPU server with an LLM on an OpenAI-compatible API (self-host guide)
A Firebase project with Cloud Functions enabled (Blaze plan required)
Firebase CLI installed: npm install -g firebase-tools
HTTPS access to your GPU server (Nginx proxy guide)
GPU API key set as a Firebase function config secret

Integration Steps

Initialise Cloud Functions in your Firebase project: firebase init functions. Choose TypeScript for type safety. Install the OpenAI SDK: npm install openai in the functions directory.

Store your GPU API credentials using Firebase function secrets: firebase functions:secrets:set GPU_API_KEY and firebase functions:secrets:set GPU_API_URL. These are encrypted at rest and only available to your Cloud Functions at runtime.

Create two types of functions: a callable function that your client app invokes directly for chat interactions, and a Firestore trigger that automatically processes new documents. The callable function handles real-time AI chat, while the trigger handles background AI enrichment on database writes.

Code Example

Cloud Functions connecting to your GPU inference server via the OpenAI-compatible API:

import { onCall, HttpsError } from "firebase-functions/v2/https";
import { onDocumentCreated } from "firebase-functions/v2/firestore";
import { defineSecret } from "firebase-functions/params";
import OpenAI from "openai";
import * as admin from "firebase-admin";

admin.initializeApp();
const db = admin.firestore();

const gpuApiKey = defineSecret("GPU_API_KEY");
const gpuApiUrl = defineSecret("GPU_API_URL");

export const aiChat = onCall(
  { secrets: [gpuApiKey, gpuApiUrl] },
  async (request) => {
    if (!request.auth) throw new HttpsError("unauthenticated", "Login required");

    const llm = new OpenAI({
      baseURL: gpuApiUrl.value() + "/v1",
      apiKey: gpuApiKey.value(),
    });

    const completion = await llm.chat.completions.create({
      model: "meta-llama/Llama-3-70b-chat-hf",
      messages: request.data.messages,
      max_tokens: 1024,
    });

    return { content: completion.choices[0].message.content };
  }
);

export const processDocument = onDocumentCreated(
  { document: "documents/{docId}", secrets: [gpuApiKey, gpuApiUrl] },
  async (event) => {
    const snap = event.data;
    if (!snap) return;
    const { text } = snap.data();

    const llm = new OpenAI({
      baseURL: gpuApiUrl.value() + "/v1",
      apiKey: gpuApiKey.value(),
    });

    const result = await llm.chat.completions.create({
      model: "meta-llama/Llama-3-70b-chat-hf",
      messages: [
        { role: "system", content: "Summarise this document in 2-3 sentences." },
        { role: "user", content: text }
      ],
      max_tokens: 200,
    });

    await snap.ref.update({ ai_summary: result.choices[0].message.content });
  }
);

Testing Your Integration

Deploy with firebase deploy --only functions. Test the callable function from your client app or the Firebase Emulator Suite. Verify the response contains a valid AI completion. For the Firestore trigger, add a document to the “documents” collection and watch the ai_summary field populate automatically.

Check the Firebase Functions logs for execution times and any errors. Ensure your GPU server responds within Cloud Functions’ timeout (60 seconds for callable functions, 540 seconds for background triggers).

Production Tips

Cloud Functions cold starts add 1-3 seconds to the first invocation. For latency-sensitive chat features, keep at least one function instance warm using minimum instances configuration. This adds cost on the Firebase side but eliminates cold start delays for users.

Use Firebase Auth rules in the callable function to control who can access AI features. Combine with Firestore security rules that restrict the AI summary fields to write-only from the service account, preventing clients from tampering with AI-generated content.

Firebase’s real-time listeners mean your client app can display AI processing status in real time — write a “processing” flag to Firestore before the GPU call, then update it with the result. Users see the status change instantly via Firestore subscriptions. Explore open-source models, browse more tutorials, or get started with GigaGPU to power your Firebase AI features.

Need a Dedicated GPU Server?

Deploy from RTX 3050 to RTX 5090. Full root access, NVMe storage, 1Gbps — UK datacenter.

Browse GPU Servers

Tutorials

admin

We benchmark, deploy, and optimise GPU infrastructure for AI workloads. All data in our guides comes from real-world testing on our UK-based dedicated GPU servers.

Ready to deploy your AI workload?

Dedicated GPU servers from our UK datacenter. NVMe storage, 1Gbps networking, full root access.

Browse GPU Servers Contact Sales

Connect Firebase to Self-Hosted AI on GPU

What You’ll Connect

Prerequisites

Integration Steps

Code Example

Testing Your Integration

Production Tips

Need a Dedicated GPU Server?

admin

GPU Hosting

Blog Categories

AI Model Hosting

Benchmarks & Tools

Deploy a GPU Server

Ready to deploy your AI workload?

Have a question? Need help?

Connect Firebase to Self-Hosted AI on GPU

What You’ll Connect

Prerequisites

Integration Steps

Code Example

Testing Your Integration

Production Tips

Need a Dedicated GPU Server?

admin

Related Articles

How to Build a RAG Pipeline with LangChain on a GPU Server

vLLM Memory Fragmentation: Defragmentation Guide

vLLM Engine Args Reference – What Each Flag Actually Does

Social Media Bot: LLM + Image Gen

GPU Hosting

Blog Categories

AI Model Hosting

Benchmarks & Tools

Deploy a GPU Server

Ready to deploy your AI workload?

Have a question? Need help? Contact us

Have a question? Need help?