RTX 3050 - Order Now
Home / Blog / Tutorials / Connect React Native to Self-Hosted AI
Tutorials

Connect React Native to Self-Hosted AI

Stream AI responses from your GPU server into a React Native application. This guide covers implementing streaming fetch on mobile, managing chat state, and building a native AI chat experience on iOS and Android with your self-hosted LLM backend.

What You’ll Connect

After this guide, your React Native app will stream AI responses from your own GPU server on both iOS and Android — tokens rendering progressively as the model generates them. The app connects to your vLLM or Ollama endpoint on dedicated GPU hardware using the OpenAI-compatible API, delivering a native mobile chat experience powered by your self-hosted infrastructure.

React Native’s Fetch API supports streaming on both platforms with the react-native-fetch-api polyfill. The integration pattern mirrors a web React app but accounts for mobile-specific considerations — background state handling, network transitions, and native keyboard management.

Prerequisites

  • A GigaGPU server running a self-hosted LLM (setup guide)
  • HTTPS access to your inference endpoint
  • React Native 0.72+ with Expo or bare workflow
  • API key for your GPU inference server

Integration Steps

Install the streaming fetch polyfill for React Native — the default Fetch implementation does not support ReadableStream on all platforms. Configure the polyfill at app startup so all fetch calls support streaming. Store your GPU endpoint URL and API key in environment variables using react-native-config.

Build a custom hook that manages the streaming connection, message state, and request lifecycle. The hook calls your GPU endpoint with stream: true, reads the response body as chunks, parses server-sent events, and updates state with each token. An AbortController handles cancellation when the user navigates away or sends a new message.

Create the chat screen with a FlatList for messages (better performance than ScrollView for long lists), a TextInput with keyboard-aware positioning, and send/cancel buttons. Use onContentSizeChange to auto-scroll to the latest message during streaming.

Code Example

React Native hook for streaming from your self-hosted LLM:

import { useState, useCallback, useRef } from "react";

const API_URL = process.env.GPU_API_URL + "/v1/chat/completions";
const API_KEY = process.env.GPU_API_KEY;

export function useChat() {
  const [messages, setMessages] = useState([]);
  const [isStreaming, setIsStreaming] = useState(false);
  const abortRef = useRef(null);

  const sendMessage = useCallback(async (text) => {
    const updated = [...messages,
      { role: "user", content: text },
      { role: "assistant", content: "" }
    ];
    setMessages(updated);
    setIsStreaming(true);
    abortRef.current = new AbortController();

    try {
      const resp = await fetch(API_URL, {
        method: "POST",
        headers: {
          "Content-Type": "application/json",
          Authorization: `Bearer ${API_KEY}`,
        },
        body: JSON.stringify({
          model: "meta-llama/Llama-3-70b-chat-hf",
          messages: updated.slice(0, -1),
          stream: true, max_tokens: 1024,
        }),
        signal: abortRef.current.signal,
        reactNative: { textStreaming: true },
      });

      const reader = resp.body.getReader();
      const decoder = new TextDecoder();
      let content = "";

      while (true) {
        const { done, value } = await reader.read();
        if (done) break;
        const chunk = decoder.decode(value);
        for (const line of chunk.split("\n")) {
          if (!line.startsWith("data: ") || line === "data: [DONE]") continue;
          const json = JSON.parse(line.slice(6));
          content += json.choices[0]?.delta?.content || "";
          setMessages(prev => {
            const copy = [...prev];
            copy[copy.length - 1] = { role: "assistant", content };
            return copy;
          });
        }
      }
    } catch (err) {
      if (err.name !== "AbortError") throw err;
    }
    setIsStreaming(false);
  }, [messages]);

  const cancel = () => { abortRef.current?.abort(); setIsStreaming(false); };
  return { messages, sendMessage, isStreaming, cancel };
}

Testing Your Integration

Run the app on iOS Simulator and Android Emulator in parallel. Send test messages on both platforms and verify tokens stream progressively. Test background/foreground transitions mid-stream — the app should handle backgrounding gracefully without crashing. Test network switching (WiFi to cellular) to verify reconnection behaviour.

Profile rendering performance with the React Native Performance Monitor. During streaming, only the last message should re-render — ensure FlatList’s keyExtractor and getItemLayout are configured to prevent unnecessary re-renders of the entire message list.

Production Tips

Route requests through your own backend to keep the API key off the mobile device. Use certificate pinning for added security when connecting to your backend. Store conversation history in AsyncStorage or an SQLite database for offline access and faster app restarts.

Handle the soft keyboard carefully — use KeyboardAvoidingView and adjust the FlatList’s content inset when the keyboard appears. Implement push notifications to alert users when long-running AI tasks complete in the background. Build a full AI chatbot mobile experience. Explore more tutorials or get started with GigaGPU to power your React Native apps.

Need a Dedicated GPU Server?

Deploy from RTX 3050 to RTX 5090. Full root access, NVMe storage, 1Gbps — UK datacenter.

Browse GPU Servers

admin

We benchmark, deploy, and optimise GPU infrastructure for AI workloads. All data in our guides comes from real-world testing on our UK-based dedicated GPU servers.

Ready to deploy your AI workload?

Dedicated GPU servers from our UK datacenter. NVMe storage, 1Gbps networking, full root access.

Browse GPU Servers Contact Sales

Have a question? Need help?