What You’ll Connect
After this guide, your React Native app will stream AI responses from your own GPU server on both iOS and Android — tokens rendering progressively as the model generates them. The app connects to your vLLM or Ollama endpoint on dedicated GPU hardware using the OpenAI-compatible API, delivering a native mobile chat experience powered by your self-hosted infrastructure.
React Native’s Fetch API supports streaming on both platforms with the react-native-fetch-api polyfill. The integration pattern mirrors a web React app but accounts for mobile-specific considerations — background state handling, network transitions, and native keyboard management.
Prerequisites
- A GigaGPU server running a self-hosted LLM (setup guide)
- HTTPS access to your inference endpoint
- React Native 0.72+ with Expo or bare workflow
- API key for your GPU inference server
Integration Steps
Install the streaming fetch polyfill for React Native — the default Fetch implementation does not support ReadableStream on all platforms. Configure the polyfill at app startup so all fetch calls support streaming. Store your GPU endpoint URL and API key in environment variables using react-native-config.
Build a custom hook that manages the streaming connection, message state, and request lifecycle. The hook calls your GPU endpoint with stream: true, reads the response body as chunks, parses server-sent events, and updates state with each token. An AbortController handles cancellation when the user navigates away or sends a new message.
Create the chat screen with a FlatList for messages (better performance than ScrollView for long lists), a TextInput with keyboard-aware positioning, and send/cancel buttons. Use onContentSizeChange to auto-scroll to the latest message during streaming.
Code Example
React Native hook for streaming from your self-hosted LLM:
import { useState, useCallback, useRef } from "react";
const API_URL = process.env.GPU_API_URL + "/v1/chat/completions";
const API_KEY = process.env.GPU_API_KEY;
export function useChat() {
const [messages, setMessages] = useState([]);
const [isStreaming, setIsStreaming] = useState(false);
const abortRef = useRef(null);
const sendMessage = useCallback(async (text) => {
const updated = [...messages,
{ role: "user", content: text },
{ role: "assistant", content: "" }
];
setMessages(updated);
setIsStreaming(true);
abortRef.current = new AbortController();
try {
const resp = await fetch(API_URL, {
method: "POST",
headers: {
"Content-Type": "application/json",
Authorization: `Bearer ${API_KEY}`,
},
body: JSON.stringify({
model: "meta-llama/Llama-3-70b-chat-hf",
messages: updated.slice(0, -1),
stream: true, max_tokens: 1024,
}),
signal: abortRef.current.signal,
reactNative: { textStreaming: true },
});
const reader = resp.body.getReader();
const decoder = new TextDecoder();
let content = "";
while (true) {
const { done, value } = await reader.read();
if (done) break;
const chunk = decoder.decode(value);
for (const line of chunk.split("\n")) {
if (!line.startsWith("data: ") || line === "data: [DONE]") continue;
const json = JSON.parse(line.slice(6));
content += json.choices[0]?.delta?.content || "";
setMessages(prev => {
const copy = [...prev];
copy[copy.length - 1] = { role: "assistant", content };
return copy;
});
}
}
} catch (err) {
if (err.name !== "AbortError") throw err;
}
setIsStreaming(false);
}, [messages]);
const cancel = () => { abortRef.current?.abort(); setIsStreaming(false); };
return { messages, sendMessage, isStreaming, cancel };
}
Testing Your Integration
Run the app on iOS Simulator and Android Emulator in parallel. Send test messages on both platforms and verify tokens stream progressively. Test background/foreground transitions mid-stream — the app should handle backgrounding gracefully without crashing. Test network switching (WiFi to cellular) to verify reconnection behaviour.
Profile rendering performance with the React Native Performance Monitor. During streaming, only the last message should re-render — ensure FlatList’s keyExtractor and getItemLayout are configured to prevent unnecessary re-renders of the entire message list.
Production Tips
Route requests through your own backend to keep the API key off the mobile device. Use certificate pinning for added security when connecting to your backend. Store conversation history in AsyncStorage or an SQLite database for offline access and faster app restarts.
Handle the soft keyboard carefully — use KeyboardAvoidingView and adjust the FlatList’s content inset when the keyboard appears. Implement push notifications to alert users when long-running AI tasks complete in the background. Build a full AI chatbot mobile experience. Explore more tutorials or get started with GigaGPU to power your React Native apps.