Home / Blog / Tutorials / Connect Next.js to Self-Hosted AI

Tutorials

Connect Next.js to Self-Hosted AI

Stream AI responses from your GPU server into a Next.js application using server-side API routes. This guide covers building a streaming API route, connecting to your self-hosted LLM from the server, and rendering AI output in React Server Components.

Tutorials April 16, 2026 3 min read gigagpu

What You’ll Connect

After this guide, your Next.js application will stream AI responses from your own GPU server through a server-side API route — keeping your API key secure on the server while delivering real-time token streaming to the browser. Your vLLM or Ollama endpoint on dedicated GPU hardware powers the AI features, and the Next.js API route acts as a secure proxy between your frontend and the GPU backend.

The integration uses Next.js App Router API routes with the Vercel AI SDK pattern for streaming. Your GPU endpoint serves the OpenAI-compatible API, and the Next.js backend streams completions to the client using the standard ReadableStream interface.

Prerequisites

A GigaGPU server running a self-hosted LLM (setup guide)
Network access from your Next.js server to the GPU endpoint
Next.js 14+ with App Router enabled
API key for your GPU inference server stored in environment variables

Integration Steps

Create an API route in your Next.js App Router that accepts chat messages from the frontend. The route calls your GPU server’s completion endpoint with streaming enabled, then pipes the response stream directly back to the client. This keeps the GPU API key server-side while giving the frontend real-time token streaming.

Build the client-side hook that calls your Next.js API route and parses the streaming response. Use the useChat pattern — manage conversation history in state, append user messages, stream assistant responses, and handle loading and error states. The Vercel AI SDK provides this hook out of the box, and it works with any OpenAI-compatible backend.

For server-rendered pages, use React Server Components to fetch AI-generated content at request time. The server component calls your GPU endpoint directly — no streaming needed for pre-rendered content — and includes the result in the initial HTML response.

Code Example

Next.js API route and client hook for streaming from your self-hosted LLM:

// app/api/chat/route.ts — Server-side API route
import { NextRequest } from "next/server";

export async function POST(req: NextRequest) {
  const { messages } = await req.json();

  const response = await fetch(
    process.env.GPU_API_URL + "/v1/chat/completions",
    {
      method: "POST",
      headers: {
        "Content-Type": "application/json",
        Authorization: `Bearer ${process.env.GPU_API_KEY}`,
      },
      body: JSON.stringify({
        model: "meta-llama/Llama-3-70b-chat-hf",
        messages,
        stream: true,
        max_tokens: 1024,
      }),
    }
  );

  // Pipe the GPU server's stream directly to the client
  return new Response(response.body, {
    headers: { "Content-Type": "text/event-stream" },
  });
}

// app/chat/page.tsx — Client component with streaming
"use client";
import { useChat } from "ai/react"; // Vercel AI SDK

export default function ChatPage() {
  const { messages, input, handleInputChange, handleSubmit, isLoading } =
    useChat({ api: "/api/chat" });

  return (
    <div>
      {messages.map((m) => (
        <div key={m.id}>
          <strong>{m.role}:</strong> {m.content}
        </div>
      ))}
      <form onSubmit={handleSubmit}>
        <input value={input} onChange={handleInputChange}
               placeholder="Ask something..." />
        <button type="submit" disabled={isLoading}>Send</button>
      </form>
    </div>
  );
}

Testing Your Integration

Start your Next.js dev server and open the chat page. Send a test message and verify tokens stream progressively. Check the Network tab to confirm requests go to your /api/chat route, not directly to the GPU server. Verify that the GPU_API_KEY environment variable is not exposed in the client bundle by searching the browser source.

Test edge cases: rapid message sending, very long responses, network interruptions, and concurrent users. The API route should handle each request independently with proper stream cleanup on client disconnection.

Production Tips

Add rate limiting to your API route using middleware to prevent abuse. Implement user authentication so each chat session is tied to a verified user. Store conversation history in a database rather than client-side state for persistence across sessions. Add request logging to track token usage and response latency per user.

For SEO-critical pages, use React Server Components to pre-render AI-generated summaries, descriptions, or metadata at request time. The server calls the GPU endpoint synchronously, includes the content in the HTML, and search engines see the full page without JavaScript execution. Build a complete AI chatbot product with Next.js as the framework. Explore more tutorials or get started with GigaGPU to power your Next.js apps.

Need a Dedicated GPU Server?

Deploy from RTX 3050 to RTX 5090. Full root access, NVMe storage, 1Gbps — UK datacenter.

Browse GPU Servers

Tutorials

gigagpu

We benchmark, deploy, and optimise GPU infrastructure for AI workloads. All data in our guides comes from real-world testing on our UK-based dedicated GPU servers.

Ready to deploy your AI workload?

Dedicated GPU servers from our UK datacenter. NVMe storage, 1Gbps networking, full root access.

Browse GPU Servers Contact Sales

Connect Next.js to Self-Hosted AI

What You’ll Connect

Prerequisites

Integration Steps

Code Example

Testing Your Integration

Production Tips

Need a Dedicated GPU Server?

gigagpu

GPU Hosting

Blog Categories

AI Model Hosting

Benchmarks & Tools

Deploy a GPU Server

Ready to deploy your AI workload?

Have a question? Need help?

Connect Next.js to Self-Hosted AI

What You’ll Connect

Prerequisites

Integration Steps

Code Example

Testing Your Integration

Production Tips

Need a Dedicated GPU Server?

gigagpu

Related Articles

Gradio AI Demo: Deployment on GPU

Self-Hosted AI Safety Guardrails: Llama Guard, Detoxify, Content Filtering

Retrieval-Augmented Fine-Tuning (RAFT)

Self-Hosted OpenAI-Compatible Streaming: SSE, WebSocket, and the Pitfalls

GPU Hosting

Blog Categories

AI Model Hosting

Benchmarks & Tools

Deploy a GPU Server

Ready to deploy your AI workload?

Have a question? Need help? Contact us

Have a question? Need help?