RTX 3050 - Order Now
Home / Blog / Tutorials / Connect Vue.js to Self-Hosted AI
Tutorials

Connect Vue.js to Self-Hosted AI

Stream AI responses from your GPU server into a Vue.js application. This guide covers building a composable for streaming chat completions, managing reactive conversation state, and creating a chat interface with your self-hosted LLM backend.

What You’ll Connect

After this guide, your Vue.js app will stream AI responses from your own GPU server with tokens appearing in real time using Vue’s reactivity system. A custom composable handles streaming from your vLLM or Ollama endpoint on dedicated GPU hardware, and the template renders each token as it arrives — delivering a fluid chat experience backed entirely by your own infrastructure.

The integration uses a Vue 3 composable wrapping the Fetch API with streaming support. Your GPU server’s OpenAI-compatible API sends server-sent events that the composable parses and feeds into reactive refs, automatically updating every bound component.

Prerequisites

  • A GigaGPU server running a self-hosted LLM (setup guide)
  • HTTPS access to your inference endpoint with CORS configured
  • Vue 3 application with Composition API
  • API key for your GPU inference server

Integration Steps

Create a Vue composable that encapsulates all AI communication logic — connection management, streaming, state tracking, and error handling. The composable exposes reactive refs for messages, loading state, and error state that any component can bind to. Configure the endpoint URL through Vite environment variables.

Build the streaming parser inside the composable. The GPU endpoint sends server-sent events with delta tokens. The parser reads the stream chunk by chunk, extracts token content, and appends it to the current assistant message ref. Vue’s reactivity system propagates changes instantly to the template without manual DOM updates.

Create a chat component that uses the composable. The component renders the message list with v-for, shows a typing indicator during streaming, and provides an input form with submit and cancel controls. Vue’s transition system can animate new messages into view.

Code Example

Vue 3 composable for streaming from your self-hosted LLM:

// composables/useChat.js
import { ref, readonly } from "vue";

const API_URL = import.meta.env.VITE_GPU_API_URL + "/v1/chat/completions";
const API_KEY = import.meta.env.VITE_GPU_API_KEY;

export function useChat() {
  const messages = ref([]);
  const isStreaming = ref(false);
  let abortController = null;

  async function sendMessage(userContent) {
    messages.value.push({ role: "user", content: userContent });
    messages.value.push({ role: "assistant", content: "" });
    isStreaming.value = true;
    abortController = new AbortController();

    const response = await fetch(API_URL, {
      method: "POST",
      headers: {
        "Content-Type": "application/json",
        Authorization: `Bearer ${API_KEY}`,
      },
      body: JSON.stringify({
        model: "meta-llama/Llama-3-70b-chat-hf",
        messages: messages.value.slice(0, -1),
        stream: true,
        max_tokens: 1024,
      }),
      signal: abortController.signal,
    });

    const reader = response.body.getReader();
    const decoder = new TextDecoder();
    const lastIdx = messages.value.length - 1;

    while (true) {
      const { done, value } = await reader.read();
      if (done) break;
      const chunk = decoder.decode(value);
      for (const line of chunk.split("\n")) {
        if (!line.startsWith("data: ") || line === "data: [DONE]") continue;
        const json = JSON.parse(line.slice(6));
        const token = json.choices[0]?.delta?.content || "";
        messages.value[lastIdx].content += token;
      }
    }
    isStreaming.value = false;
  }

  function cancel() { abortController?.abort(); isStreaming.value = false; }
  function clear() { messages.value = []; }

  return { messages: readonly(messages), isStreaming: readonly(isStreaming),
           sendMessage, cancel, clear };
}

Testing Your Integration

Import the composable into a chat component and bind the messages ref to a v-for list. Send a test message and verify tokens stream in real time — the last message should grow character by character. Test the cancel function mid-stream and verify the UI recovers cleanly. Check Vue DevTools to confirm reactive state updates without memory leaks.

Test with rapid consecutive messages to ensure state management handles overlapping requests. Verify the AbortController properly cleans up abandoned streams. Check browser console for CORS warnings if connecting directly to the GPU server.

Production Tips

Route requests through your own backend (Nuxt server routes or a separate API) to keep the GPU API key out of the client bundle. Add a Pinia store if you need conversation persistence across routes or components. Implement message history pagination for long conversations to keep the reactive array manageable.

For Nuxt.js applications, use server API routes (server/api/chat.post.ts) identically to the Next.js pattern — the server proxies requests to your GPU endpoint and streams responses back. Build a full AI chatbot with Vue’s component system. Explore more tutorials or get started with GigaGPU to power your Vue apps with self-hosted AI.

Need a Dedicated GPU Server?

Deploy from RTX 3050 to RTX 5090. Full root access, NVMe storage, 1Gbps — UK datacenter.

Browse GPU Servers

admin

We benchmark, deploy, and optimise GPU infrastructure for AI workloads. All data in our guides comes from real-world testing on our UK-based dedicated GPU servers.

Ready to deploy your AI workload?

Dedicated GPU servers from our UK datacenter. NVMe storage, 1Gbps networking, full root access.

Browse GPU Servers Contact Sales

Have a question? Need help?