RTX 3050 - Order Now
Home / Blog / Tutorials / Ollama Remote Access & Network Setup
Tutorials

Ollama Remote Access & Network Setup

Configure Ollama for secure remote access on dedicated GPU servers. Covers bind address configuration, SSH tunnels, reverse proxy with Nginx, API authentication, and TLS encryption.

The Problem: Ollama Only Listens on Localhost

You have Ollama running on your GPU server and it works perfectly when you SSH in and curl localhost. But your application runs on a different machine, and connecting to the server’s public IP on port 11434 returns nothing. Ollama binds to 127.0.0.1 by default, rejecting all remote connections. Here is how to open it up safely.

Option 1: Bind to All Interfaces

The fastest approach, suitable when your server sits behind a network firewall:

# Start Ollama on all interfaces
OLLAMA_HOST=0.0.0.0:11434 ollama serve

# For systemd-managed Ollama
sudo systemctl edit ollama
# /etc/systemd/system/ollama.service.d/override.conf
[Service]
Environment="OLLAMA_HOST=0.0.0.0:11434"
sudo systemctl daemon-reload
sudo systemctl restart ollama

# Verify it's listening on all interfaces
ss -tlnp | grep 11434
# Should show 0.0.0.0:11434 instead of 127.0.0.1:11434

This exposes Ollama to every machine that can reach your server. Without additional protection, anyone who knows your IP can send API requests and consume your GPU resources.

Lock Down with Firewall Rules

Restrict access to known client IPs immediately after opening the port:

# Allow only your office IP
sudo ufw allow from 203.0.113.50 to any port 11434

# Allow a VPN subnet
sudo ufw allow from 10.8.0.0/24 to any port 11434

# Block everything else (default deny is already set if UFW is enabled)
sudo ufw deny 11434

# Verify rules
sudo ufw status numbered

Option 2: SSH Tunnel (No Port Exposure)

The most secure approach. Keep Ollama on localhost and forward the port through an encrypted SSH connection:

# From your local machine, create the tunnel
ssh -L 11434:localhost:11434 user@your-gpu-server

# Now http://localhost:11434 on your local machine
# forwards to the GPU server's Ollama instance
curl http://localhost:11434/api/tags

# For persistent tunnels, use autossh
sudo apt install autossh
autossh -M 0 -f -N -L 11434:localhost:11434 user@your-gpu-server

SSH tunnels require no changes to Ollama’s configuration and add zero attack surface.

Option 3: Nginx Reverse Proxy with TLS

For production deployments serving multiple clients, put Ollama behind Nginx with TLS and basic authentication:

# Install Nginx and create password file
sudo apt install nginx apache2-utils
sudo htpasswd -c /etc/nginx/.ollama-auth apiuser
# /etc/nginx/sites-available/ollama
server {
    listen 443 ssl;
    server_name ollama.yourdomain.com;

    ssl_certificate /etc/letsencrypt/live/ollama.yourdomain.com/fullchain.pem;
    ssl_certificate_key /etc/letsencrypt/live/ollama.yourdomain.com/privkey.pem;

    auth_basic "Ollama API";
    auth_basic_user_file /etc/nginx/.ollama-auth;

    location / {
        proxy_pass http://127.0.0.1:11434;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_read_timeout 600s;
        proxy_buffering off;
    }
}
# Enable and test
sudo ln -s /etc/nginx/sites-available/ollama /etc/nginx/sites-enabled/
sudo nginx -t
sudo systemctl reload nginx

# Client connects with authentication
curl -u apiuser:password https://ollama.yourdomain.com/api/tags

Option 4: API Key Authentication Middleware

For more granular access control, add a lightweight authentication layer:

# Simple auth proxy with Python
# save as ollama_proxy.py
from flask import Flask, request, Response
import requests

app = Flask(__name__)
API_KEY = "your-secret-key-here"
OLLAMA_URL = "http://localhost:11434"

@app.before_request
def check_auth():
    if request.headers.get("Authorization") != f"Bearer {API_KEY}":
        return Response("Unauthorized", status=401)

@app.route("/", defaults={"path": ""}, methods=["GET", "POST", "DELETE"])
@app.route("/", methods=["GET", "POST", "DELETE"])
def proxy(path):
    resp = requests.request(
        method=request.method,
        url=f"{OLLAMA_URL}/{path}",
        headers={k: v for k, v in request.headers if k != "Host"},
        data=request.get_data(),
        stream=True
    )
    return Response(resp.iter_content(chunk_size=1024),
                    status=resp.status_code,
                    content_type=resp.headers.get("Content-Type"))

if __name__ == "__main__":
    app.run(host="0.0.0.0", port=8080)

Production Security Checklist

Before exposing your Ollama API remotely, verify these are in place: TLS encryption on all external connections, authentication on every endpoint, firewall rules restricting source IPs, rate limiting to prevent abuse, and monitoring for unusual request patterns. The infrastructure guides cover server hardening in depth. For OpenAI-compatible API serving with built-in auth support, consider vLLM as described in the production guide. Our Docker GPU guide covers isolated container networking, and the tutorials section has more LLM hosting configurations. Review the CUDA setup guide to ensure your GPU stack is production-ready.

Secure GPU Servers for Remote AI APIs

GigaGPU dedicated servers with DDoS protection, private networking, and full root access for secure Ollama deployments.

Browse GPU Servers

Need a Dedicated GPU Server?

Deploy from RTX 3050 to RTX 5090. Full root access, NVMe storage, 1Gbps — UK datacenter.

Browse GPU Servers

admin

We benchmark, deploy, and optimise GPU infrastructure for AI workloads. All data in our guides comes from real-world testing on our UK-based dedicated GPU servers.

Ready to deploy your AI workload?

Dedicated GPU servers from our UK datacenter. NVMe storage, 1Gbps networking, full root access.

Browse GPU Servers Contact Sales

Have a question? Need help?