RTX 3050 - Order Now
Home / Blog / Tutorials / Ollama API Not Responding: Debug
Tutorials

Ollama API Not Responding: Debug

Debug and fix Ollama API connection failures including port binding issues, firewall blocks, reverse proxy misconfigurations, and systemd service problems on GPU servers.

Symptom: API Requests Return Connection Refused

Your application sends requests to the Ollama API at http://localhost:11434 and gets nothing back. The error varies by client but the result is the same:

curl: (7) Failed to connect to localhost port 11434 after 0 ms: Connection refused
requests.exceptions.ConnectionError: HTTPConnectionPool(host='localhost', port=11434): Max retries exceeded

The Ollama server should be listening on port 11434 by default. When connections fail, the server is either not running, bound to a different address, or blocked by a firewall. Here is how to diagnose and fix each scenario on your GPU server.

Check If Ollama Is Running

# Check the systemd service
sudo systemctl status ollama

# Check if the process exists
pgrep -a ollama

# Check if anything is listening on port 11434
ss -tlnp | grep 11434

# Check Ollama logs for startup errors
sudo journalctl -u ollama --no-pager -n 50

If the service is not running, start it with sudo systemctl start ollama. If it crashes on startup, the logs will reveal whether the issue is a missing GPU driver, corrupted model, or port conflict.

Fix 1: Correct the Bind Address

By default Ollama binds to 127.0.0.1:11434, which only accepts connections from the local machine. If your application runs on a different host or inside a Docker container, it cannot reach localhost:

# Bind to all interfaces for remote access
OLLAMA_HOST=0.0.0.0:11434 ollama serve

# For systemd, edit the service override
sudo systemctl edit ollama
# Add:
# [Service]
# Environment="OLLAMA_HOST=0.0.0.0:11434"

sudo systemctl daemon-reload
sudo systemctl restart ollama

Binding to 0.0.0.0 exposes the API to the network. Always pair this with firewall rules to restrict access to trusted sources.

Fix 2: Firewall Blocking the Port

Server firewalls often block non-standard ports by default:

# Check UFW rules (Ubuntu)
sudo ufw status verbose

# Allow Ollama port from specific IP
sudo ufw allow from 10.0.0.0/24 to any port 11434

# Or allow from anywhere (less secure)
sudo ufw allow 11434/tcp

# Check iptables directly
sudo iptables -L -n | grep 11434

For cloud-hosted servers, also check the provider’s security group or network firewall settings in their control panel.

Fix 3: Reverse Proxy Timeout Configuration

When running Ollama behind Nginx or another reverse proxy, the proxy may time out before Ollama finishes loading a model or generating a response:

# Nginx configuration for Ollama
server {
    listen 443 ssl;
    server_name ollama.yourdomain.com;

    location / {
        proxy_pass http://127.0.0.1:11434;
        proxy_set_header Host $host;
        proxy_read_timeout 300s;
        proxy_send_timeout 300s;
        proxy_connect_timeout 10s;

        # Required for streaming responses
        proxy_buffering off;
        chunked_transfer_encoding on;
    }
}

The proxy_read_timeout must be long enough for model loading (which can take 30+ seconds for large models) and for complete response generation.

Fix 4: Docker Networking Issues

If Ollama runs in a container, the networking context changes entirely:

# Wrong: trying to reach host's localhost from another container
curl http://localhost:11434  # Fails inside a container

# Correct: use the container name or Docker network
docker network create ai-net
docker run -d --name ollama --network ai-net --gpus all ollama/ollama
docker run --network ai-net my-app
# Inside my-app, connect to http://ollama:11434

# Or use host networking
docker run -d --network host --gpus all ollama/ollama

Our Docker GPU guide covers container networking in detail.

Set Up a Health Check

Once the API is responding, add monitoring to catch future outages early:

# Simple health check script
#!/bin/bash
RESPONSE=$(curl -s -o /dev/null -w "%{http_code}" http://localhost:11434/api/tags)
if [ "$RESPONSE" != "200" ]; then
    echo "Ollama API down, restarting..."
    sudo systemctl restart ollama
fi

For production Ollama hosting, run this via cron every minute. Consider vLLM for deployments needing built-in health endpoints and load balancing. The vLLM production setup guide covers robust API serving, and the CUDA installation guide ensures your GPU stack is properly configured. Browse the tutorials for more LLM hosting configurations and the infrastructure section for server hardening tips.

Managed GPU Servers for Ollama

GigaGPU dedicated servers with pre-configured networking and NVIDIA drivers — get your Ollama API running in minutes.

Browse GPU Servers

Need a Dedicated GPU Server?

Deploy from RTX 3050 to RTX 5090. Full root access, NVMe storage, 1Gbps — UK datacenter.

Browse GPU Servers

admin

We benchmark, deploy, and optimise GPU infrastructure for AI workloads. All data in our guides comes from real-world testing on our UK-based dedicated GPU servers.

Ready to deploy your AI workload?

Dedicated GPU servers from our UK datacenter. NVMe storage, 1Gbps networking, full root access.

Browse GPU Servers Contact Sales

Have a question? Need help?