Home / Blog / Tutorials / Ollama GPU Not Detected: Fix Guide

Tutorials

Ollama GPU Not Detected: Fix Guide

Fix Ollama running on CPU instead of GPU. Covers NVIDIA driver detection, CUDA library issues, Docker configuration, and environment variable settings for GPU acceleration.

Tutorials April 16, 2026 3 min read admin

Symptom: Ollama Is Running on CPU

You installed Ollama on your GPU server, pulled a model, and ran it. But generation is painfully slow, and when you check resource usage, the GPU sits idle at 0% utilisation while the CPU is maxed out. Running ollama ps shows no GPU assignment, or the logs contain:

level=WARN msg="no NVIDIA GPU detected"

msg="inference compute" id=0 library=cpu

Ollama should automatically detect NVIDIA GPUs and use them. When it falls back to CPU, the cause is always one of a small set of configuration issues between Ollama, the NVIDIA driver, and the CUDA libraries.

Diagnostic Steps

# 1. Confirm the GPU is visible to the OS
nvidia-smi

# 2. Check Ollama's GPU detection
ollama serve &  # Start in foreground to see logs
# Look for "NVIDIA GPU detected" or "no GPU" messages

# 3. Check which compute library Ollama selected
curl http://localhost:11434/api/ps

If nvidia-smi fails, fix the driver first using our CUDA installation guide.

Fix 1: Install Missing CUDA Libraries

Ollama bundles its own CUDA runtime, but it still needs the NVIDIA driver and some shared libraries:

# Ensure the NVIDIA driver is installed
sudo apt install nvidia-driver-550

# Ollama needs these libraries accessible
ls /usr/lib/x86_64-linux-gnu/libnvidia-ml.so*
ls /usr/lib/x86_64-linux-gnu/libcuda.so*

If these libraries are missing, the driver installation is incomplete. Reinstall the driver and reboot.

Fix 2: Docker GPU Passthrough for Ollama

If running Ollama in Docker, the container must have GPU access:

# Wrong: no GPU access
docker run -d ollama/ollama

# Correct: with GPU passthrough
docker run -d --gpus all -p 11434:11434 ollama/ollama

The NVIDIA Container Toolkit must be installed on the host. Our Docker GPU guide covers the full setup.

Fix 3: Environment Variable Override

Force Ollama to use specific GPUs:

# Use GPU 0 only
CUDA_VISIBLE_DEVICES=0 ollama serve

# Use GPUs 0 and 1
CUDA_VISIBLE_DEVICES=0,1 ollama serve

If CUDA_VISIBLE_DEVICES is set to an empty string or an invalid index elsewhere in your environment, Ollama will see no GPUs.

Fix 4: Update Ollama

Older Ollama versions had limited GPU support. Update to the latest release:

curl -fsSL https://ollama.com/install.sh | sh

Or download the latest binary directly. Each Ollama release improves GPU detection and adds support for newer driver versions.

Verification

# Pull a small model and test
ollama pull llama3.2:1b
ollama run llama3.2:1b "Hello, are you using my GPU?"

# While it runs, check GPU utilisation
nvidia-smi

You should see Ollama’s process in the nvidia-smi output, consuming VRAM and GPU compute. If GPU utilisation climbs during generation, the fix worked.

Multi-GPU Configuration

On multi-GPU servers, Ollama can split large models across GPUs automatically. Ensure all GPUs are visible:

CUDA_VISIBLE_DEVICES=0,1,2,3 ollama serve

For dedicated Ollama hosting, configure it as a systemd service with the correct GPU environment. Use GPU monitoring to verify all assigned GPUs are active during inference. For alternative LLM serving options, consider vLLM for higher-throughput production workloads. Check the tutorials section for related setup guides.

GPU Servers for Ollama

GigaGPU dedicated servers with NVIDIA GPUs and pre-installed drivers — Ollama detects your GPU automatically.

Browse GPU Servers

Need a Dedicated GPU Server?

Deploy from RTX 3050 to RTX 5090. Full root access, NVMe storage, 1Gbps — UK datacenter.

Browse GPU Servers

Tutorials

admin

We benchmark, deploy, and optimise GPU infrastructure for AI workloads. All data in our guides comes from real-world testing on our UK-based dedicated GPU servers.

Ready to deploy your AI workload?

Dedicated GPU servers from our UK datacenter. NVMe storage, 1Gbps networking, full root access.

Browse GPU Servers Contact Sales

Ollama GPU Not Detected: Fix Guide

Symptom: Ollama Is Running on CPU

Diagnostic Steps

Fix 1: Install Missing CUDA Libraries

Fix 2: Docker GPU Passthrough for Ollama

Fix 3: Environment Variable Override

Fix 4: Update Ollama

Verification

Multi-GPU Configuration

GPU Servers for Ollama

Need a Dedicated GPU Server?

admin

GPU Hosting

Blog Categories

AI Model Hosting

Benchmarks & Tools

Deploy a GPU Server

Ready to deploy your AI workload?

Have a question? Need help?

Ollama GPU Not Detected: Fix Guide

Symptom: Ollama Is Running on CPU

Diagnostic Steps

Fix 1: Install Missing CUDA Libraries

Fix 2: Docker GPU Passthrough for Ollama

Fix 3: Environment Variable Override

Fix 4: Update Ollama

Verification

Multi-GPU Configuration

GPU Servers for Ollama

Need a Dedicated GPU Server?

admin

Related Articles

Whisper Slow on GPU: Speed Optimization

AI Workflow: Celery + Redis + GPU

Migrate from Google Vertex to Dedicated GPU: Conversational AI Guide

LoRA Fine-Tuning Mistral 7B on a Dedicated GPU

GPU Hosting

Blog Categories

AI Model Hosting

Benchmarks & Tools

Deploy a GPU Server

Ready to deploy your AI workload?

Have a question? Need help? Contact us

Have a question? Need help?