Symptom: Ollama Is Running on CPU
You installed Ollama on your GPU server, pulled a model, and ran it. But generation is painfully slow, and when you check resource usage, the GPU sits idle at 0% utilisation while the CPU is maxed out. Running ollama ps shows no GPU assignment, or the logs contain:
level=WARN msg="no NVIDIA GPU detected"
msg="inference compute" id=0 library=cpu
Ollama should automatically detect NVIDIA GPUs and use them. When it falls back to CPU, the cause is always one of a small set of configuration issues between Ollama, the NVIDIA driver, and the CUDA libraries.
Diagnostic Steps
# 1. Confirm the GPU is visible to the OS
nvidia-smi
# 2. Check Ollama's GPU detection
ollama serve & # Start in foreground to see logs
# Look for "NVIDIA GPU detected" or "no GPU" messages
# 3. Check which compute library Ollama selected
curl http://localhost:11434/api/ps
If nvidia-smi fails, fix the driver first using our CUDA installation guide.
Fix 1: Install Missing CUDA Libraries
Ollama bundles its own CUDA runtime, but it still needs the NVIDIA driver and some shared libraries:
# Ensure the NVIDIA driver is installed
sudo apt install nvidia-driver-550
# Ollama needs these libraries accessible
ls /usr/lib/x86_64-linux-gnu/libnvidia-ml.so*
ls /usr/lib/x86_64-linux-gnu/libcuda.so*
If these libraries are missing, the driver installation is incomplete. Reinstall the driver and reboot.
Fix 2: Docker GPU Passthrough for Ollama
If running Ollama in Docker, the container must have GPU access:
# Wrong: no GPU access
docker run -d ollama/ollama
# Correct: with GPU passthrough
docker run -d --gpus all -p 11434:11434 ollama/ollama
The NVIDIA Container Toolkit must be installed on the host. Our Docker GPU guide covers the full setup.
Fix 3: Environment Variable Override
Force Ollama to use specific GPUs:
# Use GPU 0 only
CUDA_VISIBLE_DEVICES=0 ollama serve
# Use GPUs 0 and 1
CUDA_VISIBLE_DEVICES=0,1 ollama serve
If CUDA_VISIBLE_DEVICES is set to an empty string or an invalid index elsewhere in your environment, Ollama will see no GPUs.
Fix 4: Update Ollama
Older Ollama versions had limited GPU support. Update to the latest release:
curl -fsSL https://ollama.com/install.sh | sh
Or download the latest binary directly. Each Ollama release improves GPU detection and adds support for newer driver versions.
Verification
# Pull a small model and test
ollama pull llama3.2:1b
ollama run llama3.2:1b "Hello, are you using my GPU?"
# While it runs, check GPU utilisation
nvidia-smi
You should see Ollama’s process in the nvidia-smi output, consuming VRAM and GPU compute. If GPU utilisation climbs during generation, the fix worked.
Multi-GPU Configuration
On multi-GPU servers, Ollama can split large models across GPUs automatically. Ensure all GPUs are visible:
CUDA_VISIBLE_DEVICES=0,1,2,3 ollama serve
For dedicated Ollama hosting, configure it as a systemd service with the correct GPU environment. Use GPU monitoring to verify all assigned GPUs are active during inference. For alternative LLM serving options, consider vLLM for higher-throughput production workloads. Check the tutorials section for related setup guides.
GPU Servers for Ollama
GigaGPU dedicated servers with NVIDIA GPUs and pre-installed drivers — Ollama detects your GPU automatically.
Browse GPU Servers