The Exact Symptom
You open a Python shell on your GPU server, type the standard check, and get:
>>> import torch
>>> torch.cuda.is_available()
False
This single boolean is the gatekeeper for all GPU-accelerated computation in PyTorch. When it returns False, every .cuda() call will fail, every model will train on CPU, and your expensive GPU hardware sits idle. The good news: the causes are well-documented and each has a concrete resolution.
Systematic Diagnosis in 60 Seconds
Run this diagnostic block to pinpoint exactly where the chain breaks:
import torch
import subprocess
import sys
# Check 1: PyTorch build type
cuda_version = torch.version.cuda
print(f"PyTorch version: {torch.__version__}")
print(f"PyTorch CUDA build: {cuda_version}")
if cuda_version is None:
print("PROBLEM: CPU-only PyTorch installed!")
# Check 2: Driver accessibility
try:
result = subprocess.run(['nvidia-smi'], capture_output=True, text=True)
if result.returncode == 0:
print("Driver: OK (nvidia-smi works)")
else:
print("PROBLEM: nvidia-smi failed")
except FileNotFoundError:
print("PROBLEM: nvidia-smi not found on PATH")
This tells you immediately whether the fault lies with PyTorch’s build, the NVIDIA driver, or something else. The most frequent result on PyTorch hosting servers is the first check revealing a CPU-only build.
Fix: Wrong PyTorch Build Installed
If torch.version.cuda returns None, PyTorch was installed without CUDA support. The fix:
# Remove the CPU-only version
pip uninstall torch torchvision torchaudio -y
# Install with CUDA 12.4 support (adjust version as needed)
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu124
Restart your Python process entirely after reinstalling. Importing torch in the same session after a reinstall can produce stale references.
Fix: NVIDIA Driver Not Loaded
If nvidia-smi fails with “NVIDIA-SMI has failed because it couldn’t communicate with the NVIDIA driver”, the kernel module is not loaded:
# Check if the module exists
lsmod | grep nvidia
# If nothing appears, load it manually
sudo modprobe nvidia
# If modprobe fails, reinstall the driver
sudo apt install --reinstall nvidia-driver-550
sudo reboot
After reboot, nvidia-smi should display your GPU. Our CUDA installation guide covers the full driver setup sequence if you need to start from scratch.
Fix: CUDA Version Exceeds Driver Capability
PyTorch compiled with CUDA 12.4 will not work if your driver only supports up to CUDA 12.1. The CUDA version reported by nvidia-smi is the ceiling — not the floor. Two options:
- Upgrade the driver to one that supports CUDA 12.4 or higher.
- Downgrade PyTorch to a build matching your driver’s CUDA ceiling:
pip install torch --index-url https://download.pytorch.org/whl/cu121
For workloads that mix frameworks — running vLLM alongside custom training scripts — Docker containers let each workload use its own CUDA version. See our Docker GPU guide.
Fix: Container Missing GPU Access
Inside Docker or Kubernetes, GPUs are not visible by default. The NVIDIA Container Toolkit must be installed on the host, and containers must be launched with explicit GPU access:
# Test GPU visibility in a container
docker run --gpus all nvidia/cuda:12.4.0-base-ubuntu22.04 nvidia-smi
If this command shows your GPU, the container toolkit works. If not, install it following the Docker GPU passthrough instructions.
Permanent Verification Script
After applying your fix, run a compute-level test — not just the boolean check:
import torch
# Boolean check
assert torch.cuda.is_available(), "Still False after fix!"
# Actual GPU computation
x = torch.randn(1000, 1000, device='cuda')
y = torch.mm(x, x)
print(f"Computed on: {y.device}")
print(f"GPU: {torch.cuda.get_device_name(0)}")
print(f"VRAM available: {torch.cuda.get_device_properties(0).total_mem / 1e9:.1f} GB")
This confirms not just detection but actual computation on the GPU. For production deployments on your dedicated GPU server, add this as a health check that runs at boot. Our monitoring guide shows how to set up continuous GPU health checks with alerting.
Guaranteed GPU Detection
GigaGPU dedicated servers are configured with tested NVIDIA driver and CUDA stacks. torch.cuda.is_available() returns True out of the box.
Browse GPU Servers