The Problem: nvidia-smi Sees Nothing
You run nvidia-smi on your GPU server and get one of these frustrating outputs:
No devices were found
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver.
Make sure that the latest NVIDIA driver is properly installed.
This means the NVIDIA kernel driver is either not installed, not loaded, or cannot bind to your GPU. Without nvidia-smi working, nothing else does — no PyTorch, no TensorFlow, no vLLM, no inference of any kind.
Systematic Diagnosis
Work through these checks in order. Stop at the first failure — that is your root cause.
Check 1: Does the hardware exist?
lspci | grep -i nvidia
If this returns nothing, the system does not see an NVIDIA GPU on the PCIe bus. This could mean the GPU is not physically seated, the server needs a BIOS update, or the card is faulty. On a dedicated GPU server, contact your provider.
Check 2: Is the driver package installed?
dpkg -l | grep nvidia-driver
# or
rpm -qa | grep nvidia-driver
If no package appears, install the driver:
sudo apt update && sudo apt install nvidia-driver-550
Check 3: Is the kernel module loaded?
lsmod | grep nvidia
You should see nvidia, nvidia_modeset, nvidia_drm, and nvidia_uvm. If none appear:
sudo modprobe nvidia
If modprobe fails, the module was not compiled for your current kernel. Install matching headers and reinstall the driver.
Check 4: Is Secure Boot blocking the module?
mokutil --sb-state
If Secure Boot is enabled, unsigned kernel modules cannot load. Either disable Secure Boot in the BIOS or sign the NVIDIA module with a Machine Owner Key (MOK). On cloud GPU servers, Secure Boot is typically disabled by default.
Clean Driver Reinstallation
When diagnosis points to a broken driver, the most reliable path is a clean reinstall:
# Remove all existing NVIDIA packages
sudo apt purge 'nvidia-*' -y
sudo apt autoremove -y
# Reboot to ensure all modules are unloaded
sudo reboot
# After reboot, install fresh
sudo apt update
sudo apt install nvidia-driver-550
sudo reboot
After the second reboot, nvidia-smi should display your GPU. Our CUDA installation guide covers the full driver installation process including CUDA toolkit setup.
When a Kernel Update Breaks the Driver
This is the most common reason for nvidia-smi to suddenly stop working on a previously functioning server. Ubuntu’s automatic updates can install a new kernel whose headers do not match the compiled NVIDIA module.
# Check if the running kernel matches the installed headers
uname -r
apt list --installed 2>/dev/null | grep linux-headers
If they do not match:
sudo apt install linux-headers-$(uname -r)
sudo apt install --reinstall nvidia-driver-550
sudo reboot
To prevent this in future, pin the kernel or hold driver packages:
sudo apt-mark hold nvidia-driver-550 linux-image-$(uname -r)
Verification After the Fix
# nvidia-smi should now show your GPU
nvidia-smi
# Verify CUDA works end-to-end
python3 -c "
import torch
print(f'GPU detected: {torch.cuda.is_available()}')
print(f'Device: {torch.cuda.get_device_name(0)}')
"
Preventing Future Detection Failures
- Pin your NVIDIA driver version with
apt-mark holdto prevent automatic updates from breaking it. - Set up GPU monitoring that alerts when nvidia-smi fails — catching the issue before it impacts production.
- Use Docker containers for inference workloads so that driver updates do not affect running services.
- Schedule driver updates during planned maintenance windows, not through unattended-upgrades.
- For multi-GPU setups running PyTorch or TensorFlow, test nvidia-smi before and after any system update.
GPU Servers That Just Work
GigaGPU pre-configures NVIDIA drivers on every dedicated server. nvidia-smi shows your GPU from the first login.
Browse GPU Servers