RTX 3050 - Order Now
Home / Blog / Tutorials / TensorFlow GPU Not Using CUDA: Fix Guide
Tutorials

TensorFlow GPU Not Using CUDA: Fix Guide

Fix TensorFlow silently falling back to CPU instead of using your NVIDIA GPU. Covers driver compatibility, missing CUDA libraries, environment conflicts, and verification steps.

The Symptom: TensorFlow Ignores Your GPU

You have an NVIDIA GPU on your dedicated server, but TensorFlow refuses to use it:

import tensorflow as tf
print(tf.config.list_physical_devices('GPU'))
# Output: []

Or you see this warning during import:

Could not load dynamic library 'libcudart.so.12'; dlerror: libcudart.so.12:
cannot open shared object file: No such file or directory

TensorFlow is running entirely on CPU, making training and inference orders of magnitude slower than it should be. The GPU hardware is there, nvidia-smi shows it, but TensorFlow cannot connect the pieces.

Root Causes for TensorFlow GPU Blindness

TensorFlow’s GPU support is more fragile than PyTorch’s because it has stricter version requirements:

  • Wrong TensorFlow package. pip install tensorflow installs the GPU-capable version since TF 2.15, but older versions require tensorflow-gpu explicitly.
  • CUDA version mismatch. TensorFlow 2.16+ requires CUDA 12.3. TensorFlow 2.14 and 2.15 need CUDA 12.2. Older versions need CUDA 11.x.
  • Missing cuDNN. TensorFlow loads cuDNN at runtime and fails silently if it is absent.
  • LD_LIBRARY_PATH not set. The CUDA libraries exist but are not on the linker’s search path.

Diagnostic Script

import tensorflow as tf
import sys

print(f"TensorFlow version: {tf.__version__}")
print(f"Built with CUDA: {tf.test.is_built_with_cuda()}")
print(f"GPU devices: {tf.config.list_physical_devices('GPU')}")

# More detailed GPU info
from tensorflow.python.client import device_lib
devices = device_lib.list_local_devices()
for d in devices:
    print(f"  {d.device_type}: {d.name}")

If is_built_with_cuda() returns False, you installed a CPU-only build. If it returns True but no GPU devices appear, the runtime cannot find the CUDA libraries.

Fix 1: Install Required CUDA Libraries

For TensorFlow 2.16+, you need CUDA 12.3 and cuDNN 8.9:

# Install CUDA toolkit
sudo apt install cuda-toolkit-12-3

# Install cuDNN
sudo apt install libcudnn8=8.9.7.*-1+cuda12.2

# Set library path
export LD_LIBRARY_PATH=/usr/local/cuda-12.3/lib64:$LD_LIBRARY_PATH

Add the export to your ~/.bashrc for persistence. Our CUDA installation guide provides the full procedure.

Fix 2: Install the Correct TensorFlow Version

If your CUDA version is fixed and you cannot change it, install the TensorFlow version that matches:

# For CUDA 12.3 (current)
pip install tensorflow>=2.16

# For CUDA 11.8 (legacy)
pip install tensorflow==2.14.0

On an TensorFlow GPU server where you also run PyTorch, use separate virtual environments to avoid CUDA version conflicts between frameworks.

Fix 3: Upgrade NVIDIA Driver

If your driver is too old for the required CUDA version:

# Check current driver
nvidia-smi | grep "Driver Version"

# Update to a driver supporting CUDA 12.3
sudo apt install nvidia-driver-545
sudo reboot

Full Verification

import tensorflow as tf
import time

# Confirm GPU visibility
gpus = tf.config.list_physical_devices('GPU')
assert len(gpus) > 0, "No GPU detected!"
print(f"GPUs found: {[g.name for g in gpus]}")

# Run a computation to confirm GPU is actually used
with tf.device('/GPU:0'):
    a = tf.random.normal([5000, 5000])
    start = time.time()
    b = tf.matmul(a, a)
    elapsed = time.time() - start
    print(f"Matrix multiply on GPU: {elapsed:.4f}s")

The matrix multiply should complete in well under a second on any modern GPU. If it takes multiple seconds, TensorFlow may be silently using CPU despite showing the GPU in the device list — check for memory growth settings or XLA compilation overhead.

TensorFlow GPU Performance Tips

  • Enable memory growth to prevent TensorFlow from allocating all VRAM at startup: tf.config.experimental.set_memory_growth(gpus[0], True)
  • Use mixed precision for faster training: tf.keras.mixed_precision.set_global_policy('mixed_float16')
  • Set TF_GPU_ALLOCATOR=cuda_malloc_async for better memory management on newer GPUs.
  • For production inference on a dedicated GPU server, consider TensorRT integration for maximum throughput.
  • Use Docker with the official TensorFlow GPU images to avoid library version headaches entirely.

Monitor your GPU utilisation with our monitoring guide to confirm TensorFlow is actually saturating the GPU during training.

TensorFlow-Ready GPU Servers

GigaGPU dedicated servers include the CUDA stack that TensorFlow expects. No library hunting, no version juggling.

Browse GPU Servers

Need a Dedicated GPU Server?

Deploy from RTX 3050 to RTX 5090. Full root access, NVMe storage, 1Gbps — UK datacenter.

Browse GPU Servers

admin

We benchmark, deploy, and optimise GPU infrastructure for AI workloads. All data in our guides comes from real-world testing on our UK-based dedicated GPU servers.

Ready to deploy your AI workload?

Dedicated GPU servers from our UK datacenter. NVMe storage, 1Gbps networking, full root access.

Browse GPU Servers Contact Sales

Have a question? Need help?