TensorFlow remains one of the most widely used deep learning frameworks, powering everything from image classification to speech recognition. Setting it up correctly on a dedicated GPU server ensures you get full CUDA acceleration from day one. This tutorial covers both pip and Docker installation methods, TensorFlow GPU verification, multi-GPU training configuration, and deploying models with TensorFlow Serving on Ubuntu 22.04 and 24.04.
Prerequisites and CUDA Compatibility
TensorFlow requires specific CUDA and cuDNN versions. Using the wrong combination is the most common source of installation failures. As of TensorFlow 2.16+, the required stack is CUDA 12.x and cuDNN 9.x.
Verify your CUDA installation:
# Check NVIDIA driver
nvidia-smi
# Check CUDA version
nvcc --version
# Check cuDNN
dpkg -l | grep cudnn
If CUDA is not installed or you need a different version, follow our CUDA installation guide. For GPU hardware selection, the best GPU for inference guide covers cards that work well with TensorFlow.
Set up a Python virtual environment:
# Install Python and venv
sudo apt update
sudo apt install -y python3 python3-pip python3-venv
# Create and activate virtual environment
python3 -m venv /opt/tensorflow/venv
source /opt/tensorflow/venv/bin/activate
pip install --upgrade pip
Install TensorFlow with pip
The pip installation is the most common approach. TensorFlow 2.16+ includes GPU support by default — there is no separate tensorflow-gpu package.
# Install TensorFlow (includes GPU support automatically)
pip install tensorflow[and-cuda]
# Or install a specific version
pip install tensorflow[and-cuda]==2.17.0
# Verify the installation
python3 -c "
import tensorflow as tf
print(f'TensorFlow version: {tf.__version__}')
print(f'GPU available: {tf.config.list_physical_devices(\"GPU\")}')
print(f'Built with CUDA: {tf.test.is_built_with_cuda()}')
"
If the GPU is not detected, check that your CUDA paths are set:
# Ensure CUDA is in the library path
export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH
export CUDA_HOME=/usr/local/cuda
# Check TensorFlow can find CUDA libraries
python3 -c "
import tensorflow as tf
tf.debugging.set_log_device_placement(True)
a = tf.constant([[1.0, 2.0], [3.0, 4.0]])
b = tf.constant([[5.0, 6.0], [7.0, 8.0]])
c = tf.matmul(a, b)
print(c)
"
Install TensorFlow with Docker
Docker provides the cleanest installation path with guaranteed CUDA compatibility. Ensure you have Docker and the NVIDIA Container Toolkit installed (see our Docker for AI workloads guide).
# Pull the TensorFlow GPU image
docker pull tensorflow/tensorflow:latest-gpu
# Run an interactive TensorFlow container
docker run --gpus all -it --rm \
-v $(pwd):/workspace \
-w /workspace \
tensorflow/tensorflow:latest-gpu bash
# Run a quick GPU test inside the container
docker run --gpus all --rm tensorflow/tensorflow:latest-gpu \
python3 -c "
import tensorflow as tf
print('GPUs:', tf.config.list_physical_devices('GPU'))
print('Num GPUs:', len(tf.config.list_physical_devices('GPU')))
"
For persistent development, use a docker-compose setup. You can also compare TensorFlow Serving performance against vLLM for transformer-based models by running both in parallel containers:
# docker-compose.yml
services:
tensorflow:
image: tensorflow/tensorflow:latest-gpu
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities: [gpu]
volumes:
- ./data:/data
- ./models:/models
- ./scripts:/scripts
ports:
- "8888:8888" # Jupyter
- "6006:6006" # TensorBoard
command: jupyter notebook --ip=0.0.0.0 --allow-root --no-browser
docker compose up -d
# Access Jupyter at http://your-server:8888
Verify GPU Acceleration
Run a comprehensive GPU verification that tests actual computation speed. Verifying GPU acceleration is essential before deploying any workload, whether it is image generation with Stable Diffusion or speech model hosting:
python3 << 'EOF'
import tensorflow as tf
import time
print("=" * 50)
print("TensorFlow GPU Verification")
print("=" * 50)
# List all devices
print("\nPhysical devices:")
for device in tf.config.list_physical_devices():
print(f" {device}")
gpus = tf.config.list_physical_devices('GPU')
print(f"\nGPU count: {len(gpus)}")
if not gpus:
print("ERROR: No GPUs detected!")
exit(1)
for gpu in gpus:
details = tf.config.experimental.get_device_details(gpu)
print(f" {gpu.name}: {details}")
# Benchmark: Matrix multiplication
print("\nBenchmark: Matrix multiplication (4096x4096)")
# CPU benchmark
with tf.device('/CPU:0'):
a_cpu = tf.random.normal([4096, 4096])
b_cpu = tf.random.normal([4096, 4096])
# Warmup
tf.matmul(a_cpu, b_cpu)
start = time.perf_counter()
for _ in range(10):
tf.matmul(a_cpu, b_cpu)
cpu_time = time.perf_counter() - start
print(f" CPU: {cpu_time:.3f}s (10 iterations)")
# GPU benchmark
with tf.device('/GPU:0'):
a_gpu = tf.random.normal([4096, 4096])
b_gpu = tf.random.normal([4096, 4096])
# Warmup
tf.matmul(a_gpu, b_gpu)
start = time.perf_counter()
for _ in range(10):
c = tf.matmul(a_gpu, b_gpu)
c.numpy() # Force synchronization
gpu_time = time.perf_counter() - start
print(f" GPU: {gpu_time:.3f}s (10 iterations)")
print(f" Speedup: {cpu_time/gpu_time:.1f}x")
print("\nGPU verification complete!")
EOF
Multi-GPU Training Configuration
TensorFlow's MirroredStrategy distributes training across all available GPUs on a multi-GPU server:
python3 << 'EOF'
import tensorflow as tf
# Enable memory growth to avoid OOM on multi-GPU setups
gpus = tf.config.list_physical_devices('GPU')
for gpu in gpus:
tf.config.experimental.set_memory_growth(gpu, True)
# Create a mirrored strategy
strategy = tf.distribute.MirroredStrategy()
print(f"Number of devices: {strategy.num_replicas_in_sync}")
# Build a model inside the strategy scope
with strategy.scope():
model = tf.keras.Sequential([
tf.keras.layers.Dense(512, activation='relu', input_shape=(784,)),
tf.keras.layers.Dropout(0.2),
tf.keras.layers.Dense(256, activation='relu'),
tf.keras.layers.Dropout(0.2),
tf.keras.layers.Dense(10, activation='softmax')
])
model.compile(
optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy']
)
# Load data
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()
x_train = x_train.reshape(-1, 784).astype('float32') / 255.0
x_test = x_test.reshape(-1, 784).astype('float32') / 255.0
# Train with automatic multi-GPU distribution
# Batch size scales with number of GPUs
batch_size = 256 * strategy.num_replicas_in_sync
model.fit(x_train, y_train, epochs=5, batch_size=batch_size, validation_split=0.1)
model.evaluate(x_test, y_test)
EOF
For multi-GPU inference workloads, see the multi-GPU server setup guide.
TensorFlow Serving for Production
Deploy trained models with TensorFlow Serving for production inference:
# Save a model in SavedModel format
python3 -c "
import tensorflow as tf
model = tf.keras.applications.ResNet50(weights='imagenet')
model.save('/opt/models/resnet50/1')
print('Model saved to /opt/models/resnet50/1')
"
# Run TensorFlow Serving with Docker
docker run -d --gpus all \
--name tf-serving \
-p 8501:8501 \
-p 8500:8500 \
-v /opt/models:/models \
-e MODEL_NAME=resnet50 \
tensorflow/serving:latest-gpu
# Test the REST API
curl -s http://localhost:8501/v1/models/resnet50
# Test with a prediction request
python3 -c "
import requests
import numpy as np
data = np.random.rand(1, 224, 224, 3).tolist()
resp = requests.post(
'http://localhost:8501/v1/models/resnet50:predict',
json={'instances': data}
)
print(f'Status: {resp.status_code}')
print(f'Predictions shape: {len(resp.json()[\"predictions\"][0])}')
"
Create a systemd service for non-Docker deployments:
# Install TensorFlow Serving natively
echo "deb [arch=amd64] http://storage.googleapis.com/tensorflow-serving-apt stable tensorflow-model-server tensorflow-model-server-universal" | \
sudo tee /etc/apt/sources.list.d/tensorflow-serving.list
curl https://storage.googleapis.com/tensorflow-serving-apt/tensorflow-serving.release.pub.gpg | sudo apt-key add -
sudo apt update && sudo apt install -y tensorflow-model-server
# Create a systemd service
sudo tee /etc/systemd/system/tf-serving.service > /dev/null << 'EOF'
[Unit]
Description=TensorFlow Serving
After=network.target
[Service]
Type=simple
ExecStart=/usr/bin/tensorflow_model_server \
--port=8500 \
--rest_api_port=8501 \
--model_name=resnet50 \
--model_base_path=/opt/models/resnet50 \
--per_process_gpu_memory_fraction=0.5
Restart=always
[Install]
WantedBy=multi-user.target
EOF
sudo systemctl daemon-reload
sudo systemctl enable --now tf-serving
Performance Tuning
Optimise TensorFlow performance on your GPU server. Proper tuning is especially important for vision model workloads where batch sizes directly impact throughput. For cost comparisons against managed services, use the cost per million tokens calculator:
# Enable mixed precision training (FP16 on supported GPUs)
python3 -c "
from tensorflow.keras import mixed_precision
mixed_precision.set_global_policy('mixed_float16')
print(f'Compute dtype: {mixed_precision.global_policy().compute_dtype}')
print(f'Variable dtype: {mixed_precision.global_policy().variable_dtype}')
"
# Set environment variables for optimal performance
export TF_GPU_THREAD_MODE=gpu_private
export TF_GPU_THREAD_COUNT=2
export TF_ENABLE_ONEDNN_OPTS=1
# Monitor GPU utilisation during training
watch -n 1 nvidia-smi
# Enable XLA (Accelerated Linear Algebra) compilation
import tensorflow as tf
@tf.function(jit_compile=True)
def train_step(model, x, y):
with tf.GradientTape() as tape:
predictions = model(x, training=True)
loss = tf.keras.losses.sparse_categorical_crossentropy(y, predictions)
gradients = tape.gradient(loss, model.trainable_variables)
return loss, gradients
Place an Nginx reverse proxy in front of TensorFlow Serving for TLS and load balancing. For API security, follow the secure AI inference API guide. Benchmark your TensorFlow performance using our GPU benchmarking guide. If you are also working with PyTorch, see the PyTorch GPU installation guide. Find more setup guides in the tutorials section.
GPU Servers Optimised for TensorFlow
Run TensorFlow training and serving on dedicated NVIDIA GPUs with pre-installed CUDA, cuDNN, and full root access. GigaGPU servers ship ready for deep learning.
Browse GPU Servers