How to Set Up TensorFlow on a Dedicated GPU Server GIGAGPU

TensorFlow remains one of the most widely used deep learning frameworks, powering everything from image classification to speech recognition. Setting it up correctly on a dedicated GPU server ensures you get full CUDA acceleration from day one. This tutorial covers both pip and Docker installation methods, TensorFlow GPU verification, multi-GPU training configuration, and deploying models with TensorFlow Serving on Ubuntu 22.04 and 24.04.

Table of Contents

Prerequisites and CUDA Compatibility
Install TensorFlow with pip
Install TensorFlow with Docker
Verify GPU Acceleration
Multi-GPU Training Configuration
TensorFlow Serving for Production
Performance Tuning

Prerequisites and CUDA Compatibility

TensorFlow requires specific CUDA and cuDNN versions. Using the wrong combination is the most common source of installation failures. As of TensorFlow 2.16+, the required stack is CUDA 12.x and cuDNN 9.x.

Verify your CUDA installation:

# Check NVIDIA driver
nvidia-smi

# Check CUDA version
nvcc --version

# Check cuDNN
dpkg -l | grep cudnn

If CUDA is not installed or you need a different version, follow our CUDA installation guide. For GPU hardware selection, the best GPU for inference guide covers cards that work well with TensorFlow.

Set up a Python virtual environment:

# Install Python and venv
sudo apt update
sudo apt install -y python3 python3-pip python3-venv

# Create and activate virtual environment
python3 -m venv /opt/tensorflow/venv
source /opt/tensorflow/venv/bin/activate
pip install --upgrade pip

Install TensorFlow with pip

The pip installation is the most common approach. TensorFlow 2.16+ includes GPU support by default — there is no separate tensorflow-gpu package.

# Install TensorFlow (includes GPU support automatically)
pip install tensorflow[and-cuda]

# Or install a specific version
pip install tensorflow[and-cuda]==2.17.0

# Verify the installation
python3 -c "
import tensorflow as tf
print(f'TensorFlow version: {tf.__version__}')
print(f'GPU available: {tf.config.list_physical_devices(\"GPU\")}')
print(f'Built with CUDA: {tf.test.is_built_with_cuda()}')
"

If the GPU is not detected, check that your CUDA paths are set:

# Ensure CUDA is in the library path
export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH
export CUDA_HOME=/usr/local/cuda

# Check TensorFlow can find CUDA libraries
python3 -c "
import tensorflow as tf
tf.debugging.set_log_device_placement(True)
a = tf.constant([[1.0, 2.0], [3.0, 4.0]])
b = tf.constant([[5.0, 6.0], [7.0, 8.0]])
c = tf.matmul(a, b)
print(c)
"

Install TensorFlow with Docker

Docker provides the cleanest installation path with guaranteed CUDA compatibility. Ensure you have Docker and the NVIDIA Container Toolkit installed (see our Docker for AI workloads guide).

# Pull the TensorFlow GPU image
docker pull tensorflow/tensorflow:latest-gpu

# Run an interactive TensorFlow container
docker run --gpus all -it --rm \
    -v $(pwd):/workspace \
    -w /workspace \
    tensorflow/tensorflow:latest-gpu bash

# Run a quick GPU test inside the container
docker run --gpus all --rm tensorflow/tensorflow:latest-gpu \
    python3 -c "
import tensorflow as tf
print('GPUs:', tf.config.list_physical_devices('GPU'))
print('Num GPUs:', len(tf.config.list_physical_devices('GPU')))
"

For persistent development, use a docker-compose setup. You can also compare TensorFlow Serving performance against vLLM for transformer-based models by running both in parallel containers:

# docker-compose.yml
services:
  tensorflow:
    image: tensorflow/tensorflow:latest-gpu
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: all
              capabilities: [gpu]
    volumes:
      - ./data:/data
      - ./models:/models
      - ./scripts:/scripts
    ports:
      - "8888:8888"  # Jupyter
      - "6006:6006"  # TensorBoard
    command: jupyter notebook --ip=0.0.0.0 --allow-root --no-browser

docker compose up -d
# Access Jupyter at http://your-server:8888

Verify GPU Acceleration

Run a comprehensive GPU verification that tests actual computation speed. Verifying GPU acceleration is essential before deploying any workload, whether it is image generation with Stable Diffusion or speech model hosting:

python3 << 'EOF'
import tensorflow as tf
import time

print("=" * 50)
print("TensorFlow GPU Verification")
print("=" * 50)

# List all devices
print("\nPhysical devices:")
for device in tf.config.list_physical_devices():
    print(f"  {device}")

gpus = tf.config.list_physical_devices('GPU')
print(f"\nGPU count: {len(gpus)}")

if not gpus:
    print("ERROR: No GPUs detected!")
    exit(1)

for gpu in gpus:
    details = tf.config.experimental.get_device_details(gpu)
    print(f"  {gpu.name}: {details}")

# Benchmark: Matrix multiplication
print("\nBenchmark: Matrix multiplication (4096x4096)")

# CPU benchmark
with tf.device('/CPU:0'):
    a_cpu = tf.random.normal([4096, 4096])
    b_cpu = tf.random.normal([4096, 4096])
    # Warmup
    tf.matmul(a_cpu, b_cpu)

    start = time.perf_counter()
    for _ in range(10):
        tf.matmul(a_cpu, b_cpu)
    cpu_time = time.perf_counter() - start
    print(f"  CPU: {cpu_time:.3f}s (10 iterations)")

# GPU benchmark
with tf.device('/GPU:0'):
    a_gpu = tf.random.normal([4096, 4096])
    b_gpu = tf.random.normal([4096, 4096])
    # Warmup
    tf.matmul(a_gpu, b_gpu)

    start = time.perf_counter()
    for _ in range(10):
        c = tf.matmul(a_gpu, b_gpu)
        c.numpy()  # Force synchronization
    gpu_time = time.perf_counter() - start
    print(f"  GPU: {gpu_time:.3f}s (10 iterations)")
    print(f"  Speedup: {cpu_time/gpu_time:.1f}x")

print("\nGPU verification complete!")
EOF

Multi-GPU Training Configuration

TensorFlow's MirroredStrategy distributes training across all available GPUs on a multi-GPU server:

python3 << 'EOF'
import tensorflow as tf

# Enable memory growth to avoid OOM on multi-GPU setups
gpus = tf.config.list_physical_devices('GPU')
for gpu in gpus:
    tf.config.experimental.set_memory_growth(gpu, True)

# Create a mirrored strategy
strategy = tf.distribute.MirroredStrategy()
print(f"Number of devices: {strategy.num_replicas_in_sync}")

# Build a model inside the strategy scope
with strategy.scope():
    model = tf.keras.Sequential([
        tf.keras.layers.Dense(512, activation='relu', input_shape=(784,)),
        tf.keras.layers.Dropout(0.2),
        tf.keras.layers.Dense(256, activation='relu'),
        tf.keras.layers.Dropout(0.2),
        tf.keras.layers.Dense(10, activation='softmax')
    ])
    model.compile(
        optimizer='adam',
        loss='sparse_categorical_crossentropy',
        metrics=['accuracy']
    )

# Load data
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()
x_train = x_train.reshape(-1, 784).astype('float32') / 255.0
x_test = x_test.reshape(-1, 784).astype('float32') / 255.0

# Train with automatic multi-GPU distribution
# Batch size scales with number of GPUs
batch_size = 256 * strategy.num_replicas_in_sync
model.fit(x_train, y_train, epochs=5, batch_size=batch_size, validation_split=0.1)
model.evaluate(x_test, y_test)
EOF

For multi-GPU inference workloads, see the multi-GPU server setup guide.

TensorFlow Serving for Production

Deploy trained models with TensorFlow Serving for production inference:

# Save a model in SavedModel format
python3 -c "
import tensorflow as tf
model = tf.keras.applications.ResNet50(weights='imagenet')
model.save('/opt/models/resnet50/1')
print('Model saved to /opt/models/resnet50/1')
"

# Run TensorFlow Serving with Docker
docker run -d --gpus all \
    --name tf-serving \
    -p 8501:8501 \
    -p 8500:8500 \
    -v /opt/models:/models \
    -e MODEL_NAME=resnet50 \
    tensorflow/serving:latest-gpu

# Test the REST API
curl -s http://localhost:8501/v1/models/resnet50

# Test with a prediction request
python3 -c "
import requests
import numpy as np

data = np.random.rand(1, 224, 224, 3).tolist()
resp = requests.post(
    'http://localhost:8501/v1/models/resnet50:predict',
    json={'instances': data}
)
print(f'Status: {resp.status_code}')
print(f'Predictions shape: {len(resp.json()[\"predictions\"][0])}')
"

Create a systemd service for non-Docker deployments:

# Install TensorFlow Serving natively
echo "deb [arch=amd64] http://storage.googleapis.com/tensorflow-serving-apt stable tensorflow-model-server tensorflow-model-server-universal" | \
    sudo tee /etc/apt/sources.list.d/tensorflow-serving.list
curl https://storage.googleapis.com/tensorflow-serving-apt/tensorflow-serving.release.pub.gpg | sudo apt-key add -
sudo apt update && sudo apt install -y tensorflow-model-server

# Create a systemd service
sudo tee /etc/systemd/system/tf-serving.service > /dev/null << 'EOF'
[Unit]
Description=TensorFlow Serving
After=network.target

[Service]
Type=simple
ExecStart=/usr/bin/tensorflow_model_server \
    --port=8500 \
    --rest_api_port=8501 \
    --model_name=resnet50 \
    --model_base_path=/opt/models/resnet50 \
    --per_process_gpu_memory_fraction=0.5
Restart=always

[Install]
WantedBy=multi-user.target
EOF

sudo systemctl daemon-reload
sudo systemctl enable --now tf-serving

Performance Tuning

Optimise TensorFlow performance on your GPU server. Proper tuning is especially important for vision model workloads where batch sizes directly impact throughput. For cost comparisons against managed services, use the cost per million tokens calculator:

# Enable mixed precision training (FP16 on supported GPUs)
python3 -c "
from tensorflow.keras import mixed_precision
mixed_precision.set_global_policy('mixed_float16')
print(f'Compute dtype: {mixed_precision.global_policy().compute_dtype}')
print(f'Variable dtype: {mixed_precision.global_policy().variable_dtype}')
"

# Set environment variables for optimal performance
export TF_GPU_THREAD_MODE=gpu_private
export TF_GPU_THREAD_COUNT=2
export TF_ENABLE_ONEDNN_OPTS=1

# Monitor GPU utilisation during training
watch -n 1 nvidia-smi

# Enable XLA (Accelerated Linear Algebra) compilation
import tensorflow as tf

@tf.function(jit_compile=True)
def train_step(model, x, y):
    with tf.GradientTape() as tape:
        predictions = model(x, training=True)
        loss = tf.keras.losses.sparse_categorical_crossentropy(y, predictions)
    gradients = tape.gradient(loss, model.trainable_variables)
    return loss, gradients

Place an Nginx reverse proxy in front of TensorFlow Serving for TLS and load balancing. For API security, follow the secure AI inference API guide. Benchmark your TensorFlow performance using our GPU benchmarking guide. If you are also working with PyTorch, see the PyTorch GPU installation guide. Find more setup guides in the tutorials section.

GPU Servers Optimised for TensorFlow

Run TensorFlow training and serving on dedicated NVIDIA GPUs with pre-installed CUDA, cuDNN, and full root access. GigaGPU servers ship ready for deep learning.

Browse GPU Servers

How to Set Up TensorFlow on a Dedicated GPU Server

Prerequisites and CUDA Compatibility

Install TensorFlow with pip

Install TensorFlow with Docker

Verify GPU Acceleration

Multi-GPU Training Configuration

TensorFlow Serving for Production

Performance Tuning

GPU Servers Optimised for TensorFlow

Need a Dedicated GPU Server?

gigagpu

GPU Hosting

Blog Categories

AI Model Hosting

Benchmarks & Tools

Deploy a GPU Server

Ready to deploy your AI workload?

Have a question? Need help?

How to Set Up TensorFlow on a Dedicated GPU Server

Prerequisites and CUDA Compatibility

Install TensorFlow with pip

Install TensorFlow with Docker

Verify GPU Acceleration

Multi-GPU Training Configuration

TensorFlow Serving for Production

Performance Tuning

GPU Servers Optimised for TensorFlow

Need a Dedicated GPU Server?

gigagpu

Related Articles

ControlNet Union Self-Hosted

Hybrid Search – BM25 Plus Embeddings on a GPU Server

Model Warm-up and Cold Start Patterns

vLLM Setup on the RTX 5060 Ti 16 GB: The Optimal Config

GPU Hosting

Blog Categories

AI Model Hosting

Benchmarks & Tools

Deploy a GPU Server

Ready to deploy your AI workload?

Have a question? Need help? Contact us

Have a question? Need help?