Table of Contents
Prerequisites & Server Requirements
Before installing PyTorch, you need a server with a dedicated NVIDIA GPU and full root access. A dedicated GPU server gives you bare-metal performance with no virtualisation overhead, which matters when every percentage of CUDA utilisation counts. If you haven’t chosen a GPU yet, our best GPU for LLM inference guide covers which cards deliver the most performance per pound.
This tutorial assumes a fresh Ubuntu 22.04 LTS installation. Here are the minimum requirements:
| Component | Minimum | Recommended |
|---|---|---|
| GPU | NVIDIA (Turing or newer) | RTX 3090 / RTX 5080 |
| VRAM | 8 GB | 24 GB+ |
| System RAM | 16 GB | 64 GB |
| Storage | 50 GB free | 500 GB NVMe |
| OS | Ubuntu 20.04 | Ubuntu 22.04 LTS |
SSH into your server and confirm you have sudo access before continuing:
ssh user@your-server-ip
sudo whoami
# Should output: root
Install NVIDIA Drivers
The NVIDIA driver is the foundation of the entire GPU software stack. Start by removing any existing driver installations to avoid conflicts:
sudo apt update && sudo apt upgrade -y
sudo apt remove --purge '^nvidia-.*' -y
sudo apt autoremove -y
sudo reboot
After the reboot, add the official NVIDIA driver PPA and install the latest production driver:
sudo add-apt-repository ppa:graphics-drivers/ppa -y
sudo apt update
ubuntu-drivers devices
sudo apt install nvidia-driver-560 -y
sudo reboot
Once the server comes back up, verify the driver is loaded:
nvidia-smi
You should see your GPU model, driver version, and CUDA version listed. If nvidia-smi fails, check that Secure Boot is disabled in your server’s BIOS — this is one of the most common causes of driver load failures on PyTorch GPU servers.
Install the CUDA Toolkit
PyTorch ships with its own CUDA runtime, but having the full CUDA Toolkit installed is necessary for compiling custom CUDA kernels and for libraries like FlashAttention. Install CUDA 12.4 (compatible with PyTorch 2.5+):
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.1-1_all.deb
sudo dpkg -i cuda-keyring_1.1-1_all.deb
sudo apt update
sudo apt install cuda-toolkit-12-4 -y
Add CUDA to your system PATH by appending these lines to your shell profile:
echo 'export PATH=/usr/local/cuda-12.4/bin:$PATH' >> ~/.bashrc
echo 'export LD_LIBRARY_PATH=/usr/local/cuda-12.4/lib64:$LD_LIBRARY_PATH' >> ~/.bashrc
source ~/.bashrc
Confirm the installation:
nvcc --version
# Should show: Cuda compilation tools, release 12.4
Install cuDNN
cuDNN accelerates deep learning primitives — convolutions, attention layers, and normalisations all run faster with it. Install cuDNN 9 from the NVIDIA repository:
sudo apt install cudnn9-cuda-12 -y
Verify the library is installed correctly:
dpkg -l | grep cudnn
# Should list cudnn9-cuda-12 packages
With cuDNN in place, PyTorch will automatically use optimised kernels for operations like torch.nn.Conv2d and scaled dot-product attention. This makes a measurable difference on workloads like LLM inference — check our tokens per second benchmarks for the raw numbers.
Set Up Conda & Create an Environment
Using isolated environments prevents dependency conflicts between projects. Install Miniconda:
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh -b -p $HOME/miniconda3
eval "$($HOME/miniconda3/bin/conda shell.bash hook)"
conda init bash
source ~/.bashrc
Create a dedicated environment for PyTorch:
conda create -n pytorch python=3.11 -y
conda activate pytorch
You now have a clean Python 3.11 environment ready for PyTorch and any additional libraries your project needs.
Install PyTorch with GPU Support
Install PyTorch with CUDA 12.4 support using pip (recommended over conda for the latest builds):
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu124
This downloads the CUDA 12.4-compatible build of PyTorch. The download is roughly 2.5 GB, so local NVMe storage on a dedicated server makes this significantly faster than network-attached cloud storage.
For projects that also use TensorFlow, install it in a separate conda environment to avoid dependency conflicts between the two frameworks.
Verify the Installation
Run the following Python script to confirm PyTorch can see and use your GPU:
python -c "
import torch
print(f'PyTorch version: {torch.__version__}')
print(f'CUDA available: {torch.cuda.is_available()}')
print(f'CUDA version: {torch.version.cuda}')
print(f'cuDNN version: {torch.backends.cudnn.version()}')
print(f'GPU count: {torch.cuda.device_count()}')
print(f'GPU name: {torch.cuda.get_device_name(0)}')
"
Expected output on an RTX 3090 server:
PyTorch version: 2.5.1+cu124
CUDA available: True
CUDA version: 12.4
cuDNN version: 90100
GPU count: 1
GPU name: NVIDIA GeForce RTX 3090
Next, run a quick matrix multiplication benchmark to confirm GPU compute is working under load:
python -c "
import torch, time
size = 8192
a = torch.randn(size, size, device='cuda')
b = torch.randn(size, size, device='cuda')
torch.cuda.synchronize()
start = time.time()
for _ in range(10):
c = torch.matmul(a, b)
torch.cuda.synchronize()
elapsed = time.time() - start
tflops = (2 * size**3 * 10) / elapsed / 1e12
print(f'Matrix multiply: {elapsed:.2f}s, ~{tflops:.1f} TFLOPS')
"
If CUDA is not available, work through this checklist:
- Run
nvidia-smi— if it fails, the driver is not loaded - Check
nvcc --versionmatches the CUDA version in your PyTorch build - Confirm you installed the
cu124wheel, not the CPU-only version - Ensure you are inside the correct conda environment (
conda activate pytorch)
Need a GPU Server for PyTorch?
RTX 3090 and RTX 5080 dedicated servers with full root access, NVMe storage, and 1Gbps networking. Deployed from our UK datacenter.
Browse GPU ServersNext Steps
With PyTorch installed and verified, your server is ready for production workloads. Here are some directions depending on your use case:
- Serve LLMs via API — Set up vLLM or Ollama on top of your PyTorch install for high-throughput inference
- Self-host open source models — Follow our self-host LLM guide for end-to-end deployment
- Compare inference costs — Use our cost per 1M tokens data to estimate your running expenses
- Explore more tutorials — Browse the tutorials category for vLLM setup, model deployment, and optimisation guides
If you run into driver or CUDA issues specific to your GPU model, GigaGPU support can help — all dedicated GPU hosting plans include technical support for software stack setup.