RTX 3050 - Order Now
Home / Blog / AI Hosting & Infrastructure / Python Environments on GPU Servers
AI Hosting & Infrastructure

Python Environments on GPU Servers

Manage Python environments on GPU servers for AI inference. Covers venv, conda, CUDA-aware environments, dependency isolation, multi-model setups, and avoiding library conflicts on dedicated GPU servers.

One pip Install Broke Every Model on the Server

You upgraded transformers for a new model and suddenly your existing vLLM deployment throws import errors. PyTorch expects one CUDA version, TensorFlow wants another, and the system Python is contaminated with conflicting packages. A dedicated GPU server running multiple AI workloads needs isolated Python environments that prevent one project’s dependencies from destroying another.

Virtual Environments with venv

Python’s built-in venv creates lightweight, isolated environments:

# Create a dedicated environment for each workload
python3 -m venv /opt/envs/vllm-prod
python3 -m venv /opt/envs/training
python3 -m venv /opt/envs/ollama-middleware

# Activate and install for vLLM
source /opt/envs/vllm-prod/bin/activate
pip install --upgrade pip wheel
pip install vllm torch --index-url https://download.pytorch.org/whl/cu121

# Pin exact versions for reproducibility
pip freeze > /opt/envs/vllm-prod/requirements.lock

# Deactivate when done
deactivate

# Use without activating (useful in systemd services)
/opt/envs/vllm-prod/bin/python -c "import vllm; print(vllm.__version__)"
/opt/envs/vllm-prod/bin/pip list

CUDA-Aware Environment Configuration

GPU libraries depend on specific CUDA versions. Mismatches cause silent failures or crashes:

# Check system CUDA version
nvcc --version
nvidia-smi | grep "CUDA Version"

# PyTorch with CUDA 12.1
source /opt/envs/vllm-prod/bin/activate
pip install torch torchvision torchaudio \
    --index-url https://download.pytorch.org/whl/cu121

# Verify GPU access within the environment
python3 -c "
import torch
print(f'PyTorch: {torch.__version__}')
print(f'CUDA available: {torch.cuda.is_available()}')
print(f'CUDA version: {torch.version.cuda}')
print(f'GPU: {torch.cuda.get_device_name(0)}')
"

# Set CUDA paths for environments that need them
export CUDA_HOME=/usr/local/cuda-12.1
export LD_LIBRARY_PATH=$CUDA_HOME/lib64:$LD_LIBRARY_PATH
export PATH=$CUDA_HOME/bin:$PATH

# Persist in environment activation script
cat <<'EOF' >> /opt/envs/vllm-prod/bin/activate
export CUDA_HOME=/usr/local/cuda-12.1
export LD_LIBRARY_PATH=$CUDA_HOME/lib64:$LD_LIBRARY_PATH
EOF

Multi-Model Environment Isolation

Production servers often host models with conflicting requirements:

# Environment matrix for common setups
# /opt/envs/vllm-prod/    — vLLM + PyTorch 2.3 + CUDA 12.1
# /opt/envs/sd-comfyui/   — ComfyUI + PyTorch 2.1 + CUDA 11.8
# /opt/envs/training/     — Training stack + DeepSpeed + CUDA 12.1

# Systemd service using isolated environment
# /etc/systemd/system/vllm-inference.service
[Unit]
Description=vLLM Inference Server
After=network.target

[Service]
Type=simple
User=inference
ExecStart=/opt/envs/vllm-prod/bin/python -m vllm.entrypoints.openai.api_server \
    --model meta-llama/Llama-3-70B-Instruct \
    --port 8000
Environment=CUDA_VISIBLE_DEVICES=0,1
Environment=CUDA_HOME=/usr/local/cuda-12.1
WorkingDirectory=/opt/inference
Restart=always

[Install]
WantedBy=multi-user.target

# Each service uses its own environment — no conflicts
# /etc/systemd/system/comfyui.service
[Unit]
Description=ComfyUI Image Generation

[Service]
Type=simple
User=inference
ExecStart=/opt/envs/sd-comfyui/bin/python main.py --listen 0.0.0.0 --port 8188
Environment=CUDA_VISIBLE_DEVICES=2
WorkingDirectory=/opt/ComfyUI
Restart=always

[Install]
WantedBy=multi-user.target

Dependency Pinning and Reproducibility

# Generate locked requirements from a working environment
source /opt/envs/vllm-prod/bin/activate
pip freeze > /opt/envs/vllm-prod/requirements.lock

# Recreate environment identically on another server
python3 -m venv /opt/envs/vllm-prod
source /opt/envs/vllm-prod/bin/activate
pip install -r /opt/envs/vllm-prod/requirements.lock

# Audit for security vulnerabilities
pip install pip-audit
pip-audit

# Check for outdated packages
pip list --outdated

# Upgrade carefully — test in a clone first
python3 -m venv /opt/envs/vllm-staging
source /opt/envs/vllm-staging/bin/activate
pip install -r /opt/envs/vllm-prod/requirements.lock
pip install --upgrade vllm
# Run tests, then promote to production

Environment Maintenance

# List all environments and their sizes
du -sh /opt/envs/*/

# Remove unused environment
rm -rf /opt/envs/old-experiment/

# Clean pip caches (can grow to several GB)
pip cache purge

# Check which environments are actively used by services
systemctl list-units --type=service | grep -E "vllm|ollama|comfy"
grep -r "ExecStart=/opt/envs" /etc/systemd/system/*.service

# Backup environment specs (not the environment itself)
for env in /opt/envs/*/; do
    name=$(basename "$env")
    "${env}bin/pip" freeze > "/opt/backups/envs/${name}.requirements.txt"
done

Isolated Python environments keep your GPU server stable when running multiple AI workloads. Deploy vLLM with the production guide using dedicated environments. Set up PyTorch correctly with our installation guide. Run Ollama separately. Track environment health with monitoring. Browse infrastructure articles and tutorials.

Multi-Workload GPU Servers

GigaGPU dedicated servers with full root access. Isolate environments, run multiple AI models, and maintain complete control.

Browse GPU Servers

Need a Dedicated GPU Server?

Deploy from RTX 3050 to RTX 5090. Full root access, NVMe storage, 1Gbps — UK datacenter.

Browse GPU Servers

admin

We benchmark, deploy, and optimise GPU infrastructure for AI workloads. All data in our guides comes from real-world testing on our UK-based dedicated GPU servers.

Ready to deploy your AI workload?

Dedicated GPU servers from our UK datacenter. NVMe storage, 1Gbps networking, full root access.

Browse GPU Servers Contact Sales

Have a question? Need help?