One pip Install Broke Every Model on the Server
You upgraded transformers for a new model and suddenly your existing vLLM deployment throws import errors. PyTorch expects one CUDA version, TensorFlow wants another, and the system Python is contaminated with conflicting packages. A dedicated GPU server running multiple AI workloads needs isolated Python environments that prevent one project’s dependencies from destroying another.
Virtual Environments with venv
Python’s built-in venv creates lightweight, isolated environments:
# Create a dedicated environment for each workload
python3 -m venv /opt/envs/vllm-prod
python3 -m venv /opt/envs/training
python3 -m venv /opt/envs/ollama-middleware
# Activate and install for vLLM
source /opt/envs/vllm-prod/bin/activate
pip install --upgrade pip wheel
pip install vllm torch --index-url https://download.pytorch.org/whl/cu121
# Pin exact versions for reproducibility
pip freeze > /opt/envs/vllm-prod/requirements.lock
# Deactivate when done
deactivate
# Use without activating (useful in systemd services)
/opt/envs/vllm-prod/bin/python -c "import vllm; print(vllm.__version__)"
/opt/envs/vllm-prod/bin/pip list
CUDA-Aware Environment Configuration
GPU libraries depend on specific CUDA versions. Mismatches cause silent failures or crashes:
# Check system CUDA version
nvcc --version
nvidia-smi | grep "CUDA Version"
# PyTorch with CUDA 12.1
source /opt/envs/vllm-prod/bin/activate
pip install torch torchvision torchaudio \
--index-url https://download.pytorch.org/whl/cu121
# Verify GPU access within the environment
python3 -c "
import torch
print(f'PyTorch: {torch.__version__}')
print(f'CUDA available: {torch.cuda.is_available()}')
print(f'CUDA version: {torch.version.cuda}')
print(f'GPU: {torch.cuda.get_device_name(0)}')
"
# Set CUDA paths for environments that need them
export CUDA_HOME=/usr/local/cuda-12.1
export LD_LIBRARY_PATH=$CUDA_HOME/lib64:$LD_LIBRARY_PATH
export PATH=$CUDA_HOME/bin:$PATH
# Persist in environment activation script
cat <<'EOF' >> /opt/envs/vllm-prod/bin/activate
export CUDA_HOME=/usr/local/cuda-12.1
export LD_LIBRARY_PATH=$CUDA_HOME/lib64:$LD_LIBRARY_PATH
EOF
Multi-Model Environment Isolation
Production servers often host models with conflicting requirements:
# Environment matrix for common setups
# /opt/envs/vllm-prod/ — vLLM + PyTorch 2.3 + CUDA 12.1
# /opt/envs/sd-comfyui/ — ComfyUI + PyTorch 2.1 + CUDA 11.8
# /opt/envs/training/ — Training stack + DeepSpeed + CUDA 12.1
# Systemd service using isolated environment
# /etc/systemd/system/vllm-inference.service
[Unit]
Description=vLLM Inference Server
After=network.target
[Service]
Type=simple
User=inference
ExecStart=/opt/envs/vllm-prod/bin/python -m vllm.entrypoints.openai.api_server \
--model meta-llama/Llama-3-70B-Instruct \
--port 8000
Environment=CUDA_VISIBLE_DEVICES=0,1
Environment=CUDA_HOME=/usr/local/cuda-12.1
WorkingDirectory=/opt/inference
Restart=always
[Install]
WantedBy=multi-user.target
# Each service uses its own environment — no conflicts
# /etc/systemd/system/comfyui.service
[Unit]
Description=ComfyUI Image Generation
[Service]
Type=simple
User=inference
ExecStart=/opt/envs/sd-comfyui/bin/python main.py --listen 0.0.0.0 --port 8188
Environment=CUDA_VISIBLE_DEVICES=2
WorkingDirectory=/opt/ComfyUI
Restart=always
[Install]
WantedBy=multi-user.target
Dependency Pinning and Reproducibility
# Generate locked requirements from a working environment
source /opt/envs/vllm-prod/bin/activate
pip freeze > /opt/envs/vllm-prod/requirements.lock
# Recreate environment identically on another server
python3 -m venv /opt/envs/vllm-prod
source /opt/envs/vllm-prod/bin/activate
pip install -r /opt/envs/vllm-prod/requirements.lock
# Audit for security vulnerabilities
pip install pip-audit
pip-audit
# Check for outdated packages
pip list --outdated
# Upgrade carefully — test in a clone first
python3 -m venv /opt/envs/vllm-staging
source /opt/envs/vllm-staging/bin/activate
pip install -r /opt/envs/vllm-prod/requirements.lock
pip install --upgrade vllm
# Run tests, then promote to production
Environment Maintenance
# List all environments and their sizes
du -sh /opt/envs/*/
# Remove unused environment
rm -rf /opt/envs/old-experiment/
# Clean pip caches (can grow to several GB)
pip cache purge
# Check which environments are actively used by services
systemctl list-units --type=service | grep -E "vllm|ollama|comfy"
grep -r "ExecStart=/opt/envs" /etc/systemd/system/*.service
# Backup environment specs (not the environment itself)
for env in /opt/envs/*/; do
name=$(basename "$env")
"${env}bin/pip" freeze > "/opt/backups/envs/${name}.requirements.txt"
done
Isolated Python environments keep your GPU server stable when running multiple AI workloads. Deploy vLLM with the production guide using dedicated environments. Set up PyTorch correctly with our installation guide. Run Ollama separately. Track environment health with monitoring. Browse infrastructure articles and tutorials.
Multi-Workload GPU Servers
GigaGPU dedicated servers with full root access. Isolate environments, run multiple AI models, and maintain complete control.
Browse GPU Servers