RTX 3050 - Order Now
Home / Blog / AI Hosting & Infrastructure / Version Pinning Strategy for AI Deployments: What to Pin, How Tight
AI Hosting & Infrastructure

Version Pinning Strategy for AI Deployments: What to Pin, How Tight

AI stacks have many moving versions — driver, CUDA, vLLM, model commit. Pinning the wrong layer too tight breaks security; too loose breaks reproducibility.

One unattended-upgrades incident at 3 AM is enough to motivate version pinning. The question is what to pin.

TL;DR

Pin tight: NVIDIA driver, CUDA, vLLM, model commit SHA. Pin loose: OS minor versions, language libraries. Update on a maintenance window with eval harness validation.

Versioning layers

  1. OS: Ubuntu 22.04 LTS — pin to LTS, allow security updates
  2. NVIDIA driver: pinned to exact version (e.g., 555.42)
  3. CUDA toolkit: pinned to exact version (e.g., 12.4)
  4. cuDNN / NCCL: pinned
  5. Python: pinned to minor (e.g., 3.10.x)
  6. vLLM: pinned to exact (0.6.3)
  7. Model: pinned to commit SHA, never tag
  8. LiteLLM, TEI, Qdrant: pinned to exact

Pinning strategy

Use apt-mark hold for system packages. Use requirements.txt with exact versions for Python. Pin model with explicit revision: --revision sha256....

Verdict

Version pinning is boring infrastructure that pays back the first time something breaks. Always pin the GPU stack tight.

Bottom line

Pin everything in the GPU stack. Update on maintenance windows. See driver setup.

Need a Dedicated GPU Server?

Deploy from RTX 3050 to RTX 5090. Full root access, NVMe storage, 1Gbps — UK datacenter.

Browse GPU Servers

gigagpu

We benchmark, deploy, and optimise GPU infrastructure for AI workloads. All data in our guides comes from real-world testing on our UK-based dedicated GPU servers.

Ready to deploy your AI workload?

Dedicated GPU servers from our UK datacenter. NVMe storage, 1Gbps networking, full root access.

Browse GPU Servers Contact Sales

Have a question? Need help?