Table of Contents
One unattended-upgrades incident at 3 AM is enough to motivate version pinning. The question is what to pin.
Pin tight: NVIDIA driver, CUDA, vLLM, model commit SHA. Pin loose: OS minor versions, language libraries. Update on a maintenance window with eval harness validation.
Versioning layers
- OS: Ubuntu 22.04 LTS — pin to LTS, allow security updates
- NVIDIA driver: pinned to exact version (e.g., 555.42)
- CUDA toolkit: pinned to exact version (e.g., 12.4)
- cuDNN / NCCL: pinned
- Python: pinned to minor (e.g., 3.10.x)
- vLLM: pinned to exact (0.6.3)
- Model: pinned to commit SHA, never tag
- LiteLLM, TEI, Qdrant: pinned to exact
Pinning strategy
Use apt-mark hold for system packages. Use requirements.txt with exact versions for Python. Pin model with explicit revision: --revision sha256....
Verdict
Version pinning is boring infrastructure that pays back the first time something breaks. Always pin the GPU stack tight.
Bottom line
Pin everything in the GPU stack. Update on maintenance windows. See driver setup.