RTX 3050 - Order Now
Home / Blog / Tutorials / RTX 5060 Ti 16GB First Day Checklist
Tutorials

RTX 5060 Ti 16GB First Day Checklist

The commands, config and sanity checks to work through on day one of a new Blackwell 16GB dedicated server - end up production-ready in about an hour.

Day one of a new RTX 5060 Ti 16GB server on our UK dedicated GPU hosting should leave you with a secured, monitored box running its first model. Work through this checklist in order – the whole thing takes roughly an hour including the first benchmark.

Contents

Verify Hardware

nvidia-smi                           # Expect: RTX 5060 Ti 16GB, driver 560+
lspci | grep -i nvidia               # Should list GB206
sudo dmesg | grep -i nvidia          # No errors expected
sudo nvidia-smi -pm 1                # Enable persistence mode

If driver is older than 560, rebuild – see Ubuntu driver install. Persistence mode prevents the driver unloading between jobs, which shaves cold-start time.

Secure the Box

  • Disable password SSH auth, keys only: edit /etc/ssh/sshd_config, set PasswordAuthentication no
  • UFW allow-list: 22 (SSH), 80/443 (public apps only), deny everything else inbound
  • sudo apt update && sudo apt full-upgrade -y && sudo reboot
  • Install fail2ban for SSH brute-force protection
  • Create a non-root user for all AI services; never serve vLLM as root
  • Enable unattended-upgrades for security patches

Install Runtime Stack

LayerInstall
CUDA toolkit 12.6sudo apt install cuda-toolkit-12-6
Docker + NVIDIA Container ToolkitSee Docker CUDA setup
Python 3.12 + uvcurl -LsSf https://astral.sh/uv/install.sh | sh
vLLM venvuv venv ~/.venvs/vllm && uv pip install vllm
Reverse proxyCaddy (simplest TLS) or nginx

Monitoring

Ship three signals to a dashboard from day one: GPU utilisation, VRAM usage, p99 request latency.

  • DCGM Exporter on port 9400 for GPU metrics
  • Node Exporter on 9100 for CPU/disk/network
  • Prometheus scraping both, Grafana for dashboards
  • Alert rules: p99 latency > 2s, GPU temp > 80°C, VRAM > 95%

Performance Tuning

  • sudo nvidia-smi -pm 1 – persistence mode on
  • CPU governor to performance: sudo cpupower frequency-set -g performance
  • Disable transparent huge pages for latency workloads: echo never > /sys/kernel/mm/transparent_hugepage/enabled
  • Move HuggingFace cache to fastest NVMe: export HF_HOME=/fast-nvme/hf
  • Ensure PCIe is negotiated at Gen 5 x8 – check with sudo lspci -vv | grep LnkSta

First Serve and Benchmark

Kick off Llama 3.1 8B FP8 with the standard config from our vLLM setup guide, then run the sanity test script and the benchmark script. Expected numbers:

MetricPass threshold
TTFT p99 at batch 8< 500 ms
Decode t/s at batch 1> 100
GPU temp under load< 78°C
Aggregate throughput batch 32> 650 t/s

If everything hits the marks you’re ready for your first real traffic.

Production-Ready in an Hour

UK dedicated hosting with drivers preinstalled. UK dedicated hosting.

Order the RTX 5060 Ti 16GB

See also: sanity test script, benchmark script, driver install, Docker CUDA setup, vLLM setup.

Need a Dedicated GPU Server?

Deploy from RTX 3050 to RTX 5090. Full root access, NVMe storage, 1Gbps — UK datacenter.

Browse GPU Servers

admin

We benchmark, deploy, and optimise GPU infrastructure for AI workloads. All data in our guides comes from real-world testing on our UK-based dedicated GPU servers.

Ready to deploy your AI workload?

Dedicated GPU servers from our UK datacenter. NVMe storage, 1Gbps networking, full root access.

Browse GPU Servers Contact Sales

Have a question? Need help?