RTX 3050 - Order Now
Home / Blog / Tutorials / RTX 5060 Ti 16GB Sanity Test Script
Tutorials

RTX 5060 Ti 16GB Sanity Test Script

A 15-minute sanity test for a newly provisioned RTX 5060 Ti 16GB server - verify hardware, CUDA, PyTorch, and baseline inference performance.

After provisioning a new RTX 5060 Ti 16GB server on our dedicated hosting, run a sanity test before putting a workload on it. Fifteen minutes of validation catches hardware or driver issues before they surprise you mid-deployment.

Contents

Hardware

nvidia-smi
# Expect: RTX 5060 Ti, 16 GB, driver 565+
nvidia-smi --query-gpu=name,memory.total,memory.used,temperature.gpu,power.draw --format=csv
# Baseline: ~15 MB used, ~35C, ~15 W idle

If nvidia-smi returns “No devices found”, driver is not loaded – reboot or reinstall driver.

CUDA and PyTorch

python3 -c "import torch; print(torch.cuda.is_available(), torch.cuda.get_device_name(0))"
# True RTX 5060 Ti

python3 -c "import torch; a = torch.randn(1024, 1024).cuda(); b = torch.randn(1024, 1024).cuda(); print((a @ b).sum().item())"
# Any finite number; tests matmul on GPU

Stress Test

Load the card briefly to check thermals:

python3 -c "
import torch, time
a = torch.randn(8192, 8192).cuda().half()
b = torch.randn(8192, 8192).cuda().half()
start = time.time()
for _ in range(200):
    c = a @ b
torch.cuda.synchronize()
print(f'200 iters: {time.time()-start:.1f}s')
"

Expected: 200 iterations in ~8-12 seconds. Monitor nvidia-smi during the run – temperature should stay under 80°C, power near 170 W.

Baseline Inference

Quick vLLM smoke test:

pip install vllm
python -m vllm.entrypoints.openai.api_server \
  --model neuralmagic/Llama-3.1-8B-Instruct-FP8 \
  --quantization fp8 --max-model-len 4096 &

sleep 120  # wait for model load

curl http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model":"neuralmagic/Llama-3.1-8B-Instruct-FP8","messages":[{"role":"user","content":"Hello, say GigaGPU."}],"max_tokens":50}'

Expected: response with generated text in under 3 seconds.

VRAM Ceiling

python3 -c "
import torch
total = torch.cuda.get_device_properties(0).total_memory / 1e9
reserved = torch.cuda.memory_reserved(0) / 1e9
print(f'Total VRAM: {total:.1f} GB, Reserved: {reserved:.1f} GB')
"
# Expect Total ~16.0 GB

If total VRAM reported is under 15.5 GB, check that you did not get a different card variant. Contact support.

Validated Blackwell 16GB

Every server we ship passes this test before handoff. UK dedicated hosting.

Order the RTX 5060 Ti 16GB

Once sanity passes, run the full benchmark script to establish your baseline throughput numbers.

Need a Dedicated GPU Server?

Deploy from RTX 3050 to RTX 5090. Full root access, NVMe storage, 1Gbps — UK datacenter.

Browse GPU Servers

admin

We benchmark, deploy, and optimise GPU infrastructure for AI workloads. All data in our guides comes from real-world testing on our UK-based dedicated GPU servers.

Ready to deploy your AI workload?

Dedicated GPU servers from our UK datacenter. NVMe storage, 1Gbps networking, full root access.

Browse GPU Servers Contact Sales

Have a question? Need help?