RTX 3050 - Order Now
Home / Blog / Benchmarks / RTX 5060 Ti 16GB Batch Size Tuning
Benchmarks

RTX 5060 Ti 16GB Batch Size Tuning

Finding the right max-num-seqs for Blackwell 16GB - throughput vs latency vs TTFT trade-offs with concrete numbers.

Batch size (--max-num-seqs in vLLM) is the single knob with the biggest effect on throughput vs latency. On the RTX 5060 Ti 16GB at our hosting, here are the concrete numbers to help pick a value.

Contents

Batch Sweep (Llama 3.1 8B FP8 + FP8 KV)

max-num-seqsAggregate t/sPer-user t/sp50 TTFTp99 TTFT
1112112120 ms180 ms
435589160 ms310 ms
851064200 ms480 ms
1664040280 ms780 ms
3272022420 ms1,450 ms
4875016560 ms2,100 ms
6476012720 ms2,800 ms

Throughput nearly flat past batch 32 – diminishing returns as memory bandwidth saturates. Per-user latency keeps dropping.

Interactive Chat Target

  • Goal: 30-60 tokens/sec per user (faster than reading speed)
  • Recommended: --max-num-seqs 16 – ~40 t/s per user, 640 aggregate
  • TTFT p99 under 800 ms

Bulk API Target

  • Goal: maximise completions per minute
  • Recommended: --max-num-seqs 32-48 – peak aggregate
  • Accept 1-2s TTFT p99

Recommended Defaults

Workloadmax-num-seqs
Interactive chat (SLA)16
General purpose (balanced)24
Bulk completion API32-48
Throughput benchmark64+
Low-VRAM model (14B AWQ)8

vLLM’s default is 256 – which is too high on a 16 GB card and creates KV cache pressure. Always override.

Tuned Blackwell 16GB Hosting

Right batch for your workload. UK dedicated hosting.

Order the RTX 5060 Ti 16GB

See also: max throughput, concurrent users, TTFT p99, decode benchmark.

Need a Dedicated GPU Server?

Deploy from RTX 3050 to RTX 5090. Full root access, NVMe storage, 1Gbps — UK datacenter.

Browse GPU Servers

admin

We benchmark, deploy, and optimise GPU infrastructure for AI workloads. All data in our guides comes from real-world testing on our UK-based dedicated GPU servers.

Ready to deploy your AI workload?

Dedicated GPU servers from our UK datacenter. NVMe storage, 1Gbps networking, full root access.

Browse GPU Servers Contact Sales

Have a question? Need help?