RTX 3050 - Order Now
Home / Blog / Benchmarks / RTX 5060 Ti 16GB QLoRA Training Speed
Benchmarks

RTX 5060 Ti 16GB QLoRA Training Speed

QLoRA fine-tuning speed on Blackwell 16GB - per-step times, tokens per second, and the optimal batch/seq combinations.

QLoRA freezes quantised weights and trains low-rank adapters. It’s the realistic way to fine-tune 7B-14B models on 16 GB. Measured numbers on the RTX 5060 Ti 16GB at our hosting:

Contents

Stack

  • Transformers 4.46, PEFT 0.13, bitsandbytes 0.44, Accelerate 1.0
  • 4-bit NF4 quantisation, double quant enabled, bf16 compute dtype
  • Paged AdamW optimiser (bnb_8bit)
  • FlashAttention 2.6

Per-Step Timings (sec per iteration)

ModelSeq 1024Seq 2048Seq 4096
Mistral 7B (bs=4)0.781.553.20
Llama 3 8B (bs=4)0.851.683.45
Gemma 2 9B (bs=2)0.621.302.80
Qwen 2.5 14B (bs=2)1.052.10OOM

Tokens/sec

ModelConfigtokens/s
Llama 3 8Bbs=4, seq=20484,900
Mistral 7Bbs=4, seq=20485,300
Qwen 2.5 14Bbs=2, seq=20481,950

Memory Usage

ModelConfigPeak VRAM
Llama 3 8Bbs=4, seq=204811.8 GB
Llama 3 8Bbs=2, seq=409613.2 GB
Qwen 2.5 14Bbs=2, seq=204814.5 GB

Recommended Recipe

  • Llama 3 8B at seq 2048, bs 4 – fits with 4 GB headroom, ~5k tokens/s
  • Enable gradient checkpointing if pushing to seq 4096
  • Effective batch size via gradient_accumulation_steps=4 to reach eff_bs=16
  • Use Paged AdamW to avoid OOM on longer sequences

For higher throughput, switch to Unsloth – same results in roughly half the time.

QLoRA on Blackwell 16GB

Llama 3 8B at ~5k tokens/s. UK dedicated hosting.

Order the RTX 5060 Ti 16GB

See also: QLoRA guide, LoRA speed, Unsloth, fine-tune throughput, LoRA guide.

Need a Dedicated GPU Server?

Deploy from RTX 3050 to RTX 5090. Full root access, NVMe storage, 1Gbps — UK datacenter.

Browse GPU Servers

admin

We benchmark, deploy, and optimise GPU infrastructure for AI workloads. All data in our guides comes from real-world testing on our UK-based dedicated GPU servers.

Ready to deploy your AI workload?

Dedicated GPU servers from our UK datacenter. NVMe storage, 1Gbps networking, full root access.

Browse GPU Servers Contact Sales

Have a question? Need help?