Home / Blog / Benchmarks / RTX 5060 Ti 16GB QLoRA Training Speed

Benchmarks

RTX 5060 Ti 16GB QLoRA Training Speed

QLoRA fine-tuning speed on Blackwell 16GB - per-step times, tokens per second, and the optimal batch/seq combinations.

Benchmarks April 23, 2026 1 min read admin

QLoRA freezes quantised weights and trains low-rank adapters. It’s the realistic way to fine-tune 7B-14B models on 16 GB. Measured numbers on the RTX 5060 Ti 16GB at our hosting:

Stack
Per-step timings
Tokens/sec
Memory usage
Recommended recipe

Stack

Transformers 4.46, PEFT 0.13, bitsandbytes 0.44, Accelerate 1.0
4-bit NF4 quantisation, double quant enabled, bf16 compute dtype
Paged AdamW optimiser (bnb_8bit)
FlashAttention 2.6

Per-Step Timings (sec per iteration)

Model	Seq 1024	Seq 2048	Seq 4096
Mistral 7B (bs=4)	0.78	1.55	3.20
Llama 3 8B (bs=4)	0.85	1.68	3.45
Gemma 2 9B (bs=2)	0.62	1.30	2.80
Qwen 2.5 14B (bs=2)	1.05	2.10	OOM

Tokens/sec

Model	Config	tokens/s
Llama 3 8B	bs=4, seq=2048	4,900
Mistral 7B	bs=4, seq=2048	5,300
Qwen 2.5 14B	bs=2, seq=2048	1,950

Memory Usage

Model	Config	Peak VRAM
Llama 3 8B	bs=4, seq=2048	11.8 GB
Llama 3 8B	bs=2, seq=4096	13.2 GB
Qwen 2.5 14B	bs=2, seq=2048	14.5 GB

Recommended Recipe

Llama 3 8B at seq 2048, bs 4 – fits with 4 GB headroom, ~5k tokens/s
Enable gradient checkpointing if pushing to seq 4096
Effective batch size via gradient_accumulation_steps=4 to reach eff_bs=16
Use Paged AdamW to avoid OOM on longer sequences

For higher throughput, switch to Unsloth – same results in roughly half the time.

QLoRA on Blackwell 16GB

Llama 3 8B at ~5k tokens/s. UK dedicated hosting.

Order the RTX 5060 Ti 16GB

Need a Dedicated GPU Server?

Deploy from RTX 3050 to RTX 5090. Full root access, NVMe storage, 1Gbps — UK datacenter.

Browse GPU Servers

Benchmarks

admin

We benchmark, deploy, and optimise GPU infrastructure for AI workloads. All data in our guides comes from real-world testing on our UK-based dedicated GPU servers.

Ready to deploy your AI workload?

Dedicated GPU servers from our UK datacenter. NVMe storage, 1Gbps networking, full root access.

Browse GPU Servers Contact Sales

RTX 5060 Ti 16GB QLoRA Training Speed

Contents

Stack

Per-Step Timings (sec per iteration)

Tokens/sec

Memory Usage

Recommended Recipe

QLoRA on Blackwell 16GB

Need a Dedicated GPU Server?

admin

GPU Hosting

Blog Categories

AI Model Hosting

Benchmarks & Tools

Deploy a GPU Server

Ready to deploy your AI workload?

Have a question? Need help?

RTX 5060 Ti 16GB QLoRA Training Speed

Contents

Stack

Per-Step Timings (sec per iteration)

Tokens/sec

Memory Usage

Recommended Recipe

QLoRA on Blackwell 16GB

Need a Dedicated GPU Server?

admin

Related Articles

PaddleOCR on RTX 3090: OCR Speed & Cost, Category: Benchmarks, Slug: paddleocr-on-rtx-3090-benchmark, Excerpt: PaddleOCR benchmarked on RTX 3090: 52 pages/sec, VRAM usage, cost efficiency, and deployment configuration., Internal links: 8 –>

DeepSeek Benchmarks: Performance on GigaGPU Servers

AI Chatbot Response Time by GPU and Model

PaddleOCR on RTX 5080: OCR Speed & Cost, Category: Benchmarks, Slug: paddleocr-on-rtx-5080-benchmark, Excerpt: PaddleOCR benchmarked on RTX 5080: 78 pages/sec, VRAM usage, cost efficiency, and deployment configuration., Internal links: 8 –>

GPU Hosting

Blog Categories

AI Model Hosting

Benchmarks & Tools

Deploy a GPU Server

Ready to deploy your AI workload?

Have a question? Need help? Contact us

Have a question? Need help?