Home / Blog / Benchmarks / RTX 5060 Ti 16GB SDXL Benchmark

Benchmarks

RTX 5060 Ti 16GB SDXL Benchmark

SDXL 1024x1024 benchmark on Blackwell 16GB - standard and Turbo, batch 1 and batch 4, VRAM and watts.

Benchmarks April 23, 2026 1 min read admin

Stable Diffusion XL at 1024×1024 is the standard image-gen workload. The RTX 5060 Ti 16GB on our hosting has plenty of headroom. Numbers below.

Setup
Batch 1
Batch 4
SDXL-Turbo
Optimisation knobs

Setup

Diffusers 0.30, PyTorch 2.5, xFormers 0.0.28
Model: stabilityai/stable-diffusion-xl-base-1.0
Resolution: 1024×1024
Sampler: DPM++ 2M Karras
Backends compared: Diffusers FP16, ComfyUI, Automatic1111

Batch 1, 1024×1024

Backend	Steps	Time	VRAM peak
Diffusers FP16	30	3.8 s	9.2 GB
Diffusers + SDPA	30	3.4 s	9.0 GB
Diffusers + torch.compile	30	2.9 s	9.4 GB
ComfyUI (default)	30	3.6 s	8.9 GB
Automatic1111 (xFormers)	30	4.1 s	9.5 GB

Roughly 3-4 seconds per 1024×1024 image at 30 steps. At 20 steps (fine for many styles) drop to ~2.5 s.

Batch 4, 1024×1024

Diffusers FP16: 12.2 s for 4 images (3.05 s / image)
VRAM peak: 13.8 GB
Batch 6: OOM at 1024×1024; workable at 896×896

SDXL-Turbo (1-4 step)

Steps	Time	Images/min
1	0.28 s	214
4	0.85 s	70

Turbo is the option for interactive creative tools.

Optimisation Knobs

torch.compile – 20% speedup, 60s first-run compile cost
SDPA vs xFormers – native PyTorch SDPA is now faster on Blackwell
VAE tiling – needed if doing huge resolutions
FP8 UNet – experimental; ~25% faster, minor quality cost

SDXL on Blackwell 16GB

3-4 s/image at 1024px, batch up to 4. UK dedicated hosting.

Order the RTX 5060 Ti 16GB

Need a Dedicated GPU Server?

Deploy from RTX 3050 to RTX 5090. Full root access, NVMe storage, 1Gbps — UK datacenter.

Browse GPU Servers

Benchmarks

admin

We benchmark, deploy, and optimise GPU infrastructure for AI workloads. All data in our guides comes from real-world testing on our UK-based dedicated GPU servers.

Ready to deploy your AI workload?

Dedicated GPU servers from our UK datacenter. NVMe storage, 1Gbps networking, full root access.

Browse GPU Servers Contact Sales

RTX 5060 Ti 16GB SDXL Benchmark

Contents

Setup

Batch 1, 1024×1024

Batch 4, 1024×1024

SDXL-Turbo (1-4 step)

Optimisation Knobs

SDXL on Blackwell 16GB

Need a Dedicated GPU Server?

admin

GPU Hosting

Blog Categories

AI Model Hosting

Benchmarks & Tools

Deploy a GPU Server

Ready to deploy your AI workload?

Have a question? Need help?

RTX 5060 Ti 16GB SDXL Benchmark

Contents

Setup

Batch 1, 1024×1024

Batch 4, 1024×1024

SDXL-Turbo (1-4 step)

Optimisation Knobs

SDXL on Blackwell 16GB

Need a Dedicated GPU Server?

admin

Related Articles

TTS Latency Benchmark Update: April 2026

Voice Agent Round-Trip Latency by GPU

Whisper Large-v3 on RTX 4060: Transcription Speed & Cost, Category: Benchmarks, Slug: whisper-large-v3-on-rtx-4060-benchmark, Excerpt: Whisper Large-v3 benchmarked on RTX 4060: RTF 0.16, 6.2x real-time processing, VRAM usage, and cost per audio hour., Internal links: 8 –>

LLaMA 3 70B Tokens/sec by GPU

GPU Hosting

Blog Categories

AI Model Hosting

Benchmarks & Tools

Deploy a GPU Server

Ready to deploy your AI workload?

Have a question? Need help? Contact us

Have a question? Need help?