RTX 3050 - Order Now
Home / Blog / GPU Comparisons / Best GPU for Stable Diffusion XL in 2026
GPU Comparisons

Best GPU for Stable Diffusion XL in 2026

Per-GPU SDXL benchmarks, batch-size capability and value picks from RTX 3060 12GB through to the RTX 6000 Pro 96GB.

SDXL remains the workhorse image model of 2026. Flux has the higher ceiling, but SDXL still wins on prompt variety, LoRA availability and end-to-end speed. Picking the right GPU matters: the spread between a 12 GB RTX 3060 and an RTX 5090 is more than 3x per image, and more than 10x once you factor in batch size. This guide benchmarks seven cards on a standard 1024×1024 30-step run and calls out the value picks. To deploy any of these, see our dedicated GPU hosting.

Contents

Per-GPU benchmark table

All numbers below use SDXL 1.0 base, 30 steps DPM++ 2M Karras, 1024×1024, FP16 (except Blackwell cards where FP8 was tested separately). Same seed, ComfyUI 0.3.12.

GPUVRAMSeconds / imageImages / hourArchitecture
RTX 3060 12GB12 GB6.0600Ampere
RTX 4060 Ti 16GB16 GB4.1878Ada
RTX 5060 Ti 16GB16 GB3.41,058Blackwell
RTX 3090 24GB24 GB2.81,285Ampere
RTX 5080 16GB16 GB2.21,636Blackwell
RTX 5090 32GB32 GB1.82,000Blackwell
RTX 6000 Pro 96GB96 GB1.52,400Blackwell

For the 5060 Ti numbers in detail, see our SDXL benchmark page.

Batch size capability

Throughput stops scaling with batch size once VRAM runs out. For commercial pipelines where you want to generate tens of thousands of images per day, batch is the biggest lever.

GPUMax batch (1024×1024 FP16)Effective imgs/hour at max batch
RTX 3060 12GB2720
RTX 4060 Ti 16GB41,450
RTX 5060 Ti 16GB41,720
RTX 3090 24GB62,150
RTX 5080 16GB42,600
RTX 5090 32GB124,800
RTX 6000 Pro 96GB409,000+

Value picks by budget

  • Under £400/mo: RTX 5060 Ti 16GB. Best FP8 support in its class and 75% the speed of a 5080 at roughly half the hosting price.
  • £500-800/mo: RTX 5080 16GB or an RTX 3090 24GB if you need the extra VRAM for big LoRA stacks.
  • £1000+/mo: RTX 5090 32GB. The fastest single GPU a normal team needs.
  • Studio scale: RTX 6000 Pro 96GB. Batch 40, run ControlNet and two LoRAs without offload.

Architecture notes

Blackwell cards (5060 Ti, 5080, 5090, 6000 Pro) have native FP8 tensor cores, which SDXL can exploit via torch.compile or Stable-Fast. FP8 typically gives a further 20-30% throughput on top of FP16 at negligible quality loss. Ampere (3060, 3090) does not have FP8 hardware; use FP16 and rely on bandwidth. Bandwidth ranking: 5090 1792 GB/s, 6000 Pro 1.4 TB/s, 5080 960 GB/s, 3090 936 GB/s, 5060 Ti 448 GB/s.

Cost per image

Assume UK dedicated pricing of £299-£1,499/month depending on card. At typical utilisation (70%), cost per 1000 SDXL images ranges from ~£0.21 on an RTX 6000 Pro to ~£1.10 on an RTX 3060.

Deploy SDXL on dedicated hardware

RTX 5060 Ti 16GB through to RTX 6000 Pro 96GB. UK dedicated hosting.

Browse GPU Servers

Which to buy

If you are starting from zero and plan to run one or two users with LoRAs and ControlNet, the 5060 Ti 16GB is the correct pick. If you are running a studio generating thousands of images daily, skip straight to the 5090 or 6000 Pro. A 3090 is still sensible as a legacy workhorse, but its £/image is no longer competitive with Blackwell at the consumer tier.

See also: 5060 Ti SDXL benchmark, 5060 Ti vs 3090, 5060 Ti vs 5080, SDXL for product images, image-generation studio.

Need a Dedicated GPU Server?

Deploy from RTX 3050 to RTX 5090. Full root access, NVMe storage, 1Gbps — UK datacenter.

Browse GPU Servers

admin

We benchmark, deploy, and optimise GPU infrastructure for AI workloads. All data in our guides comes from real-world testing on our UK-based dedicated GPU servers.

Ready to deploy your AI workload?

Dedicated GPU servers from our UK datacenter. NVMe storage, 1Gbps networking, full root access.

Browse GPU Servers Contact Sales

Have a question? Need help?