SDXL remains the workhorse image model of 2026. Flux has the higher ceiling, but SDXL still wins on prompt variety, LoRA availability and end-to-end speed. Picking the right GPU matters: the spread between a 12 GB RTX 3060 and an RTX 5090 is more than 3x per image, and more than 10x once you factor in batch size. This guide benchmarks seven cards on a standard 1024×1024 30-step run and calls out the value picks. To deploy any of these, see our dedicated GPU hosting.
Contents
- Per-GPU benchmark table
- Batch size capability
- Value picks by budget
- Architecture notes
- Cost per image
- Which to buy
Per-GPU benchmark table
All numbers below use SDXL 1.0 base, 30 steps DPM++ 2M Karras, 1024×1024, FP16 (except Blackwell cards where FP8 was tested separately). Same seed, ComfyUI 0.3.12.
| GPU | VRAM | Seconds / image | Images / hour | Architecture |
|---|---|---|---|---|
| RTX 3060 12GB | 12 GB | 6.0 | 600 | Ampere |
| RTX 4060 Ti 16GB | 16 GB | 4.1 | 878 | Ada |
| RTX 5060 Ti 16GB | 16 GB | 3.4 | 1,058 | Blackwell |
| RTX 3090 24GB | 24 GB | 2.8 | 1,285 | Ampere |
| RTX 5080 16GB | 16 GB | 2.2 | 1,636 | Blackwell |
| RTX 5090 32GB | 32 GB | 1.8 | 2,000 | Blackwell |
| RTX 6000 Pro 96GB | 96 GB | 1.5 | 2,400 | Blackwell |
For the 5060 Ti numbers in detail, see our SDXL benchmark page.
Batch size capability
Throughput stops scaling with batch size once VRAM runs out. For commercial pipelines where you want to generate tens of thousands of images per day, batch is the biggest lever.
| GPU | Max batch (1024×1024 FP16) | Effective imgs/hour at max batch |
|---|---|---|
| RTX 3060 12GB | 2 | 720 |
| RTX 4060 Ti 16GB | 4 | 1,450 |
| RTX 5060 Ti 16GB | 4 | 1,720 |
| RTX 3090 24GB | 6 | 2,150 |
| RTX 5080 16GB | 4 | 2,600 |
| RTX 5090 32GB | 12 | 4,800 |
| RTX 6000 Pro 96GB | 40 | 9,000+ |
Value picks by budget
- Under £400/mo: RTX 5060 Ti 16GB. Best FP8 support in its class and 75% the speed of a 5080 at roughly half the hosting price.
- £500-800/mo: RTX 5080 16GB or an RTX 3090 24GB if you need the extra VRAM for big LoRA stacks.
- £1000+/mo: RTX 5090 32GB. The fastest single GPU a normal team needs.
- Studio scale: RTX 6000 Pro 96GB. Batch 40, run ControlNet and two LoRAs without offload.
Architecture notes
Blackwell cards (5060 Ti, 5080, 5090, 6000 Pro) have native FP8 tensor cores, which SDXL can exploit via torch.compile or Stable-Fast. FP8 typically gives a further 20-30% throughput on top of FP16 at negligible quality loss. Ampere (3060, 3090) does not have FP8 hardware; use FP16 and rely on bandwidth. Bandwidth ranking: 5090 1792 GB/s, 6000 Pro 1.4 TB/s, 5080 960 GB/s, 3090 936 GB/s, 5060 Ti 448 GB/s.
Cost per image
Assume UK dedicated pricing of £299-£1,499/month depending on card. At typical utilisation (70%), cost per 1000 SDXL images ranges from ~£0.21 on an RTX 6000 Pro to ~£1.10 on an RTX 3060.
Deploy SDXL on dedicated hardware
RTX 5060 Ti 16GB through to RTX 6000 Pro 96GB. UK dedicated hosting.
Browse GPU ServersWhich to buy
If you are starting from zero and plan to run one or two users with LoRAs and ControlNet, the 5060 Ti 16GB is the correct pick. If you are running a studio generating thousands of images daily, skip straight to the 5090 or 6000 Pro. A 3090 is still sensible as a legacy workhorse, but its £/image is no longer competitive with Blackwell at the consumer tier.
See also: 5060 Ti SDXL benchmark, 5060 Ti vs 3090, 5060 Ti vs 5080, SDXL for product images, image-generation studio.