Table of Contents
SDXL Turbo Benchmark Overview
SDXL Turbo uses adversarial diffusion distillation to generate images in as few as one sampling step, making it one of the fastest high-quality image generation models available. For real-time applications on a dedicated GPU server, SDXL Turbo can deliver sub-second image generation on the right hardware. We benchmark images per second across six GPUs.
All tests were run on GigaGPU servers at 512×512 resolution (SDXL Turbo’s optimal resolution) using 1-step and 4-step generation. SDXL Turbo requires approximately 6.5 GB of VRAM. For model comparisons, see the SD 1.5 vs SDXL speed benchmark.
Images/sec Results by GPU
| GPU | VRAM | SDXL Turbo 1-Step (img/s) | Time per Image |
|---|---|---|---|
| RTX 3050 | 6 GB | 1.8 img/s | ~556ms |
| RTX 4060 | 8 GB | 3.5 img/s | ~286ms |
| RTX 4060 Ti | 16 GB | 4.8 img/s | ~208ms |
| RTX 3090 | 24 GB | 6.2 img/s | ~161ms |
| RTX 5080 | 16 GB | 9.5 img/s | ~105ms |
| RTX 5090 | 32 GB | 13.8 img/s | ~72ms |
SDXL Turbo is remarkably fast. Even the RTX 3050 delivers nearly 2 images per second at 1-step, while the RTX 5090 manages 13.8 images/sec at 72ms per image — fast enough for truly real-time applications.
1-Step vs 4-Step Comparison
More steps improve quality at the cost of speed. Below we compare 1-step and 4-step generation.
| GPU | 1-Step (img/s) | 4-Step (img/s) |
|---|---|---|
| RTX 3050 | 1.8 | 0.48 |
| RTX 4060 | 3.5 | 0.92 |
| RTX 4060 Ti | 4.8 | 1.25 |
| RTX 3090 | 6.2 | 1.62 |
| RTX 5080 | 9.5 | 2.48 |
| RTX 5090 | 13.8 | 3.60 |
At 4 steps, the RTX 5090 still manages 3.6 images/sec (~278ms per image), which is faster than most models at any step count. For applications where quality matters more than speed, 4-step generation is recommended.
Cost Efficiency Analysis
| GPU | 1-Step img/s | Approx. Monthly Cost | img/s per Pound |
|---|---|---|---|
| RTX 3050 | 1.8 | ~£45 | 0.040 |
| RTX 4060 | 3.5 | ~£60 | 0.058 |
| RTX 4060 Ti | 4.8 | ~£75 | 0.064 |
| RTX 3090 | 6.2 | ~£110 | 0.056 |
| RTX 5080 | 9.5 | ~£160 | 0.059 |
| RTX 5090 | 13.8 | ~£250 | 0.055 |
The RTX 4060 Ti leads on cost efficiency at 0.064 img/s per pound. For the best GPU for Stable Diffusion, it offers outstanding value for SDXL Turbo.
GPU Recommendations
- Budget: RTX 4060 — 3.5 img/s at 1-step for development and moderate-traffic APIs.
- Best value: RTX 4060 Ti — top cost efficiency with 4.8 img/s.
- Real-time: RTX 5090 — 72ms per image enables truly interactive generation.
- High throughput: RTX 5080 — excellent balance of speed and cost for production.
For higher-quality image generation at lower speed, see our Flux.1 benchmark. Compare SDXL Turbo with the full SDXL model in our SD 1.5 vs SDXL comparison. Browse all results in the Benchmarks category.
Conclusion
SDXL Turbo is the fastest high-quality image generation model we have benchmarked. Its single-step capability means even budget GPUs can serve images in under a second, while high-end cards achieve frame-rate speeds. For applications requiring instant visual feedback, SDXL Turbo on dedicated GPU hardware is the optimal choice.
Real-Time Image Generation with SDXL Turbo
GPU servers optimised for image generation from budget to high-end. Sub-second generation speeds available.
Browse GPU Servers