RTX 3050 - Order Now
Home / Blog / Benchmarks / Image Generation Latency by GPU (Time per Image)
Benchmarks

Image Generation Latency by GPU (Time per Image)

Image generation latency benchmarks across six GPUs — p50, p90, and p99 time per image for Stable Diffusion XL, FLUX.1, and SD 1.5 on dedicated GPU servers.

Image Generation Latency Overview

For AI image generation APIs and user-facing creative tools running on a dedicated GPU server, the metric users feel most is time per image — how long they wait from submitting a prompt to receiving a completed image. We benchmarked per-image latency across six GPUs with the most popular diffusion models to help you size your hardware for responsive image generation.

All tests ran on GigaGPU bare-metal servers with default inference settings (20 steps for SD 1.5, 30 steps for SDXL, 28 steps for FLUX.1). Images were generated at each model’s native resolution (512×512 for SD 1.5, 1024×1024 for SDXL and FLUX.1). For throughput-oriented benchmarks, see the concurrent image generation benchmark.

SDXL Latency by GPU

Stable Diffusion XL at 1024×1024 resolution with 30 denoising steps.

GPUVRAMp50 Latencyp90 Latencyp99 Latency
RTX 3050 (6 GB)6 GB42.0 s43.5 s45.0 s
RTX 4060 (8 GB)8 GB18.5 s19.2 s20.0 s
RTX 4060 Ti (16 GB)16 GB12.8 s13.4 s14.0 s
RTX 3090 (24 GB)24 GB8.2 s8.6 s9.1 s
RTX 5080 (16 GB)16 GB5.5 s5.8 s6.2 s
RTX 5090 (32 GB)32 GB3.4 s3.6 s3.9 s

The RTX 5090 generates SDXL images in under 4 seconds — fast enough for near-real-time creative workflows. The RTX 3090 at 8.2 seconds is acceptable for API workloads where users expect a short wait. The RTX 4060 at 18.5 seconds is viable only for batch processing or non-interactive use.

FLUX.1 Latency by GPU

FLUX.1 (dev variant) at 1024×1024 with 28 inference steps. FLUX.1 requires more VRAM than SDXL.

GPUp50 Latencyp90 Latencyp99 Latency
RTX 3050 (6 GB)OOMOOMOOM
RTX 4060 (8 GB)OOMOOMOOM
RTX 4060 Ti (16 GB)28.5 s29.8 s31.0 s
RTX 3090 (24 GB)16.0 s16.8 s17.5 s
RTX 5080 (16 GB)11.2 s11.8 s12.4 s
RTX 5090 (32 GB)6.8 s7.2 s7.6 s

FLUX.1 requires at least 12-13 GB of VRAM, eliminating the RTX 3050 and RTX 4060. The RTX 5090 generates FLUX.1 images in under 7 seconds. For a broader comparison of these models, see the Benchmarks category which includes model-specific speed comparisons.

Stable Diffusion 1.5 Latency by GPU

SD 1.5 at 512×512 with 20 steps is the lightest workload and runs on every GPU tested.

GPUp50 Latencyp90 Latencyp99 Latency
RTX 3050 (6 GB)8.5 s8.9 s9.3 s
RTX 4060 (8 GB)3.8 s4.0 s4.2 s
RTX 4060 Ti (16 GB)2.6 s2.8 s3.0 s
RTX 3090 (24 GB)1.7 s1.8 s2.0 s
RTX 5080 (16 GB)1.1 s1.2 s1.3 s
RTX 5090 (32 GB)0.7 s0.75 s0.8 s

SD 1.5 is fast enough on the RTX 5090 for real-time preview generation. Even the RTX 3050 can handle SD 1.5 in under 10 seconds, making it viable for low-volume creative tools.

Latency Under Concurrent Load

Unlike LLM inference, image generation does not benefit from continuous batching in the same way. Each image uses the full GPU, so concurrent requests queue sequentially unless you use batch generation (multiple images in one forward pass). With batch-2 on the RTX 3090, SDXL latency increases from 8.2 s to 12.5 s per image — a 52 percent increase for 2x the throughput.

For production APIs handling multiple requests, the effective throughput depends on whether users can tolerate queued latency. See the concurrent image generation benchmark for detailed throughput data across GPUs and batch sizes. For capacity planning, our GPU capacity planning for AI SaaS guide covers image generation workloads.

Conclusion

Image generation latency varies by over 10x across our GPU lineup. The RTX 5090 delivers sub-4-second SDXL images and sub-1-second SD 1.5 images, making it suitable for near-real-time creative applications. The RTX 3090 provides a strong mid-range option at 8 seconds for SDXL. For the best GPU recommendations covering both LLM and image workloads, browse the full GPU comparisons category on GigaGPU.

Size Your GPU Server

Tell us your workload — we’ll recommend the right GPU.

Browse GPU Servers

Need a Dedicated GPU Server?

Deploy from RTX 3050 to RTX 5090. Full root access, NVMe storage, 1Gbps — UK datacenter.

Browse GPU Servers

admin

We benchmark, deploy, and optimise GPU infrastructure for AI workloads. All data in our guides comes from real-world testing on our UK-based dedicated GPU servers.

Ready to deploy your AI workload?

Dedicated GPU servers from our UK datacenter. NVMe storage, 1Gbps networking, full root access.

Browse GPU Servers Contact Sales

Have a question? Need help?