Home / Blog / Benchmarks / Image Generation Latency by GPU (Time per Image)

Benchmarks

Image Generation Latency by GPU (Time per Image)

Image generation latency benchmarks across six GPUs — p50, p90, and p99 time per image for Stable Diffusion XL, FLUX.1, and SD 1.5 on dedicated GPU servers.

Benchmarks April 17, 2026 3 min read admin

Table of Contents

Image Generation Latency Overview
SDXL Latency by GPU
FLUX.1 Latency by GPU
Stable Diffusion 1.5 Latency by GPU
Latency Under Concurrent Load
Conclusion

Image Generation Latency Overview

For AI image generation APIs and user-facing creative tools running on a dedicated GPU server, the metric users feel most is time per image — how long they wait from submitting a prompt to receiving a completed image. We benchmarked per-image latency across six GPUs with the most popular diffusion models to help you size your hardware for responsive image generation.

All tests ran on GigaGPU bare-metal servers with default inference settings (20 steps for SD 1.5, 30 steps for SDXL, 28 steps for FLUX.1). Images were generated at each model’s native resolution (512×512 for SD 1.5, 1024×1024 for SDXL and FLUX.1). For throughput-oriented benchmarks, see the concurrent image generation benchmark.

SDXL Latency by GPU

Stable Diffusion XL at 1024×1024 resolution with 30 denoising steps.

GPU	VRAM	p50 Latency	p90 Latency	p99 Latency
RTX 3050 (6 GB)	6 GB	42.0 s	43.5 s	45.0 s
RTX 4060 (8 GB)	8 GB	18.5 s	19.2 s	20.0 s
RTX 4060 Ti (16 GB)	16 GB	12.8 s	13.4 s	14.0 s
RTX 3090 (24 GB)	24 GB	8.2 s	8.6 s	9.1 s
RTX 5080 (16 GB)	16 GB	5.5 s	5.8 s	6.2 s
RTX 5090 (32 GB)	32 GB	3.4 s	3.6 s	3.9 s

The RTX 5090 generates SDXL images in under 4 seconds — fast enough for near-real-time creative workflows. The RTX 3090 at 8.2 seconds is acceptable for API workloads where users expect a short wait. The RTX 4060 at 18.5 seconds is viable only for batch processing or non-interactive use.

FLUX.1 Latency by GPU

FLUX.1 (dev variant) at 1024×1024 with 28 inference steps. FLUX.1 requires more VRAM than SDXL.

GPU	p50 Latency	p90 Latency	p99 Latency
RTX 3050 (6 GB)	OOM	OOM	OOM
RTX 4060 (8 GB)	OOM	OOM	OOM
RTX 4060 Ti (16 GB)	28.5 s	29.8 s	31.0 s
RTX 3090 (24 GB)	16.0 s	16.8 s	17.5 s
RTX 5080 (16 GB)	11.2 s	11.8 s	12.4 s
RTX 5090 (32 GB)	6.8 s	7.2 s	7.6 s

FLUX.1 requires at least 12-13 GB of VRAM, eliminating the RTX 3050 and RTX 4060. The RTX 5090 generates FLUX.1 images in under 7 seconds. For a broader comparison of these models, see the Benchmarks category which includes model-specific speed comparisons.

Stable Diffusion 1.5 Latency by GPU

SD 1.5 at 512×512 with 20 steps is the lightest workload and runs on every GPU tested.

GPU	p50 Latency	p90 Latency	p99 Latency
RTX 3050 (6 GB)	8.5 s	8.9 s	9.3 s
RTX 4060 (8 GB)	3.8 s	4.0 s	4.2 s
RTX 4060 Ti (16 GB)	2.6 s	2.8 s	3.0 s
RTX 3090 (24 GB)	1.7 s	1.8 s	2.0 s
RTX 5080 (16 GB)	1.1 s	1.2 s	1.3 s
RTX 5090 (32 GB)	0.7 s	0.75 s	0.8 s

SD 1.5 is fast enough on the RTX 5090 for real-time preview generation. Even the RTX 3050 can handle SD 1.5 in under 10 seconds, making it viable for low-volume creative tools.

Latency Under Concurrent Load

Unlike LLM inference, image generation does not benefit from continuous batching in the same way. Each image uses the full GPU, so concurrent requests queue sequentially unless you use batch generation (multiple images in one forward pass). With batch-2 on the RTX 3090, SDXL latency increases from 8.2 s to 12.5 s per image — a 52 percent increase for 2x the throughput.

For production APIs handling multiple requests, the effective throughput depends on whether users can tolerate queued latency. See the concurrent image generation benchmark for detailed throughput data across GPUs and batch sizes. For capacity planning, our GPU capacity planning for AI SaaS guide covers image generation workloads.

Conclusion

Image generation latency varies by over 10x across our GPU lineup. The RTX 5090 delivers sub-4-second SDXL images and sub-1-second SD 1.5 images, making it suitable for near-real-time creative applications. The RTX 3090 provides a strong mid-range option at 8 seconds for SDXL. For the best GPU recommendations covering both LLM and image workloads, browse the full GPU comparisons category on GigaGPU.

Size Your GPU Server

Tell us your workload — we’ll recommend the right GPU.

Browse GPU Servers

Need a Dedicated GPU Server?

Deploy from RTX 3050 to RTX 5090. Full root access, NVMe storage, 1Gbps — UK datacenter.

Browse GPU Servers

Benchmarks

admin

We benchmark, deploy, and optimise GPU infrastructure for AI workloads. All data in our guides comes from real-world testing on our UK-based dedicated GPU servers.

Ready to deploy your AI workload?

Dedicated GPU servers from our UK datacenter. NVMe storage, 1Gbps networking, full root access.

Browse GPU Servers Contact Sales

Image Generation Latency by GPU (Time per Image)

Image Generation Latency Overview

SDXL Latency by GPU

FLUX.1 Latency by GPU

Stable Diffusion 1.5 Latency by GPU

Latency Under Concurrent Load

Conclusion

Size Your GPU Server

Need a Dedicated GPU Server?

admin

GPU Hosting

Blog Categories

AI Model Hosting

Benchmarks & Tools

Deploy a GPU Server

Ready to deploy your AI workload?

Have a question? Need help?

Image Generation Latency by GPU (Time per Image)

Image Generation Latency Overview

SDXL Latency by GPU

FLUX.1 Latency by GPU

Stable Diffusion 1.5 Latency by GPU

Latency Under Concurrent Load

Conclusion

Size Your GPU Server

Need a Dedicated GPU Server?

admin

Related Articles

BGE Embedding Throughput by GPU

NUMA-Aware AI Inference Optimization

Memory Bandwidth vs TFLOPS: Why It Matters

SD 1.5 on RTX 3050: Images/sec & VRAM Usage, Category: Benchmarks, Slug: sd-1.5-on-rtx-3050-benchmark, Excerpt: SD 1.5 benchmarked on RTX 3050: 2.8 it/s, 6.72 images/min at 512×512, VRAM usage, and cost per 1K images., Internal links: 8 –>

GPU Hosting

Blog Categories

AI Model Hosting

Benchmarks & Tools

Deploy a GPU Server

Ready to deploy your AI workload?

Have a question? Need help? Contact us

Have a question? Need help?