RTX 3050 - Order Now
Home / Blog / Benchmarks / Stable Diffusion: Concurrent Image Generation by GPU
Benchmarks

Stable Diffusion: Concurrent Image Generation by GPU

How many images can each GPU generate concurrently with Stable Diffusion? Batch throughput benchmarks for SDXL, SD 1.5, and FLUX.1 across six GPUs.

Concurrent Generation Overview

Image generation APIs need to handle multiple requests simultaneously. Unlike LLM inference, diffusion models can process multiple images in a single batch through the denoising loop, trading per-image latency for higher aggregate throughput. We tested batch image generation across six GPUs on dedicated GPU servers to measure how many images per minute each card can produce.

Tests ran on GigaGPU bare-metal servers using the diffusers library with default step counts (20 for SD 1.5, 30 for SDXL, 28 for FLUX.1). We measured images per minute at batch sizes 1, 2, 4, and 8 (where VRAM permits). For per-image latency, see the image generation latency benchmark.

SDXL Batch Throughput by GPU

SDXL at 1024×1024, 30 steps. Images per minute at each batch size.

GPUBatch 1 (img/min)Batch 2 (img/min)Batch 4 (img/min)Batch 8 (img/min)
RTX 3050 (6 GB)1.4OOMOOMOOM
RTX 4060 (8 GB)3.24.8OOMOOM
RTX 4060 Ti (16 GB)4.77.410.8OOM
RTX 3090 (24 GB)7.311.617.222.0
RTX 5080 (16 GB)10.917.425.6OOM
RTX 5090 (32 GB)17.628.242.055.0

The RTX 5090 produces 55 SDXL images per minute at batch 8 — nearly 80,000 images per day. The RTX 3090 manages 22 images per minute at batch 8, which is roughly 31,000 per day. The RTX 4060 is limited to batch 2 for SDXL due to VRAM constraints.

SD 1.5 Batch Throughput by GPU

SD 1.5 at 512×512, 20 steps. The lighter model allows larger batch sizes.

GPUBatch 1 (img/min)Batch 2 (img/min)Batch 4 (img/min)Batch 8 (img/min)
RTX 3050 (6 GB)7.110.815.2OOM
RTX 4060 (8 GB)15.825.236.044.0
RTX 4060 Ti (16 GB)23.137.856.072.0
RTX 3090 (24 GB)35.358.488.0114.0
RTX 5080 (16 GB)54.590.0134.0172.0
RTX 5090 (32 GB)85.7142.0212.0280.0

SD 1.5 is dramatically faster — the RTX 5090 produces 280 images per minute at batch 8. Even the RTX 4060 handles 44 images per minute, making it viable for lightweight image generation APIs.

FLUX.1 Throughput by GPU

FLUX.1 (dev) at 1024×1024, 28 steps. FLUX.1’s higher VRAM requirements limit batch sizes on smaller cards.

GPUBatch 1 (img/min)Batch 2 (img/min)Batch 4 (img/min)
RTX 4060 Ti (16 GB)2.1OOMOOM
RTX 3090 (24 GB)3.85.8OOM
RTX 5080 (16 GB)5.4OOMOOM
RTX 5090 (32 GB)8.814.020.4

FLUX.1 is VRAM-hungry — only the RTX 5090 supports batch 4. For FLUX.1 production APIs, the 5090 is effectively the minimum card for any meaningful throughput.

Queue vs Batch: Throughput Strategy

For image generation APIs, you have two strategies: queue individual requests (batch 1, lowest latency) or accumulate requests into batches (higher throughput, higher latency). Queue mode delivers images in 3-18 seconds depending on GPU and model. Batch mode can double or triple throughput but adds wait time while the batch fills.

A common production approach is dynamic batching with a short timeout (200-500 ms). If multiple requests arrive within the timeout window, they batch together; otherwise, they process individually. This balances throughput and latency automatically. For overall API capacity planning, see the GPU capacity planning for AI SaaS guide. For more image generation analysis, explore the Benchmarks category.

Conclusion

Concurrent image generation throughput depends heavily on GPU VRAM and model size. The RTX 5090 leads with 55 SDXL images per minute at batch 8, while the RTX 3090 delivers 22 — both strong options for production image APIs. For SD 1.5 workloads, even budget GPUs offer high throughput. Match your GPU choice to your model, batch strategy, and volume requirements at GigaGPU dedicated hosting. See also the RTX 3090 vs RTX 5090 throughput per dollar comparison and the GPU comparisons category.

Size Your GPU Server

Tell us your workload — we’ll recommend the right GPU.

Browse GPU Servers

Need a Dedicated GPU Server?

Deploy from RTX 3050 to RTX 5090. Full root access, NVMe storage, 1Gbps — UK datacenter.

Browse GPU Servers

admin

We benchmark, deploy, and optimise GPU infrastructure for AI workloads. All data in our guides comes from real-world testing on our UK-based dedicated GPU servers.

Ready to deploy your AI workload?

Dedicated GPU servers from our UK datacenter. NVMe storage, 1Gbps networking, full root access.

Browse GPU Servers Contact Sales

Have a question? Need help?