Table of Contents
FLUX.1 has overtaken Stable Diffusion XL as the de-facto open image model. Sizing it for a production image-generation API requires real benchmark numbers on the actual hardware you’d rent. This page is the consolidated reference.
For maximum throughput per pound: RTX 5090 + FP8 at ~10 images/min on FLUX.1 dev. For absolute speed: same card. For budget: RTX 5060 Ti + GGUF Q5 at ~3.5 images/min. FLUX.1 schnell is ~5× faster than dev across the board.
Benchmark setup
- ComfyUI on Ubuntu 22.04, NVIDIA driver 555.x
- 1024×1024 output, no upscaling
- FLUX.1 dev: 25 sampling steps, Euler scheduler
- FLUX.1 schnell: 4 sampling steps
- Single-image generation; batch 1
FLUX.1 schnell numbers
| GPU | Precision | Time per 1024² image | Images per minute |
|---|---|---|---|
| RTX 3050 6 GB | GGUF Q5 | ~22 s | 2.7 |
| RTX 3060 12 GB | GGUF Q5 | ~14 s | 4.3 |
| RTX 4060 8 GB | GGUF Q5 | ~10 s | 6.0 |
| RTX 5060 Ti 16 GB | FP8 | ~6 s | 10.0 |
| RTX 5080 16 GB | FP8 | ~3 s | 20.0 |
| RTX 4090 24 GB | FP16 | ~5 s | 12.0 |
| RTX 5090 32 GB | FP8 | ~1.6 s | 37.5 |
| RTX 6000 Pro | FP16 | ~1.6 s | 37.5 |
FLUX.1 dev numbers
| GPU | Precision | Time per 1024² image | Images per minute |
|---|---|---|---|
| RTX 3050 6 GB | GGUF Q4 | ~80 s | 0.75 |
| RTX 3060 12 GB | GGUF Q5 | ~32 s | 1.9 |
| RTX 4060 8 GB | GGUF Q5 | ~25 s | 2.4 |
| RTX 5060 Ti 16 GB | GGUF Q5 | ~17 s | 3.5 |
| RTX 5080 16 GB | FP8 | ~9 s | 6.7 |
| RTX 4090 24 GB | FP16 | ~8 s | 7.5 |
| RTX 5090 32 GB | FP8 | ~6 s | 10.0 |
| RTX 6000 Pro | FP16 | ~6 s | 10.0 |
Verdict
- Cheapest credible: RTX 5060 Ti 16 GB at £119/mo, ~3.5 FLUX.1 dev images/min.
- Best per-pound: RTX 5090 at £399/mo, ~10 FLUX.1 dev images/min.
- Highest absolute: RTX 5090 or RTX 6000 Pro (essentially tied).
- Fastest single image: same — Blackwell FP8 path.
Bottom line
For a FLUX.1 image API the RTX 5090 is the price/performance leader. The 5060 Ti is the budget pick. See best GPU for FLUX and can RTX 5090 run FLUX? for the deployment context.