FLUX.1 Schnell is the fastest top-tier diffusion model in 2026. Apache-licensed, 4 steps per image, quality that rivals or beats SDXL. On our dedicated GPU hosting every tier runs it; here is measured throughput per card.
Contents
VRAM
FLUX Schnell at FP16: ~22 GB. FP8: ~12 GB. INT4 (bitsandbytes): ~7 GB.
Setup
from diffusers import FluxPipeline
import torch
pipe = FluxPipeline.from_pretrained(
"black-forest-labs/FLUX.1-schnell",
torch_dtype=torch.bfloat16,
).to("cuda")
image = pipe(
"A detailed urban sketch, high contrast ink on paper",
num_inference_steps=4,
guidance_scale=0.0,
max_sequence_length=256,
).images[0]
Benchmarks
1024×1024, 4 steps, time per image:
| GPU | Precision | Time |
|---|---|---|
| 4060 Ti 16GB | FP8 | ~2.8s |
| 3090 24GB | BF16 | ~1.6s |
| 5080 16GB | FP8 | ~1.1s |
| 5090 32GB | BF16 | ~0.7s |
| 6000 Pro 96GB | BF16 | ~0.6s |
Schnell vs Dev
FLUX.1 Dev is a 28-step variant with higher quality and a non-commercial licence. For commercial use stick with Schnell. For portfolio or non-commercial work Dev can be worth the extra steps.
Quality comparison on commercial workflows: Schnell is typically within 5-10% of Dev quality on most prompts. Speed is 7-8x faster.