Yes, the RTX 5090 runs Flux.1 in full FP16 precision. With 32GB GDDR7 VRAM, the RTX 5090 is one of the few single-GPU options that can load Flux.1 Dev at full FP16 quality without any quantisation. This gives you the highest possible image quality from Flux.1 with fast generation times.
The Short Answer
YES. Flux.1 Dev FP16 needs ~26GB peak at 1024×1024. The RTX 5090’s 32GB handles this with 6GB to spare.
Flux.1 is a 12B parameter diffusion transformer. In FP16, the model weights consume approximately 24GB. During 1024×1024 generation with 20 steps, peak VRAM usage including latent tensors, attention maps, and the text encoders (T5-XXL + CLIP-L) reaches roughly 26GB. The RTX 5090 fits this comfortably. For comparison, the RTX 3090’s 24GB cannot fit Flux.1 FP16 reliably, making the 5090 the entry point for full-precision Flux. See our image model VRAM guide for comparisons with other models.
VRAM Analysis
| Configuration | Model VRAM | Generation Overhead | Total (1024×1024) | RTX 5090 (32GB) |
|---|---|---|---|---|
| Flux.1 Dev FP16 | ~24GB | ~2GB | ~26GB | Fits |
| Flux.1 Dev FP16 + ControlNet | ~24GB | ~4.5GB | ~28.5GB | Fits |
| Flux.1 Dev FP16 (1536×1536) | ~24GB | ~4GB | ~28GB | Fits |
| Flux.1 Dev FP16 (2048×2048) | ~24GB | ~7GB | ~31GB | Tight |
| Flux.1 Schnell FP16 | ~24GB | ~1.5GB | ~25.5GB | Fits well |
At 1024×1024, you have 6GB of headroom for ControlNet, IP-Adapter, or other add-ons. At 1536×1536, the fit is still comfortable. Only at 2048×2048 resolution does it start to get tight. Batch size is limited to 1 at FP16 due to the model’s large footprint.
Performance Benchmarks
| GPU | Flux.1 Dev FP16 1024×1024 (20 steps) | Schnell FP16 1024×1024 (4 steps) |
|---|---|---|
| RTX 3090 (24GB) | OOM / borderline | OOM / borderline |
| RTX 5080 (16GB) | N/A (insufficient VRAM) | N/A (insufficient VRAM) |
| RTX 5090 (32GB) | ~20s | ~4.5s |
The RTX 5090 generates a full-quality Flux.1 Dev FP16 image in about 20 seconds at 1024×1024 with 20 steps. Schnell at 4 steps takes roughly 4.5 seconds. These are true FP16 results with no quality loss from quantisation. For FP8 benchmarks on more GPUs, see our Flux.1 images/sec benchmark.
Setup Guide
ComfyUI is the recommended interface for Flux.1 FP16:
# Clone and set up ComfyUI
git clone https://github.com/comfyanonymous/ComfyUI.git
cd ComfyUI
pip install -r requirements.txt
# Launch without lowvram (not needed on 32GB)
python main.py --listen 0.0.0.0 --port 8188
Download the Flux.1 Dev FP16 checkpoint (flux1-dev.safetensors) and the T5-XXL and CLIP-L text encoder files. Place them in the appropriate models/ subdirectories. Do NOT use the --lowvram flag, as the RTX 5090 has enough VRAM to keep everything resident.
For the best quality workflow, use the full Dev model with 20-30 steps and the Euler scheduler. The FP16 model produces sharper details and better text rendering than FP8 or NF4 quantised versions.
Recommended Alternative
If 32GB is not enough (for example, batch generation or 2048×2048), multi-GPU setups with two RTX 3090 cards can provide 48GB. For Flux.1 on a budget, the RTX 5080 runs Flux.1 in FP8 with good quality at a lower price point.
For SDXL on the 5090, check our SD 1.5 vs SDXL speed comparison. For LLM workloads, see the LLaMA 3 70B INT4 guide or multi-LLM guide. Browse all GPU options in the GPU Comparisons category or on our dedicated GPU hosting page.
Deploy This Model Now
Dedicated GPU servers with the VRAM you need. UK datacenter, full root access.
Browse GPU Servers