Table of Contents
VRAM Check: Flux.1 on RTX 3090
Flux.1 is the leading open-source text-to-image model from Black Forest Labs, and the RTX 3090 with 24 GB GDDR6X is one of the best GPUs for running it at full FP16 precision on a dedicated GPU server. Here is how each Flux.1 variant fits:
| Variant | Precision | Model VRAM | Peak During Generation | Fits RTX 3090? |
|---|---|---|---|---|
| Flux.1 Dev | FP16 | ~24 GB | ~18-20 GB | Yes |
| Flux.1 Schnell | FP16 | ~24 GB | ~18-20 GB | Yes |
| Flux.1 Dev | FP8 | ~12 GB | ~13-15 GB | Yes (9 GB spare) |
| Flux.1 Dev | NF4 | ~6 GB | ~8-10 GB | Yes (14 GB spare) |
At FP16 the RTX 3090 runs Flux.1 at full quality by offloading the text encoder after the prompt encoding step, keeping peak VRAM around 18-20 GB. With FP8 quantisation you free enough memory for ControlNet or LoRA extensions. For full VRAM sizing, read our Flux.1 VRAM requirements guide.
Setup with Diffusers
# Install dependencies
pip install diffusers transformers accelerate torch sentencepiece
# Generate with Flux.1 Dev
from diffusers import FluxPipeline
import torch
pipe = FluxPipeline.from_pretrained(
"black-forest-labs/FLUX.1-dev",
torch_dtype=torch.float16
)
pipe.enable_model_cpu_offload()
image = pipe(
prompt="A futuristic GPU data centre in London, cyberpunk neon glow",
num_inference_steps=28,
guidance_scale=3.5,
height=1024,
width=1024
).images[0]
image.save("flux_output.png")
Using enable_model_cpu_offload() keeps peak VRAM under 20 GB by moving components to CPU when not in use. For a comparison of Flux versus SDXL, see our Run SDXL on RTX 3090 guide.
Setup with ComfyUI
# Clone ComfyUI
git clone https://github.com/comfyanonymous/ComfyUI.git
cd ComfyUI && pip install -r requirements.txt
# Download Flux.1 Dev checkpoint
wget -P models/unet/ \
https://huggingface.co/black-forest-labs/FLUX.1-dev/resolve/main/flux1-dev.safetensors
# Download required text encoders (CLIP-L + T5-XXL)
wget -P models/clip/ \
https://huggingface.co/comfyanonymous/flux_text_encoders/resolve/main/t5xxl_fp16.safetensors
# Launch ComfyUI
python main.py --listen 0.0.0.0 --port 8188
ComfyUI is the most popular workflow tool for Flux, supporting ControlNet, IP-Adapter, and LoRA nodes in a visual pipeline editor.
RTX 3090 Generation Benchmarks
Tested at 1024×1024 using diffusers with enable_model_cpu_offload(). See the benchmark tool for more data.
| Configuration | Steps | Time per Image | Images per Minute | Peak VRAM |
|---|---|---|---|---|
| Dev FP16, 1024×1024 | 28 | 12.4s | ~4.8 | 19.2 GB |
| Schnell FP16, 1024×1024 | 4 | 2.8s | ~21 | 19.0 GB |
| Dev FP8, 1024×1024 | 28 | 14.1s | ~4.2 | 14.6 GB |
| Dev FP16, 1280×1280 | 28 | 19.7s | ~3.0 | 23.1 GB |
Schnell at 4 steps delivers over 21 images per minute, making it suitable for real-time previewing workflows. Dev at full 28 steps produces higher-quality output at nearly 5 images per minute.
Optimisation Tips
- Use Schnell for previews and Dev for final renders. Schnell uses only 4 steps versus 28 for Dev, delivering 4x faster iterations.
- Enable
torch.compile()on the transformer for a 15-25% speedup on repeated generations with PyTorch 2.x. - FP8 quantisation loses minimal quality while halving model VRAM, freeing room for ControlNet and LoRA adapters.
- Use VAE tiling for resolutions above 1280×1280 to prevent OOM errors.
- Batch with Schnell at FP8 to generate 2 images simultaneously on 24 GB, doubling throughput.
Use the cost calculator to estimate per-image costs. Browse more deployment guides in the model guides section.
Next Steps
The RTX 3090 is one of the strongest single-GPU choices for Flux.1 at full quality. For complex multi-model ComfyUI pipelines, the RTX 5090 with 32 GB gives additional headroom. Compare generation costs across GPUs with the GPU comparisons tool. For the best budget alternative, see our Stable Diffusion VRAM guide.
Deploy Flux.1 Now
Generate stunning images with Flux.1 on a dedicated RTX 3090 server. Full root access, no generation limits, and UK data centre hosting.
Browse GPU Servers