RTX 3050 - Order Now
Home / Blog / Model Guides / Run Flux.1 on RTX 3090 (Image Generation Guide)
Model Guides

Run Flux.1 on RTX 3090 (Image Generation Guide)

Complete guide to running Flux.1 Dev and Schnell on an RTX 3090. Covers VRAM requirements, ComfyUI and diffusers setup, generation benchmarks, and optimisation tips.

VRAM Check: Flux.1 on RTX 3090

Flux.1 is the leading open-source text-to-image model from Black Forest Labs, and the RTX 3090 with 24 GB GDDR6X is one of the best GPUs for running it at full FP16 precision on a dedicated GPU server. Here is how each Flux.1 variant fits:

VariantPrecisionModel VRAMPeak During GenerationFits RTX 3090?
Flux.1 DevFP16~24 GB~18-20 GBYes
Flux.1 SchnellFP16~24 GB~18-20 GBYes
Flux.1 DevFP8~12 GB~13-15 GBYes (9 GB spare)
Flux.1 DevNF4~6 GB~8-10 GBYes (14 GB spare)

At FP16 the RTX 3090 runs Flux.1 at full quality by offloading the text encoder after the prompt encoding step, keeping peak VRAM around 18-20 GB. With FP8 quantisation you free enough memory for ControlNet or LoRA extensions. For full VRAM sizing, read our Flux.1 VRAM requirements guide.

Setup with Diffusers

# Install dependencies
pip install diffusers transformers accelerate torch sentencepiece

# Generate with Flux.1 Dev
from diffusers import FluxPipeline
import torch

pipe = FluxPipeline.from_pretrained(
    "black-forest-labs/FLUX.1-dev",
    torch_dtype=torch.float16
)
pipe.enable_model_cpu_offload()

image = pipe(
    prompt="A futuristic GPU data centre in London, cyberpunk neon glow",
    num_inference_steps=28,
    guidance_scale=3.5,
    height=1024,
    width=1024
).images[0]
image.save("flux_output.png")

Using enable_model_cpu_offload() keeps peak VRAM under 20 GB by moving components to CPU when not in use. For a comparison of Flux versus SDXL, see our Run SDXL on RTX 3090 guide.

Setup with ComfyUI

# Clone ComfyUI
git clone https://github.com/comfyanonymous/ComfyUI.git
cd ComfyUI && pip install -r requirements.txt

# Download Flux.1 Dev checkpoint
wget -P models/unet/ \
  https://huggingface.co/black-forest-labs/FLUX.1-dev/resolve/main/flux1-dev.safetensors

# Download required text encoders (CLIP-L + T5-XXL)
wget -P models/clip/ \
  https://huggingface.co/comfyanonymous/flux_text_encoders/resolve/main/t5xxl_fp16.safetensors

# Launch ComfyUI
python main.py --listen 0.0.0.0 --port 8188

ComfyUI is the most popular workflow tool for Flux, supporting ControlNet, IP-Adapter, and LoRA nodes in a visual pipeline editor.

RTX 3090 Generation Benchmarks

Tested at 1024×1024 using diffusers with enable_model_cpu_offload(). See the benchmark tool for more data.

ConfigurationStepsTime per ImageImages per MinutePeak VRAM
Dev FP16, 1024×10242812.4s~4.819.2 GB
Schnell FP16, 1024×102442.8s~2119.0 GB
Dev FP8, 1024×10242814.1s~4.214.6 GB
Dev FP16, 1280×12802819.7s~3.023.1 GB

Schnell at 4 steps delivers over 21 images per minute, making it suitable for real-time previewing workflows. Dev at full 28 steps produces higher-quality output at nearly 5 images per minute.

Optimisation Tips

  • Use Schnell for previews and Dev for final renders. Schnell uses only 4 steps versus 28 for Dev, delivering 4x faster iterations.
  • Enable torch.compile() on the transformer for a 15-25% speedup on repeated generations with PyTorch 2.x.
  • FP8 quantisation loses minimal quality while halving model VRAM, freeing room for ControlNet and LoRA adapters.
  • Use VAE tiling for resolutions above 1280×1280 to prevent OOM errors.
  • Batch with Schnell at FP8 to generate 2 images simultaneously on 24 GB, doubling throughput.

Use the cost calculator to estimate per-image costs. Browse more deployment guides in the model guides section.

Next Steps

The RTX 3090 is one of the strongest single-GPU choices for Flux.1 at full quality. For complex multi-model ComfyUI pipelines, the RTX 5090 with 32 GB gives additional headroom. Compare generation costs across GPUs with the GPU comparisons tool. For the best budget alternative, see our Stable Diffusion VRAM guide.

Deploy Flux.1 Now

Generate stunning images with Flux.1 on a dedicated RTX 3090 server. Full root access, no generation limits, and UK data centre hosting.

Browse GPU Servers

Need a Dedicated GPU Server?

Deploy from RTX 3050 to RTX 5090. Full root access, NVMe storage, 1Gbps — UK datacenter.

Browse GPU Servers

admin

We benchmark, deploy, and optimise GPU infrastructure for AI workloads. All data in our guides comes from real-world testing on our UK-based dedicated GPU servers.

Ready to deploy your AI workload?

Dedicated GPU servers from our UK datacenter. NVMe storage, 1Gbps networking, full root access.

Browse GPU Servers Contact Sales

Have a question? Need help?