Home / Blog / Model Guides / Run Flux.1 on RTX 3090 (Image Generation Guide)

Model Guides

Run Flux.1 on RTX 3090 (Image Generation Guide)

Complete guide to running Flux.1 Dev and Schnell on an RTX 3090. Covers VRAM requirements, ComfyUI and diffusers setup, generation benchmarks, and optimisation tips.

Model Guides April 14, 2026 3 min read gigagpu

Table of Contents

VRAM Check: Flux.1 on RTX 3090
Setup with Diffusers
Setup with ComfyUI
RTX 3090 Generation Benchmarks
Optimisation Tips
Next Steps

VRAM Check: Flux.1 on RTX 3090

Flux.1 is the leading open-source text-to-image model from Black Forest Labs, and the RTX 3090 with 24 GB GDDR6X is one of the best GPUs for running it at full FP16 precision on a dedicated GPU server. Here is how each Flux.1 variant fits:

Variant	Precision	Model VRAM	Peak During Generation	Fits RTX 3090?
Flux.1 Dev	FP16	~24 GB	~18-20 GB	Yes
Flux.1 Schnell	FP16	~24 GB	~18-20 GB	Yes
Flux.1 Dev	FP8	~12 GB	~13-15 GB	Yes (9 GB spare)
Flux.1 Dev	NF4	~6 GB	~8-10 GB	Yes (14 GB spare)

At FP16 the RTX 3090 runs Flux.1 at full quality by offloading the text encoder after the prompt encoding step, keeping peak VRAM around 18-20 GB. With FP8 quantisation you free enough memory for ControlNet or LoRA extensions. For full VRAM sizing, read our Flux.1 VRAM requirements guide.

Setup with Diffusers

# Install dependencies
pip install diffusers transformers accelerate torch sentencepiece

# Generate with Flux.1 Dev
from diffusers import FluxPipeline
import torch

pipe = FluxPipeline.from_pretrained(
    "black-forest-labs/FLUX.1-dev",
    torch_dtype=torch.float16
)
pipe.enable_model_cpu_offload()

image = pipe(
    prompt="A futuristic GPU data centre in London, cyberpunk neon glow",
    num_inference_steps=28,
    guidance_scale=3.5,
    height=1024,
    width=1024
).images[0]
image.save("flux_output.png")

Using enable_model_cpu_offload() keeps peak VRAM under 20 GB by moving components to CPU when not in use. For a comparison of Flux versus SDXL, see our Run SDXL on RTX 3090 guide.

Setup with ComfyUI

# Clone ComfyUI
git clone https://github.com/comfyanonymous/ComfyUI.git
cd ComfyUI && pip install -r requirements.txt

# Download Flux.1 Dev checkpoint
wget -P models/unet/ \
  https://huggingface.co/black-forest-labs/FLUX.1-dev/resolve/main/flux1-dev.safetensors

# Download required text encoders (CLIP-L + T5-XXL)
wget -P models/clip/ \
  https://huggingface.co/comfyanonymous/flux_text_encoders/resolve/main/t5xxl_fp16.safetensors

# Launch ComfyUI
python main.py --listen 0.0.0.0 --port 8188

ComfyUI is the most popular workflow tool for Flux, supporting ControlNet, IP-Adapter, and LoRA nodes in a visual pipeline editor.

RTX 3090 Generation Benchmarks

Tested at 1024×1024 using diffusers with enable_model_cpu_offload(). See the benchmark tool for more data.

Configuration	Steps	Time per Image	Images per Minute	Peak VRAM
Dev FP16, 1024×1024	28	12.4s	~4.8	19.2 GB
Schnell FP16, 1024×1024	4	2.8s	~21	19.0 GB
Dev FP8, 1024×1024	28	14.1s	~4.2	14.6 GB
Dev FP16, 1280×1280	28	19.7s	~3.0	23.1 GB

Schnell at 4 steps delivers over 21 images per minute, making it suitable for real-time previewing workflows. Dev at full 28 steps produces higher-quality output at nearly 5 images per minute.

Optimisation Tips

Use Schnell for previews and Dev for final renders. Schnell uses only 4 steps versus 28 for Dev, delivering 4x faster iterations.
Enable torch.compile() on the transformer for a 15-25% speedup on repeated generations with PyTorch 2.x.
FP8 quantisation loses minimal quality while halving model VRAM, freeing room for ControlNet and LoRA adapters.
Use VAE tiling for resolutions above 1280×1280 to prevent OOM errors.
Batch with Schnell at FP8 to generate 2 images simultaneously on 24 GB, doubling throughput.

Use the cost calculator to estimate per-image costs. Browse more deployment guides in the model guides section.

Next Steps

The RTX 3090 is one of the strongest single-GPU choices for Flux.1 at full quality. For complex multi-model ComfyUI pipelines, the RTX 5090 with 32 GB gives additional headroom. Compare generation costs across GPUs with the GPU comparisons tool. For the best budget alternative, see our Stable Diffusion VRAM guide.

Deploy Flux.1 Now

Generate stunning images with Flux.1 on a dedicated RTX 3090 server. Full root access, no generation limits, and UK data centre hosting.

Browse GPU Servers

Need a Dedicated GPU Server?

Deploy from RTX 3050 to RTX 5090. Full root access, NVMe storage, 1Gbps — UK datacenter.

Browse GPU Servers

Model Guides

gigagpu

We benchmark, deploy, and optimise GPU infrastructure for AI workloads. All data in our guides comes from real-world testing on our UK-based dedicated GPU servers.

Ready to deploy your AI workload?

Dedicated GPU servers from our UK datacenter. NVMe storage, 1Gbps networking, full root access.

Browse GPU Servers Contact Sales

Run Flux.1 on RTX 3090 (Image Generation Guide)

VRAM Check: Flux.1 on RTX 3090

Setup with Diffusers

Setup with ComfyUI

RTX 3090 Generation Benchmarks

Optimisation Tips

Next Steps

Deploy Flux.1 Now

Need a Dedicated GPU Server?

gigagpu

GPU Hosting

Blog Categories

AI Model Hosting

Benchmarks & Tools

Deploy a GPU Server

Ready to deploy your AI workload?

Have a question? Need help?

Run Flux.1 on RTX 3090 (Image Generation Guide)

VRAM Check: Flux.1 on RTX 3090

Setup with Diffusers

Setup with ComfyUI

RTX 3090 Generation Benchmarks

Optimisation Tips

Next Steps

Deploy Flux.1 Now

Need a Dedicated GPU Server?

gigagpu

Related Articles

Qwen 2.5 32B VRAM Requirements: FP16, FP8, INT4 and KV Cache Explained

Qwen 2.5 32B VRAM Requirements: FP16, FP8 and AWQ INT4 Numbers

Gemma 2 for Data Extraction & OCR: GPU Requirements & Setup

Qwen VRAM Requirements (All Model Sizes)

GPU Hosting

Blog Categories

AI Model Hosting

Benchmarks & Tools

Deploy a GPU Server

Ready to deploy your AI workload?

Have a question? Need help? Contact us

Have a question? Need help?