Table of Contents
Flux.1 Model Family Overview
Flux.1 is Black Forest Labs’ state-of-the-art text-to-image model, representing a significant step up from Stable Diffusion in quality and prompt adherence. The model family includes three variants: Dev (full quality, 20+ steps), Schnell (distilled, 4 steps), and Pro (API-only, highest quality). For self-hosted Flux hosting on a dedicated GPU server, Dev and Schnell are the primary options.
With approximately 12 billion parameters in the diffusion transformer, Flux.1 demands considerably more VRAM than SDXL. Understanding the exact requirements for each variant prevents out-of-memory errors and helps you choose the right GPU.
VRAM Requirements by Variant
| Variant | Parameters | FP16 Weights | Total VRAM (1024×1024) | Steps |
|---|---|---|---|---|
| Flux.1 Dev | ~12B | ~24 GB | ~18-20 GB | 20-50 |
| Flux.1 Schnell | ~12B | ~24 GB | ~18-20 GB | 1-4 |
| Flux.1 Dev (FP8) | ~12B | ~12 GB | ~13-15 GB | 20-50 |
| Flux.1 Dev (NF4) | ~12B | ~6 GB | ~8-10 GB | 20-50 |
Note that total VRAM during generation is often less than the raw model weight size because the VAE, text encoders, and diffusion model do not all need to be in memory simultaneously. Modern inference pipelines use model offloading to manage this, but the peak usage during the diffusion process still requires substantial VRAM.
Resolution and Batch Size Impact
| Resolution | FP16 Total VRAM | FP8 Total VRAM |
|---|---|---|
| 512×512 | ~16 GB | ~11 GB |
| 768×768 | ~17 GB | ~12 GB |
| 1024×1024 | ~18-20 GB | ~13-15 GB |
| 1280×1280 | ~22-24 GB | ~16-18 GB |
| 1024×1024, batch 2 | ~28-32 GB | ~20-24 GB |
Flux VRAM scales more aggressively with resolution than SDXL due to the larger transformer architecture. Batch generation at FP16 is only feasible on 32GB+ GPUs. For comparison with older models, see the Stable Diffusion VRAM requirements guide.
FP8 and NF4 Quantisation
Flux.1 supports FP8 quantisation, which halves the model weight memory from ~24GB to ~12GB with minimal quality loss. This is the recommended approach for running Flux on 16GB GPUs. NF4 (4-bit) quantisation further reduces weights to ~6GB but introduces more visible quality degradation, particularly in fine details and text rendering.
FP8 Flux on a 16GB card like the RTX 4060 Ti is feasible at 1024×1024 with around 13-15GB total usage. NF4 Flux fits on 8GB cards like the RTX 4060 but quality is noticeably reduced. For best quality, FP16 on a 24GB RTX 3090 or higher is recommended.
GPU Recommendations
| GPU | VRAM | Flux Capability | Quality Level |
|---|---|---|---|
| RTX 3050 (6GB) | 6 GB | Not feasible | N/A |
| RTX 4060 (8GB) | 8 GB | NF4 only, reduced quality | Low |
| RTX 4060 Ti (16GB) | 16 GB | FP8, good quality | Good |
| RTX 3090 (24GB) | 24 GB | FP16, full quality | Best |
| RTX 5090 (32GB) | 32 GB | FP16 + extensions | Best + extras |
ComfyUI Workflow VRAM Overhead
Running Flux through ComfyUI adds VRAM overhead for workflow components. ControlNet adds 1-3GB depending on the model. IP-Adapter adds 2-4GB. Combining Flux with multiple control models can push total VRAM to 24-30GB, requiring a 32GB GPU for complex pipelines.
For ComfyUI VRAM planning, account for each node’s memory footprint separately. The VRAM cost guide provides formulas for estimating multi-model pipeline requirements. Compare GPU options with the GPU comparisons tool.
Run Flux.1 on Dedicated GPU Servers
Generate stunning images with Flux.1 Dev and Schnell on dedicated GPU servers. From 16GB FP8 to 32GB full-quality, find the right configuration.
Browse GPU Servers