Home / Blog / Model Guides / Stable Diffusion XL VRAM Requirements: From 6 GB Minimum to Production-Ready

Model Guides

Stable Diffusion XL VRAM Requirements: From 6 GB Minimum to Production-Ready

How much VRAM does SDXL actually need? Numbers for FP16, FP8, INT8, with and without ControlNets, LoRAs and refiners. Plus the GPU we recommend for each tier.

Model Guides May 4, 2026 3 min read gigagpu

Table of Contents

SDXL is the open-weight image model that scales most predictably with VRAM. Doubles in memory cleanly between FP16 and FP8, plays well with offloading, and the rendering pipeline (UNet → VAE → optional refiner) has a known VRAM signature you can plan against. This page is the precise sizing reference.

TL;DR

SDXL base needs ~8 GB at FP16 for the UNet, plus 1–2 GB for the VAE and text encoders. With LoRAs and a single ControlNet you’re at ~12 GB. With the refiner ensemble, ~16 GB. So:

6 GB cards — works with offloading, slow
8 GB — comfortable base SDXL
12 GB — base + LoRAs + 1 ControlNet
16 GB — full refiner ensemble
24 GB+ — multi-pipeline, batch generation

SDXL base model VRAM

Component	FP32	FP16	FP8	INT8
UNet (2.6B params)	10.4 GB	5.2 GB	2.6 GB	2.6 GB
VAE	0.4 GB	0.2 GB	0.1 GB	0.1 GB
Text encoder 1 (CLIP-L)	0.5 GB	0.25 GB	0.13 GB	0.13 GB
Text encoder 2 (OpenCLIP-G)	2.8 GB	1.4 GB	0.7 GB	0.7 GB
Activations + buffers	~2 GB	~1.5 GB	~1.5 GB	~1.5 GB
Total (1024×1024)	~16 GB	~8.5 GB	~5 GB	~5 GB

Almost nobody runs SDXL at FP32 in production — FP16 is the default. FP8 is supported on Blackwell (5080/5090/6000 Pro) via TensorRT-LLM-style quantisation; quality drop is <1%.

SDXL variants — Turbo, Lightning, refiner

SDXL Base — the original. 25–50 sampling steps. ~8 GB FP16.
SDXL Refiner — second-pass model that adds detail. ~5 GB FP16. Add to a Base pipeline → ~13 GB total.
SDXL Turbo — distilled for 1–4 step generation. Same VRAM as Base; just faster.
SDXL Lightning — LCM-style 2/4/8-step distilled model. Same VRAM.
Hyper-SDXL — 1-step variant, same VRAM.

ControlNets, IP-Adapters, LoRAs

Add-on	VRAM cost	Notes
LoRA (single, rank 32-128)	~50–200 MB	Trivial. Hot-load 8–10 LoRAs on a 12 GB card.
LoRA (XL trained, rank 256+)	~400 MB	Larger; trim if VRAM-tight.
ControlNet (one)	~2.5 GB FP16 / 1.3 GB FP8	Each adds an additional UNet pass.
IP-Adapter	~0.5 GB	Cheap. Hot-load several.
Inpainting model	~1 GB additional	On top of base.
Refiner	~5 GB FP16	Doubles the pipeline VRAM at peak.

Concrete example: SDXL Base + 1 ControlNet + 2 LoRAs + IP-Adapter at FP16 = ~12 GB peak. Comfortable on a 3090 24 GB; tight on a 5080 16 GB.

GPU recommendations by tier

Tier	GPU	What it handles
Minimum	RTX 3050 6 GB	SDXL FP16 with sequential CPU offload only. ~25 s per 1024.
Entry	RTX 4060 8 GB	SDXL Base FP16 fits. No room for ControlNets at FP16.
Comfortable	RTX 3060 12 GB	Base + LoRAs + 1 ControlNet. The first card we recommend.
Sweet spot	RTX 5080 16 GB	Base + LoRAs + 2 ControlNets. FP8 path. Fast.
Production	RTX 5090 32 GB	Full ensemble + batch + multiple pipelines hot-loaded.
Workstation	RTX 6000 Pro 96 GB	Multi-model serving (SDXL + FLUX + SD3).

Memory-saving tricks that actually work

Sequential CPU offload — moves text encoders + VAE to CPU between forward passes. Cuts VRAM to ~5 GB. Costs ~30% in latency. Diffusers: pipe.enable_sequential_cpu_offload().
VAE tiling + slicing — decodes the latent in tiles. Lets you generate >1024² on small cards. pipe.enable_vae_tiling().
FP8 weights + FP16 cache — halves UNet VRAM with <1% quality regression on Blackwell.
Attention slicing — recomputes attention in chunks. Mostly obsoleted by xformers, but still useful as a fallback.
xformers / SDPA — efficient attention kernels. Saves 20–30% peak VRAM and is faster.
torch.compile — JIT-compiled UNet. ~15% faster, no VRAM cost. Long warm-up though.

Speed by GPU

Steps × resolution × sampler combine to determine wall time. Reference numbers below use Euler-A with 30 steps at 1024×1024.

GPU	SDXL Base FP16	SDXL Turbo (4-step)	Notes
RTX 3050 6 GB	25 s	4 s	CPU offload required
RTX 3060 12 GB	11 s	2 s	Comfortable
RTX 5060 Ti 16 GB	7 s	1.4 s	Best entry-tier speed
RTX 5080	5 s	1.0 s	FP8 path drops to 3.5 s
RTX 3090	5 s	1.1 s	24 GB headroom
RTX 5090	2.5 s	0.6 s	FP8 path; batch-friendly
RTX 6000 Pro	2.6 s	0.6 s	Same speed, more headroom

Bottom line

For new SDXL deployments on dedicated hardware, take the RTX 5090 — fastest per image, FP8 hardware, 32 GB lets you run multiple pipelines hot-loaded. If cost-anchored, the RTX 3060 12 GB at £99/mo is the cheapest dedicated server we host that runs full SDXL comfortably. See our image generator hosting hub for a deeper deployment guide and our RTX 5090 + Stable Diffusion verdict for the spec-by-spec.

Need a Dedicated GPU Server?

Deploy from RTX 3050 to RTX 5090. Full root access, NVMe storage, 1Gbps — UK datacenter.

Browse GPU Servers

Model Guides

gigagpu

We benchmark, deploy, and optimise GPU infrastructure for AI workloads. All data in our guides comes from real-world testing on our UK-based dedicated GPU servers.

Ready to deploy your AI workload?

Dedicated GPU servers from our UK datacenter. NVMe storage, 1Gbps networking, full root access.

Browse GPU Servers Contact Sales

Stable Diffusion XL VRAM Requirements: From 6 GB Minimum to Production-Ready

SDXL base model VRAM

SDXL variants — Turbo, Lightning, refiner

ControlNets, IP-Adapters, LoRAs

GPU recommendations by tier

Memory-saving tricks that actually work

Speed by GPU

Bottom line

Need a Dedicated GPU Server?

gigagpu

GPU Hosting

Blog Categories

AI Model Hosting

Benchmarks & Tools

Deploy a GPU Server

Ready to deploy your AI workload?

Have a question? Need help?

Stable Diffusion XL VRAM Requirements: From 6 GB Minimum to Production-Ready

SDXL base model VRAM

SDXL variants — Turbo, Lightning, refiner

ControlNets, IP-Adapters, LoRAs

GPU recommendations by tier

Memory-saving tricks that actually work

Speed by GPU

Bottom line

Need a Dedicated GPU Server?

gigagpu

Related Articles

RTX 4090 24GB for Mistral Nemo 12B: 128k Context at FP8 with deep VRAM math

Maximum LLM Size That Fits the RTX 5060 Ti 16 GB

Run LLaMA 3 8B on RTX 3090 (Setup + Benchmarks)

Self-Hosted Mistral Small 22B Deployment Guide

GPU Hosting

Blog Categories

AI Model Hosting

Benchmarks & Tools

Deploy a GPU Server

Ready to deploy your AI workload?

Have a question? Need help? Contact us

Have a question? Need help?