Table of Contents
Can RTX 4060 Run SDXL?
Yes, the RTX 4060 can run Stable Diffusion XL and generate 1024×1024 images, but you need FP16 precision with memory optimizations enabled. The RTX 4060 has 8 GB of GDDR6X VRAM, which is tight for SDXL but workable with the right settings. Expect generation times of 8-12 seconds per image at default steps on a dedicated GPU server.
SDXL is a 6.6 billion parameter model (base + refiner), significantly larger than SD 1.5. It was designed for higher resolution output but demands more VRAM. The 4060 handles it, though not as comfortably as GPUs with 12+ GB.
VRAM Analysis: SDXL on 8 GB
Here is how SDXL’s VRAM requirements break down across different Stable Diffusion versions:
| Model | Parameters | FP16 VRAM (Generating) | With Refiner | Fits RTX 4060? |
|---|---|---|---|---|
| SD 1.5 | 860M | ~3.5 GB | N/A | Yes (comfortable) |
| SD 2.1 | 865M | ~3.8 GB | N/A | Yes (comfortable) |
| SDXL Base | 3.5B | ~6.5 GB | N/A | Yes (tight) |
| SDXL Base + Refiner | 6.6B | ~7.5 GB | ~12 GB sequential | Sequential only |
| Flux.1 Dev | 12B | ~12 GB | N/A | No (needs quantization) |
SDXL Base alone uses about 6.5 GB during generation at 1024×1024, leaving roughly 1.5 GB headroom. The refiner must be loaded sequentially (not alongside the base model) to fit. For full details across all SD variants, see our Stable Diffusion VRAM requirements guide.
Generation Speed Benchmarks
Real-world generation times on the RTX 4060 with SDXL at various configurations:
| Resolution | Steps | Sampler | Time (seconds) | it/s |
|---|---|---|---|---|
| 1024×1024 | 20 | DPM++ 2M | ~8.5 | ~2.4 |
| 1024×1024 | 30 | DPM++ 2M | ~12.5 | ~2.4 |
| 1024×1024 | 20 | Euler a | ~8.0 | ~2.5 |
| 768×768 | 20 | DPM++ 2M | ~5.5 | ~3.6 |
| 1024×1024 | 20 | LCM | ~3.5 (4 steps) | ~1.1 |
These times are measured with xformers enabled and FP16 precision. Without optimizations, times can double. For speed comparisons across GPUs, see our best GPU for Stable Diffusion guide.
Required Optimizations for 8 GB
To run SDXL reliably on 8 GB VRAM, enable these optimizations:
- xformers or SDP attention: Reduces VRAM usage during the attention computation by 30-40%. Essential for 8 GB cards.
- FP16 VAE: Use the FP16-fix VAE to avoid black images while saving memory.
- Sequential refiner loading: Load the refiner after unloading the base model, not simultaneously.
- –medvram or –medvram-sdxl: In Automatic1111, this moves parts of the model between GPU and CPU as needed.
- Token merging (ToMe): Optional 20-30% speedup with minimal quality loss.
Do NOT use –lowvram unless absolutely necessary, as it dramatically slows generation. The 4060’s 8 GB is sufficient with the above optimizations. Read our Stable Diffusion hosting page for deployment best practices.
What Can You Actually Generate?
Here is what works and what doesn’t on the RTX 4060 with SDXL:
- 1024×1024, single image: Works well. 8-12 seconds per image.
- 1024×1024 with refiner: Works (sequential loading). ~15-18 seconds total.
- 1536×1536 or higher: Likely to OOM. Reduce to 1024×1024 or use tiled upscaling.
- Batch of 2+ images: Risky at 1024×1024. Works at 768×768.
- ControlNet + SDXL: Very tight. May need –medvram-sdxl.
- SDXL + LoRA: Works fine. LoRAs add minimal VRAM overhead.
For workflows requiring higher resolution, batching, or ControlNet, an RTX 3090 with 24 GB gives you much more headroom. See also our image generator hosting options.
Setup Guide (A1111 + ComfyUI)
Get SDXL running on your RTX 4060 server:
Automatic1111 Web UI
# Clone and launch with SDXL optimizations
git clone https://github.com/AUTOMATIC1111/stable-diffusion-webui
cd stable-diffusion-webui
./webui.sh --xformers --medvram-sdxl
# Place SDXL model in models/Stable-diffusion/
# Download from HuggingFace: stabilityai/stable-diffusion-xl-base-1.0
ComfyUI (Recommended for SDXL)
# Clone and launch ComfyUI
git clone https://github.com/comfyanonymous/ComfyUI
cd ComfyUI
pip install -r requirements.txt
python main.py --force-fp16
# ComfyUI handles VRAM more efficiently than A1111 for SDXL
ComfyUI is generally recommended over Automatic1111 for SDXL on 8 GB cards due to better memory management. For server deployment guidance, see our deploy Stable Diffusion server tutorial.
GPU Alternatives for SDXL
| GPU | VRAM | SDXL 1024×1024 | Batch Size | Best For |
|---|---|---|---|---|
| RTX 3050 | 8 GB | ~12s (tight) | 1 | Testing only |
| RTX 4060 | 8 GB | ~8.5s | 1 | Personal use |
| RTX 4060 Ti | 16 GB | ~7s | 2-3 | Light production |
| RTX 3090 | 24 GB | ~5s | 4-6 | Production |
For serious image generation workloads, 16+ GB VRAM makes a significant difference. Compare pricing and performance in our cheapest GPU for AI inference guide. You can also explore the RTX 3090 Flux.1 analysis if you’re considering next-gen models.
Deploy This Model Now
Dedicated GPU servers with the VRAM you need. UK datacenter, full root access.
Browse GPU Servers