SDXL on AMD used to be a rough road. The ROCm stack has closed most of the gap in 2026 and the R9700 now ships with 32 GB of VRAM – double what the RTX 5080 offers. On our dedicated GPU servers we have run both cards through the same SDXL pipeline so you can compare something beyond vendor datasheets.
In This Guide
- Spec sheet comparison
- Software stack maturity in 2026
- Images per minute, both cards
- Where the VRAM difference matters
- Which card actually wins
Specifications
| Spec | R9700 Pro | RTX 5080 |
|---|---|---|
| VRAM | 32 GB GDDR6 | 16 GB GDDR7 |
| Memory bandwidth | ~640 GB/s | ~960 GB/s |
| FP16 TFLOPS | Strong, parity-ish with 5080 in RDNA4 | Higher on paper |
| Software | ROCm 6.x | CUDA 12.x / 13.x |
| TDP | ~260 W | 360 W |
Software Stack in 2026
The old “AMD cannot run SDXL” story is gone. PyTorch supports ROCm natively, Diffusers works without monkey patches, and xFormers alternatives like Flash Attention for ROCm are stable. The 5080 still has the easier path – every tutorial on the internet assumes CUDA – but R9700 is no longer a science project. Our ROCm vs CUDA production guide walks through the real-world delta in tooling.
Images Per Minute
SDXL 1024×1024 at 30 steps, batch 1:
| Pipeline | R9700 32GB | RTX 5080 16GB |
|---|---|---|
| SDXL base only | ~3.1 s/image | ~2.3 s/image |
| SDXL + refiner | ~4.8 s/image | ~3.6 s/image |
| SDXL + 2 ControlNets | ~6.2 s/image | ~4.9 s/image |
| Batch 4, base only | Fits easily | Tight, needs VAE slicing |
Per-image speed favours the 5080 by 25-35%. Batch capacity favours the R9700 by a wider margin.
Benchmark Your Own SDXL Pipeline
We’ll set you up with either GPU on a dedicated server – you bring the prompts, we bring the hardware.
Browse GPU ServersWhen 32 GB Beats 16 GB
If you run SDXL alone on a 1024 square at batch 1, both cards work. The moment you start combining – base plus refiner plus two ControlNets plus an IP-Adapter plus a LoRA stack – the 16 GB 5080 runs out of headroom first. The R9700 swallows the entire pipeline without VAE tiling or model offload. See our IP-Adapter production setup guide for how quickly VRAM fills in a real stack.
The Verdict
For a pure “make SDXL images fast, one at a time” workload the 5080 wins on raw speed. For production pipelines with multiple auxiliary models loaded simultaneously, the R9700 wins because it does not force you to hot-swap components between GPU and CPU. If you are doing training or fine-tuning LoRAs on your own datasets, the 32 GB is decisive.
For broader context see our best GPU for SDXL guide and the SDXL VRAM requirements page.