A common dedicated server scenario: you want to host Stable Diffusion XL for image generation and a Llama or Qwen model for text, on one box. On our hosting this is a normal use case. The tricky part is that the ideal GPU for pure SDXL is different from the ideal for pure LLM – and if you pick wrong you will regret it on the workload you overlooked.
Topics
- What SDXL wants
- What LLMs want
- Where the ideals overlap
- Best GPUs for mixed workloads
- When to split across two cards
SDXL’s Demands
SDXL at 1024×1024 needs around 10-12 GB at FP16 for base plus VAE. Add a refiner and you cross 14 GB. Add ControlNet and IP-Adapter and you can push 18-22 GB. SDXL cares about compute TFLOPS more than pure bandwidth – the UNet is compute-bound. It likes larger tensor cores and newer architectures. Per-image latency scales with compute.
What LLMs Want
LLM decode is bandwidth-bound. A 7B FP16 model needs roughly 16 GB of weights which is read once per token. Bandwidth (not TFLOPS) sets your token/sec ceiling. VRAM ceiling sets which models you can load at what precision.
Where Ideals Align
Both workloads benefit from more VRAM. Both want newer silicon for Blackwell-era features (FP8, flash attention v3). The divergence is in the balance: SDXL is happy on cards with strong compute and moderate bandwidth; LLMs are happy on cards with strong bandwidth and moderate compute.
| GPU | SDXL | LLM | Mixed Verdict |
|---|---|---|---|
| RTX 5080 | Fast | Fast, 16 GB ceiling limits models | Best mixed card up to 13B LLM |
| RTX 5090 | Very fast | Excellent, 32 GB fits 70B INT4 | Best all-round mixed |
| RTX 3090 | Adequate | Very fast for 24 GB tier | Best value mixed |
| RTX 4060 Ti 16GB | Adequate | Slower decode | Budget mixed if latency tolerant |
| R9700 | Good | Good for 32 GB tier | Best non-CUDA mixed |
One Server, Both Workloads
We size cards to your full stack – no oversized GPU just because one workload asks for it.
Browse GPU ServersThe Practical Picks
For 80% of mixed workloads, the 5090 is the best single card: 32 GB handles both SDXL’s pipeline VRAM demand and sizeable LLMs, Blackwell speed is top-tier on both, and FP8 support helps LLMs more than SDXL today but will matter on both going forward. If budget matters, the 3090 remains an outstanding value pick.
Split Across Two Cards
If your traffic mix is heavy on both (image and text simultaneously with sustained concurrent users), consider two separate cards rather than one big one. A 5080 for SDXL and a 3090 for LLM gives you workload isolation, eliminates VRAM competition, and often costs less than one 6000 Pro. See multi-model serving.
For pure SDXL decisions see best GPU for SDXL. For pure LLM decisions see best GPU for LLM inference.