Blackwell silicon runs across the RTX 5080 and RTX 5090 on our dedicated GPU hosting. Both ship with FP8 tensor cores and GDDR7. The 5090 is meaningfully more expensive. For AI workloads, how much better does it actually perform?
Sections
- The spec gap
- Where the 32 GB / 16 GB split matters
- Tokens per second
- SDXL and image generation
- When the 5090 is worth it
Spec Gap
| Spec | RTX 5080 | RTX 5090 |
|---|---|---|
| VRAM | 16 GB GDDR7 | 32 GB GDDR7 |
| Memory bandwidth | ~960 GB/s | ~1,792 GB/s |
| CUDA cores | ~10,752 | ~21,760 |
| FP8 tensor | Yes | Yes |
| TDP | 360 W | 575 W |
The 5090 has roughly double everything: VRAM, bandwidth, compute, power draw. On paper the 5090 is the 5080 plus fifty percent. In practice, the gap varies by workload.
The 16 GB vs 32 GB Split
This is the most important line in the table. 16 GB hosts 7B models comfortably at FP16, 13B models at INT8, and 30B models at INT4 with tight context. 32 GB hosts 13B at FP16, 30B at INT8, and opens 70B INT4 for single-card serving. If your target model sits above 13B, the 5090 is not an upgrade – it is the only one of the two that works. See can the 5090 run 70B and 70B INT4 VRAM.
Token Throughput
Where both cards fit a model, the 5090 runs roughly 60-80% faster per token on decode-heavy workloads because it has near-double the memory bandwidth. For prefill-heavy workloads (large prompts, RAG) the compute gap matters more and the 5090 advantage approaches 90-100%.
| Workload | 5080 | 5090 |
|---|---|---|
| Mistral 7B INT8 decode | ~85 t/s | ~145 t/s |
| Llama 3 8B INT4 decode | ~110 t/s | ~185 t/s |
| Llama 3 70B INT4 | Does not fit | ~38 t/s |
| SDXL 1024×1024 30 steps | ~2.3 s | ~1.4 s |
Upgrade Only When It Pays Back
We host both cards on fixed UK monthly pricing – no need to guess from synthetic benchmarks.
Browse GPU ServersSDXL and Video
For pure image generation the 5090 runs about 40-50% faster than the 5080 per image. For video models where VRAM matters (CogVideoX, Hunyuan Video), only the 5090 is in the running. The 5080 runs out of memory on most modern video models. See our Hunyuan Video VRAM page.
When to Pick the 5090
Jump to the 5090 if any of these apply: your model is above 13B, you serve video, you batch many concurrent users, or you fine-tune. Stick with the 5080 if you are serving 7-13B LLMs with modest concurrency, running SDXL production at reasonable pace, or testing the economics before committing. The 5090 is not a luxury upgrade – it is a capability upgrade. Workloads either need it or they do not.
For step-up decisions see 6000 Pro vs dual 5090, and for value floor analysis see VRAM per pound.