The natural next step from the RTX 5060 Ti 16GB is the RTX 5090 32GB on our hosting. Here is what changes.
Contents
Spec Delta
| Spec | 5060 Ti 16GB | 5090 | Delta |
|---|---|---|---|
| VRAM | 16 GB | 32 GB | 2x |
| Bandwidth | 448 GB/s | 1,792 GB/s | 4x |
| CUDA cores | 4,608 | 21,760 | 4.7x |
| TDP | 180 W | 575 W | 3.2x |
| Price per month | ~£300 | ~£900 | 3x |
Models Unlocked
Upgrading unlocks:
- Llama 3 70B INT4 natively (barely on 5060 Ti with offload)
- Qwen 2.5 32B at AWQ with real KV cache
- Gemma 2 27B at FP8
- Mixtral 8x7B at AWQ
- Long-context Mistral Nemo at 128k with 8+ concurrent users
Throughput
For models that fit both:
- Llama 3 8B FP8: 5060 Ti 820 t/s, 5090 ~1,450 t/s aggregate at batch 16 (+77%)
- Mistral 7B FP8: 5060 Ti 650 t/s, 5090 ~1,200 t/s aggregate (+85%)
- SDXL Lightning: 5060 Ti 0.95 s/img, 5090 0.45 s/img (+110%)
Pays Back
At 3x the monthly cost, the 5090 needs to deliver 3x value. It does not quite on throughput alone (~2x). It does if any of:
- Your target model only fits 32 GB (not 16)
- Latency is worth a premium to users
- You are replacing two 5060 Ti deployments with one bigger card