The RTX 3090 (Ampere, 24 GB) and RTX 5060 Ti 16GB (Blackwell) are both popular on our hosting. Full comparison:
Contents
Specs
| Spec | RTX 5060 Ti 16GB | RTX 3090 24GB |
|---|---|---|
| Arch | Blackwell GB206 | Ampere GA102 |
| CUDA cores | 4,608 | 10,496 |
| VRAM | 16 GB GDDR7 | 24 GB GDDR6X |
| Bandwidth | 448 GB/s | 936 GB/s |
| FP8 tensor cores | 5th gen, native | None (emulated) |
| TDP | 180 W | 350 W |
| PCIe | Gen 5 x8 | Gen 4 x16 |
LLM Decode (Llama 3.1 8B, batch 1)
| Precision | 5060 Ti t/s | 3090 t/s | Winner |
|---|---|---|---|
| FP16 | N/A (OOM) | 78 | 3090 fits |
| FP8 | 112 | 65 (emulated) | 5060 Ti +72% |
| AWQ INT4 | 135 | 150 | 3090 +11% |
| GGUF Q4 | 95 | 110 | 3090 +16% |
The 3090 has more raw bandwidth (936 vs 448), so at INT4 it wins pure throughput. At FP8 the 5060 Ti’s native tensor cores overwhelm the 3090’s emulated path.
FP8 Is the Game-Changer
FP8 serving on Blackwell is a different regime:
- 5060 Ti aggregate at batch 32, FP8: 720 t/s
- 3090 aggregate at batch 32, AWQ INT4: 950 t/s
- At batch 32 with AWQ INT4 on 5060 Ti: 620 t/s
The 3090 still wins aggregate because of bandwidth, but the 5060 Ti draws half the power doing it.
VRAM Implications
- 3090 24 GB serves FP16 7-8B models or INT4 Mixtral 8x7B – things 5060 Ti can’t
- 5060 Ti caps at 14B AWQ, no room for Mixtral without CPU offload
- For FP8-era serving the 5060 Ti’s 16 GB is actually enough for 99% of mainstream models
Verdict
- Pick 5060 Ti: FP8 serving, tokens/watt, lower TDP, new driver/CUDA support, brand-new hardware warranty
- Pick 3090: need 24 GB VRAM for larger models, running INT4 workloads at peak throughput, secondhand pricing
Blackwell Efficiency vs Ampere Bandwidth
Compare on GPU hosting. UK-based.
Order the RTX 5060 Ti 16GBSee also: 5060 Ti or 3090 decision, vs 4060, vs 5080, vs 5060 8GB, tokens/watt.