The RTX 3090 and RTX 5090 represent two eras of NVIDIA consumer GPUs — and both are offered as dedicated GPU hosting at GigaGPU. The 3090 (24GB GDDR6X, Ampere) remains the cost-efficiency champion for LLM inference. The 5090 (32GB GDDR7, Blackwell) is the new flagship, with FP8 tensor cores and nearly double the memory bandwidth. Which one should you deploy?
Specs Comparison
| Spec | RTX 3090 | RTX 5090 |
|---|---|---|
| Architecture | Ampere (GA102) | Blackwell (GB202) |
| VRAM | 24 GB GDDR6X | 32 GB GDDR7 |
| Memory Bandwidth | 936 GB/s | 1,792 GB/s |
| CUDA Cores | 10,496 | 21,760 |
| Tensor Cores | 328 (3rd gen) | 680 (5th gen) |
| FP8 support | No | Yes (native) |
| TDP | 350W | 575W |
The 5090’s advantages: nearly 2x memory bandwidth, 2x CUDA cores, native FP8, and 33% more VRAM. The 3090’s advantages: lower power, cheaper monthly hosting, and mature framework support.
LLM Inference Performance
Tested with vLLM running open-source LLMs:
| Model | RTX 3090 (tok/s) | RTX 5090 (tok/s) | Speedup |
|---|---|---|---|
| LLaMA 3 8B (FP16) | 62 | 100 | 1.61x |
| Mistral 7B (FP16) | 45 | 82 | 1.82x |
| DeepSeek 7B (FP16) | 40 | 74 | 1.85x |
| LLaMA 3 13B (GPTQ 4-bit) | 28 | 51 | 1.82x |
See our tokens per second benchmark for the complete dataset.
Stable Diffusion & Image Generation
For image generation workloads:
| Model | RTX 3090 (it/s) | RTX 5090 (it/s) |
|---|---|---|
| SDXL 1024×1024 | 3.2 | 6.8 |
| Flux.1 Dev 1024×1024 | 1.4 | 3.1 |
The 5090 is roughly 2x faster on image workloads — the Blackwell tensor cores handle attention-heavy diffusion models very well. See our best GPU for Stable Diffusion guide for more.
Deploy an RTX 3090 or RTX 5090 Server
Both available on GigaGPU. Full root access, NVMe, 1Gbps — UK datacenter.
Browse GPU ServersCost per Token Analysis
The RTX 3090 delivers roughly 60% of the 5090’s throughput at a significantly lower monthly cost. For batch inference and non-latency-critical workloads, the 3090 wins on cost per token. For real-time APIs where latency matters, the 5090’s throughput gap justifies the premium. Use our LLM cost calculator to model your specific workload.
Which Should You Choose?
Pick the RTX 3090 if:
- You need 24GB VRAM at the best price
- Your workload is batch or async (latency isn’t critical)
- You’re optimising cost per 1M tokens — see cost-per-token breakdowns
Pick the RTX 5090 if:
- You need 32GB VRAM for larger models or bigger batches
- You’re serving real-time APIs where time-to-first-token matters
- You want FP8 support for next-gen quantisation (see FP16 vs FP8 guide)
- Image generation is a major workload
For workloads beyond either, consider multi-GPU clusters or the 96GB RTX 6000 Pro.