Table of Contents
The RTX 4090 (Ada Lovelace, AD102 die) launched in 2022 and remained the consumer flagship until the RTX 5090 arrived in early 2025. For AI inference it’s still relevant: 24 GB of GDDR6X, strong FP16 throughput, and now meaningfully cheaper than the Blackwell flagship. This page is the consolidated AI-buyer’s reference.
RTX 4090 = 24 GB GDDR6X, 16,384 CUDA cores, 1,008 GB/s memory bandwidth, ~83 TFLOPS FP16. No native FP8 hardware (uses software emulation). Still excellent for FP16 LLM serving up to 13B; weaker than 5090 on FP8 paths. We host it at £289/mo.
Full spec sheet
| Spec | RTX 4090 |
|---|---|
| Architecture | Ada Lovelace (AD102) |
| Process | TSMC 4N (custom 5nm) |
| CUDA cores | 16,384 |
| Tensor cores | 512 (4th gen) |
| RT cores | 128 (3rd gen) |
| Base / boost clock | 2,235 / 2,520 MHz |
| VRAM | 24 GB GDDR6X |
| Memory bus | 384-bit |
| Memory bandwidth | 1,008 GB/s |
| L2 cache | 72 MB |
| FP32 compute | ~82.6 TFLOPS |
| FP16 compute (Tensor) | ~165 TFLOPS dense / 330 sparse |
| BF16 | ~165 TFLOPS dense |
| FP8 | Software path only (~165 TOPS via FP16 emulation) |
| INT8 (Tensor) | ~660 TOPS dense |
| TDP | 450 W |
| PCIe | Gen 4 x16 |
| Power connector | 12VHPWR (16-pin) |
| Launch year | 2022 |
What matters for AI workloads
- 24 GB VRAM — fits Llama 3 8B FP16 + KV cache, Qwen 2.5 14B with quantisation, Llama 3 70B INT3 (tight). The single most important number.
- 1,008 GB/s memory bandwidth — strong. Higher than 3090 (936) but lower than 5090 (1,792).
- 165 TFLOPS FP16 — solid. Matters for prefill latency on long prompts.
- No native FP8 — the big architecture limitation in 2026. Models that have shipped FP8 quantised checkpoints (Llama 3, Mistral, Qwen, FLUX.1) get a 1.5–2× speedup on Blackwell that you don’t get on Ada.
- 4th gen tensor cores — fine for mixed-precision training, no FP8 acceleration.
RTX 4090 vs RTX 5090 — spec deltas
| Spec | RTX 4090 | RTX 5090 | Delta |
|---|---|---|---|
| VRAM | 24 GB GDDR6X | 32 GB GDDR7 | +33% |
| Memory bandwidth | 1,008 GB/s | 1,792 GB/s | +78% |
| CUDA cores | 16,384 | 21,760 | +33% |
| FP16 TFLOPS | ~165 | ~210 | +27% |
| FP8 hardware | No | Yes (~838 TOPS) | ∞ |
| FP4 hardware | No | Yes (~1,676 TOPS) | ∞ |
| TDP | 450 W | 575 W | +28% |
| Monthly (GigaGPU) | £289 | £399 | +29% |
The 5090 is meaningfully more capable but not dramatically so on workloads the 4090 already handles. The FP8 path is the actual generational gap.
RTX 4090 vs RTX 3090 — spec deltas
| Spec | RTX 3090 | RTX 4090 | Delta |
|---|---|---|---|
| Architecture | Ampere | Ada Lovelace | +1 gen |
| VRAM | 24 GB GDDR6X | 24 GB GDDR6X | Same |
| Memory bandwidth | 936 GB/s | 1,008 GB/s | +8% |
| CUDA cores | 10,496 | 16,384 | +56% |
| FP16 TFLOPS | ~36 | ~83 | +131% |
| Monthly (GigaGPU) | £159 | £289 | +56% |
4090 is roughly 2.3× faster on FP16 with the same VRAM at 1.56× the cost. Better cost-per-throughput than the 3090 if FP16 throughput is your bottleneck.
Verdict — when to pick the 4090
- You don’t need FP8 and the 5090’s price premium isn’t worth the speed delta.
- Your workload is solidly in the 13B FP16 zone — Code Llama 13B, Qwen 2.5 14B INT4, Mixtral 8x7B INT4.
- You want 24 GB at the cheapest Ada price — solid for image generation, FLUX.1 with software FP8.
- Stock availability of 5090 is a problem — 4090 is more available right now.
Bottom line
The RTX 4090 remains a credible 2026 AI GPU at £289/mo. Pick it when 24 GB is enough and FP8 isn't critical. For FP8-aware workloads (most modern LLMs ship FP8 checkpoints now), the 5090 is meaningfully better. For sizing across the catalogue see best GPU for LLM inference.