NVIDIA’s GeForce flagship and datacenter cards now overlap on raw FP16 TFLOPS in ways that weren’t true a few years ago. The RTX 4090’s ~165 TFLOPS FP16 lands in the same neighbourhood as A100 SXM4 (312 TFLOPS) and well above older datacenter parts. This page maps it precisely.
RTX 4090 = ~165 TFLOPS FP16 dense, ~330 with sparsity. That is roughly 53% of A100 SXM4 at <5% the price. For inference workloads the gap shrinks further (memory bandwidth catches up). For training, A100 still wins decisively.
The TFLOPS class
| GPU | FP16 TFLOPS dense | FP16 TFLOPS sparse | FP8 TOPS | Mem BW (GB/s) |
|---|---|---|---|---|
| RTX 3090 | ~36 | ~71 | No native | 936 |
| RTX 4090 | ~165 | ~330 | No native (sw) | 1,008 |
| RTX 5080 | ~75 | ~150 | ~600 | 960 |
| RTX 5090 | ~210 | ~420 | ~838 | 1,792 |
| RTX 6000 Pro | ~234 | ~468 | ~936 | 1,792 |
| A100 80 GB SXM4 | ~312 | ~624 | No native | 2,039 |
| H100 80 GB SXM5 | ~989 | ~1,979 | ~3,958 | 3,350 |
Where the 4090 sits
The 4090 is in the "upper-mid" tier of NVIDIA’s hardware lineup for FP16 inference. Not as fast as the latest Blackwell or Hopper datacenter cards, but considerably faster than older datacenter SKUs (V100, T4) and competitive with A100-PCIe.
Real-world benchmarks vs theoretical
Theoretical TFLOPS rarely match real-world tokens per second. Memory bandwidth, kernel maturity, and software stack all matter:
| Workload | RTX 4090 (real) | A100 80 GB (real) | 4090 % of A100 |
|---|---|---|---|
| Mistral 7B FP16 aggregate | 950 tok/s | 1,310 tok/s | 73% |
| Llama 3.1 8B FP16 aggregate | 910 tok/s | 1,200 tok/s | 76% |
| SDXL 1024² (s/image) | 8 s | 9 s | 112% |
| BF16 fine-tuning (8B model) | ~12 hours/epoch | ~6 hours/epoch | 50% |
| Training (50B token corpus) | 24 hours | 11 hours | 46% |
For inference, the 4090 delivers 70–80% of A100 throughput at less than 5% of A100’s effective monthly cost on AWS. For training, A100 still wins because the 4090’s lack of NVLink and ECC matter for multi-week jobs.
Verdict
The RTX 4090’s TFLOPS class is "A100-light" for inference and "solid mid-tier" for training. Worth the price for inference; not worth it for training serious models. For training, look at multi-GPU clusters or A100.
Bottom line
The 4090 punches well above its weight for inference. Treat it as a budget A100 for FP16 chatbot work; treat it as obsolete for training. For broader tier comparison see RTX 5090 vs RTX 3090.