RTX 3050 - Order Now
Home / Blog / Benchmarks / RTX 4090 24 GB TFLOPS Benchmark Class: Where It Sits in the AI Hierarchy
Benchmarks

RTX 4090 24 GB TFLOPS Benchmark Class: Where It Sits in the AI Hierarchy

The RTX 4090 punches at roughly the same FP16 TFLOPS class as datacenter A100 cards. Here is the precise benchmark hierarchy and what it means in practice.

NVIDIA’s GeForce flagship and datacenter cards now overlap on raw FP16 TFLOPS in ways that weren’t true a few years ago. The RTX 4090’s ~165 TFLOPS FP16 lands in the same neighbourhood as A100 SXM4 (312 TFLOPS) and well above older datacenter parts. This page maps it precisely.

TL;DR

RTX 4090 = ~165 TFLOPS FP16 dense, ~330 with sparsity. That is roughly 53% of A100 SXM4 at <5% the price. For inference workloads the gap shrinks further (memory bandwidth catches up). For training, A100 still wins decisively.

The TFLOPS class

GPUFP16 TFLOPS denseFP16 TFLOPS sparseFP8 TOPSMem BW (GB/s)
RTX 3090~36~71No native936
RTX 4090~165~330No native (sw)1,008
RTX 5080~75~150~600960
RTX 5090~210~420~8381,792
RTX 6000 Pro~234~468~9361,792
A100 80 GB SXM4~312~624No native2,039
H100 80 GB SXM5~989~1,979~3,9583,350

Where the 4090 sits

The 4090 is in the "upper-mid" tier of NVIDIA’s hardware lineup for FP16 inference. Not as fast as the latest Blackwell or Hopper datacenter cards, but considerably faster than older datacenter SKUs (V100, T4) and competitive with A100-PCIe.

Real-world benchmarks vs theoretical

Theoretical TFLOPS rarely match real-world tokens per second. Memory bandwidth, kernel maturity, and software stack all matter:

WorkloadRTX 4090 (real)A100 80 GB (real)4090 % of A100
Mistral 7B FP16 aggregate950 tok/s1,310 tok/s73%
Llama 3.1 8B FP16 aggregate910 tok/s1,200 tok/s76%
SDXL 1024² (s/image)8 s9 s112%
BF16 fine-tuning (8B model)~12 hours/epoch~6 hours/epoch50%
Training (50B token corpus)24 hours11 hours46%

For inference, the 4090 delivers 70–80% of A100 throughput at less than 5% of A100’s effective monthly cost on AWS. For training, A100 still wins because the 4090’s lack of NVLink and ECC matter for multi-week jobs.

Verdict

The RTX 4090’s TFLOPS class is "A100-light" for inference and "solid mid-tier" for training. Worth the price for inference; not worth it for training serious models. For training, look at multi-GPU clusters or A100.

Bottom line

The 4090 punches well above its weight for inference. Treat it as a budget A100 for FP16 chatbot work; treat it as obsolete for training. For broader tier comparison see RTX 5090 vs RTX 3090.

Need a Dedicated GPU Server?

Deploy from RTX 3050 to RTX 5090. Full root access, NVMe storage, 1Gbps — UK datacenter.

Browse GPU Servers

gigagpu

We benchmark, deploy, and optimise GPU infrastructure for AI workloads. All data in our guides comes from real-world testing on our UK-based dedicated GPU servers.

Ready to deploy your AI workload?

Dedicated GPU servers from our UK datacenter. NVMe storage, 1Gbps networking, full root access.

Browse GPU Servers Contact Sales

Have a question? Need help?