RTX 3050 - Order Now
Home / Blog / GPU Comparisons / RTX 4090 Software FP8 vs RTX 5090 Hardware FP8: Real Difference
GPU Comparisons

RTX 4090 Software FP8 vs RTX 5090 Hardware FP8: Real Difference

vLLM can software-emulate FP8 on the RTX 4090. The performance is much worse than Blackwell native FP8. Here are the numbers.

vLLM offers FP8 emulation on Ada Lovelace cards. It works but doesn't hit Blackwell's hardware speed.

TL;DR

4090 software FP8 is roughly the same speed as 4090 FP16 — no throughput gain, just memory savings. 5090 hardware FP8 is ~1.5× faster than FP16. For FP8-shaped workloads, the 5090 is the right card.

Software vs hardware FP8

  • Hardware FP8 (Blackwell): dedicated tensor-core path, ~2× FP16 throughput
  • Software FP8 (Ada): cast to FP16 internally, runs at FP16 speed but uses FP8 memory

Benchmarks

Workload4090 FP164090 sw FP85090 hw FP8
Mistral 7B aggregate tok/s9509601,920
Memory pressure14 GB7 GB7 GB

Verdict

Software FP8 on Ada saves VRAM but not time. For throughput, you need Blackwell hardware FP8.

Bottom line

If you need FP8 throughput, get a 5090. If you need FP8 memory savings only, 4090 software FP8 works. See FP8 vs FP16 comparison.

Need a Dedicated GPU Server?

Deploy from RTX 3050 to RTX 5090. Full root access, NVMe storage, 1Gbps — UK datacenter.

Browse GPU Servers

gigagpu

We benchmark, deploy, and optimise GPU infrastructure for AI workloads. All data in our guides comes from real-world testing on our UK-based dedicated GPU servers.

Ready to deploy your AI workload?

Dedicated GPU servers from our UK datacenter. NVMe storage, 1Gbps networking, full root access.

Browse GPU Servers Contact Sales

Have a question? Need help?