Home / Blog / GPU Comparisons / RTX 4090 Software FP8 vs RTX 5090 Hardware FP8: Real Difference

GPU Comparisons

RTX 4090 Software FP8 vs RTX 5090 Hardware FP8: Real Difference

vLLM can software-emulate FP8 on the RTX 4090. The performance is much worse than Blackwell native FP8. Here are the numbers.

GPU Comparisons May 5, 2026 1 min read gigagpu

Table of Contents

vLLM offers FP8 emulation on Ada Lovelace cards. It works but doesn't hit Blackwell's hardware speed.

TL;DR

4090 software FP8 is roughly the same speed as 4090 FP16 — no throughput gain, just memory savings. 5090 hardware FP8 is ~1.5× faster than FP16. For FP8-shaped workloads, the 5090 is the right card.

Software vs hardware FP8

Hardware FP8 (Blackwell): dedicated tensor-core path, ~2× FP16 throughput
Software FP8 (Ada): cast to FP16 internally, runs at FP16 speed but uses FP8 memory

Benchmarks

Workload	4090 FP16	4090 sw FP8	5090 hw FP8
Mistral 7B aggregate tok/s	950	960	1,920
Memory pressure	14 GB	7 GB	7 GB

Verdict

Software FP8 on Ada saves VRAM but not time. For throughput, you need Blackwell hardware FP8.

Bottom line

If you need FP8 throughput, get a 5090. If you need FP8 memory savings only, 4090 software FP8 works. See FP8 vs FP16 comparison.

Need a Dedicated GPU Server?

Deploy from RTX 3050 to RTX 5090. Full root access, NVMe storage, 1Gbps — UK datacenter.

Browse GPU Servers

GPU Comparisons

gigagpu

We benchmark, deploy, and optimise GPU infrastructure for AI workloads. All data in our guides comes from real-world testing on our UK-based dedicated GPU servers.

Ready to deploy your AI workload?

Dedicated GPU servers from our UK datacenter. NVMe storage, 1Gbps networking, full root access.

Browse GPU Servers Contact Sales

RTX 4090 Software FP8 vs RTX 5090 Hardware FP8: Real Difference

Software vs hardware FP8

Benchmarks

Verdict

Bottom line

Need a Dedicated GPU Server?

gigagpu

GPU Hosting

Blog Categories

AI Model Hosting

Benchmarks & Tools

Deploy a GPU Server

Ready to deploy your AI workload?

Have a question? Need help?

RTX 4090 Software FP8 vs RTX 5090 Hardware FP8: Real Difference

Software vs hardware FP8

Benchmarks

Verdict

Bottom line

Need a Dedicated GPU Server?

gigagpu

Related Articles

LLaMA 3 8B vs Mistral 7B for Document Processing / RAG: GPU Benchmark

Mistral 7B vs Gemma 2 9B for Document Processing / RAG: GPU Benchmark

Best GPU for Embedding Workloads in 2026

Upgrade RTX 3090 to RTX 5080: AI Performance Gain

GPU Hosting

Blog Categories

AI Model Hosting

Benchmarks & Tools

Deploy a GPU Server

Ready to deploy your AI workload?

Have a question? Need help? Contact us

Have a question? Need help?