Table of Contents
vLLM offers FP8 emulation on Ada Lovelace cards. It works but doesn't hit Blackwell's hardware speed.
4090 software FP8 is roughly the same speed as 4090 FP16 — no throughput gain, just memory savings. 5090 hardware FP8 is ~1.5× faster than FP16. For FP8-shaped workloads, the 5090 is the right card.
Software vs hardware FP8
- Hardware FP8 (Blackwell): dedicated tensor-core path, ~2× FP16 throughput
- Software FP8 (Ada): cast to FP16 internally, runs at FP16 speed but uses FP8 memory
Benchmarks
| Workload | 4090 FP16 | 4090 sw FP8 | 5090 hw FP8 |
|---|---|---|---|
| Mistral 7B aggregate tok/s | 950 | 960 | 1,920 |
| Memory pressure | 14 GB | 7 GB | 7 GB |
Verdict
Software FP8 on Ada saves VRAM but not time. For throughput, you need Blackwell hardware FP8.
Bottom line
If you need FP8 throughput, get a 5090. If you need FP8 memory savings only, 4090 software FP8 works. See FP8 vs FP16 comparison.