Token generation speed on LLMs is almost entirely bandwidth-limited. Raw compute TFLOPS matter for prefill and training, but for the per-token decode path – the experience end-users actually perceive – your GPU’s memory bandwidth is the ceiling. This is why you can roughly predict tokens per second from the bandwidth spec alone. Below is every GPU on our dedicated hosting ranked.
Topics
Why Bandwidth Dominates
On every decode step, an autoregressive model reads its entire weight set from VRAM, does math, and writes one token. For a 7B FP16 model (14 GB of weights) that is 14 GB of memory read per token. If your GPU has 500 GB/s of bandwidth, the theoretical ceiling is roughly 500/14 = ~35 tokens/sec. Real-world tops out 20-30% below that because of KV cache reads and overhead. Compute TFLOPS barely matter at decode because the math pipeline is rarely the bottleneck.
Bandwidth Ranking
| GPU | Memory | Bandwidth |
|---|---|---|
| RTX 6000 Pro | 96 GB GDDR | ~1,800 GB/s |
| RTX 5090 | 32 GB GDDR7 | ~1,792 GB/s |
| RTX 5080 | 16 GB GDDR7 | ~960 GB/s |
| RTX 3090 | 24 GB GDDR6X | ~936 GB/s |
| R9700 | 32 GB GDDR6 | ~640 GB/s |
| Intel Arc Pro B70 | 32 GB | ~560 GB/s |
| RTX 5060 | 8 GB GDDR7 | ~448 GB/s |
| AMD RX 9070 XT | 16 GB GDDR6 | ~640 GB/s |
| RTX 4060 Ti | 16 GB GDDR6 | ~288 GB/s |
| RTX 4060 | 8 GB GDDR6 | ~272 GB/s |
| Ryzen AI Max+ 395 | 96 GB LPDDR5X | ~256 GB/s |
| RTX 3050 | 6 GB GDDR6 | ~224 GB/s |
Bandwidth Per Pound
On fixed monthly pricing, the RTX 3090 remains the best bandwidth per pound once you need serious decode throughput. The 5090 is nearly double the 3090’s bandwidth but costs meaningfully more. For decode-bound chat serving, consider whether the 5090 upgrade is worth it relative to simply running two 3090s. See 6000 Pro vs pair of 3090s.
Order the Bandwidth Your Tokens Need
Every card on our UK hosting with full root access and fixed monthly pricing.
Browse GPU ServersTakeaways
First, the Ryzen AI Max+ 395’s huge 96 GB is bandwidth-starved at 256 GB/s. It fits giant models but decodes them slowly. Second, the 3090’s 936 GB/s is competitive with the 5080 despite being four years older. Third, the 6000 Pro and 5090 are nearly tied on bandwidth – the 6000 Pro’s advantage over the 5090 is capacity, not speed. When you see someone say “the 6000 Pro is much faster than the 5090,” check the workload: on decode of models that fit both, the two perform nearly identically.
For deeper analysis see tokens per watt and TDP and power draw.