Theoretical TFLOPS numbers are marketing. Sustained TFLOPS under real AI workloads are what you actually get on our dedicated GPU hosting. Here is how the RTX 5060 Ti 16GB performs across formats with both peak and sustained numbers.
Contents
- Peak tensor throughput
- Sustained AI workload
- CUDA general compute
- Which format when
- Versus 4060 Ti and 5080
Peak Tensor Throughput
| Format | Peak TFLOPS | Notes |
|---|---|---|
| FP32 | ~12 | Rarely used for AI |
| BF16 dense | ~200 | Mixed precision training |
| FP16 dense | ~200 | Inference without FP8 |
| FP8 dense | ~400 | Best default for 2026 inference |
| INT8 dense | ~400 | AWQ/GPTQ quantised inference |
| INT4 dense | ~800 | Aggressive quantisation |
| FP16 sparse (2:4) | ~400 | Structured sparse models |
| FP8 sparse | ~800 | Future sparse + FP8 |
Sustained
Real AI workloads see 30-60% of peak tensor throughput because of:
- Memory bandwidth limits on decode
- Kernel launch overhead
- Attention pattern irregularities
- CPU-GPU sync stalls
For LLM inference sustained FP8 throughput lands around 120-200 TFLOPS – plenty for 7-14B models at production speeds. For SDXL and FLUX image generation sustained FP16 throughput is 80-120 TFLOPS.
CUDA General Compute
- FP32 shader: ~23 TFLOPS
- FP16 shader: ~46 TFLOPS
Shader compute matters for diffusion UNet paths, ControlNet conditioning, and custom CUDA kernels. Tensor cores handle the matmul bulk; shaders fill in the rest.
Which Format When
| Use Case | Best Format | Why |
|---|---|---|
| LLM serving 2026 | FP8 | Half memory, double compute, minimal quality loss |
| LLM legacy or no FP8 checkpoint | AWQ INT4 | Best quality-size at INT4 |
| Fine-tuning | BF16 | FP8 training still experimental for most toolchains |
| Image diffusion | FP16/BF16 | UNets run well at half precision |
| Vision inference (YOLO, CLIP) | FP16 or INT8 TensorRT | Small models, latency-sensitive |
Versus Neighbours
| Format | 4060 Ti | 5060 Ti | 5080 |
|---|---|---|---|
| FP16 peak | ~177 | ~200 | ~450 |
| FP8 peak | N/A | ~400 | ~900 |
| INT8 peak | ~353 | ~400 | ~900 |
The 5060 Ti is a modest step over 4060 Ti on raw FP16 but a huge step on FP8 (new capability). The 5080 is roughly 2x across the board at 2.5x the price.
200+ TFLOPS at Mid-Tier
FP8-native Blackwell tensor cores on UK dedicated hosting.
Order the RTX 5060 Ti 16GBSee also: FP8 deep dive, 5th-gen tensor cores.