RTX 3050 - Order Now
Home / Blog / Benchmarks / RTX 5060 Ti 16GB Prefill Benchmark
Benchmarks

RTX 5060 Ti 16GB Prefill Benchmark

Isolated prefill throughput on Blackwell 16GB - input tokens per second per model and precision, the compute-bound half of LLM serving.

Prefill is the phase where the model reads the prompt before generating the first token. It’s compute-bound and usually the TTFT bottleneck. Numbers on the RTX 5060 Ti 16GB at our hosting:

Contents

Setup

  • vLLM 0.6.4 with max-tokens=1 to isolate prefill
  • Metric: input tokens per second

By Model

ModelPrecisionPrefill t/s
Phi-3-miniFP814,000
Llama 3.2 3BFP811,500
Mistral 7BFP87,200
Llama 3.1 8BFP86,800
Gemma 2 9BFP85,400
Qwen 2.5 14BAWQ INT42,100

By Prompt Length (Llama 3.1 8B FP8)

PromptPrefill timeTTFT impact
128 tok19 ms+19 ms
512 tok75 ms+75 ms
2,048 tok301 ms+301 ms
8,192 tok1,205 ms+1,205 ms
32,768 tok4,820 ms+4,820 ms

Prefill scales nearly linearly with prompt length below 8k; quadratically (attention cost) above.

Implications

  • For short prompts (<1k): prefill is negligible, decode dominates TTFT
  • For long prompts (8k+): prefill dominates – enable prefix caching or chunked prefill
  • RAG: Retrieved passages are usually 2-4k tokens – prefill is ~300-600 ms per query
  • FP8 vs INT4: FP8 prefill is 2-3x faster because Blackwell’s FP8 tensor cores hit peak GEMM

Prefill-Optimised LLM Hosting

6,800 input t/s on Llama 3 8B FP8. UK dedicated hosting.

Order the RTX 5060 Ti 16GB

See also: decode benchmark, TTFT p99, long-context perf, prefix caching.

Need a Dedicated GPU Server?

Deploy from RTX 3050 to RTX 5090. Full root access, NVMe storage, 1Gbps — UK datacenter.

Browse GPU Servers

admin

We benchmark, deploy, and optimise GPU infrastructure for AI workloads. All data in our guides comes from real-world testing on our UK-based dedicated GPU servers.

Ready to deploy your AI workload?

Dedicated GPU servers from our UK datacenter. NVMe storage, 1Gbps networking, full root access.

Browse GPU Servers Contact Sales

Have a question? Need help?