Home / Blog / Benchmarks / RTX 5060 Ti 16GB Reranker Throughput

Benchmarks

RTX 5060 Ti 16GB Reranker Throughput

Cross-encoder reranker throughput on Blackwell 16GB - BGE, Jina, Cohere, and Mixedbread numbers on query-document scoring.

Benchmarks April 23, 2026 1 min read admin

Rerankers re-score candidate documents against a query for higher final RAG quality. Throughput on the RTX 5060 Ti 16GB via our hosting:

Setup
Models compared
Query-doc pairs per second
RAG latency

Setup

Text Embeddings Inference (TEI) 1.5 with rerank endpoint
Input: query ~20 tokens + doc ~256 tokens
Metric: pairs/s

Models Compared

Model	Params	Context	FP16 VRAM
BGE-reranker-base	278M	512	1.1 GB
BGE-reranker-large	560M	512	2.2 GB
Jina-reranker-v2	568M	1024	2.3 GB
Mixedbread-rerank-v1	335M	512	1.3 GB

Pairs per Second (Batch 32)

Model	pairs/s
BGE-reranker-base	3,200
BGE-reranker-large	1,850
Jina-reranker-v2	1,700
Mixedbread-rerank-v1	2,400

Per-query latency in a 1 query x 100 candidates scenario: 31 ms on BGE-base, 55 ms on BGE-large. Rerank is cheap enough to include in every RAG query.

End-to-End RAG Latency

Embed query: 3 ms
Vector search top-100: 20 ms (vector DB, not GPU)
Rerank top-100: 31 ms (BGE-reranker-base)
LLM generation: 2,000 ms

Rerank adds ~30 ms to RAG. Always worth it – typical NDCG@10 uplift is 10-15%.

Reranking on Blackwell 16GB

3,200 pairs/s on BGE-base. UK dedicated hosting.

Order the RTX 5060 Ti 16GB

Need a Dedicated GPU Server?

Deploy from RTX 3050 to RTX 5090. Full root access, NVMe storage, 1Gbps — UK datacenter.

Browse GPU Servers

Benchmarks

admin

We benchmark, deploy, and optimise GPU infrastructure for AI workloads. All data in our guides comes from real-world testing on our UK-based dedicated GPU servers.

Ready to deploy your AI workload?

Dedicated GPU servers from our UK datacenter. NVMe storage, 1Gbps networking, full root access.

Browse GPU Servers Contact Sales

RTX 5060 Ti 16GB Reranker Throughput

Contents

Setup

Models Compared

Pairs per Second (Batch 32)

End-to-End RAG Latency

Reranking on Blackwell 16GB

Need a Dedicated GPU Server?

admin

GPU Hosting

Blog Categories

AI Model Hosting

Benchmarks & Tools

Deploy a GPU Server

Ready to deploy your AI workload?

Have a question? Need help?

RTX 5060 Ti 16GB Reranker Throughput

Contents

Setup

Models Compared

Pairs per Second (Batch 32)

End-to-End RAG Latency

Reranking on Blackwell 16GB

Need a Dedicated GPU Server?

admin

Related Articles

DeepSeek R1 Distill Tokens/sec by GPU

Whisper Large-v3 on RTX 4060: Transcription Speed & Cost, Category: Benchmarks, Slug: whisper-large-v3-on-rtx-4060-benchmark, Excerpt: Whisper Large-v3 benchmarked on RTX 4060: RTF 0.16, 6.2x real-time processing, VRAM usage, and cost per audio hour., Internal links: 8 –>

Qwen 2.5 Performance Report: April 2026

RTX 5060 Ti 16GB vs RTX 3090 Benchmark

GPU Hosting

Blog Categories

AI Model Hosting

Benchmarks & Tools

Deploy a GPU Server

Ready to deploy your AI workload?

Have a question? Need help? Contact us

Have a question? Need help?