Home / Blog / Benchmarks / RTX 5060 Ti 16GB Embedding Throughput

Benchmarks

RTX 5060 Ti 16GB Embedding Throughput

Embedding model throughput on Blackwell 16GB - BGE, E5, Nomic, and Arctic. Texts per second at different batch sizes.

Benchmarks April 23, 2026 1 min read admin

Embedding generation is the backbone of RAG and semantic search. The RTX 5060 Ti 16GB at our hosting is a workhorse for this workload – high parallelism on small models, FP8 support.

Setup
Models compared
Throughput by batch
TEI numbers
Recommendation

Setup

Text Embeddings Inference (TEI) 1.5
Input: 256-token sentences, truncation default
Metrics: texts per second (t/s)

Models

Model	Params	Dim	Context	FP16 VRAM
BGE-small-en-v1.5	33M	384	512	0.3 GB
BGE-base-en-v1.5	109M	768	512	0.7 GB
BGE-large-en-v1.5	335M	1024	512	1.3 GB
E5-large-v2	335M	1024	512	1.3 GB
Nomic-embed-text-v1.5	137M	768	8192	1.0 GB
Snowflake-arctic-embed-l	335M	1024	512	1.3 GB

Throughput by Batch

BGE-base, 256-token sentences, FP16, TEI:

Batch	texts/s
1	420
8	2,800
32	7,200
64	9,100
128	9,800
256	10,200

Throughput plateaus at ~10k texts/s – hitting memory bandwidth.

TEI Per-Model Peak

Model	Peak texts/s
BGE-small	28,000
BGE-base	10,200
BGE-large	3,400
Nomic-embed-v1.5	7,800
Snowflake-arctic-l	3,200

For reference – at 10k texts/s you can index a 10-million-document corpus in under 20 minutes.

Default for RAG: BGE-base or Nomic-embed-v1.5 (long-context, 8k supported)
Accuracy priority: BGE-large
Bulk throughput: BGE-small – 28k texts/s lets you re-index often

Embedding Throughput on Blackwell 16GB

10k texts/s on BGE-base, trivially scales to millions of docs. UK dedicated hosting.

Order the RTX 5060 Ti 16GB

Need a Dedicated GPU Server?

Deploy from RTX 3050 to RTX 5090. Full root access, NVMe storage, 1Gbps — UK datacenter.

Browse GPU Servers

Benchmarks

admin

We benchmark, deploy, and optimise GPU infrastructure for AI workloads. All data in our guides comes from real-world testing on our UK-based dedicated GPU servers.

Ready to deploy your AI workload?

Dedicated GPU servers from our UK datacenter. NVMe storage, 1Gbps networking, full root access.

Browse GPU Servers Contact Sales

RTX 5060 Ti 16GB Embedding Throughput

Contents

Setup

Models

Throughput by Batch

TEI Per-Model Peak

Embedding Throughput on Blackwell 16GB

Need a Dedicated GPU Server?

admin

GPU Hosting

Blog Categories

AI Model Hosting

Benchmarks & Tools

Deploy a GPU Server

Ready to deploy your AI workload?

Have a question? Need help?

RTX 5060 Ti 16GB Embedding Throughput

Contents

Setup

Models

Throughput by Batch

TEI Per-Model Peak

Recommendation

Embedding Throughput on Blackwell 16GB

Need a Dedicated GPU Server?

admin

Related Articles

Phi-3 Mini Tokens/sec by GPU

Disk I/O Bottleneck: When Storage Slows GPU

Whisper Medium RTF by GPU

RTX 5060 Ti 16GB SDXL Benchmark

GPU Hosting

Blog Categories

AI Model Hosting

Benchmarks & Tools

Deploy a GPU Server

Ready to deploy your AI workload?

Have a question? Need help? Contact us

Have a question? Need help?