RTX 3050 - Order Now
Home / Blog / Benchmarks / RTX 5060 Ti 16GB Embedding Throughput
Benchmarks

RTX 5060 Ti 16GB Embedding Throughput

Embedding model throughput on Blackwell 16GB - BGE, E5, Nomic, and Arctic. Texts per second at different batch sizes.

Embedding generation is the backbone of RAG and semantic search. The RTX 5060 Ti 16GB at our hosting is a workhorse for this workload – high parallelism on small models, FP8 support.

Contents

Setup

  • Text Embeddings Inference (TEI) 1.5
  • Input: 256-token sentences, truncation default
  • Metrics: texts per second (t/s)

Models

ModelParamsDimContextFP16 VRAM
BGE-small-en-v1.533M3845120.3 GB
BGE-base-en-v1.5109M7685120.7 GB
BGE-large-en-v1.5335M10245121.3 GB
E5-large-v2335M10245121.3 GB
Nomic-embed-text-v1.5137M76881921.0 GB
Snowflake-arctic-embed-l335M10245121.3 GB

Throughput by Batch

BGE-base, 256-token sentences, FP16, TEI:

Batchtexts/s
1420
82,800
327,200
649,100
1289,800
25610,200

Throughput plateaus at ~10k texts/s – hitting memory bandwidth.

TEI Per-Model Peak

ModelPeak texts/s
BGE-small28,000
BGE-base10,200
BGE-large3,400
Nomic-embed-v1.57,800
Snowflake-arctic-l3,200

For reference – at 10k texts/s you can index a 10-million-document corpus in under 20 minutes.

Recommendation

  • Default for RAG: BGE-base or Nomic-embed-v1.5 (long-context, 8k supported)
  • Accuracy priority: BGE-large
  • Bulk throughput: BGE-small – 28k texts/s lets you re-index often

Embedding Throughput on Blackwell 16GB

10k texts/s on BGE-base, trivially scales to millions of docs. UK dedicated hosting.

Order the RTX 5060 Ti 16GB

See also: TEI server setup, reranker throughput, SaaS RAG, RAG stack install, RAG pipeline.

Need a Dedicated GPU Server?

Deploy from RTX 3050 to RTX 5090. Full root access, NVMe storage, 1Gbps — UK datacenter.

Browse GPU Servers

admin

We benchmark, deploy, and optimise GPU infrastructure for AI workloads. All data in our guides comes from real-world testing on our UK-based dedicated GPU servers.

Ready to deploy your AI workload?

Dedicated GPU servers from our UK datacenter. NVMe storage, 1Gbps networking, full root access.

Browse GPU Servers Contact Sales

Have a question? Need help?