Home / Blog / Benchmarks / Embedding Throughput Benchmark Across the GigaGPU Lineup

Benchmarks

Embedding Throughput Benchmark Across the GigaGPU Lineup

Real embedding throughput per second for BGE-M3 and BGE-large across every GPU we host, measured under TEI with tuned batch size.

Benchmarks April 23, 2026 1 min read admin

Embedding workloads scale with the card, but not the way you might expect from LLM benchmarks. At small embedding model sizes, even budget cards keep up. On our dedicated GPU hosting here are measured throughput numbers for common embedders.

Benchmark setup
BGE-M3 throughput
BGE-large (English) throughput
What it means

Setup

TEI v1.5 Docker container, FP16, 200-token input per document, batch size tuned per card to saturate VRAM.

BGE-M3 (568M)

GPU	Batch	Docs/sec (dense)
RTX 3050 6GB	64	~1,400
RTX 4060 8GB	96	~2,100
RTX 4060 Ti 16GB	256	~3,400
RTX 3090	512	~7,200
RTX 5080	384	~9,800
RTX 5090	768	~16,000
RTX 6000 Pro	2048	~28,000

BGE-large (335M)

GPU	Docs/sec
RTX 3050	~2,000
RTX 4060 Ti	~5,500
RTX 3090	~10,500
RTX 5080	~14,000
RTX 5090	~22,000
RTX 6000 Pro	~42,000

Verdict

For embedding workloads under 10k docs/sec, the 4060 Ti 16GB is usually the right economic choice. For high-volume indexing (>20k docs/sec) the 5090 or 6000 Pro pays back. Do not provision a 6000 Pro for embedding-only workloads unless you are pushing 100M+ documents; use the freed budget for a second card or a larger LLM.

For the broader GPU selection see the 2026 tier ladder and VRAM per pound.

Right-Sized Embedding Hosting

We match GPU tier to your expected document throughput.

Browse GPU Servers

See batch tuning.

Need a Dedicated GPU Server?

Deploy from RTX 3050 to RTX 5090. Full root access, NVMe storage, 1Gbps — UK datacenter.

Browse GPU Servers

Benchmarks

admin

We benchmark, deploy, and optimise GPU infrastructure for AI workloads. All data in our guides comes from real-world testing on our UK-based dedicated GPU servers.

Ready to deploy your AI workload?

Dedicated GPU servers from our UK datacenter. NVMe storage, 1Gbps networking, full root access.

Browse GPU Servers Contact Sales

Embedding Throughput Benchmark Across the GigaGPU Lineup

Contents

Setup

BGE-M3 (568M)

BGE-large (335M)

Verdict

Right-Sized Embedding Hosting

Need a Dedicated GPU Server?

admin

GPU Hosting

Blog Categories

AI Model Hosting

Benchmarks & Tools

Deploy a GPU Server

Ready to deploy your AI workload?

Have a question? Need help?

Embedding Throughput Benchmark Across the GigaGPU Lineup

Contents

Setup

BGE-M3 (568M)

BGE-large (335M)

Verdict

Right-Sized Embedding Hosting

Need a Dedicated GPU Server?

admin

Related Articles

RTX 5060 Ti 16GB Tokens per Watt

RTX 5060 Ti 16GB Unsloth Speed

GPU Profiling with nvidia-smi & Nsight

Qwen Benchmarks: Performance on GigaGPU Servers

GPU Hosting

Blog Categories

AI Model Hosting

Benchmarks & Tools

Deploy a GPU Server

Ready to deploy your AI workload?

Have a question? Need help? Contact us

Have a question? Need help?