Home / Blog / Benchmarks / How Many TTS Requests per Second per GPU?

Benchmarks

How Many TTS Requests per Second per GPU?

Text-to-speech throughput benchmarks — requests per second across six GPUs for Kokoro, Bark, and XTTS v2, with p50/p90/p99 latency per request.

Benchmarks April 17, 2026 3 min read admin

Table of Contents

TTS Throughput Overview
Kokoro TTS Throughput by GPU
Bark TTS Throughput by GPU
XTTS v2 Throughput by GPU
Per-Request Latency Comparison
Conclusion

TTS Throughput Overview

Text-to-speech is a core component of voice agents, accessibility tools, and audio content platforms. When serving TTS at scale on a dedicated GPU server, you need to know how many requests per second each GPU can handle before latency becomes unacceptable. We benchmarked three popular open-source TTS models across six GPUs to provide concrete capacity planning data.

All tests ran on GigaGPU bare-metal servers. Each request synthesised approximately 30 words of English text (~3 seconds of output audio). We measured sustained requests per second and per-request latency at p50, p90, and p99 percentiles. For voice pipeline latency, see the voice agent latency benchmark.

Kokoro TTS Throughput by GPU

Kokoro is a lightweight, low-latency TTS model that prioritises speed — ideal for real-time voice agents.

GPU	Requests/sec	p50 Latency	p90 Latency	p99 Latency
RTX 3050 (6 GB)	5.2	185 ms	210 ms	240 ms
RTX 4060 (8 GB)	10.5	92 ms	105 ms	120 ms
RTX 4060 Ti (16 GB)	14.8	65 ms	75 ms	88 ms
RTX 3090 (24 GB)	22.0	44 ms	50 ms	58 ms
RTX 5080 (16 GB)	30.5	32 ms	36 ms	42 ms
RTX 5090 (32 GB)	42.0	23 ms	26 ms	30 ms

The RTX 5090 handles 42 Kokoro TTS requests per second with sub-30 ms latency — effectively invisible in a voice pipeline. Even the RTX 4060 manages 10.5 req/s with under 120 ms latency, which is acceptable for most applications.

Bark TTS Throughput by GPU

Bark produces high-quality, expressive audio but requires significantly more compute than Kokoro.

GPU	Requests/sec	p50 Latency	p90 Latency	p99 Latency
RTX 3050 (6 GB)	0.18	5,200 ms	5,600 ms	6,100 ms
RTX 4060 (8 GB)	0.40	2,400 ms	2,650 ms	2,900 ms
RTX 4060 Ti (16 GB)	0.58	1,680 ms	1,850 ms	2,050 ms
RTX 3090 (24 GB)	0.92	1,050 ms	1,150 ms	1,280 ms
RTX 5080 (16 GB)	1.35	720 ms	790 ms	880 ms
RTX 5090 (32 GB)	2.10	460 ms	510 ms	570 ms

Bark is 15-20x slower than Kokoro. On the RTX 3090, it delivers under 1 request per second — usable for batch audio generation but too slow for real-time voice agents. The RTX 5090 at 2.1 req/s is borderline for interactive use.

XTTS v2 Throughput by GPU

XTTS v2 supports voice cloning and produces natural speech at moderate latency.

GPU	Requests/sec	p50 Latency	p90 Latency	p99 Latency
RTX 3050 (6 GB)	0.65	1,480 ms	1,620 ms	1,800 ms
RTX 4060 (8 GB)	1.30	740 ms	820 ms	910 ms
RTX 4060 Ti (16 GB)	1.85	525 ms	580 ms	645 ms
RTX 3090 (24 GB)	2.80	345 ms	380 ms	425 ms
RTX 5080 (16 GB)	3.90	248 ms	275 ms	305 ms
RTX 5090 (32 GB)	5.60	172 ms	190 ms	215 ms

XTTS v2 sits between Kokoro and Bark in both quality and speed. On the RTX 3090 at 2.8 req/s and 345 ms latency, it is usable for near-real-time voice applications with voice cloning.

Per-Request Latency Comparison

Choosing between TTS models is fundamentally a latency-quality trade-off. Kokoro delivers sub-50 ms on mid-range GPUs — ideal for real-time voice agents where speed matters more than expressiveness. XTTS v2 provides voice cloning at 250-750 ms, suitable for personalised but not fully real-time use. Bark produces the most expressive audio but at 1-5 second latency, limiting it to batch and offline use.

For voice agent pipelines, Kokoro is the default recommendation because the TTS stage needs to be nearly invisible. See the voice agent latency benchmark for full pipeline numbers. For capacity planning across all workload types, see the GPU capacity planning for AI SaaS guide. Use the LLM cost calculator to model total costs.

Conclusion

TTS throughput varies enormously by model: from 42 req/s (Kokoro on RTX 5090) to 0.18 req/s (Bark on RTX 3050). For real-time voice agents, Kokoro on an RTX 3090 (22 req/s, 44 ms latency) is the value leader. For voice cloning workloads, XTTS v2 on the RTX 5080 delivers good throughput at manageable latency. Browse all speech and audio benchmarks in the Benchmarks category at GigaGPU.

Size Your GPU Server

Tell us your workload — we’ll recommend the right GPU.

Browse GPU Servers

Need a Dedicated GPU Server?

Deploy from RTX 3050 to RTX 5090. Full root access, NVMe storage, 1Gbps — UK datacenter.

Browse GPU Servers

Benchmarks

admin

We benchmark, deploy, and optimise GPU infrastructure for AI workloads. All data in our guides comes from real-world testing on our UK-based dedicated GPU servers.

Ready to deploy your AI workload?

Dedicated GPU servers from our UK datacenter. NVMe storage, 1Gbps networking, full root access.

Browse GPU Servers Contact Sales

How Many TTS Requests per Second per GPU?

TTS Throughput Overview

Kokoro TTS Throughput by GPU

Bark TTS Throughput by GPU

XTTS v2 Throughput by GPU

Per-Request Latency Comparison

Conclusion

Size Your GPU Server

Need a Dedicated GPU Server?

admin

GPU Hosting

Blog Categories

AI Model Hosting

Benchmarks & Tools

Deploy a GPU Server

Ready to deploy your AI workload?

Have a question? Need help?

How Many TTS Requests per Second per GPU?

TTS Throughput Overview

Kokoro TTS Throughput by GPU

Bark TTS Throughput by GPU

XTTS v2 Throughput by GPU

Per-Request Latency Comparison

Conclusion

Size Your GPU Server

Need a Dedicated GPU Server?

admin

Related Articles

Gemma 2 27B Tokens/sec by GPU

LLaMA 3 8B on RTX 3090: Performance Benchmark & Cost, Category: Benchmarks, Slug: llama-3-8b-on-rtx-3090-benchmark, Excerpt: LLaMA 3 8B benchmarked on RTX 3090: 62 tok/s at FP16, VRAM usage, cost per 1M tokens, and deployment configuration., Internal links: 9 –>

LLM + TTS Pipeline on RTX 3090: Performance Benchmark & Cost, Category: Benchmarks, Slug: llm-tts-pipeline-on-rtx-3090-benchmark, Excerpt: LLM + TTS Pipeline benchmarked on RTX 3090: LLaMA 3 8B + Coqui XTTS-v2, concurrent performance, VRAM breakdown, and cost analysis., Internal links: 9 –>

Tensor Cores Explained

GPU Hosting

Blog Categories

AI Model Hosting

Benchmarks & Tools

Deploy a GPU Server

Ready to deploy your AI workload?

Have a question? Need help? Contact us

Have a question? Need help?