RTX 3050 - Order Now
Home / Blog / Benchmarks / RTX 5060 Ti 16GB Coqui TTS Benchmark
Benchmarks

RTX 5060 Ti 16GB Coqui TTS Benchmark

Coqui XTTS v2 and Bark-small on Blackwell 16GB - real-time factor, VRAM, batch throughput for self-hosted TTS.

Coqui XTTS v2 is the leading open TTS model for multilingual voice cloning. Numbers on the RTX 5060 Ti 16GB at our hosting:

Contents

Setup

  • Coqui TTS 0.22
  • Model: XTTS v2 (multilingual, 17 languages)
  • Sample rate: 24 kHz, mel 80-band
  • FP16 inference, CUDA 12.6

XTTS v2 Throughput (Batch 1)

Length (output audio)Gen timeRTF
5 sec0.85 s0.17
10 sec1.25 s0.125
20 sec2.20 s0.110
60 sec6.10 s0.102

Real-time factor below 0.2 means you generate audio ~5-10x faster than it plays. Solid for interactive voice assistants.

Batch 4

LengthTotal time (4 items)Per-item
5 sec each2.2 s0.55 s
10 sec each3.4 s0.85 s

Batching 4 cuts per-item time by ~35%. VRAM peak ~6 GB.

Voice Cloning Latency

Provide a 6-second reference clip, generate new speech in cloned voice:

  • Speaker encoding (one-time): ~300 ms
  • Generation: same as unclones (RTF ~0.1)

For persistent cloned voices, cache the speaker embedding in memory to skip the 300 ms on subsequent calls.

Coqui TTS on Blackwell 16GB

RTF 0.1, voice cloning ready. UK dedicated hosting.

Order the RTX 5060 Ti 16GB

See also: Bark TTS, Whisper benchmark, voice pipeline, voice assistant, podcast tools.

Need a Dedicated GPU Server?

Deploy from RTX 3050 to RTX 5090. Full root access, NVMe storage, 1Gbps — UK datacenter.

Browse GPU Servers

admin

We benchmark, deploy, and optimise GPU infrastructure for AI workloads. All data in our guides comes from real-world testing on our UK-based dedicated GPU servers.

Ready to deploy your AI workload?

Dedicated GPU servers from our UK datacenter. NVMe storage, 1Gbps networking, full root access.

Browse GPU Servers Contact Sales

Have a question? Need help?