Twelve-and-a-half times faster than real-time. That means Coqui XTTS-v2 on the RTX 5090 synthesises a one-minute voice clip in under five seconds — with full voice cloning from a short reference sample. We pushed this combination through our benchmark suite on GigaGPU because the numbers almost seemed too good.
Peak TTS Speed
| Metric | Value |
|---|---|
| Real-Time Factor (lower = faster) | 0.08 |
| Synthesis speed | 12.5x real-time |
| Audio hours processed per GPU-hour | 12.5 |
| Precision | FP16 |
| Performance rating | Very Good |
Benchmark conditions: FP16 inference, single-stream processing, 24kHz output, English, single-speaker. XTTS-v2 streaming server.
29 GB Free After Loading
| Component | VRAM |
|---|---|
| Model weights (FP16) | 2.4 GB |
| Audio buffer + runtime | ~0.4 GB |
| Total RTX 5090 VRAM | 32 GB |
| Free headroom | ~29.6 GB |
XTTS-v2 consumes less than 8% of the 5090’s VRAM. The remaining 29.6 GB is enough to co-host Whisper Large-v3, a 13B-parameter LLM, and Flux.1 — all simultaneously. If you are building a multi-modal AI product, the 5090 is the one card that can genuinely host your entire inference stack.
Economics of Scale
| Cost Metric | Value |
|---|---|
| Server cost | £1.50/hr (£299/mo) |
| Cost per audio hour | £0.120 |
| Audio hours per £1 | 8.3 |
Twelve pence per hour of synthesised voice. At 12.5x speed, a single 5090 generates 300 hours of audio per day. Audiobook publishers, large-scale accessibility services, and enterprise notification systems all benefit from this throughput. See how it compares in our cross-GPU benchmark.
Maximum Performance, Maximum Flexibility
The 5090 is not the cheapest way to run XTTS-v2 — the RTX 5080 offers similar per-hour costs at 8.3x speed. What the 5090 offers is headroom for growth. Start with TTS, add speech recognition, layer in language understanding. As your product complexity increases, the 5090 absorbs new models without needing a second server. That flexibility has a value that does not show up in the per-hour cost alone. Full comparison: best GPU for TTS.
Quick deploy:
docker run --gpus all -p 8000:8000 ghcr.io/coqui-ai/xtts-streaming-server:latest
Explore: Coqui hosting guide, all benchmarks, SD hosting.
Deploy Coqui XTTS-v2 on RTX 5090
Order this exact configuration. UK datacenter, full root access.
Order RTX 5090 Server