Bark from Suno is the most expressive open TTS – emotion, laughter, non-verbal sounds, music hints. Slower than XTTS but unique capability. Numbers on the RTX 5060 Ti 16GB via our hosting:
Contents
Setup
- suno-ai/bark v0.1.5
- FP16 inference, CUDA 12.6
- 24 kHz output
Variants
| Model | VRAM | Quality |
|---|---|---|
| bark-small | 4 GB | Acceptable, faster |
| bark (full) | 12 GB | Best |
| bark + small-cpu offload | 7 GB | Best, slower |
Generation Time (Batch 1)
| Output length | bark-small | bark full |
|---|---|---|
| 5 sec | 1.8 s | 4.1 s |
| 10 sec | 3.4 s | 8.2 s |
| 15 sec (max practical) | 5.1 s | 12 s |
Bark has a hard ~15-second generation window per call – for longer narration, concatenate chunks.
vs XTTS v2
| Metric | Bark full | XTTS v2 |
|---|---|---|
| 5-sec gen time | 4.1 s | 0.85 s |
| Voice cloning | No (persona prompts) | Yes, from 6s clip |
| Emotion/expression | Strong | Moderate |
| Non-verbal sounds | Yes (laughter, sighs) | No |
| Multilingual | 100+ languages | 17 languages |
| Commercial licence | MIT | CPML (non-commercial default) |
Bark’s Apache 2.0 / MIT licence is a major advantage for commercial products. Use Bark for: AI characters with emotion, audiobook narration with non-verbal cues, creative audio content. Use XTTS for: live voice assistants (speed), voice cloning.
Bark TTS on Blackwell 16GB
Expressive multilingual TTS, commercially licensed. UK dedicated hosting.
Order the RTX 5060 Ti 16GBSee also: Coqui TTS, Whisper, voice assistant, MusicGen.