Meta’s MusicGen produces AI music from text prompts. On the RTX 5060 Ti 16GB at our hosting, all variants fit.
Contents
Setup
- Audiocraft 1.3 library (Meta)
- FP16 inference, CUDA 12.6
- 32 kHz EnCodec decoder
Variants
| Model | Params | VRAM | Conditioning |
|---|---|---|---|
| facebook/musicgen-small | 300M | 1.8 GB | Text |
| facebook/musicgen-medium | 1.5B | 5.2 GB | Text |
| facebook/musicgen-large | 3.3B | 10.4 GB | Text |
| facebook/musicgen-melody | 1.5B | 5.6 GB | Text + melody |
| facebook/musicgen-stereo-large | 3.3B | 11.2 GB | Text, stereo |
Generation Time
| Model | 5-sec clip | 10-sec clip | 30-sec clip |
|---|---|---|---|
| small | 1.4 s | 2.6 s | 8.8 s |
| medium | 3.8 s | 7.5 s | 24 s |
| large | 8.9 s | 17 s | 55 s |
| melody | 4.1 s | 8.2 s | 25 s |
| stereo-large | 10.5 s | 21 s | 65 s |
Verdict
For prototyping and SFX production, medium is a good default – 3.8 s for 5-sec clip at decent quality. Large for final-quality cuts where you’re happy to wait. Melody variant is essential for continuation-based work.
Use cases: video game background music generation, ad soundbed prototyping, app notification sounds. Commercial licensing varies per Meta’s terms – read the model card.
MusicGen on Blackwell 16GB
Large model fits, 30-sec clips in under a minute. UK dedicated hosting.
Order the RTX 5060 Ti 16GB