MusicGen Large from Meta’s AudioCraft generates up to 30 seconds of music from text prompts or melody conditioning. At 3.3B parameters it is the largest practical self-hosted music model in 2026. On our dedicated GPU hosting it fits a 16 GB card comfortably.
Contents
VRAM
~8-10 GB at FP16 for MusicGen Large. Runs comfortably on any 12 GB+ card.
Deployment
from audiocraft.models import MusicGen
from audiocraft.data.audio import audio_write
model = MusicGen.get_pretrained("facebook/musicgen-large", device="cuda")
model.set_generation_params(duration=20)
prompts = ["Calm acoustic guitar with rain ambience"]
wav = model.generate(prompts)
for idx, one_wav in enumerate(wav):
audio_write(f"output_{idx}", one_wav.cpu(), model.sample_rate)
Conditioning
Text conditioning is the default – describe genre, instruments, mood, tempo. Melody conditioning lets you upload a reference WAV and MusicGen will generate music matching its melody:
melody, sr = torchaudio.load("reference.wav")
wav = model.generate_with_chroma(["upbeat rock"], melody[None], sr)
Melody conditioning is useful for arranging variations on a theme.
Quality
MusicGen produces convincingly musical output but with limitations:
- Max output is 30 seconds. Longer outputs require chaining (concatenation with overlap) which introduces seams.
- No vocals – instrumental only
- Quality is genre-dependent – simple pop and ambient work well, complex jazz or classical less so
- Can sound repetitive at longer durations
Good for background tracks, game loops, and podcast intros. Not a replacement for a composer.
Self-Hosted Music Generation
MusicGen Large on UK dedicated GPUs, available from 4060 Ti upward.
Browse GPU ServersSee Stable Audio Open.