LLM weights are big – 40 GB for Llama 3 70B INT4, 140+ GB at FP16. Naive backup burns NVMe space, copies the same weights across servers, and makes model rollout slow. On our dedicated GPU hosting a smarter pattern keeps things tidy.
Contents
Don’t Back Up Public Weights
Llama 3, Qwen, Mistral – these are available from Hugging Face and will be for the foreseeable future. Backing up a copy to your own infrastructure is redundant. A re-download over your datacenter bandwidth on a dead server is typically faster than restoring from your own remote backup.
Exception: if HF removes a model or you run in air-gapped mode. In that case, keep one canonical copy somewhere cheap.
Fine-Tunes
Your fine-tunes are precious – they are the actual IP. Back them up.
Strategy:
- Tag each fine-tune with a version and training config
- Upload to an S3-compatible store (R2, B2, Wasabi – all cheaper than AWS S3)
- Keep the training data separately backed up
- Store the training script in git alongside
Storage
At ~1 GB per LoRA adapter and ~15 GB per fully-tuned 7B model:
| Store | Price/GB-month | Egress |
|---|---|---|
| AWS S3 Standard | ~$0.023 | $0.09/GB |
| Cloudflare R2 | ~$0.015 | Free egress |
| Backblaze B2 | ~$0.006 | $0.01/GB |
| Wasabi | ~$0.0069 | Free |
Cloudflare R2 wins for most model-weight workflows – cheap storage, free egress means re-pulling weights to a server is free.
Multi-Server Caching
On multi-server deployments, cache weights on a fast shared NFS or local NVMe with a sync cron. First server to download fills the cache; subsequent servers pull locally. Saves bandwidth and startup time.
Sensible Model Storage
UK dedicated GPU hosting with NVMe for weights and clean backup integration.
Browse GPU Servers