Table of Contents
Most AI deployments don't have a DR plan because "the model is on Hugging Face". The other 80% of state matters too.
Back up: vector store data (Qdrant snapshots), fine-tuned LoRA adapters, per-tenant config, request logs, build manifest with pinned versions. RTO target: 1 hour. Test the restore quarterly.
What to back up
- Qdrant snapshots (daily, off-server)
- LoRA adapters (after every training run)
- LiteLLM config + API key database
- Build manifest: vLLM version, driver version, model commit SHAs
- Request logs (compliance-required retention period)
Don't back up: model weights from Hugging Face (re-downloadable).
DR plan
- Provision new dedicated GPU server (under 24h via GigaGPU)
- Install pinned versions from build manifest
- Restore Qdrant snapshot
- Restore LoRA adapters
- Restore LiteLLM config
- Re-download model weights from Hugging Face
- Smoke test, flip DNS
Verdict
DR for AI is mostly the same as DR for any backend, with one twist: model weights are external (Hugging Face) and re-downloadable.
Bottom line
Boring infrastructure pays back when something breaks. See on-call runbook.