Model load time matters during cold starts, rolling deployments, and autoscaling. A single Gen4 NVMe reads ~5-7 GB/s. A RAID 0 across two drives can double that. On a dedicated GPU server with fast storage, model loads drop to a few seconds.
Contents
Numbers
Loading a 70B model from disk to GPU:
| Storage | Load Time (~40 GB Q4) |
|---|---|
| SATA SSD | ~80-120 seconds |
| Gen3 NVMe single | ~15 seconds |
| Gen4 NVMe single | ~8 seconds |
| Gen4 NVMe RAID 0 × 2 | ~4-5 seconds |
| Gen5 NVMe single | ~4-5 seconds |
Practical load time is often bottlenecked by CPU or PCIe bus to GPU, not pure disk speed. After Gen4 RAID or Gen5 single, further RAID has diminishing returns.
Setup
Software RAID 0 via mdadm:
sudo mdadm --create /dev/md0 --level=0 --raid-devices=2 /dev/nvme0n1 /dev/nvme1n1
sudo mkfs.ext4 /dev/md0
sudo mkdir /data && sudo mount /dev/md0 /data
Add to /etc/fstab for boot. Store model weights under /data.
Data Risk
RAID 0 has no redundancy – any drive failure loses everything. For model weights this is usually acceptable because weights are backed up elsewhere (see backing up weights). Recovery means re-downloading from backup, taking minutes.
Do not use RAID 0 for user data or logs you cannot reconstruct.
When to Bother
RAID 0 pays back when:
- You do rolling deployments or blue-green upgrades frequently – model load time is visible to you
- You have large models (70B+) where single-NVMe load is >15 seconds
- You host many different models that swap in and out during the day
For a single model that loads once at server start, single Gen4 NVMe is plenty.
Fast-Storage GPU Hosting
UK dedicated GPU servers with Gen4 or Gen5 NVMe and optional RAID 0.
Browse GPU ServersSee ZFS vs ext4.