The filesystem choice on a dedicated GPU server affects model loading speed, backup workflow, and storage efficiency. ext4 is the default; ZFS offers features that sometimes justify the overhead. Here is how to choose.
Contents
ext4
The default Linux filesystem. Fast, well-tested, minimal overhead. Loading a 40 GB model from NVMe on ext4 hits near-raw NVMe read speeds.
No built-in snapshots or compression. Rely on LVM or btrfs-style overlays for those if needed.
ZFS
Features that matter for ML:
- Transparent compression – LZ4 compresses model weights ~10-20% with negligible CPU
- Snapshots – instant “rollback” to a previous weight set
- Checksums – detect silent bit-rot, important for multi-year weight archives
- Deduplication – if you store multiple variants of one model, dedup saves space (but dedup is memory-heavy; use sparingly)
Cost: ~10-15% overhead on large sequential reads, noticeable extra RAM usage (ARC cache), tuning complexity.
Benchmarks
Loading Llama 3 70B Q4 (~40 GB) from local NVMe:
| Filesystem | Time |
|---|---|
| ext4 | ~8 seconds |
| XFS | ~8.5 seconds |
| ZFS (no compression) | ~9-10 seconds |
| ZFS (LZ4) | ~10-11 seconds with ~12% less disk used |
| btrfs | ~9 seconds |
Pick
Most deployments: ext4. Simple, fast, works.
Pick ZFS when:
- You need instant snapshots for weight version management
- You archive many TB of model variants and want compression
- You are running a long-term archive where bit-rot detection matters
For hot-path inference loading, ext4’s raw speed wins.
Flexible Storage GPU Hosting
UK dedicated GPU hosting with ext4, XFS, or ZFS on request.
Browse GPU ServersSee NVMe RAID.