Table of Contents
Most teams default to Docker for everything. For GPU AI inference, the container tax is small but real, and the operational model is sometimes wrong.
Docker overhead for GPU AI: ~2-5% throughput, ~50 ms cold-start delay, complicated GPU passthrough. For ephemeral workloads Docker wins on portability. For long-running production single-server, bare-metal systemd is simpler and slightly faster.
Performance overhead
- vLLM throughput in Docker vs bare-metal: ~2-5% lower in Docker (NVIDIA Container Toolkit)
- Cold start: Docker container init adds ~50-100 ms
- VRAM accounting: container limits sometimes mis-report
When Docker is right
- Multi-tenant isolation
- Many models swapped in/out
- Want portability between hosts
- CI / CD pipelines build containerised images
- Kubernetes-managed deployments
When bare-metal wins
- Single-tenant, long-running production server
- Latency-critical workload (every ms counts)
- Simpler operational model (no Docker layer)
- Custom kernel modules / driver tuning
Verdict
For dedicated single-server AI inference, bare-metal systemd is simpler and slightly faster. For multi-tenant or fleet deployments, Docker / Kubernetes wins on operational scale.
Bottom line
Default to systemd on dedicated. Use Docker when you genuinely need it. See Kubernetes vs systemd.