Table of Contents
For multi-tenant SaaS running multiple customers on the same GPU, isolation matters for security and fairness. Three levels of isolation exist; each trades isolation strength against efficiency and ops complexity.
Three levels: (1) process isolation — multiple OS processes, default behaviour, decent isolation. (2) MPS (Multi-Process Service) — tighter per-process limits, time-multiplexed compute. (3) MIG (Multi-Instance GPU) — hardware partitioning on data center cards (A100 / H100). Most consumer-card SaaS: process isolation + per-tenant API keys + rate limits is enough.
Isolation levels
- Process isolation: each tenant's requests go to a vLLM process; OS-level isolation via Linux process boundaries. Default. Good enough for most SaaS.
- CUDA MPS: NVIDIA's Multi-Process Service multiplexes GPU compute across processes more efficiently. Slight isolation improvement; mostly an efficiency feature.
- CUDA streams + per-tenant queues: in-process isolation; cheapest; weakest isolation
- MIG: A100 / H100 / H200 only. Hardware partitioning into independent GPU instances. Strong isolation; consumer cards don't support it.
- Per-tenant container: each tenant gets own vLLM in own container; OS namespaces; strong isolation; ops cost per tenant
Tools
- cgroups: per-process resource limits (memory, CPU time)
- nvidia-container-toolkit: GPU access in containers with optional restrictions
- vLLM multi-LoRA: per-tenant fine-tuned variants from one base model
- Per-tenant API keys + rate limits: prevents one tenant from saturating
- Per-tenant request queues: in your gateway, fair scheduling
Trade-offs
| Pattern | Isolation | Efficiency | Ops cost |
|---|---|---|---|
| Process + API key + rate limit | Decent | High | Low |
| MPS | Decent+ | High | Medium |
| Per-tenant container | Strong | Medium | High |
| MIG (A100/H100) | Strongest | Medium | Medium |
Verdict
For most consumer-card multi-tenant SaaS, process isolation + per-tenant API keys + rate limits is the right pattern. Strong enough for typical SaaS isolation needs; cheap operationally; high efficiency. Step up to MIG only when you're on data-center hardware AND have hard isolation requirements (regulated tenants).
Bottom line
Process + API keys + rate limits is the SaaS default. See RAG isolation.