Serving multiple customers from one dedicated GPU server introduces isolation problems. Without care, one tenant’s traffic spike steals compute from another, one tenant’s crash takes down others, and VRAM allocation becomes a scramble. Several patterns handle this cleanly.
Contents
One Tenant Per GPU
The simplest pattern: give each tenant a dedicated GPU inside the chassis. A four-card server hosts four tenants. Each tenant gets a bound CUDA device via CUDA_VISIBLE_DEVICES. VRAM is physically separate. One tenant’s OOM does not affect others. Throughput is predictable because resources are not shared.
Downside: underutilisation. A tenant using 20% of their card leaves 80% idle.
Shared GPU with MPS
Nvidia Multi-Process Service lets multiple CUDA contexts share a GPU more efficiently than the default. Two tenants on one card each get their CUDA streams merged at the hardware level. Better utilisation, but MPS has sharp edges – OOM on one tenant can crash the MPS daemon and take down all tenants on the card.
Container Isolation
Docker containers with --gpus '"device=0"' or Kubernetes with GPU resource requests give process isolation plus resource accounting. Each tenant runs in their own namespace. Combine with one GPU per container for the cleanest outcome. See Kubernetes AI inference.
| Pattern | Isolation | Utilisation | Complexity |
|---|---|---|---|
| Tenant per GPU | Strongest | Can be low | Lowest |
| MPS shared | Moderate | Higher | Medium |
| Container per tenant | Process-level | Depends on scheduling | Medium |
| MIG (datacenter GPUs) | Strong hardware partitioning | Good | Only on A100/H100 class |
Multi-Tenant GPU Chassis Built Right
Our team helps architect isolation for customer-facing SaaS on dedicated UK hosting.
Browse GPU ServersWhich to Choose
For customer-facing SaaS with payment SLAs, use one GPU per tenant – the utilisation loss is the cost of reliability. For internal tooling with trusted consumers, MPS or containerisation works. For hybrid cases, mix – premium customers on dedicated cards, free tier on shared. See AI for agencies multi-client and access control for self-hosted AI.