Cloud GPU instances share physical hardware with other tenants. When a noisy neighbour saturates NVMe or network, your AI workload degrades. When spot capacity disappears, your inference goes down. On our dedicated UK hosting neither happens – but the cost of those events on cloud is worth quantifying.
Contents
SLA
Hyperscale GPU SLAs typically guarantee 99.9% monthly uptime – that’s 43 minutes of allowed downtime per month. Credits for breach are a fraction of affected instance cost, not your lost revenue.
Dedicated hosting often matches 99.9%+ at the infrastructure level with the added benefit of no shared tenancy.
Noisy Neighbours
Cloud instances share physical host resources – CPU, network, storage bus. When another tenant starts a heavy workload, your LLM latency can jump 20-50%. No SLA breach occurs; the instance is still “up”. But your customers see degraded experience.
Dedicated physical hardware eliminates this. Your card is your card.
Spot Preemption
Spot instances save 60-70% vs on-demand. The tradeoff: they can be preempted with minutes’ notice when capacity is reclaimed. For LLM serving this means:
- Active requests terminated
- Model reload time (30-120s for 70B class)
- Load balancer needs to be smart about failover
- Engineer time to build and maintain preemption handling
Downtime Cost
For a SaaS charging £50/user/month with 10,000 users:
- Revenue per hour: ~£700
- 1 hour of inference downtime: £700 revenue impact plus customer trust damage
- Across a year at 99.9% SLA: up to ~9 hours × £700 = £6,300
- Plus churn effect of visible outages
Dedicated hosting’s higher per-month cost is easily justified by avoiding one customer-visible outage per year.
Dedicated Uptime Without Neighbours
UK dedicated GPU hosting with no shared tenants and no preemption risk.
Browse GPU ServersSee hidden cloud costs and annual TCO.