Spot GPU pricing looks unbeatable until you total the preemption-handling engineering, cold-start latency and user-visible outages. A dedicated RTX 5060 Ti 16GB on our UK dedicated hosting is always-on by construction, and the math works out better than the headline $/hr suggests.
Contents
- Spot cloud economics
- Head-to-head comparison
- Hidden cost of preemption
- When spot still wins
- When dedicated wins
Spot cloud economics
Spot instances are excess capacity cloud providers sell at a discount, on the condition that they can reclaim the GPU at short notice (typically 30-120 seconds). Typical discount vs on-demand is 40-65%.
- 16GB-class on-demand: $0.40-0.75/hr.
- 16GB-class spot: $0.15-0.30/hr.
- Our 5060 Ti dedicated flat rate: ~£0.41/hr effective ($0.53).
Head-to-head comparison
| Dimension | Spot cloud 16GB | On-demand 16GB | GigaGPU 5060 Ti dedicated |
|---|---|---|---|
| Hourly cost | $0.15 – $0.30 | $0.40 – $0.75 | ~$0.53 effective |
| Monthly 24×7 | $108 – $216 | $288 – $540 | $380 flat |
| Preemption rate | 5-30% per day in busy regions | 0% | 0% |
| Warning before kill | 30-120s | n/a | n/a |
| Boot time | 60-180s | 60-120s | Always warm |
| Model load time | +30-120s per cycle | +30-120s at start | 0s (persisted) |
| SLA | None | 99.5-99.9% | 99.9% |
| Consistent GPU type | No – best effort | Yes | Yes – fixed 5060 Ti Blackwell |
Hidden cost of preemption
A proper spot architecture requires real engineering. Expect:
- Preemption-signal handling: SIGTERM trap, graceful shutdown, request draining. ~1-2 engineer weeks initial plus ongoing maintenance.
- Checkpoint / restart: model weights cached to local SSD to avoid re-download; vLLM KV-cache discarded on every preemption.
- Request replay: in-flight requests must be retried on another instance – adds complexity to your API layer.
- Cold-start latency: 5-30 seconds user-visible p99 spikes on every reclaim event.
- Capacity scarcity: when the region is busy, spot simply is not available. Engineer on-call incidents follow.
Conservatively, engineer-hours for running spot in production cost £500-£2,000/month amortised – plus the user-visible reliability hit that is harder to price but often the biggest issue.
When spot still wins
- Overnight fine-tuning runs that tolerate restart from checkpoint.
- Bulk document processing pipelines with idempotent tasks.
- Data preprocessing / feature extraction.
- Research experimentation on one-off models.
When dedicated wins
- Real-time inference APIs with user SLA.
- Chatbot or RAG backends where p99 latency matters.
- Any UK-data-residency requirement.
- Teams without the engineering budget to build preemption handling.
- Workloads where you want predictable concurrency.
For most production inference, spot false economy beats the quoted savings. See our vs RunPod and vs Lambda Labs comparisons.
Always-on dedicated without preemption drama
No reclaims, no cold starts, no engineer-time tax. UK dedicated hosting.
Order the RTX 5060 Ti 16GBSee also: vs RunPod, vs Lambda Labs, concurrent users, ROI analysis.