Hyperscale cloud GPU pricing on AWS, GCP, and Azure looks reasonable on the pricing page. The actual bill is consistently 30-50% higher than the nominal compute number. Understanding where the extra comes from helps compare against our dedicated hosting honestly.
Contents
Egress
AWS charges ~$0.09/GB for data leaving the region. Serving model responses to users outside AWS adds up fast. An LLM API serving 100GB/month of tokens out to customers costs ~$9 at AWS rates. A busy API pushing 10TB/month: $900.
Dedicated hosting typically bundles bandwidth or charges flat rates far below $0.09/GB.
Storage
Model weights are 10-50 GB each. On AWS EBS gp3 at $0.08/GB-month, storing a dozen fine-tune checkpoints costs $50-$500/month. On S3 it’s cheaper but access latency matters for model loading.
Dedicated hosting includes generous local NVMe – model weights live on-server, no additional charge.
Monitoring
CloudWatch, Cloud Monitoring, and Azure Monitor all charge per metric and per log ingest. A moderately instrumented LLM deployment easily adds $100-$500/month in observability costs on top of compute.
On dedicated hosting you run open-source Prometheus and Grafana at zero marginal cost.
Opportunity Cost
Spot instances save money but get preempted. Recovery code, checkpoint handling, and the occasional cold start all cost engineer time. If you are paying a senior engineer £500/day, a week of preemption-handling work is £2,500 that dedicated hosting avoids.
Bursty auto-scaling has similar overhead – you need engineering to build autoscaling that works reliably at LLM-scale warm-up times.
No-Surprises UK Dedicated Hosting
One monthly invoice. No egress, no per-metric monitoring fees, no preemption.
Browse GPU ServersSee annual TCO and cost of downtime.