Lambda Labs prices the RTX 4090 24GB at $0.50/hour on-demand, undercutting most US hyperscalers by 25-40% and putting it within touching distance of UK dedicated 4090 hosting at GigaGPU. The economics tilt sharply depending on whether your workload is a 3-day fine-tune sprint or a permanent UK-facing inference endpoint – this article works through both with real numbers, with the dedicated GPU range as the flat-rate baseline and proper accounting of the hidden costs Lambda’s marketing page does not show.
Contents
- Lambda 4090 pricing today
- Flat UK dedicated baseline
- Three-day fine-tuning sprint
- Monthly always-on inference workload
- Hidden costs: storage, egress, boot, queue
- Region and latency considerations
- Production gotchas
- Verdict by workload
Lambda 4090 pricing today
Lambda Cloud exposes the 4090 mainly through single-GPU and multi-GPU on-demand instances. The headline rate is $0.50/hr (~£0.40), billed by the second once the instance boots. There is no spot tier, no preemption discount, and no long-term commit programme on this SKU – what you see is what you pay. Storage is metered separately at $0.20/GB/month for persistent volumes attached to instances, and egress above the included 1TB monthly quota is $0.05-0.09/GB depending on destination region.
| Item | Rate USD | GBP equivalent (~) | Notes |
|---|---|---|---|
| 4090 on-demand | $0.50/hr | £0.40/hr | Per-second billing after boot |
| Persistent storage | $0.20/GB-month | £0.16/GB-mo | 4x RunPod pricing |
| Egress (above 1TB) | $0.05-0.09/GB | £0.04-0.07/GB | Region-dependent |
| Boot time | ~90 seconds | n/a | Plus model load time |
| SSH/Jupyter included | Yes | Yes | No surcharge |
| Reserved capacity (1yr) | Not on 4090 | n/a | Available on H100/A100 only |
Capacity reality
Lambda’s 4090 fleet is much smaller than its A100/H100 fleet. Capacity is regularly exhausted, particularly in the popular us-east-1 and us-west-1 regions during US business hours. “Try again later” is a common UX. Build retry logic into provisioning scripts, and do not assume capacity is available when you need it for a deadline-sensitive sprint.
Flat UK dedicated baseline
A dedicated RTX 4090 24GB at GigaGPU prices at roughly £550/month, inclusive of host CPU (Xeon or EPYC), 64-128GB system RAM, 1-2TB local NVMe scratch, and 1Gbps unmetered transit on a UK datacentre backbone. For the comparisons below we use £550/month as the midpoint, equivalent to roughly $700. There is no per-hour meter, no boot time, no per-GB storage charge, and no egress overage. The card is yours every minute of the month, including the minutes you don’t use it. Cross-reference with the monthly hosting cost and ROI analysis for adjacent maths.
Three-day fine-tuning sprint
Suppose you need to QLoRA fine-tune Llama 3.1 8B for 72 wall-clock hours. Lambda costs 72 × $0.50 = $36 for compute, plus maybe $4 for 100GB of persistent storage during the run. Total ~$40, or about £32. A dedicated 4090 for the same 3 days, prorated against a £550/mo bill, would cost ~£55. For one-off bursts, Lambda wins clearly. The maths only inverts on long sprints or on always-on workloads.
| Sprint length | Lambda compute | Lambda + 200GB storage | Dedicated prorated | Cheapest |
|---|---|---|---|---|
| 1 day (24 hrs) | $12 / £9.60 | $52 / £42 | £18 | Lambda compute, dedicated with storage |
| 3 days (72 hrs) | $36 / £29 | $76 / £61 | £55 | Lambda |
| 7 days (168 hrs) | $84 / £67 | $124 / £99 | £128 | Lambda |
| 14 days (336 hrs) | $168 / £134 | $208 / £166 | £257 | Lambda |
| 21 days (504 hrs) | $252 / £202 | $292 / £234 | £385 | Lambda |
| 30 days (720 hrs) | $360 / £288 | $400 / £320 | £550 | Lambda |
| 60 days (1,440 hrs) | $720 / £576 | $800 / £640 | £1,100 | Lambda |
Why the sprint maths still favours Lambda
Even at full month-long usage, $0.50/hr × 720 = $360/mo, well under the dedicated £550 baseline. The flat-rate proposition does not win on raw compute meter for sprints under 60 days.
Monthly always-on inference workload
Lambda’s $0.50/hr never crosses dedicated 4090 pricing for a single 30-day stretch on compute alone. So why would a sane infra team choose flat hosting? Three reasons keep recurring: stable cost predictability across multi-month engagements, NVMe-local datasets that avoid the $0.20/GB/mo storage tax, and avoiding boot-time latency for production endpoints. Once you stretch a Lambda instance into a multi-month always-on engagement, the storage gap also widens.
| Months always-on | Lambda compute | Lambda + 500GB storage | Dedicated cumulative | Delta vs dedicated |
|---|---|---|---|---|
| 1 | £288 | £368 | £550 | Lambda -£182 |
| 3 | £864 | £1,104 | £1,650 | Lambda -£546 |
| 6 | £1,728 | £2,208 | £3,300 | Lambda -£1,092 |
| 12 | £3,456 | £4,416 | £6,600 | Lambda -£2,184 |
| 24 | £6,912 | £8,832 | £13,200 | Lambda -£4,368 |
On pure compute Lambda wins on absolute cost at every duration. The dedicated proposition lives in the bundle: included NVMe (saving £40-200/mo storage), unmetered egress (saving £30-200/mo for inference APIs serving meaningful token volume), UK datacentre presence for GDPR-bound clients, and predictable invoicing for finance teams that hate variable line items. Once you net those in, the gap closes considerably.
Hidden costs: storage, egress, boot, queue
Storage at 4x RunPod prices
Lambda’s $0.20/GB/mo persistent storage is the highest in the major-provider league. A 1TB index of model weights, embeddings and chat history adds $200/mo – 56% of the headline compute cost. Compare to RunPod’s $0.07/GB/mo, or the £0 marginal cost of the included 1-2TB NVMe on dedicated.
Egress above the included quota
The first 1TB/month of egress is included. A modest production chat API doing 50M tokens/day at 8KB per completion pushes 12TB/month – that’s 11TB over quota at $0.05-0.09/GB = $550-990 in egress alone. Inference APIs are egress-heavy by nature; this charge alone can flip the economics.
Boot time and idle padding
Lambda boots in ~90 seconds, plus 30-180 seconds for model load depending on size. If you build serverless-style spin-up on every request, you pay 2-4 minutes of unusable time per cold start. Most teams keep instances warm 24/7 to avoid this, which means paying the full hourly rate regardless of utilisation.
Queue and capacity unavailability
Lambda’s 4090 capacity in popular regions runs out routinely. A deadline-sensitive sprint that needs to start at 09:00 may not get capacity until 14:00. Build retry-with-backoff into provisioning, and have a fallback (RunPod Community, or your dedicated box) for time-critical work.
Region and latency considerations
Lambda’s 4090 capacity is overwhelmingly US-based. UK-originating traffic to a Lambda 4090 endpoint adds 80-110ms RTT before the model even starts decoding. For a chat UX targeting British or European users, that latency can make a 200ms time-to-first-token feel like 320ms. UK-hosted dedicated kit at GigaGPU eliminates that hop. For data-residency-bound work (NHS, financial services, public sector), Lambda US is often outright disqualifying.
| Origin | To Lambda us-east-1 | To Lambda us-west-1 | To GigaGPU UK |
|---|---|---|---|
| London office | ~85ms RTT | ~150ms RTT | ~10ms RTT |
| Manchester | ~95ms RTT | ~160ms RTT | ~15ms RTT |
| Frankfurt | ~95ms RTT | ~165ms RTT | ~25ms RTT |
| New York | ~25ms RTT | ~75ms RTT | ~80ms RTT |
Production gotchas
- Capacity is not guaranteed. Lambda 4090 routinely runs out in busy regions. Build provisioning retry; do not put it on a critical-path deploy.
- Storage is the silent cost killer. $0.20/GB/mo means a 1TB model and embedding store costs $200 over and above compute. Audit your volumes monthly.
- Egress overage hits inference APIs hardest. 1TB included is generous for training but tight for streaming completions at scale. Monitor or you will be surprised.
- No spot tier. Unlike RunPod or AWS, Lambda has no preemptible discount on the 4090. The on-demand rate is the only rate.
- Boot + load time is real. Plan 3-5 minutes from `lambda instance launch` to “ready for first request” for a 70B AWQ model. Bake this into your warm-pool sizing.
- UK/EU latency is significant. 80-110ms transatlantic RTT is unavoidable. For UK users, dedicated is structurally faster, not just cheaper.
- Reserved pricing not available on 4090. Long-term commit discounts apply to A100/H100 only – the 4090 stays at $0.50/hr regardless of duration.
Verdict by workload
For sprints under 60 days, Lambda Labs is genuinely cheaper at $0.50/hr – the dedicated breakeven on compute alone never quite arrives within a single month. For permanent endpoints serving UK or European users with serious egress, NVMe-resident datasets, GDPR data residency requirements, or strict latency SLAs, dedicated UK hosting wins on total cost of ownership and on user-experience metrics. The crossover is not on the compute meter but on bundled extras (storage, egress) and on latency (RTT to UK end users). For a one-off training sprint, pick Lambda. For a 12-month customer-facing chat API, pick a dedicated 4090 from GigaGPU.
Predictable monthly billing
One Ada AD102, in the UK, no per-hour meter and no surprise storage or egress invoices. UK dedicated hosting with included NVMe and unmetered transit.
Order the RTX 4090 24GBSee also: vs RunPod pricing, vs Together AI, monthly hosting cost, ROI analysis, vs OpenAI API cost, vs Anthropic API cost, vs cloud H100, break-even calculator, fine-tune throughput, Llama 8B benchmark, FP8 deployment, tier positioning 2026, spec breakdown, 5060 Ti vs Lambda, for SaaS RAG, best GPU for fine-tuning.