Reselling a self-hosted LLM as an API product is a legitimate business in 2026. Dedicated GPU hosting gives you predictable COGS; customers get flat-rate or cheaper-per-token pricing than hyperscale APIs. Here is how to price it profitably.
Contents
COGS
Monthly server cost divided by expected monthly token throughput at target utilisation. For a 5090 serving 13B INT8:
- Monthly: £500
- Throughput: ~1000 t/s aggregate at full utilisation = ~2.6 billion tokens/month at 100% util
- Realistic target 50% utilisation: ~1.3 billion tokens/month
- COGS: £0.38 per million tokens
Models
- Per-token: direct competitor to OpenAI/Anthropic. Charge ~$1-2/M tokens.
- Monthly flat with cap: $X/month for N tokens. Predictable for customers.
- Dedicated capacity: “your own instance” at $X/month. Highest margin but limited buyers.
Benchmarks
Competitor pricing (Q2 2026, per million output tokens):
- OpenAI GPT-4o: ~$15
- OpenAI GPT-4o-mini: ~$0.60
- Anthropic Claude Haiku 4: ~$1-2
- Together.ai Llama 3.3 70B: ~$0.60-$0.90
- Your self-hosted Llama 3.3 70B: charge $0.50-$1.50 and still profit
Example
Charge $1/M tokens on Llama 3.3 70B. Your COGS: ~$0.30. Gross margin ~70%. Target 500k paying requests/month (500M tokens): $500 revenue, $150 COGS, $350 gross per month.
Scale the math with more dedicated servers as you grow. Each additional server expands capacity linearly at predictable COGS.
Resell Dedicated GPU Capacity
Predictable UK hosting lets you price an AI API with confident margins.
Browse GPU Servers