Gross margin for AI products is a product of pricing, utilisation, and infrastructure choice. On dedicated GPU hosting you can model it precisely because your biggest variable becomes a fixed line item.
Contents
Formula
Gross Margin = (Revenue - COGS) / Revenue
COGS = Infrastructure + Direct Support + Per-Customer Third-Party Fees
For AI products, infrastructure dominates COGS unless your product is very light on inference.
Inputs
Collect for each customer segment:
- Monthly revenue per customer
- Average AI requests per customer per month
- Average tokens per request
- Your per-token COGS (if API) or server allocation (if dedicated)
- Direct customer success / support cost
Example
B2B SaaS, Pro plan £100/month, 1,000 AI queries/month per user, 500 output tokens average:
| Line | OpenAI API | Dedicated 5090 |
|---|---|---|
| Revenue | £100 | £100 |
| AI infra per user | £15-25 | £2-5 (amortised) |
| Other infra | £3 | £3 |
| Support/CS | £10 | £10 |
| Gross margin | 62-72% | 82-85% |
The dedicated hosting gap widens with user count – amortisation gets more favourable.
Levers
- Switch from API to dedicated hosting above break-even
- Quantise models (INT4/FP8) – same quality, half the VRAM, lower per-token cost
- Cache aggressively (prefix caching, response caching)
- Tier caps – charge heavy users more, not average users
Fixed-Infra Gross Margin
UK dedicated hosting with predictable infra costs that support 80%+ gross margin.
Browse GPU ServersSee SaaS unit economics.