Carbon footprint per inference request depends on three things: GPU efficiency, grid carbon intensity, and server utilisation. On our UK dedicated hosting the numbers are measurable and reasonable to report on a sustainability page.
Contents
Methodology
kWh per request × grid gCO2e/kWh × PUE = gCO2e per request.
UK Grid
2026 average UK carbon intensity: roughly 150-250 gCO2e/kWh on the generation basis, trending down. Compared to ~400+ in grids reliant on coal or gas, UK compares favourably.
Per-Token
A Llama 3 70B INT4 request on a 5090 drawing 350 W during the 2-second inference:
- Energy: 0.35 kW × 2s / 3600 = 0.19 Wh
- At 200 gCO2e/kWh: 0.039 gCO2e per request
- With PUE 1.3: 0.05 gCO2e per request
For perspective: a Google search is ~0.2 gCO2e. An LLM request is comparable to a handful of web searches.
Versus Cloud
Hyperscale cloud often claims near-zero carbon via renewable energy certificates (RECs). The physical generation mix where the server actually runs may still be fossil-heavy – RECs are accounting, not physics.
UK dedicated hosting on a relatively clean grid typically has lower actual physical emissions than generic cloud regions. For accurate sustainability reporting, cite your grid region honestly rather than corporate REC-based claims.
UK-Grid Dedicated GPU Hosting
Transparent carbon reporting and a relatively low-carbon grid mix.
Browse GPU ServersSee UK energy cost analysis and tokens per watt.