RTX 3050 - Order Now
Home / Blog / Benchmarks / Tokens per Watt: Energy Efficiency
Benchmarks

Tokens per Watt: Energy Efficiency

Benchmarking AI inference energy efficiency across GPU models measured in tokens per watt. Comparing power consumption against throughput to find the most cost-effective GPU for sustainable AI hosting.

Benchmark Overview

As AI inference scales, energy consumption becomes a significant cost factor and sustainability concern. A data centre running 100 GPUs at 400W each consumes 40kW continuously, costing thousands monthly in electricity alone. We measured tokens-per-watt efficiency across GPU models and LLM sizes to identify the most energy-efficient configurations for production inference on dedicated GPU hosting.

Test Configuration

GPUs: RTX 5090 (450W TDP), RTX 6000 Pro (350W TDP), RTX 6000 Pro 96 GB (300W TDP), RTX 6000 Pro (700W TDP). Models: Llama 3 8B INT4, Llama 3 70B INT4. Workload: sustained inference at 10 concurrent users via vLLM. Power measured via nvidia-smi averaged over 30-minute sustained load. Energy efficiency = total tokens generated per second / average power draw in watts. See token benchmarks for raw throughput numbers.

Energy Efficiency: 8B INT4 Model

GPUThroughput (tok/s)Power Draw (W)Tokens per WattMonthly Energy Cost (24/7, UK)
RTX 5090680285W2.39~62 GBP
RTX 6000 Pro750225W3.33~49 GBP
RTX 6000 Pro 96 GB890200W4.45~44 GBP
RTX 6000 Pro1,420390W3.64~85 GBP

Energy Efficiency: 70B INT4 Model

GPUThroughput (tok/s)Power Draw (W)Tokens per WattMonthly Energy Cost (24/7, UK)
RTX 5090320380W0.84~83 GBP
RTX 6000 Pro380310W1.23~68 GBP
RTX 6000 Pro 96 GB450265W1.70~58 GBP
RTX 6000 Pro720550W1.31~120 GBP

Efficiency Rankings

The RTX 6000 Pro leads in energy efficiency for both model sizes, delivering the most tokens per watt consumed. Its 300W TDP and HBM2e bandwidth create an optimal ratio. The RTX 6000 Pro ranks second with excellent efficiency from its lower power profile. The RTX 6000 Pro delivers the highest absolute throughput but consumes proportionally more power, ranking third in efficiency. The RTX 5090 is consistently the least efficient due to its consumer power profile. See GPU comparisons for full specifications.

Cost-Efficiency Analysis

At UK electricity rates of approximately 0.30 GBP/kWh, monthly energy costs range from 44 to 120 GBP per GPU running 24/7. For multi-GPU clusters with 4-8 GPUs, energy adds 175-960 GBP monthly. This is 5-15% of the total hosting cost but becomes significant at scale. On managed dedicated servers, power costs are typically included in the monthly price, simplifying budgeting.

When comparing total cost per million tokens, the RTX 6000 Pro leads for energy-conscious deployments while the RTX 6000 Pro wins on time-efficiency where faster completion matters more than energy cost. Configure efficiency-optimised deployments via the vLLM production guide.

Recommendations

For maximum energy efficiency, the RTX 6000 Pro delivers the best tokens-per-watt across model sizes. For maximum throughput where energy cost is secondary, the RTX 6000 Pro leads. Factor energy costs into GPU selection when running 24/7 inference, especially in multi-GPU configurations. Deploy on GigaGPU dedicated servers with private AI hosting where power and cooling are managed. Visit the benchmarks section, LLM hosting guide, and infrastructure blog for comprehensive deployment planning.

Need a Dedicated GPU Server?

Deploy from RTX 3050 to RTX 5090. Full root access, NVMe storage, 1Gbps — UK datacenter.

Browse GPU Servers

admin

We benchmark, deploy, and optimise GPU infrastructure for AI workloads. All data in our guides comes from real-world testing on our UK-based dedicated GPU servers.

Ready to deploy your AI workload?

Dedicated GPU servers from our UK datacenter. NVMe storage, 1Gbps networking, full root access.

Browse GPU Servers Contact Sales

Have a question? Need help?