Benchmark Overview
GPU power consumption directly affects operating costs and thermal management requirements. An RTX 6000 Pro at full load draws 700W while an RTX 5090 draws 450W, but they deliver very different throughput per watt. We measured real-world power draw during AI inference across GPU models and LLM sizes on dedicated GPU hosting to quantify energy efficiency.
Test Configuration
GPUs: RTX 5090 (450W TDP), RTX 6000 Pro (350W TDP), RTX 6000 Pro 96 GB (300W TDP), RTX 6000 Pro (700W TDP). Models: Llama 3 8B INT4, Llama 3 70B INT4. Workload: continuous inference at 10 concurrent users via vLLM. Power measured via nvidia-smi at 1-second intervals, averaged over 10-minute sustained load.
Power Draw During Inference
| GPU | Idle (W) | 8B INT4 Load (W) | 70B INT4 Load (W) | TDP Utilisation |
|---|---|---|---|---|
| RTX 5090 | 25W | 280W | 380W | 62-84% |
| RTX 6000 Pro | 30W | 220W | 310W | 63-89% |
| RTX 6000 Pro 96 GB | 35W | 195W | 265W | 65-88% |
| RTX 6000 Pro | 45W | 380W | 550W | 54-79% |
Throughput per Watt (70B INT4, 10 Users)
| GPU | Throughput (tok/s) | Power (W) | Tokens per Watt | Monthly Energy Cost (UK) |
|---|---|---|---|---|
| RTX 5090 | 320 | 380W | 0.84 | ~85 GBP |
| RTX 6000 Pro | 380 | 310W | 1.23 | ~70 GBP |
| RTX 6000 Pro 96 GB | 450 | 265W | 1.70 | ~60 GBP |
| RTX 6000 Pro | 720 | 550W | 1.31 | ~125 GBP |
Energy Efficiency Analysis
The RTX 6000 Pro delivers the best energy efficiency at 1.70 tokens per watt for 70B inference. The RTX 6000 Pro produces the highest absolute throughput but consumes more power per token than the RTX 6000 Pro. The RTX 5090 is the least efficient due to its consumer-grade power profile optimised for peak gaming performance rather than sustained compute. See token speed benchmarks and GPU comparisons for throughput data.
Monthly energy costs at UK electricity rates (approximately 0.30 GBP/kWh) range from 60 to 125 GBP per GPU running 24/7 inference. For multi-GPU clusters, these costs multiply linearly.
Cooling Implications
Higher power draw means more heat. The RTX 6000 Pro at 550W sustained requires enterprise-grade cooling. The RTX 6000 Pro at 265W runs comfortably in standard server chassis. For private AI hosting in colocation, power and cooling costs can exceed GPU rental costs. Managed dedicated servers include cooling in the monthly price, simplifying budgeting.
Recommendations
For energy-conscious deployments, the RTX 6000 Pro offers the best tokens-per-watt efficiency. For maximum throughput regardless of power, the RTX 6000 Pro leads. Factor energy costs into your total cost of ownership when comparing GPU options. Deploy on GigaGPU dedicated servers where power and cooling are included. See the benchmarks section, LLM hosting guide, and infrastructure blog for more analysis.