A modern GPU draws 150-575 W depending on model and workload. On our dedicated GPU hosting power is bundled but the settings still matter – they affect thermal headroom, performance consistency, and in some cases longevity.
Contents
Persistence Mode
Persistence mode keeps the Nvidia driver resident between GPU uses. Without it, the driver reinitialises on every process launch – adding 1-3 seconds of cold-start latency. Always enable on a server.
sudo nvidia-smi -pm 1
Set via systemd unit to survive reboots.
Power Limit
You can cap power draw below the card’s default. Useful when thermals are marginal or when you want predictable power consumption:
sudo nvidia-smi -pl 300 # cap at 300 W
On a 5090 (575 W default), capping at 400 W reduces performance by roughly 10-15% but cuts power by 30%. For batch workloads not dominated by peak throughput, this is often a favourable trade.
Clock Locking
For benchmark repeatability you can lock clocks:
sudo nvidia-smi --lock-gpu-clocks=1500,1980
sudo nvidia-smi --reset-gpu-clocks # unlock
Locking eliminates thermal throttling variance during benchmarks. For production serving, leave auto-boost enabled – the GPU will push clocks higher than your lock would allow.
Tradeoffs
| Setting | Performance | Power | Thermal |
|---|---|---|---|
| Default | 100% | 100% | Baseline |
| -pl 80% of max | ~90-95% | 80% | Lower temps |
| -pl 60% of max | ~75% | 60% | Much lower |
Most production deployments run at default. Power limit when the chassis runs hot or when you want deterministic thermal behaviour.
GPU Servers with Sensible Defaults
UK dedicated hosting with persistence mode and thermal envelopes preconfigured.
Browse GPU Servers