RTX 3050 - Order Now
Home / Blog / Tutorials / Monitoring GPU Usage on a Dedicated Server: Tools, Metrics, and Alerts
Tutorials

Monitoring GPU Usage on a Dedicated Server: Tools, Metrics, and Alerts

How to monitor GPU usage on a dedicated AI inference server — nvidia-smi, DCGM exporter, vLLM metrics, and the alerts that catch real problems.

Table of Contents

  1. Tools
  2. Metrics that matter
  3. Alerts

Most AI deployments under-monitor their GPUs. This is the practical setup we ship.

TL;DR

Three layers: nvidia-smi for ad-hoc checks, DCGM exporter for Prometheus metrics, vLLM Prometheus for inference-engine metrics. Alert on TTFT p99, queue depth, and GPU memory util > 95%.

Tools

  • nvidia-smi: built-in CLI, manual checks
  • nvitop: htop-style live view
  • DCGM exporter: NVIDIA Prometheus exporter — docker run -d --gpus all nvcr.io/nvidia/k8s/dcgm-exporter
  • vLLM --enable-metrics: exposes /metrics in Prometheus format
  • Grafana: dashboards on top of Prometheus

Metrics that matter

  • DCGM_FI_DEV_GPU_UTIL: GPU compute utilisation %
  • DCGM_FI_DEV_FB_USED: VRAM used (most important for AI)
  • DCGM_FI_DEV_POWER_USAGE: power draw W
  • DCGM_FI_DEV_GPU_TEMP: temperature (alarm at >85°C)
  • DCGM_FI_DEV_THROTTLE_REASONS: non-zero = problem
  • vllm:num_requests_waiting: queue depth (alert >100)
  • vllm:gpu_cache_usage_perc: KV cache util (alert >95%)
  • vllm:time_to_first_token_seconds: TTFT (alert p99 >2s)

Alerts

  • p99 TTFT > 2s for 5 min — queue blowout
  • GPU memory util > 95% for 5 min — about to OOM
  • Throttle reasons != 0 — thermal or power issue
  • 5xx error rate > 1% — vLLM crashes

Bottom line

Three-tier monitoring: nvidia-smi + DCGM + vLLM. See monitoring guide.

Need a Dedicated GPU Server?

Deploy from RTX 3050 to RTX 5090. Full root access, NVMe storage, 1Gbps — UK datacenter.

Browse GPU Servers

gigagpu

We benchmark, deploy, and optimise GPU infrastructure for AI workloads. All data in our guides comes from real-world testing on our UK-based dedicated GPU servers.

Ready to deploy your AI workload?

Dedicated GPU servers from our UK datacenter. NVMe storage, 1Gbps networking, full root access.

Browse GPU Servers Contact Sales

Have a question? Need help?