Home / Blog / Tutorials / Monitoring GPU Usage on a Dedicated Server: Tools, Metrics, and Alerts

Tutorials

Monitoring GPU Usage on a Dedicated Server: Tools, Metrics, and Alerts

How to monitor GPU usage on a dedicated AI inference server — nvidia-smi, DCGM exporter, vLLM metrics, and the alerts that catch real problems.

Tutorials May 5, 2026 1 min read gigagpu

Table of Contents

Most AI deployments under-monitor their GPUs. This is the practical setup we ship.

TL;DR

Three layers: nvidia-smi for ad-hoc checks, DCGM exporter for Prometheus metrics, vLLM Prometheus for inference-engine metrics. Alert on TTFT p99, queue depth, and GPU memory util > 95%.

Tools

nvidia-smi: built-in CLI, manual checks
nvitop: htop-style live view
DCGM exporter: NVIDIA Prometheus exporter — docker run -d --gpus all nvcr.io/nvidia/k8s/dcgm-exporter
vLLM --enable-metrics: exposes /metrics in Prometheus format
Grafana: dashboards on top of Prometheus

Metrics that matter

DCGM_FI_DEV_GPU_UTIL: GPU compute utilisation %
DCGM_FI_DEV_FB_USED: VRAM used (most important for AI)
DCGM_FI_DEV_POWER_USAGE: power draw W
DCGM_FI_DEV_GPU_TEMP: temperature (alarm at >85°C)
DCGM_FI_DEV_THROTTLE_REASONS: non-zero = problem
vllm:num_requests_waiting: queue depth (alert >100)
vllm:gpu_cache_usage_perc: KV cache util (alert >95%)
vllm:time_to_first_token_seconds: TTFT (alert p99 >2s)

Alerts

p99 TTFT > 2s for 5 min — queue blowout
GPU memory util > 95% for 5 min — about to OOM
Throttle reasons != 0 — thermal or power issue
5xx error rate > 1% — vLLM crashes

Bottom line

Three-tier monitoring: nvidia-smi + DCGM + vLLM. See monitoring guide.

Need a Dedicated GPU Server?

Deploy from RTX 3050 to RTX 5090. Full root access, NVMe storage, 1Gbps — UK datacenter.

Browse GPU Servers

Tutorials

gigagpu

We benchmark, deploy, and optimise GPU infrastructure for AI workloads. All data in our guides comes from real-world testing on our UK-based dedicated GPU servers.

Ready to deploy your AI workload?

Dedicated GPU servers from our UK datacenter. NVMe storage, 1Gbps networking, full root access.

Browse GPU Servers Contact Sales

Monitoring GPU Usage on a Dedicated Server: Tools, Metrics, and Alerts

Tools

Metrics that matter

Alerts

Bottom line

Need a Dedicated GPU Server?

gigagpu

GPU Hosting

Blog Categories

AI Model Hosting

Benchmarks & Tools

Deploy a GPU Server

Ready to deploy your AI workload?

Have a question? Need help?

Monitoring GPU Usage on a Dedicated Server: Tools, Metrics, and Alerts

Tools

Metrics that matter

Alerts

Bottom line

Need a Dedicated GPU Server?

gigagpu

Related Articles

MCP Server with a Self-Hosted LLM

How to Run Multiple AI Models on a Single GPU Server

GPU Server Security for AI: Hardening Your Inference Stack

Semantic Cache Implementation

GPU Hosting

Blog Categories

AI Model Hosting

Benchmarks & Tools

Deploy a GPU Server

Ready to deploy your AI workload?

Have a question? Need help? Contact us

Have a question? Need help?