Keeping an eye on GPU utilisation is critical when running AI workloads on a dedicated GPU server. Underutilised GPUs waste money, while overloaded ones throttle and degrade inference latency. This tutorial covers monitoring approaches from simple CLI tools to full Prometheus and Grafana dashboards, so you can track every metric that matters on your LLM hosting infrastructure.
nvidia-smi: Quick GPU Monitoring
The fastest way to check GPU status is nvidia-smi, which ships with the NVIDIA driver. If you need to install or update your drivers, see our CUDA installation guide.
# One-shot GPU status
nvidia-smi
# Continuous monitoring every 1 second
nvidia-smi -l 1
# Structured CSV output for scripting
nvidia-smi --query-gpu=index,name,temperature.gpu,utilization.gpu,utilization.memory,memory.used,memory.total,power.draw \
--format=csv -l 1
# Monitor specific GPU processes
nvidia-smi pmon -i 0 -s u -d 1
# Query per-process memory usage
nvidia-smi --query-compute-apps=pid,name,used_memory --format=csv
For a persistent terminal monitor, use nvtop which provides a top-like interface for GPUs:
# Install nvtop
sudo apt update && sudo apt install -y nvtop
# Launch the interactive monitor
nvtop
These tools are excellent for quick debugging but inadequate for production monitoring at scale. For that, you need a metrics pipeline. If you are running containerised workloads, nvidia-smi also works inside Docker GPU containers.
NVIDIA DCGM Exporter for Prometheus
NVIDIA Data Center GPU Manager (DCGM) exposes detailed GPU metrics in Prometheus format. This is especially important for multi-GPU cluster environments where tracking individual card health is critical. Install the DCGM exporter as a Docker container or systemd service:
# Run DCGM exporter as a Docker container
docker run -d --gpus all --rm \
-p 9400:9400 \
--name dcgm-exporter \
nvcr.io/nvidia/k8s/dcgm-exporter:3.3.5-3.4.1-ubuntu22.04
# Verify metrics are being exported
curl -s localhost:9400/metrics | head -20
# Check specific metrics
curl -s localhost:9400/metrics | grep DCGM_FI_DEV_GPU_UTIL
curl -s localhost:9400/metrics | grep DCGM_FI_DEV_FB_USED
Alternatively, install DCGM natively:
# Install DCGM packages
sudo apt install -y datacenter-gpu-manager
# Start DCGM service
sudo systemctl enable --now nvidia-dcgm
# Query GPU health
dcgmi discovery -l
dcgmi diag -r 1
Set Up Prometheus for GPU Metrics
Install Prometheus to scrape and store GPU metrics from DCGM exporter:
# Download and install Prometheus
cd /opt
sudo wget https://github.com/prometheus/prometheus/releases/download/v2.51.0/prometheus-2.51.0.linux-amd64.tar.gz
sudo tar xzf prometheus-2.51.0.linux-amd64.tar.gz
sudo mv prometheus-2.51.0.linux-amd64 prometheus
Configure Prometheus to scrape the DCGM exporter and (optionally) your vLLM metrics endpoint:
# /opt/prometheus/prometheus.yml
global:
scrape_interval: 15s
evaluation_interval: 15s
scrape_configs:
- job_name: 'dcgm-exporter'
static_configs:
- targets: ['localhost:9400']
- job_name: 'vllm'
static_configs:
- targets: ['localhost:8000']
metrics_path: /metrics
- job_name: 'node-exporter'
static_configs:
- targets: ['localhost:9100']
Create a systemd service for Prometheus:
# /etc/systemd/system/prometheus.service
[Unit]
Description=Prometheus Monitoring
After=network.target
[Service]
Type=simple
User=prometheus
ExecStart=/opt/prometheus/prometheus \
--config.file=/opt/prometheus/prometheus.yml \
--storage.tsdb.path=/opt/prometheus/data \
--storage.tsdb.retention.time=30d
Restart=always
[Install]
WantedBy=multi-user.target
# Create user and start service
sudo useradd --system --no-create-home prometheus
sudo chown -R prometheus:prometheus /opt/prometheus
sudo systemctl daemon-reload
sudo systemctl enable --now prometheus
# Verify Prometheus is scraping
curl -s localhost:9090/api/v1/targets | python3 -m json.tool | head -30
If you are serving models with vLLM, the vLLM metrics endpoint provides token throughput and KV cache utilisation. See our vLLM memory optimisation guide for tuning based on these metrics.
Grafana GPU Dashboards
Install Grafana to visualise your GPU metrics:
# Install Grafana
sudo apt install -y apt-transport-https software-properties-common
wget -q -O - https://apt.grafana.com/gpg.key | sudo apt-key add -
echo "deb https://apt.grafana.com stable main" | sudo tee /etc/apt/sources.list.d/grafana.list
sudo apt update
sudo apt install -y grafana
# Start Grafana
sudo systemctl enable --now grafana-server
Access Grafana at http://your-server:3000 (default credentials: admin/admin). Add Prometheus as a data source, then import the NVIDIA DCGM dashboard:
# Import dashboard via API
curl -X POST http://admin:admin@localhost:3000/api/datasources \
-H "Content-Type: application/json" \
-d '{
"name": "Prometheus",
"type": "prometheus",
"url": "http://localhost:9090",
"access": "proxy",
"isDefault": true
}'
# Import the NVIDIA DCGM dashboard (ID: 12239)
curl -X POST http://admin:admin@localhost:3000/api/dashboards/import \
-H "Content-Type: application/json" \
-d '{
"dashboard": {"id": null, "uid": null, "title": "NVIDIA DCGM"},
"overwrite": true,
"inputs": [{"name": "DS_PROMETHEUS", "type": "datasource", "pluginId": "prometheus", "value": "Prometheus"}],
"folderId": 0,
"pluginId": "grafana-simple-json-datasource"
}'
Custom Monitoring Scripts
For lightweight monitoring without a full stack, use this Python script to log GPU stats. This is a good approach if you are hosting a private AI deployment and want simple observability without external services:
#!/usr/bin/env python3
# gpu_monitor.py — Log GPU stats to CSV
import subprocess
import csv
import time
from datetime import datetime
LOG_FILE = "/var/log/gpu_monitor.csv"
INTERVAL = 10 # seconds
def get_gpu_stats():
result = subprocess.run(
["nvidia-smi", "--query-gpu=index,name,temperature.gpu,utilization.gpu,"
"utilization.memory,memory.used,memory.total,power.draw,power.limit",
"--format=csv,noheader,nounits"],
capture_output=True, text=True
)
return result.stdout.strip().split("\n")
def main():
with open(LOG_FILE, "a", newline="") as f:
writer = csv.writer(f)
while True:
timestamp = datetime.now().isoformat()
for line in get_gpu_stats():
values = [v.strip() for v in line.split(",")]
writer.writerow([timestamp] + values)
f.flush()
time.sleep(INTERVAL)
if __name__ == "__main__":
main()
# Run as a systemd service
sudo tee /etc/systemd/system/gpu-monitor.service > /dev/null << 'EOF'
[Unit]
Description=GPU Monitor Script
After=network.target
[Service]
Type=simple
ExecStart=/usr/bin/python3 /opt/gpu_monitor.py
Restart=always
[Install]
WantedBy=multi-user.target
EOF
sudo systemctl daemon-reload
sudo systemctl enable --now gpu-monitor
Alerting on GPU Anomalies
Set up Prometheus alerting rules to get notified when GPU metrics cross thresholds. Alerting is essential for auto-scaling inference architectures where scaling decisions depend on real-time GPU data. Also pair alerting with the API security layer to detect abuse patterns:
# /opt/prometheus/alert_rules.yml
groups:
- name: gpu_alerts
rules:
- alert: GPUHighTemperature
expr: DCGM_FI_DEV_GPU_TEMP > 85
for: 5m
labels:
severity: warning
annotations:
summary: "GPU {{ $labels.gpu }} temperature above 85°C"
- alert: GPUMemoryExhausted
expr: DCGM_FI_DEV_FB_USED / DCGM_FI_DEV_FB_FREE > 0.95
for: 2m
labels:
severity: critical
annotations:
summary: "GPU {{ $labels.gpu }} VRAM usage above 95%"
- alert: GPUUtilizationLow
expr: DCGM_FI_DEV_GPU_UTIL < 10
for: 15m
labels:
severity: info
annotations:
summary: "GPU {{ $labels.gpu }} utilization below 10% for 15 minutes"
# Add alert rules to Prometheus config
# In /opt/prometheus/prometheus.yml, add:
rule_files:
- "alert_rules.yml"
# Restart Prometheus
sudo systemctl restart prometheus
Proper monitoring helps you right-size your GPU allocation and identify when it is time to scale. For performance testing, see our GPU benchmarking guide. Compare performance across different hardware using the tokens per second benchmark. For cost analysis, check the cost per million tokens calculator. Browse all infrastructure guides in the AI hosting and infrastructure category.
Full Visibility Into Your GPU Infrastructure
GigaGPU dedicated servers include IPMI access and full root control for complete monitoring flexibility. Deploy Prometheus, Grafana, and DCGM on high-performance NVIDIA hardware.
Browse GPU Servers