What You’ll Connect
After this guide, your Datadog dashboard will display real-time GPU metrics, inference latency, and throughput data from your self-hosted AI servers — giving you full observability over your dedicated GPU infrastructure. Alerts fire when GPU memory approaches capacity, inference latency spikes, or your vLLM server stops responding.
The integration installs the Datadog Agent on your GPU server, enables the NVIDIA GPU integration for hardware metrics, and adds custom metrics from your inference endpoint. This gives operations teams a single pane of glass for monitoring AI workloads.
Datadog Agent –> Datadog Platform –> Dashboard + Alerts | | | NVML metrics Ships metrics Visualises GPU (GPU util, mem, every 15 seconds utilisation, temps, temperature) inference throughput | | | vLLM /metrics –> Custom check –> Latency, queue depth, endpoint scrapes stats requests per second –>Prerequisites
- A GigaGPU server running an LLM on vLLM or Ollama (setup guide)
- A Datadog account with an API key (free trial works for testing)
- SSH access to your GPU server for installing the Datadog Agent
- NVIDIA drivers installed with NVML library available (standard on CUDA installations)
Integration Steps
Install the Datadog Agent on your GPU server using the one-line installer from the Datadog platform. The agent runs as a system service and begins shipping host-level metrics (CPU, memory, disk, network) immediately.
Enable the NVIDIA GPU integration by creating a configuration file at /etc/datadog-agent/conf.d/nvidia.d/conf.yaml. The integration uses NVML to collect GPU-specific metrics: utilisation percentage, memory used and total, temperature, power draw, and clock speeds. Restart the agent to begin collection.
For inference-specific metrics, configure a custom check that scrapes your vLLM server’s metrics endpoint or queries the /v1/models endpoint. Track request count, average latency, queue depth, and tokens generated per second. These application-level metrics complement the hardware metrics from NVML.
Code Example
Datadog configuration files for monitoring your AI inference server:
# /etc/datadog-agent/conf.d/nvidia.d/conf.yaml
instances:
- {}
# /etc/datadog-agent/conf.d/openmetrics.d/conf.yaml
# Scrape vLLM Prometheus metrics endpoint
instances:
- openmetrics_endpoint: http://localhost:8000/metrics
namespace: vllm
metrics:
- vllm:num_requests_running
- vllm:num_requests_waiting
- vllm:gpu_cache_usage_perc
- vllm:avg_generation_throughput_toks_per_s
- vllm:request_success_total
- vllm:e2e_request_latency_seconds
# Custom Datadog Monitor (via API or UI):
# {
# "name": "GPU Memory > 90%",
# "type": "metric alert",
# "query": "avg(last_5m):avg:nvidia.gpu.mem.used{host:inference-1} / avg:nvidia.gpu.mem.total{host:inference-1} * 100 > 90",
# "message": "GPU memory usage exceeded 90% on {{host.name}}. Risk of OOM. @pagerduty",
# "thresholds": { "critical": 90, "warning": 80 }
# }
# Inference latency monitor:
# {
# "name": "Inference Latency > 5s",
# "type": "metric alert",
# "query": "avg(last_5m):avg:vllm.e2e_request_latency_seconds{*} > 5",
# "message": "Average inference latency exceeded 5 seconds. Check GPU load. @ops-team"
# }
Testing Your Integration
After installing the agent and enabling integrations, check the Datadog agent status: sudo datadog-agent status. Look for the nvidia and openmetrics checks under “Running Checks.” Navigate to your Datadog dashboard and verify GPU metrics appear under the Infrastructure section.
Send a few inference requests to your GPU server and confirm the vLLM metrics update in Datadog within 30 seconds. Create a test monitor with a low threshold to verify alert delivery to your notification channel.
Production Tips
Build a dedicated dashboard combining GPU hardware metrics (utilisation, memory, temperature) with inference application metrics (latency, throughput, queue depth). This correlation view helps diagnose whether performance issues stem from GPU saturation or application-level bottlenecks.
Set up anomaly detection monitors rather than static thresholds for inference latency. This accounts for natural variations in request complexity and alerts only on genuine degradation patterns. Pair with a secure endpoint health check that pings the model every 60 seconds.
For teams running multiple open-source model servers, Datadog’s tagging system lets you filter by model name, GPU type, and server role. This scales monitoring across your entire GPU fleet without duplicating dashboards. Browse more tutorials or get started with GigaGPU to monitor your AI infrastructure professionally.