RTX 3050 - Order Now
Home / Blog / Tutorials / Connect DataDog to Monitor GPU AI Infrastructure
Tutorials

Connect DataDog to Monitor GPU AI Infrastructure

Monitor your GPU AI inference infrastructure with Datadog. This guide covers the Datadog Agent setup, GPU metrics collection via NVML, custom dashboards for inference latency, and alerting on GPU utilisation thresholds for your self-hosted LLM servers.

What You’ll Connect

After this guide, your Datadog dashboard will display real-time GPU metrics, inference latency, and throughput data from your self-hosted AI servers — giving you full observability over your dedicated GPU infrastructure. Alerts fire when GPU memory approaches capacity, inference latency spikes, or your vLLM server stops responding.

The integration installs the Datadog Agent on your GPU server, enables the NVIDIA GPU integration for hardware metrics, and adds custom metrics from your inference endpoint. This gives operations teams a single pane of glass for monitoring AI workloads.

Datadog Agent –> Datadog Platform –> Dashboard + Alerts | | | NVML metrics Ships metrics Visualises GPU (GPU util, mem, every 15 seconds utilisation, temps, temperature) inference throughput | | | vLLM /metrics –> Custom check –> Latency, queue depth, endpoint scrapes stats requests per second –>

Prerequisites

  • A GigaGPU server running an LLM on vLLM or Ollama (setup guide)
  • A Datadog account with an API key (free trial works for testing)
  • SSH access to your GPU server for installing the Datadog Agent
  • NVIDIA drivers installed with NVML library available (standard on CUDA installations)

Integration Steps

Install the Datadog Agent on your GPU server using the one-line installer from the Datadog platform. The agent runs as a system service and begins shipping host-level metrics (CPU, memory, disk, network) immediately.

Enable the NVIDIA GPU integration by creating a configuration file at /etc/datadog-agent/conf.d/nvidia.d/conf.yaml. The integration uses NVML to collect GPU-specific metrics: utilisation percentage, memory used and total, temperature, power draw, and clock speeds. Restart the agent to begin collection.

For inference-specific metrics, configure a custom check that scrapes your vLLM server’s metrics endpoint or queries the /v1/models endpoint. Track request count, average latency, queue depth, and tokens generated per second. These application-level metrics complement the hardware metrics from NVML.

Code Example

Datadog configuration files for monitoring your AI inference server:

# /etc/datadog-agent/conf.d/nvidia.d/conf.yaml
instances:
  - {}

# /etc/datadog-agent/conf.d/openmetrics.d/conf.yaml
# Scrape vLLM Prometheus metrics endpoint
instances:
  - openmetrics_endpoint: http://localhost:8000/metrics
    namespace: vllm
    metrics:
      - vllm:num_requests_running
      - vllm:num_requests_waiting
      - vllm:gpu_cache_usage_perc
      - vllm:avg_generation_throughput_toks_per_s
      - vllm:request_success_total
      - vllm:e2e_request_latency_seconds

# Custom Datadog Monitor (via API or UI):
# {
#   "name": "GPU Memory > 90%",
#   "type": "metric alert",
#   "query": "avg(last_5m):avg:nvidia.gpu.mem.used{host:inference-1} / avg:nvidia.gpu.mem.total{host:inference-1} * 100 > 90",
#   "message": "GPU memory usage exceeded 90% on {{host.name}}. Risk of OOM. @pagerduty",
#   "thresholds": { "critical": 90, "warning": 80 }
# }

# Inference latency monitor:
# {
#   "name": "Inference Latency > 5s",
#   "type": "metric alert",
#   "query": "avg(last_5m):avg:vllm.e2e_request_latency_seconds{*} > 5",
#   "message": "Average inference latency exceeded 5 seconds. Check GPU load. @ops-team"
# }

Testing Your Integration

After installing the agent and enabling integrations, check the Datadog agent status: sudo datadog-agent status. Look for the nvidia and openmetrics checks under “Running Checks.” Navigate to your Datadog dashboard and verify GPU metrics appear under the Infrastructure section.

Send a few inference requests to your GPU server and confirm the vLLM metrics update in Datadog within 30 seconds. Create a test monitor with a low threshold to verify alert delivery to your notification channel.

Production Tips

Build a dedicated dashboard combining GPU hardware metrics (utilisation, memory, temperature) with inference application metrics (latency, throughput, queue depth). This correlation view helps diagnose whether performance issues stem from GPU saturation or application-level bottlenecks.

Set up anomaly detection monitors rather than static thresholds for inference latency. This accounts for natural variations in request complexity and alerts only on genuine degradation patterns. Pair with a secure endpoint health check that pings the model every 60 seconds.

For teams running multiple open-source model servers, Datadog’s tagging system lets you filter by model name, GPU type, and server role. This scales monitoring across your entire GPU fleet without duplicating dashboards. Browse more tutorials or get started with GigaGPU to monitor your AI infrastructure professionally.

Need a Dedicated GPU Server?

Deploy from RTX 3050 to RTX 5090. Full root access, NVMe storage, 1Gbps — UK datacenter.

Browse GPU Servers

admin

We benchmark, deploy, and optimise GPU infrastructure for AI workloads. All data in our guides comes from real-world testing on our UK-based dedicated GPU servers.

Ready to deploy your AI workload?

Dedicated GPU servers from our UK datacenter. NVMe storage, 1Gbps networking, full root access.

Browse GPU Servers Contact Sales

Have a question? Need help?