RTX 3050 - Order Now
Home / Blog / Tutorials / nvidia-smi Deep Dive for GPU Server Operators
Tutorials

nvidia-smi Deep Dive for GPU Server Operators

Beyond the default dashboard view, nvidia-smi has subcommands for process listings, ECC status, topology, and continuous logging.

Every dedicated GPU operator knows nvidia-smi. Most use only the default summary view. The tool has far more capability – process tracking, ECC error reporting, topology queries, and scriptable output. On our dedicated GPU hosting these less-used modes are worth knowing.

Contents

Process Listing

nvidia-smi pmon -c 1

Shows per-process GPU usage: PID, process name, utilisation, memory. Useful for finding which vLLM replica is using which GPU on a multi-process server.

nvidia-smi --query-compute-apps=pid,process_name,gpu_bus_id,used_memory --format=csv

Machine-Readable

For scripting or monitoring:

nvidia-smi --query-gpu=name,temperature.gpu,utilization.gpu,memory.used,memory.total \
  --format=csv,noheader,nounits

Output is parsable CSV. Good for one-off checks without setting up Prometheus.

Topology

nvidia-smi topo -m

Shows how GPUs connect – PCIe root complex, NUMA node, interconnect type. Critical for multi-GPU tensor-parallel setups: two GPUs on the same NUMA node communicate faster than cross-socket pairs.

Continuous Logging

nvidia-smi dmon -s u,m,p,t -c 300 > gpu-log.csv

Samples utilisation, memory, power, and temperature every second for 300 samples. Useful for post-mortem on a load test or identifying thermal throttling.

Flags:

  • -s u: utilisation
  • -s m: memory
  • -s p: power
  • -s t: temperature
  • -s c: clocks
  • -s e: ECC errors

GPU Server Tooling Ready

Preinstalled nvidia-smi, DCGM, and Prometheus on UK dedicated GPU hosting.

Browse GPU Servers

See DCGM Exporter and GPU power management.

Need a Dedicated GPU Server?

Deploy from RTX 3050 to RTX 5090. Full root access, NVMe storage, 1Gbps — UK datacenter.

Browse GPU Servers

admin

We benchmark, deploy, and optimise GPU infrastructure for AI workloads. All data in our guides comes from real-world testing on our UK-based dedicated GPU servers.

Ready to deploy your AI workload?

Dedicated GPU servers from our UK datacenter. NVMe storage, 1Gbps networking, full root access.

Browse GPU Servers Contact Sales

Have a question? Need help?