Table of Contents
Most teams ship AI without analytics. The first time the bill spikes is also the first time anyone looks at the data.
Three layers: infrastructure metrics (Prometheus + DCGM), request analytics (LiteLLM or custom), business analytics (per-tenant cost, usage patterns). Build all three before launch.
Three analytics layers
- Infra: GPU util, VRAM, temperature, power
- Request: per-request tokens, latency, cost, error class
- Business: per-tenant cost, top users, top prompts, model split
Dashboards
- Real-time ops: TTFT p99, queue depth, GPU mem util
- Daily cost: tokens-per-tenant, cost-per-tenant, % vs budget
- Weekly trends: usage growth, error rate trend, cache hit rate
- Monthly review: top 10 prompts, cost outliers, fine-tune candidates
Verdict
If you can't see your usage, you can't optimise. Analytics is half of running production AI.
Bottom line
Build the dashboards before launch. See monitoring guide.