Home / Blog / Tutorials / AI Runtime Tracing with OpenTelemetry

Tutorials

AI Runtime Tracing with OpenTelemetry

OpenTelemetry instrumentation for AI applications — traces from gateway through embeddings, retrieval, LLM, response.

Tutorials May 6, 2026 2 min read gigagpu

Table of Contents

OpenTelemetry is the standard distributed tracing framework. For AI applications with multiple service hops (gateway → embeddings → vector store → LLM → response), traces are essential for diagnosing latency issues. AI-specific span attributes capture model + tokens + cost per hop.

TL;DR

Add OTel SDK to your AI app; instrument each service hop as a span; ship to Jaeger / Honeycomb / Grafana Tempo. AI-specific attributes: model, prompt_tokens, completion_tokens, cost_usd. One trace per request from gateway to response. Diagnoses latency / cost / failure root causes in seconds.

Why OTel

Distributed traces: a slow request traverses gateway / vector / LLM — trace shows where time was spent
Standard format: any compatible backend (Jaeger, Honeycomb, Datadog, Grafana Tempo)
Vendor neutrality: switch backends without code changes
AI-specific attributes: capture model / tokens / cost per span
Sampling: trace 1-10% of requests; full coverage on errors

Setup

from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter

provider = TracerProvider()
provider.add_span_processor(BatchSpanProcessor(OTLPSpanExporter()))
trace.set_tracer_provider(provider)

tracer = trace.get_tracer(__name__)

with tracer.start_as_current_span("rag.query") as span:
    span.set_attribute("user_id", user_id)

    with tracer.start_as_current_span("embed"):
        emb = embed_query(query)

    with tracer.start_as_current_span("retrieve") as r:
        r.set_attribute("k", 10)
        chunks = vector_store.search(emb, k=10)

    with tracer.start_as_current_span("llm.generate") as l:
        l.set_attribute("model", "llama-3.1-8b-fp8")
        l.set_attribute("prompt_tokens", count_tokens(prompt))
        response = llm.generate(prompt)
        l.set_attribute("completion_tokens", count_tokens(response))

AI-specific spans

Per-span attributes for AI workloads:

model: which model was called
prompt_tokens, completion_tokens
cost_usd or cost_gbp per call
cache_hit: prefix or semantic
fallback: was hosted-API fallback used?
tenant_id, feature_id, request_id

Verdict

For production AI applications with multi-service hops, OpenTelemetry tracing is essential. Standard format + AI-specific attributes + flexible backends. Setup is ~half-day; the value during incident response and performance debugging is decisive. Build day-one of production deployment.

Bottom line

OTel + AI-specific attributes. See obs stack.

Need a Dedicated GPU Server?

Deploy from RTX 3050 to RTX 5090. Full root access, NVMe storage, 1Gbps — UK datacenter.

Browse GPU Servers

Tutorials

gigagpu

We benchmark, deploy, and optimise GPU infrastructure for AI workloads. All data in our guides comes from real-world testing on our UK-based dedicated GPU servers.

Ready to deploy your AI workload?

Dedicated GPU servers from our UK datacenter. NVMe storage, 1Gbps networking, full root access.

Browse GPU Servers Contact Sales

AI Runtime Tracing with OpenTelemetry

Why OTel

Setup

AI-specific spans

Verdict

Bottom line

Need a Dedicated GPU Server?

gigagpu

GPU Hosting

Blog Categories

AI Model Hosting

Benchmarks & Tools

Deploy a GPU Server

Ready to deploy your AI workload?

Have a question? Need help?

AI Runtime Tracing with OpenTelemetry

Why OTel

Setup

AI-specific spans

Verdict

Bottom line

Need a Dedicated GPU Server?

gigagpu

Related Articles

Migrate from Together.ai to Dedicated GPU: Batch Processing

Semantic Kernel vs LangChain

Ollama Slow on GPU: Speed Optimization

GPU Memory Leak: Detecting and Fixing VRAM Leaks

GPU Hosting

Blog Categories

AI Model Hosting

Benchmarks & Tools

Deploy a GPU Server

Ready to deploy your AI workload?

Have a question? Need help? Contact us

Have a question? Need help?