RTX 3050 - Order Now
Home / Blog / Tutorials / ELK Stack for AI Inference Logging
Tutorials

ELK Stack for AI Inference Logging

Complete guide to setting up the ELK stack for AI inference logging covering Elasticsearch indexing, Logstash pipelines, Kibana dashboards, structured logging, and debugging model issues on GPU servers.

You will set up the ELK stack (Elasticsearch, Logstash, Kibana) to capture, index, and visualise logs from your AI inference pipeline. By the end, you will have structured logging on your GPU server that lets you debug inference failures, track request patterns, and monitor model behaviour across your entire stack.

Logging Architecture

ComponentRolePort
ApplicationStructured JSON logs
FilebeatLog collection and shipping
LogstashParse, transform, enrich5044
ElasticsearchIndex and search9200
KibanaDashboards and exploration5601

Structured Logging in Python

Emit structured JSON logs from your inference server so that the ELK stack can parse fields automatically.

import logging
import json
import time
from datetime import datetime

class JSONFormatter(logging.Formatter):
    def format(self, record):
        log_data = {
            "timestamp": datetime.utcnow().isoformat(),
            "level": record.levelname,
            "service": "inference-api",
            "message": record.getMessage(),
        }
        if hasattr(record, "request_id"):
            log_data["request_id"] = record.request_id
        if hasattr(record, "model"):
            log_data["model"] = record.model
        if hasattr(record, "tokens"):
            log_data["tokens"] = record.tokens
        if hasattr(record, "latency_ms"):
            log_data["latency_ms"] = record.latency_ms
        return json.dumps(log_data)

logger = logging.getLogger("inference")
handler = logging.FileHandler("/var/log/inference/api.json")
handler.setFormatter(JSONFormatter())
logger.addHandler(handler)
logger.setLevel(logging.INFO)
# Log inference requests
def log_inference(request_id, model, prompt_tokens, completion_tokens, latency):
    logger.info(
        "Inference completed",
        extra={
            "request_id": request_id,
            "model": model,
            "tokens": prompt_tokens + completion_tokens,
            "latency_ms": round(latency * 1000, 2)
        }
    )

For the inference server itself, see the FastAPI server guide or the Flask API guide.

ELK Stack Setup

Deploy the stack with Docker Compose on your GPU server or a separate monitoring server.

# docker-compose.yml
version: "3.8"
services:
  elasticsearch:
    image: docker.elastic.co/elasticsearch/elasticsearch:8.12.0
    environment:
      - discovery.type=single-node
      - xpack.security.enabled=false
      - "ES_JAVA_OPTS=-Xms1g -Xmx1g"
    ports:
      - "9200:9200"
    volumes:
      - es_data:/usr/share/elasticsearch/data

  logstash:
    image: docker.elastic.co/logstash/logstash:8.12.0
    ports:
      - "5044:5044"
    volumes:
      - ./logstash.conf:/usr/share/logstash/pipeline/logstash.conf
    depends_on:
      - elasticsearch

  kibana:
    image: docker.elastic.co/kibana/kibana:8.12.0
    ports:
      - "5601:5601"
    environment:
      - ELASTICSEARCH_HOSTS=http://elasticsearch:9200
    depends_on:
      - elasticsearch

volumes:
  es_data:

Logstash Pipeline

Configure Logstash to parse JSON logs and enrich them with GPU server metadata.

# logstash.conf
input {
  beats {
    port => 5044
  }
}

filter {
  json {
    source => "message"
    target => "inference"
  }

  if [inference][latency_ms] {
    mutate {
      convert => { "[inference][latency_ms]" => "float" }
      convert => { "[inference][tokens]" => "integer" }
    }
  }

  mutate {
    add_field => { "gpu_server" => "%{[host][name]}" }
  }
}

output {
  elasticsearch {
    hosts => ["http://elasticsearch:9200"]
    index => "inference-logs-%{+YYYY.MM.dd}"
  }
}

Filebeat Configuration

# filebeat.yml
filebeat.inputs:
  - type: log
    enabled: true
    paths:
      - /var/log/inference/*.json
    json.keys_under_root: true
    json.add_error_key: true

  - type: log
    enabled: true
    paths:
      - /var/log/vllm/*.log
    fields:
      service: vllm

output.logstash:
  hosts: ["logstash-server:5044"]

Kibana Dashboards

Build dashboards in Kibana for inference observability. Key visualisations include:

  • Request volume over time — line chart of requests per minute.
  • Latency distribution — histogram of inference.latency_ms to spot slow requests.
  • Error rate — percentage of failed inference requests by model and error type.
  • Token throughput — sum of inference.tokens per time period.
  • Model usage breakdown — pie chart by inference.model field.

For metrics-based monitoring alongside logs, see the Prometheus and Grafana guide. For webhook notifications on errors, check the webhook integration guide. The self-hosting guide covers infrastructure, and our tutorials section has more observability patterns. Set up the backend with the vLLM production guide. For container logging in Kubernetes, see the GPU pod guide.

Monitor AI Inference with ELK on Dedicated GPUs

Deploy full logging and observability on bare-metal GPU servers. Debug inference issues, track model behaviour.

Browse GPU Servers

Need a Dedicated GPU Server?

Deploy from RTX 3050 to RTX 5090. Full root access, NVMe storage, 1Gbps — UK datacenter.

Browse GPU Servers

admin

We benchmark, deploy, and optimise GPU infrastructure for AI workloads. All data in our guides comes from real-world testing on our UK-based dedicated GPU servers.

Ready to deploy your AI workload?

Dedicated GPU servers from our UK datacenter. NVMe storage, 1Gbps networking, full root access.

Browse GPU Servers Contact Sales

Have a question? Need help?