Home / Blog / Tutorials / ELK Stack for AI Inference Logging

Tutorials

ELK Stack for AI Inference Logging

Complete guide to setting up the ELK stack for AI inference logging covering Elasticsearch indexing, Logstash pipelines, Kibana dashboards, structured logging, and debugging model issues on GPU servers.

Tutorials April 16, 2026 3 min read gigagpu

You will set up the ELK stack (Elasticsearch, Logstash, Kibana) to capture, index, and visualise logs from your AI inference pipeline. By the end, you will have structured logging on your GPU server that lets you debug inference failures, track request patterns, and monitor model behaviour across your entire stack.

Logging Architecture

Component	Role	Port
Application	Structured JSON logs	—
Filebeat	Log collection and shipping	—
Logstash	Parse, transform, enrich	5044
Elasticsearch	Index and search	9200
Kibana	Dashboards and exploration	5601

Structured Logging in Python

Emit structured JSON logs from your inference server so that the ELK stack can parse fields automatically.

import logging
import json
import time
from datetime import datetime

class JSONFormatter(logging.Formatter):
    def format(self, record):
        log_data = {
            "timestamp": datetime.utcnow().isoformat(),
            "level": record.levelname,
            "service": "inference-api",
            "message": record.getMessage(),
        }
        if hasattr(record, "request_id"):
            log_data["request_id"] = record.request_id
        if hasattr(record, "model"):
            log_data["model"] = record.model
        if hasattr(record, "tokens"):
            log_data["tokens"] = record.tokens
        if hasattr(record, "latency_ms"):
            log_data["latency_ms"] = record.latency_ms
        return json.dumps(log_data)

logger = logging.getLogger("inference")
handler = logging.FileHandler("/var/log/inference/api.json")
handler.setFormatter(JSONFormatter())
logger.addHandler(handler)
logger.setLevel(logging.INFO)

# Log inference requests
def log_inference(request_id, model, prompt_tokens, completion_tokens, latency):
    logger.info(
        "Inference completed",
        extra={
            "request_id": request_id,
            "model": model,
            "tokens": prompt_tokens + completion_tokens,
            "latency_ms": round(latency * 1000, 2)
        }
    )

For the inference server itself, see the FastAPI server guide or the Flask API guide.

ELK Stack Setup

Deploy the stack with Docker Compose on your GPU server or a separate monitoring server.

# docker-compose.yml
version: "3.8"
services:
  elasticsearch:
    image: docker.elastic.co/elasticsearch/elasticsearch:8.12.0
    environment:
      - discovery.type=single-node
      - xpack.security.enabled=false
      - "ES_JAVA_OPTS=-Xms1g -Xmx1g"
    ports:
      - "9200:9200"
    volumes:
      - es_data:/usr/share/elasticsearch/data

  logstash:
    image: docker.elastic.co/logstash/logstash:8.12.0
    ports:
      - "5044:5044"
    volumes:
      - ./logstash.conf:/usr/share/logstash/pipeline/logstash.conf
    depends_on:
      - elasticsearch

  kibana:
    image: docker.elastic.co/kibana/kibana:8.12.0
    ports:
      - "5601:5601"
    environment:
      - ELASTICSEARCH_HOSTS=http://elasticsearch:9200
    depends_on:
      - elasticsearch

volumes:
  es_data:

Logstash Pipeline

Configure Logstash to parse JSON logs and enrich them with GPU server metadata.

# logstash.conf
input {
  beats {
    port => 5044
  }
}

filter {
  json {
    source => "message"
    target => "inference"
  }

  if [inference][latency_ms] {
    mutate {
      convert => { "[inference][latency_ms]" => "float" }
      convert => { "[inference][tokens]" => "integer" }
    }
  }

  mutate {
    add_field => { "gpu_server" => "%{[host][name]}" }
  }
}

output {
  elasticsearch {
    hosts => ["http://elasticsearch:9200"]
    index => "inference-logs-%{+YYYY.MM.dd}"
  }
}

Filebeat Configuration

# filebeat.yml
filebeat.inputs:
  - type: log
    enabled: true
    paths:
      - /var/log/inference/*.json
    json.keys_under_root: true
    json.add_error_key: true

  - type: log
    enabled: true
    paths:
      - /var/log/vllm/*.log
    fields:
      service: vllm

output.logstash:
  hosts: ["logstash-server:5044"]

Kibana Dashboards

Build dashboards in Kibana for inference observability. Key visualisations include:

Request volume over time — line chart of requests per minute.
Latency distribution — histogram of inference.latency_ms to spot slow requests.
Error rate — percentage of failed inference requests by model and error type.
Token throughput — sum of inference.tokens per time period.
Model usage breakdown — pie chart by inference.model field.

For metrics-based monitoring alongside logs, see the Prometheus and Grafana guide. For webhook notifications on errors, check the webhook integration guide. The self-hosting guide covers infrastructure, and our tutorials section has more observability patterns. Set up the backend with the vLLM production guide. For container logging in Kubernetes, see the GPU pod guide.

Monitor AI Inference with ELK on Dedicated GPUs

Deploy full logging and observability on bare-metal GPU servers. Debug inference issues, track model behaviour.

Browse GPU Servers

Need a Dedicated GPU Server?

Deploy from RTX 3050 to RTX 5090. Full root access, NVMe storage, 1Gbps — UK datacenter.

Browse GPU Servers

Tutorials

gigagpu

We benchmark, deploy, and optimise GPU infrastructure for AI workloads. All data in our guides comes from real-world testing on our UK-based dedicated GPU servers.

Ready to deploy your AI workload?

Dedicated GPU servers from our UK datacenter. NVMe storage, 1Gbps networking, full root access.

Browse GPU Servers Contact Sales

ELK Stack for AI Inference Logging

Logging Architecture

Structured Logging in Python

ELK Stack Setup

Logstash Pipeline

Filebeat Configuration

Kibana Dashboards

Monitor AI Inference with ELK on Dedicated GPUs

Need a Dedicated GPU Server?

gigagpu

GPU Hosting

Blog Categories

AI Model Hosting

Benchmarks & Tools

Deploy a GPU Server

Ready to deploy your AI workload?

Have a question? Need help?

ELK Stack for AI Inference Logging

Logging Architecture

Structured Logging in Python

ELK Stack Setup

Logstash Pipeline

Filebeat Configuration

Kibana Dashboards

Monitor AI Inference with ELK on Dedicated GPUs

Need a Dedicated GPU Server?

gigagpu

Related Articles

vLLM Continuous Batching Tuning Guide

Fine-Tuning VRAM Calculator: How Much Do You Need?

Social Media Bot: LLM + Image Gen

Stable Diffusion Out of Memory: GPU Fix

GPU Hosting

Blog Categories

AI Model Hosting

Benchmarks & Tools

Deploy a GPU Server

Ready to deploy your AI workload?

Have a question? Need help? Contact us

Have a question? Need help?