Home / Blog / Tutorials / Feedback Analyser with LLM and Embeddings

Tutorials

Feedback Analyser with LLM and Embeddings

Build a feedback analysis pipeline that clusters customer feedback using embeddings, identifies themes with an LLM, and surfaces actionable insights on a dedicated GPU server.

Tutorials April 16, 2026 3 min read gigagpu

You will build a pipeline that takes thousands of customer feedback entries (surveys, support tickets, app reviews), converts them to embeddings for clustering, uses an LLM to label each cluster with a human-readable theme, and produces a prioritised report of customer concerns. The end result: instead of reading 5,000 feedback entries manually, your product team gets “Top 10 customer themes this month” with representative quotes and trend data. Here is the pipeline on dedicated GPU infrastructure.

Pipeline Architecture

Stage	Tool	Purpose
1. Embedding	BGE-large-en-v1.5	Convert feedback to vectors
2. Clustering	HDBSCAN	Group similar feedback
3. Theme labelling	LLaMA 3.1 8B	Name each cluster
4. Insight extraction	LLaMA 3.1 8B	Prioritised recommendations

Stage 1: Feedback Embedding

from sentence_transformers import SentenceTransformer
import numpy as np

model = SentenceTransformer("BAAI/bge-large-en-v1.5", device="cuda")

def embed_feedback(feedback_list: list) -> np.ndarray:
    embeddings = model.encode(
        feedback_list, batch_size=64,
        show_progress_bar=True, normalize_embeddings=True
    )
    return embeddings

# Embed 5,000 feedback entries (~30 seconds on GPU)
feedback_texts = load_feedback_from_db()
embeddings = embed_feedback(feedback_texts)

GPU-accelerated embedding processes thousands of entries in seconds. Store embeddings in ChromaDB or Qdrant for persistent vector storage and retrieval.

Stage 2: Semantic Clustering

import hdbscan
from sklearn.decomposition import UMAP

# Reduce dimensions for clustering
reducer = UMAP(n_components=15, metric="cosine")
reduced = reducer.fit_transform(embeddings)

# Cluster similar feedback
clusterer = hdbscan.HDBSCAN(min_cluster_size=10, min_samples=5)
labels = clusterer.fit_predict(reduced)

# Group feedback by cluster
clusters = {}
for idx, label in enumerate(labels):
    if label == -1:  # Noise
        continue
    if label not in clusters:
        clusters[label] = []
    clusters[label].append(feedback_texts[idx])

print(f"Found {len(clusters)} distinct themes from {len(feedback_texts)} entries")

Stage 3: Theme Labelling

from openai import OpenAI
client = OpenAI(base_url="http://localhost:8000/v1", api_key="none")

def label_cluster(feedback_samples: list) -> dict:
    sample_text = "\n- ".join(feedback_samples[:15])
    response = client.chat.completions.create(
        model="meta-llama/Llama-3.1-8B-Instruct",
        messages=[{
            "role": "system",
            "content": """Analyse these customer feedback entries that share a common theme.
Return JSON: {"theme": "short descriptive name",
"description": "2-sentence description of the theme",
"sentiment": "positive|negative|mixed",
"severity": "critical|high|medium|low",
"representative_quote": "best example from the samples",
"recommendation": "suggested action"}"""
        }, {"role": "user", "content": f"Feedback entries:\n- {sample_text}"}],
        max_tokens=300, temperature=0.1
    )
    return parse_json(response.choices[0].message.content)

The vLLM server labels each cluster. The LLM understands context better than keyword extraction, producing themes like “Mobile checkout timeout on slow connections” rather than generic “checkout issues”.

Insight Report Generation

def generate_report(themes: list) -> dict:
    # Sort by severity and cluster size
    prioritised = sorted(themes, key=lambda t: (
        {"critical": 0, "high": 1, "medium": 2, "low": 3}[t["severity"]],
        -t["count"]
    ))
    report = {
        "total_feedback": len(feedback_texts),
        "themes_found": len(themes),
        "top_issues": prioritised[:10],
        "positive_themes": [t for t in themes if t["sentiment"] == "positive"],
        "trend_comparison": compare_with_previous_period(themes)
    }
    return report

Production Deployment

For production: schedule weekly analysis runs; track theme trends over time to measure whether fixes reduced complaint volume; integrate with product management tools to auto-create tickets from critical themes; and add RAG search so team members can ask natural language questions about feedback. Deploy on private infrastructure to keep customer feedback confidential. See model options for larger models, chatbot hosting for feedback Q&A, more tutorials, and analytics use cases.

Analytics GPU Servers

Dedicated GPU servers for feedback analysis and embedding pipelines. Process customer data on isolated UK infrastructure.

Browse GPU Servers

Need a Dedicated GPU Server?

Deploy from RTX 3050 to RTX 5090. Full root access, NVMe storage, 1Gbps — UK datacenter.

Browse GPU Servers

Tutorials

gigagpu

We benchmark, deploy, and optimise GPU infrastructure for AI workloads. All data in our guides comes from real-world testing on our UK-based dedicated GPU servers.

Ready to deploy your AI workload?

Dedicated GPU servers from our UK datacenter. NVMe storage, 1Gbps networking, full root access.

Browse GPU Servers Contact Sales

Feedback Analyser with LLM and Embeddings

Pipeline Architecture

Stage 1: Feedback Embedding

Stage 2: Semantic Clustering

Stage 3: Theme Labelling

Insight Report Generation

Production Deployment

Analytics GPU Servers

Need a Dedicated GPU Server?

gigagpu

GPU Hosting

Blog Categories

AI Model Hosting

Benchmarks & Tools

Deploy a GPU Server

Ready to deploy your AI workload?

Have a question? Need help?

Feedback Analyser with LLM and Embeddings

Pipeline Architecture

Stage 1: Feedback Embedding

Stage 2: Semantic Clustering

Stage 3: Theme Labelling

Insight Report Generation

Production Deployment

Analytics GPU Servers

Need a Dedicated GPU Server?

gigagpu

Related Articles

Redis Vector vs ChromaDB: In-Memory vs Persistent

Open Interpreter on a Dedicated GPU

Migrate from AWS Bedrock to Dedicated GPU: Real-Time Inference Guide

Embedding Model Retraining Cadence

GPU Hosting

Blog Categories

AI Model Hosting

Benchmarks & Tools

Deploy a GPU Server

Ready to deploy your AI workload?

Have a question? Need help? Contact us

Have a question? Need help?