You will build a pipeline that takes thousands of customer feedback entries (surveys, support tickets, app reviews), converts them to embeddings for clustering, uses an LLM to label each cluster with a human-readable theme, and produces a prioritised report of customer concerns. The end result: instead of reading 5,000 feedback entries manually, your product team gets “Top 10 customer themes this month” with representative quotes and trend data. Here is the pipeline on dedicated GPU infrastructure.
Pipeline Architecture
| Stage | Tool | Purpose |
|---|---|---|
| 1. Embedding | BGE-large-en-v1.5 | Convert feedback to vectors |
| 2. Clustering | HDBSCAN | Group similar feedback |
| 3. Theme labelling | LLaMA 3.1 8B | Name each cluster |
| 4. Insight extraction | LLaMA 3.1 8B | Prioritised recommendations |
Stage 1: Feedback Embedding
from sentence_transformers import SentenceTransformer
import numpy as np
model = SentenceTransformer("BAAI/bge-large-en-v1.5", device="cuda")
def embed_feedback(feedback_list: list) -> np.ndarray:
embeddings = model.encode(
feedback_list, batch_size=64,
show_progress_bar=True, normalize_embeddings=True
)
return embeddings
# Embed 5,000 feedback entries (~30 seconds on GPU)
feedback_texts = load_feedback_from_db()
embeddings = embed_feedback(feedback_texts)
GPU-accelerated embedding processes thousands of entries in seconds. Store embeddings in ChromaDB or Qdrant for persistent vector storage and retrieval.
Stage 2: Semantic Clustering
import hdbscan
from sklearn.decomposition import UMAP
# Reduce dimensions for clustering
reducer = UMAP(n_components=15, metric="cosine")
reduced = reducer.fit_transform(embeddings)
# Cluster similar feedback
clusterer = hdbscan.HDBSCAN(min_cluster_size=10, min_samples=5)
labels = clusterer.fit_predict(reduced)
# Group feedback by cluster
clusters = {}
for idx, label in enumerate(labels):
if label == -1: # Noise
continue
if label not in clusters:
clusters[label] = []
clusters[label].append(feedback_texts[idx])
print(f"Found {len(clusters)} distinct themes from {len(feedback_texts)} entries")
Stage 3: Theme Labelling
from openai import OpenAI
client = OpenAI(base_url="http://localhost:8000/v1", api_key="none")
def label_cluster(feedback_samples: list) -> dict:
sample_text = "\n- ".join(feedback_samples[:15])
response = client.chat.completions.create(
model="meta-llama/Llama-3.1-8B-Instruct",
messages=[{
"role": "system",
"content": """Analyse these customer feedback entries that share a common theme.
Return JSON: {"theme": "short descriptive name",
"description": "2-sentence description of the theme",
"sentiment": "positive|negative|mixed",
"severity": "critical|high|medium|low",
"representative_quote": "best example from the samples",
"recommendation": "suggested action"}"""
}, {"role": "user", "content": f"Feedback entries:\n- {sample_text}"}],
max_tokens=300, temperature=0.1
)
return parse_json(response.choices[0].message.content)
The vLLM server labels each cluster. The LLM understands context better than keyword extraction, producing themes like “Mobile checkout timeout on slow connections” rather than generic “checkout issues”.
Insight Report Generation
def generate_report(themes: list) -> dict:
# Sort by severity and cluster size
prioritised = sorted(themes, key=lambda t: (
{"critical": 0, "high": 1, "medium": 2, "low": 3}[t["severity"]],
-t["count"]
))
report = {
"total_feedback": len(feedback_texts),
"themes_found": len(themes),
"top_issues": prioritised[:10],
"positive_themes": [t for t in themes if t["sentiment"] == "positive"],
"trend_comparison": compare_with_previous_period(themes)
}
return report
Production Deployment
For production: schedule weekly analysis runs; track theme trends over time to measure whether fixes reduced complaint volume; integrate with product management tools to auto-create tickets from critical themes; and add RAG search so team members can ask natural language questions about feedback. Deploy on private infrastructure to keep customer feedback confidential. See model options for larger models, chatbot hosting for feedback Q&A, more tutorials, and analytics use cases.
Analytics GPU Servers
Dedicated GPU servers for feedback analysis and embedding pipelines. Process customer data on isolated UK infrastructure.
Browse GPU Servers