Customer support AI pairs knowledge-base retrieval with a helpful LLM. Running it on the RTX 5060 Ti 16GB at our hosting keeps tickets, KB, and customer PII inside your perimeter.
Contents
Stack
- LLM: Llama 3.1 8B FP8 or Qwen 2.5 14B AWQ
- Embedding: BGE-base via TEI
- Vector DB: Qdrant over KB articles
- Classifier: small DeBERTa for intent + sentiment routing
- Backend: any (Zendesk plugin, custom portal, chat widget)
Workflow
- Customer submits ticket / message
- Intent classifier routes (billing, tech, shipping)
- Retrieve top-K KB passages
- LLM drafts reply with cited passages
- If confidence low or sentiment negative, escalate to human
- Agent reviews and sends (or bot auto-sends for easy cases)
Quality Tuning
- Fine-tune via LoRA on historical human-agent replies (~10k samples) – roughly 35 minutes with Unsloth
- System prompt enforces brand voice, formatting, required disclosures
- Prefix caching on that system prompt = every reply starts in ~50 ms
- Rerank step surfaces more relevant KB passages, reduces hallucination
Capacity
- Active live-chat sessions: ~16 at Llama 3 8B FP8
- Ticket auto-reply (non-interactive): ~5,000-8,000 tickets/day
- Ticket triage + routing only: 50,000+/day
One card typically covers a medium-sized support ops team.
Customer Support AI on Blackwell 16GB
RAG + LLM + triage, UK data jurisdiction. UK dedicated hosting.
Order the RTX 5060 Ti 16GBSee also: chatbot backend, SaaS RAG, ecommerce AI, classification.