Turn Confluence, Notion or a SharePoint wiki from “search-and-scroll” into a conversational knowledge base by wrapping it in a RAG endpoint hosted on the RTX 5060 Ti 16GB at our UK dedicated GPU hosting. One Blackwell card carries the embedder, reranker and answer LLM simultaneously, keeping every page and every query inside your network.
Contents
- Architecture on one card
- Ingestion from Confluence and Notion
- Throughput and latency
- Quality tactics
- Integrations
Architecture
| Component | Tool | Role |
|---|---|---|
| Embedder | BGE-M3 | Query and document vectors |
| Vector store | Qdrant | HNSW index, CPU-bound |
| Reranker | BGE reranker v2 (cross-encoder) | Precision lift on top-20 candidates |
| Answer LLM | Llama 3.1 8B FP8 or Qwen 2.5 14B AWQ | Synthesis with citations |
| Orchestrator | FastAPI + LangChain or LlamaIndex | Prompt assembly, auth, logging |
Ingestion
- Confluence:
GET /wiki/rest/api/content?type=page&expand=body.storage,version,ancestorspaginated, triggered daily via cron and on webhook for live edits - Notion:
databases.query+blocks.children.list, follow child blocks recursively, flatten to markdown - SharePoint: Microsoft Graph sites/drives/items, text extraction via
Tikafor Office files - GitHub wiki / READMEs: clone, walk
.mdfiles, respect.docignore - Chunking: 512-token semantic chunks with 64-token overlap; preserve heading hierarchy in metadata
Throughput
| Operation | Per-second on 5060 Ti | Notes |
|---|---|---|
| BGE-M3 embedding | ~5,000 chunks | Batch 64, FP16 |
| BGE-base embedding | ~10,000 texts | For lighter workloads |
| Reranker (batch 20) | ~50 queries | 20 ms per query |
| Llama 3.1 8B answer (300 tokens) | ~20 answers/min single-stream, 60+ concurrent | 112 t/s solo, 720 t/s aggregate |
| Full Q&A round-trip | ~2-3 s end to end | Retrieve + rerank + answer |
A 100,000-page Confluence space embeds in roughly 20 minutes on one card. Incremental updates on edits take seconds.
Quality tactics
- Use contextual retrieval: prepend a per-chunk summary before embedding
- Always rerank; precision@5 typically jumps 15-25 percent over vector search alone
- Include citations with page title and anchor in every answer
- Refuse to answer when the top reranked score is below a threshold; fall back to human support
- Log queries and “no answer found” events to find KB gaps
Integrations
Embed the Q&A widget into your help centre, Intranet homepage or Slack bot. Expose an OpenAI-compatible /chat/completions endpoint so existing tools (Zendesk macros, Notion AI prompts, Slack Bolt apps) can switch providers without code changes. Dozens of simultaneous KB queries fit comfortably on one card.
Conversational knowledge base
Confluence and Notion Q&A on Blackwell 16GB. UK dedicated hosting.
Order the RTX 5060 Ti 16GBSee also: document Q&A, RAG stack install, embedding throughput, SaaS RAG, customer support.