Standard RAG retrieves passages related to a query. Graph RAG builds a knowledge graph of entities and relationships from the corpus first, then traverses it to answer questions. Multi-hop queries (“what connects X to Y through documents?”) benefit most. On dedicated GPU hosting the indexing cost is high but tractable.
Contents
When It Wins
Graph RAG beats vector RAG on:
- Multi-hop questions that chain facts across documents
- “Summarise what document X says about entity Y” style queries
- Discovery questions (“what are all the connections between A and B”)
It underperforms on simple factoid questions where a single passage has the answer. For those, vector RAG is faster and cheaper.
Pipeline
- Chunk the corpus
- LLM pass per chunk extracts entities and relationships
- Merge entities across chunks (same entity in different documents)
- Build a graph (Neo4j, or an in-memory NetworkX)
- Community detection: cluster related nodes
- LLM generates a summary per community
- At query time, route between vector search (local) and graph traversal + community summaries (global)
Cost
Using Llama 3 8B on a 5090:
- Entity/relationship extraction: 3-5 LLM calls per 100k-token document
- Community summarisation: 1 LLM call per 10-20 entities
- Total: ~5-10x the LLM cost of a basic RAG indexing run
For a 10k-document corpus, budget several hours of GPU time for initial indexing. Incremental indexing for new documents is cheap.
Tools
- Microsoft’s GraphRAG reference implementation
- LlamaIndex’s PropertyGraphIndex
- LangChain’s experimental Graph RAG module
GraphRAG is the most complete; LlamaIndex is easier to customise. Pick based on team preference.
Graph RAG Hosting
UK dedicated GPU servers sized for graph indexing with LLM and embedder working together.
Browse GPU ServersSee contextual retrieval (cheaper alternative) and multi-query RAG.