Single-vector dense retrieval is the default for a reason – it is fast and good enough. When it is not good enough, late interaction methods like ColBERT and lexical methods like SPLADE can lift recall measurably. On dedicated GPU hosting both are viable production paths.
Contents
Dense Baseline
One vector per document. Fast search via HNSW or IVF. Good on semantic queries, weaker on exact keyword matching or compositional queries. See BGE-M3.
ColBERT
N vectors per document (one per token). Late interaction scoring via MaxSim. Better than dense on hard retrieval by ~5-15 points recall@10. Storage cost is 10-15x dense. See ColBERT v2.
SPLADE
SPLADE produces sparse vectors (one weighted score per vocabulary token). Inverted-index-friendly, captures lexical matching well. Usually beats BM25 by a decent margin, particularly on exact-keyword queries. Storage and index format resemble classical search.
Hybrid
Dense + BM25 (or dense + SPLADE) combined via reciprocal rank fusion (RRF) is the most common production pattern. One query gets embedded (dense), lexical-indexed (SPLADE/BM25), both retrievers return top-k, results are fused. Adds minor latency, lifts recall 5-15% over either alone.
| Pattern | Recall lift vs dense only | Latency cost |
|---|---|---|
| Dense + BM25 RRF | ~5-10% | Minimal |
| Dense + SPLADE RRF | ~8-15% | Small |
| Dense + rerank (top 100 to top 5) | ~15-25% | ~100 ms |
| ColBERT end-to-end | ~10-20% | ~50-100 ms |
Production-Grade Retrieval Hosting
Hybrid retrieval stacks (dense + rerank, ColBERT) on UK dedicated GPUs.
Browse GPU ServersSee hybrid BM25+embeddings and BGE reranker.