Home / Blog / Tutorials / Vector Search Tuning: HNSW Parameters

Tutorials

Vector Search Tuning: HNSW Parameters

HNSW is the default vector index. Three parameters matter for production: M, ef_construction, ef_search. The trade-offs.

Tutorials May 6, 2026 2 min read gigagpu

Table of Contents

HNSW (Hierarchical Navigable Small World) is the default vector index in Qdrant, Weaviate, pgvector, FAISS. Three parameters determine its behaviour: M (graph connectivity), ef_construction (build-time accuracy), ef_search (query-time accuracy). Defaults work for many cases; tuning matters for specific workloads.

TL;DR

Defaults (M=16, ef_construction=128, ef_search=64) work well for most general RAG. Raise M (32-64) + ef_construction (256-512) for high-recall production. Tune ef_search at query time for cost / quality trade-off. Higher = better recall, higher latency / memory. Most teams: bump ef_search to 100-200 for production; defaults elsewhere.

Parameters

M (typically 8-64, default 16): max connections per node in the HNSW graph. Higher = better recall, more memory, slower build.
ef_construction (typically 64-512, default 128): build-time accuracy. Higher = better graph quality, slower build.
ef_search (typically 32-512, default 64): query-time accuracy. Higher = better recall, slower query.

Trade-offs

Setting	Build time	Memory	Query latency	Recall@10
Defaults (M=16, ef=64)	Fast	Low	Fast	~95%
Tuned for production (M=32, ef=200)	Medium	Medium	Medium	~99%
Maximum quality (M=64, ef=500)	Slow	Highest	Slowest	~99.9%

Recipe

For most production RAG:

Start with defaults; measure recall on eval set
If recall < 95% on representative queries: bump ef_search to 128-200 first (no re-index needed)
If still low: increase M to 32 + ef_construction to 256 (requires rebuild)
Above this rarely worth it for general RAG; specialised IR workloads may push higher

Verdict

HNSW defaults work for most RAG. Tune ef_search first (cheap, no rebuild). Rebuild with higher M / ef_construction only when measured recall regression justifies it. Most production deployments: defaults + slightly raised ef_search. Don't over-tune; the gains diminish above 99% recall.

Bottom line

Tune ef_search first; rebuild rarely. See vector store comparison.

Need a Dedicated GPU Server?

Deploy from RTX 3050 to RTX 5090. Full root access, NVMe storage, 1Gbps — UK datacenter.

Browse GPU Servers

Tutorials

gigagpu

We benchmark, deploy, and optimise GPU infrastructure for AI workloads. All data in our guides comes from real-world testing on our UK-based dedicated GPU servers.

Ready to deploy your AI workload?

Dedicated GPU servers from our UK datacenter. NVMe storage, 1Gbps networking, full root access.

Browse GPU Servers Contact Sales

Vector Search Tuning: HNSW Parameters

Parameters

Trade-offs

Recipe

Verdict

Bottom line

Need a Dedicated GPU Server?

gigagpu

GPU Hosting

Blog Categories

AI Model Hosting

Benchmarks & Tools

Deploy a GPU Server

Ready to deploy your AI workload?

Have a question? Need help?

Vector Search Tuning: HNSW Parameters

Parameters

Trade-offs

Recipe

Verdict

Bottom line

Need a Dedicated GPU Server?

gigagpu

Related Articles

vLLM max-model-len and GPU Memory Utilisation Tradeoff

Step-by-Step LoRA Fine-Tune of Llama 3 8B on RTX 4090 24GB

Flux.1 Generation Errors: Common Fixes

Connect AWS S3 to GPU for Models

GPU Hosting

Blog Categories

AI Model Hosting

Benchmarks & Tools

Deploy a GPU Server

Ready to deploy your AI workload?

Have a question? Need help? Contact us

Have a question? Need help?