Nomic Embed Text v1.5 from Nomic AI is a 137M-parameter English embedder released with full training transparency – weights, training data, and training code all public. On our dedicated GPU hosting it runs on the cheapest available card with very high throughput.
Contents
Why Nomic
Three strengths:
- Full training transparency – important for regulated industries and research
- Matryoshka embeddings – one model produces 64, 128, 256, 512, or full 768-dim vectors
- Permissive licence
Trade-off: English-only, slightly weaker on MTEB leaderboard than BGE-M3 on multilingual tasks.
Deployment
docker run --gpus all -p 8080:80 \
ghcr.io/huggingface/text-embeddings-inference:1.5 \
--model-id nomic-ai/nomic-embed-text-v1.5 \
--max-client-batch-size 512
Client code (OpenAI-compatible embeddings API):
from openai import OpenAI
client = OpenAI(base_url="http://server:8080/v1", api_key="n/a")
resp = client.embeddings.create(
model="nomic-ai/nomic-embed-text-v1.5",
input=["search_document: your text here"],
)
Nomic expects an instruction prefix: search_document: for documents, search_query: for queries. This is a Nomic quirk – BGE does not require it.
Matryoshka
Nomic supports truncating the full 768-dim vector to shorter lengths without much quality loss. For storage-sensitive deployments:
- 768-dim: full quality baseline
- 512-dim: <1% MTEB drop
- 256-dim: ~2% drop, half the index size
- 128-dim: ~5% drop, quarter of the index size
For a 10M-document index, going from 768 to 256 saves 40 GB of vector storage.
Performance
On a 3050 (6 GB):
- Batch 32: ~5,500 docs/sec
- Batch 128: ~12,000 docs/sec
Faster than BGE-M3 due to smaller model and English-only optimisation.
Small, Fast, Open Embedder Hosting
Nomic Embed on UK dedicated GPUs – any tier suits it.
Browse GPU ServersSee BGE-M3 and MixedBread mxbai.