RTX 3050 - Order Now
Home / Blog / Tutorials / Nomic Embed Text v1.5 Deployment
Tutorials

Nomic Embed Text v1.5 Deployment

Nomic's embedding model is small, fast, and fully open - weights, data, and training code published. A practical choice when provenance matters.

Nomic Embed Text v1.5 from Nomic AI is a 137M-parameter English embedder released with full training transparency – weights, training data, and training code all public. On our dedicated GPU hosting it runs on the cheapest available card with very high throughput.

Contents

Why Nomic

Three strengths:

  • Full training transparency – important for regulated industries and research
  • Matryoshka embeddings – one model produces 64, 128, 256, 512, or full 768-dim vectors
  • Permissive licence

Trade-off: English-only, slightly weaker on MTEB leaderboard than BGE-M3 on multilingual tasks.

Deployment

docker run --gpus all -p 8080:80 \
  ghcr.io/huggingface/text-embeddings-inference:1.5 \
  --model-id nomic-ai/nomic-embed-text-v1.5 \
  --max-client-batch-size 512

Client code (OpenAI-compatible embeddings API):

from openai import OpenAI
client = OpenAI(base_url="http://server:8080/v1", api_key="n/a")
resp = client.embeddings.create(
    model="nomic-ai/nomic-embed-text-v1.5",
    input=["search_document: your text here"],
)

Nomic expects an instruction prefix: search_document: for documents, search_query: for queries. This is a Nomic quirk – BGE does not require it.

Matryoshka

Nomic supports truncating the full 768-dim vector to shorter lengths without much quality loss. For storage-sensitive deployments:

  • 768-dim: full quality baseline
  • 512-dim: <1% MTEB drop
  • 256-dim: ~2% drop, half the index size
  • 128-dim: ~5% drop, quarter of the index size

For a 10M-document index, going from 768 to 256 saves 40 GB of vector storage.

Performance

On a 3050 (6 GB):

  • Batch 32: ~5,500 docs/sec
  • Batch 128: ~12,000 docs/sec

Faster than BGE-M3 due to smaller model and English-only optimisation.

Small, Fast, Open Embedder Hosting

Nomic Embed on UK dedicated GPUs – any tier suits it.

Browse GPU Servers

See BGE-M3 and MixedBread mxbai.

Need a Dedicated GPU Server?

Deploy from RTX 3050 to RTX 5090. Full root access, NVMe storage, 1Gbps — UK datacenter.

Browse GPU Servers

gigagpu

We benchmark, deploy, and optimise GPU infrastructure for AI workloads. All data in our guides comes from real-world testing on our UK-based dedicated GPU servers.

Ready to deploy your AI workload?

Dedicated GPU servers from our UK datacenter. NVMe storage, 1Gbps networking, full root access.

Browse GPU Servers Contact Sales

Have a question? Need help?