Home / Blog / Tutorials / Nomic Embed Text v1.5 Deployment

Tutorials

Nomic Embed Text v1.5 Deployment

Nomic's embedding model is small, fast, and fully open - weights, data, and training code published. A practical choice when provenance matters.

Tutorials April 23, 2026 2 min read gigagpu

Nomic Embed Text v1.5 from Nomic AI is a 137M-parameter English embedder released with full training transparency – weights, training data, and training code all public. On our dedicated GPU hosting it runs on the cheapest available card with very high throughput.

Why pick Nomic
Deployment
Matryoshka embeddings
Performance

Why Nomic

Three strengths:

Full training transparency – important for regulated industries and research
Matryoshka embeddings – one model produces 64, 128, 256, 512, or full 768-dim vectors
Permissive licence

Trade-off: English-only, slightly weaker on MTEB leaderboard than BGE-M3 on multilingual tasks.

Deployment

docker run --gpus all -p 8080:80 \
  ghcr.io/huggingface/text-embeddings-inference:1.5 \
  --model-id nomic-ai/nomic-embed-text-v1.5 \
  --max-client-batch-size 512

Client code (OpenAI-compatible embeddings API):

from openai import OpenAI
client = OpenAI(base_url="http://server:8080/v1", api_key="n/a")
resp = client.embeddings.create(
    model="nomic-ai/nomic-embed-text-v1.5",
    input=["search_document: your text here"],
)

Nomic expects an instruction prefix: search_document: for documents, search_query: for queries. This is a Nomic quirk – BGE does not require it.

Matryoshka

Nomic supports truncating the full 768-dim vector to shorter lengths without much quality loss. For storage-sensitive deployments:

768-dim: full quality baseline
512-dim: <1% MTEB drop
256-dim: ~2% drop, half the index size
128-dim: ~5% drop, quarter of the index size

For a 10M-document index, going from 768 to 256 saves 40 GB of vector storage.

Performance

On a 3050 (6 GB):

Batch 32: ~5,500 docs/sec
Batch 128: ~12,000 docs/sec

Faster than BGE-M3 due to smaller model and English-only optimisation.

Small, Fast, Open Embedder Hosting

Nomic Embed on UK dedicated GPUs – any tier suits it.

Browse GPU Servers

See BGE-M3 and MixedBread mxbai.

Need a Dedicated GPU Server?

Deploy from RTX 3050 to RTX 5090. Full root access, NVMe storage, 1Gbps — UK datacenter.

Browse GPU Servers

Tutorials

gigagpu

We benchmark, deploy, and optimise GPU infrastructure for AI workloads. All data in our guides comes from real-world testing on our UK-based dedicated GPU servers.

Ready to deploy your AI workload?

Dedicated GPU servers from our UK datacenter. NVMe storage, 1Gbps networking, full root access.

Browse GPU Servers Contact Sales

Nomic Embed Text v1.5 Deployment

Contents

Why Nomic

Deployment

Matryoshka

Performance

Small, Fast, Open Embedder Hosting

Need a Dedicated GPU Server?

gigagpu

GPU Hosting

Blog Categories

AI Model Hosting

Benchmarks & Tools

Deploy a GPU Server

Ready to deploy your AI workload?

Have a question? Need help?

Nomic Embed Text v1.5 Deployment

Contents

Why Nomic

Deployment

Matryoshka

Performance

Small, Fast, Open Embedder Hosting

Need a Dedicated GPU Server?

gigagpu

Related Articles

Full Fine-Tune of a 7B Model on RTX 6000 Pro

Dual RTX 5090 Llama 3 70B Deployment – Tensor Parallel Setup

QLoRA Fine-Tune on RTX 5060 Ti 16GB – Complete Guide

Monitoring an AI Inference Server: Prometheus, Grafana, and the Metrics That Matter

GPU Hosting

Blog Categories

AI Model Hosting

Benchmarks & Tools

Deploy a GPU Server

Ready to deploy your AI workload?

Have a question? Need help? Contact us

Have a question? Need help?