The Challenge: Two Million Products, Zero Relevant Results
A UK-based home furnishings marketplace lists over two million SKUs spanning furniture, lighting, textiles, and decor. Their keyword search engine returns frustrating results: a customer searching for “cosy reading corner chair” gets hits for corner shelves, reading lamps, and chair cushions — everything except what they want. Bounce rates on search result pages sit at 68%, and the merchandising team estimates they lose roughly £180,000 per month in abandoned searches. The existing Elasticsearch cluster handles lexical matching well but has no understanding of intent or product similarity.
Third-party semantic search APIs charge per query, and at 4.5 million searches per month the costs spiral quickly. Worse, sending product catalogue data and user queries to external APIs raises questions about GDPR compliance and competitive data leakage.
AI Solution: Embedding-Based Semantic Search
Semantic product search replaces keyword matching with vector similarity. An embedding model — such as E5-large, BGE, or a fine-tuned Sentence-BERT variant — converts both product descriptions and search queries into dense vectors. At query time, the system encodes the user’s natural language input, then retrieves the nearest product vectors from a purpose-built index. “Cosy reading corner chair” now returns armchairs, accent chairs, and snugglers because the model understands semantic proximity.
The pipeline has two GPU-intensive stages. First, a batch embedding job encodes all two million product descriptions into vectors — a one-time process refreshed nightly as new products arrive. Second, real-time query encoding converts each incoming search into a vector in under 50 milliseconds. A vector database such as Qdrant, Milvus, or Weaviate handles the approximate nearest neighbour lookup. Hosting the embedding model on a dedicated GPU server keeps both stages performant and private.
GPU Requirements
Embedding models are smaller than generative LLMs but still benefit enormously from GPU acceleration. E5-large has 335 million parameters and fits comfortably in 4 GB of VRAM, but throughput depends on batch size and sequence length. Real-time query encoding at 150 queries per second requires sustained GPU compute.
| GPU Model | VRAM | Embedding Throughput (queries/sec) | Batch Encode (2M products) |
|---|---|---|---|
| NVIDIA RTX 5090 | 24 GB | ~320 | ~25 minutes |
| NVIDIA RTX 6000 Pro | 48 GB | ~280 | ~30 minutes |
| NVIDIA RTX 6000 Pro | 48 GB | ~350 | ~22 minutes |
| NVIDIA RTX 6000 Pro 96 GB | 80 GB | ~500 | ~15 minutes |
For a marketplace handling 4.5 million monthly searches, an RTX 5090 or RTX 6000 Pro provides ample headroom. The private AI hosting option ensures no search queries or product data leave the controlled environment.
Recommended Stack
- Sentence Transformers or FlagEmbedding for encoding, running on PyTorch with CUDA acceleration.
- Qdrant or Milvus as the vector database, deployed on the same server or a paired instance for minimal network latency.
- FastAPI wrapping the encoding model as a microservice with batched inference.
- Redis for caching frequent query vectors, reducing GPU load by 30-40%.
- Optional: a vLLM-hosted LLM for query expansion, rewriting vague searches into richer semantic queries before encoding.
Teams wanting to add visual search — letting customers upload a photo and find similar products — can layer in a vision model like CLIP to generate image embeddings alongside text vectors.
Cost Analysis
Third-party semantic search APIs typically charge £0.002–£0.005 per query. At 4.5 million queries per month, that translates to £9,000–£22,500 monthly. A dedicated GPU server from GigaGPU running the full stack costs a fraction of that, with the added benefit of unlimited queries and zero per-call charges.
The initial setup — fine-tuning embeddings on the product catalogue, indexing vectors, integrating with the storefront — takes a development sprint of two to three weeks. After that, ongoing costs are limited to server rental and periodic re-indexing as the catalogue changes. Most e-commerce teams see full ROI within the first month.
Getting Started
Begin by exporting your product catalogue and encoding it with a pre-trained embedding model. Test retrieval quality against your top 200 search queries before committing to production deployment. Fine-tuning the embedding model on your specific product domain — training it to understand that “mid-century sideboard” and “retro buffet cabinet” are near-synonyms — typically lifts relevance scores by 15-25%.
GigaGPU provides UK-based dedicated GPU servers ready for semantic search workloads, with full root access and GDPR-compliant infrastructure. Pair with an AI chatbot for conversational product discovery, or add document AI for processing supplier catalogues automatically.
GigaGPU offers dedicated GPU servers in UK data centres with full GDPR compliance and no shared tenancy. Deploy embedding models for intelligent product discovery today.
View Dedicated GPU Plans