Shopify stores that bolt on AI usually end up paying OpenAI per SKU and per chat turn. Move the workload onto a vLLM endpoint running on the RTX 5060 Ti 16GB at our UK dedicated GPU hosting and the cost flips to a flat monthly line item, the roundtrip drops once traffic leaves the Shopify VPC, and product data stays under your GDPR controls. A single Blackwell card at 4608 CUDA, 16 GB GDDR7 and native FP8 runs roughly 1000 product descriptions per hour alongside a semantic search index.
Contents
- AI features you can ship
- Shopify app integration pattern
- Throughput on commerce workloads
- Cost vs SaaS at catalogue scale
- Semantic search for storefront
Features
- Product descriptions from title, attributes and bullet list
- Review summarisation and sentiment breakdown
- Semantic product search with typo tolerance
- Recommendation reranking for “related products”
- Customer support chat grounded in your help centre
- Email campaign copy and subject-line generation
- Localised variants for EU and UK markets
Integration pattern
Build a Remix or Node Shopify app. On product create/update, register a webhook that posts to your middleware. Middleware calls the vLLM endpoint, writes the generated description back via the Admin GraphQL API, and caches inputs to avoid regenerating unchanged fields. Keep Shopify’s rate limits in mind (2 requests/sec on standard plans, 20/sec on Plus) so batch generation happens server-side and writes back at the Shopify-permitted pace.
Throughput
| Task | Model | Per-item time | Per hour on one card |
|---|---|---|---|
| Product description (250 tokens) | Mistral 7B FP8 | 3.5 s concurrent batch | ~1000 |
| Review summary (100 tokens) | Phi-3 mini FP8 | 0.35 s | ~10,000 |
| Recommendation rerank (20 items) | BGE reranker | 45 ms | ~80,000 |
| Storefront chat turn | Llama 3.1 8B FP8 | 1.8 s streaming | ~60 concurrent chats |
| Product embedding (title+body) | BGE-base | 0.1 ms batched | ~10M/hour |
Cost comparison
| Store profile / month | OpenAI | Self-hosted 5060 Ti |
|---|---|---|
| 10k SKUs regenerated + 100k chat turns | ~£450 | Flat £300 |
| 50k review summaries | ~£60 | Same box |
| Semantic search for 100k queries | ~£40 (embeddings only) | Same box |
Above roughly 5000 SKUs or 50k monthly chat turns, dedicated wins in both cost and latency. Below that OpenAI is probably fine; above it a 5060 Ti becomes the obvious choice.
Semantic search
BGE-base produces embeddings at ~10,000 texts/second, so indexing a 100k-SKU catalogue takes roughly ten seconds of GPU time. Store vectors in Qdrant, query with BGE-base, and rerank the top-50 with a cross-encoder in under 50 ms. Hybrid BM25 + vector retrieval (see our hybrid search guide) handles both exact SKU lookups and fuzzy shopper queries like “warm jumper for autumn.”
Shopify AI on your own hardware
Descriptions, search and chat on Blackwell 16GB. UK dedicated hosting.
Order the RTX 5060 Ti 16GBSee also: e-commerce AI, chatbot backend, embedding throughput, SaaS RAG, classification.