Content tagging is the unsung backbone of discovery: every blog post, video, SKU or support ticket needs a structured set of labels before recommenders, search or routing can do their job. The RTX 5060 Ti 16GB on UK dedicated GPU hosting runs a fine-tuned DeBERTa-v3 multi-label classifier at 2,400 items per second, which is enough headroom for any mid-sized CMS, marketplace or ad platform on a single Blackwell card.
Contents
Approach: fine-tune beats prompting
For a fixed taxonomy above 20 labels, a fine-tuned encoder beats a prompted LLM on both cost and consistency. DeBERTa-v3-base with a multi-label classification head (BCE loss, label-smoothing 0.05) reaches 91-94% F1 on typical business taxonomies after 3 epochs on 20k-80k labelled examples. Where the taxonomy shifts weekly, fall back to a prompted Phi-3 mini FP8 (285 t/s) with JSON-schema output.
Throughput table
| Approach | Model | Items/sec | Daily (16 h) | F1 |
|---|---|---|---|---|
| Fine-tuned encoder | DeBERTa-v3-base INT8 | 2,400 | 138M | 0.93 |
| Fine-tuned small encoder | MiniLM-L6 INT8 | 6,800 | 391M | 0.87 |
| Prompted small LLM | Phi-3 mini FP8 | 120 (batched) | 6.9M | 0.81 |
| Prompted larger LLM | Llama 3.1 8B FP8 | 80 (batched) | 4.6M | 0.88 |
Training workflow
Label the first 5,000 items by hand or with a Llama 3.1 8B FP8 weak-labeller, train DeBERTa-v3-base with Hugging Face Trainer at batch 32 on one 5060 Ti (roughly 20 minutes per epoch on 50k examples), then iterate on confident errors. The 16 GB of GDDR7 at 448 GB/s holds the full forward/backward pass for sequences up to 512 tokens at batch 32 in BF16 without gradient accumulation tricks.
Serving stack
Export to ONNX, quantise to INT8 with TensorRT and serve via Triton – or keep it simple with a FastAPI wrapper around onnxruntime-gpu. At 2,400 items/second and 50% utilisation you bill one flat monthly fee for 100M+ daily tags, vs paying per call to a hosted classification API.
| Deployment | p50 latency | p99 latency | Max QPS |
|---|---|---|---|
| Single request | 11 ms | 18 ms | 90 |
| Batched (32) | 34 ms | 62 ms | 940 |
| Dynamic batching (Triton) | 22 ms | 48 ms | 2,400 |
Applications
- CMS auto-categorisation and related-content recommendations.
- Marketplace product tagging (category, attributes, moderation flags).
- Support ticket routing and triage.
- Video metadata extraction (after Whisper transcription).
- Ad inventory brand-safety classification.
Content tagging at 2,400 items/sec
Fine-tuned DeBERTa on Blackwell 16GB. UK dedicated hosting.
Order the RTX 5060 Ti 16GBSee also: classification, social listening, embedding server, vLLM setup.