Table of Contents
Text classification is one of the cheapest AI workloads. Two paradigms: dedicated encoder models (BERT, DeBERTa) or LLM-as-classifier (prompt an LLM with class labels).
For high-throughput stable classification: fine-tuned DeBERTa-v3. For flexible / low-volume / multi-class: LLM-as-classifier (Mistral 7B). DeBERTa hits 100K+ classifications/sec on a 5060 Ti; LLM hits ~500/sec.
Two approaches
- Encoder-only: DeBERTa-v3 fine-tuned on your task. Fast, cheap, requires labelled data.
- LLM-as-classifier: prompt Mistral 7B with class labels and few-shot examples. Slower, more flexible, no fine-tuning data required.
Hardware
- DeBERTa-v3 on RTX 3060 12 GB: ~100K classifications/sec — over-spec for most workloads
- Mistral 7B FP8 on RTX 5060 Ti: ~500 classifications/sec — slower but no fine-tuning
Verdict
If you have labelled data and stable categories, fine-tune DeBERTa. If categories drift or labelled data is scarce, use LLM-as-classifier.
Bottom line
Classification is throughput-bound; cheapest GPUs work well. See best GPU for embeddings for similar sizing.