The RTX 3090 (Ampere) is older but still a credible production AI host. The 24 GB VRAM matters more than the architecture age.
3090 vLLM config: FP16 weights, max-num-seqs=64, max-model-len=16384, gpu-memory-utilization=0.92, prefix caching. ~720 tok/s on Mistral 7B. No FP8 hardware so AWQ-INT4 for 13B-class models.
Install
pip install vllm==0.6.3
# RTX 3090 needs NVIDIA driver 535+ (Ampere baseline)
Config
vllm serve mistralai/Mistral-7B-Instruct-v0.3 \
--max-model-len 16384 \
--max-num-seqs 64 \
--gpu-memory-utilization 0.92 \
--enable-prefix-caching
For 13B models, switch to AWQ-INT4:
vllm serve hugging-quants/Qwen2.5-14B-Instruct-AWQ-INT4 \
--quantization awq_marlin \
--max-model-len 16384
Verdict
3090 is the cheapest 24 GB GPU for FP16 production. Skip if you need FP8 or 32+ GB.
Bottom line
Cheapest 24 GB. See RTX 3090 RAG guide.