Qwen 2.5 Model Family
Alibaba’s Qwen 2.5 series has emerged as one of the strongest open-source model families, rivalling LLaMA 3 on multiple benchmarks and excelling at multilingual and coding tasks. Self-hosting Qwen on a dedicated GPU server gives you zero per-token costs and full data control. Here is the complete cost-per-million-tokens breakdown by GPU.
Qwen 2.5 comes in 7B, 14B, 32B, and 72B variants. GigaGPU’s Qwen hosting supports all sizes with vLLM pre-configured. Use our cost per million tokens calculator for interactive comparisons.
Qwen 2.5 7B: Cost per GPU
| GPU | Monthly Cost | Throughput (tok/s) | Max Tok/Month | Cost/1M (50%) | Cost/1M (100%) |
|---|---|---|---|---|---|
| RTX 3090 24GB | $99 | ~75 | ~194M | $1.02 | $0.51 |
| RTX 5090 32 GB | $149 | ~115 | ~298M | $1.00 | $0.50 |
| RTX 6000 Pro | $249 | ~140 | ~363M | $1.37 | $0.69 |
| RTX 6000 Pro 96 GB | $299 | ~150 | ~389M | $1.54 | $0.77 |
Qwen 2.5 7B is extremely efficient. At $0.50 per 1M tokens on an RTX 5090, it undercuts virtually every API on the market. The RTX 3090 at $0.51/1M is nearly identical and costs $50 less per month. Compare with our RTX 3090 vs RTX 5090 analysis.
Qwen 2.5 14B and 32B: Cost per GPU
Qwen 2.5 14B
| GPU | Monthly Cost | Throughput (tok/s) | Max Tok/Month | Cost/1M (50%) | Cost/1M (100%) |
|---|---|---|---|---|---|
| RTX 5090 32 GB | $149 | ~70 | ~181M | $1.65 | $0.82 |
| RTX 6000 Pro | $249 | ~90 | ~233M | $2.14 | $1.07 |
| RTX 6000 Pro 96 GB | $299 | ~100 | ~259M | $2.31 | $1.15 |
Qwen 2.5 32B
| GPU Setup | Monthly Cost | Throughput (tok/s) | Max Tok/Month | Cost/1M (50%) | Cost/1M (100%) |
|---|---|---|---|---|---|
| 1x RTX 6000 Pro 96 GB (INT8) | $299 | ~50 | ~130M | $4.60 | $2.30 |
| 2x RTX 5090 (FP16) | $279 | ~40 | ~104M | $5.37 | $2.68 |
| 2x RTX 6000 Pro 96 GB (FP16) | $599 | ~70 | ~181M | $6.62 | $3.31 |
The 14B and 32B variants hit a sweet spot between quality and cost. Qwen 2.5 32B on a single RTX 6000 Pro 96 GB with INT8 quantisation delivers $2.30 per 1M tokens, which is cheaper than most premium APIs. For help choosing between model sizes, see our VRAM optimisation guide.
Qwen 2.5 72B: Cost per GPU
| GPU Setup | Precision | Monthly Cost | Throughput | Max Tok/Month | Cost/1M (50%) | Cost/1M (100%) |
|---|---|---|---|---|---|---|
| 1x RTX 5090 | INT4 (GPTQ) | $149 | ~18 tok/s | ~47M | $6.34 | $3.17 |
| 2x RTX 5090 | INT4 | $279 | ~35 tok/s | ~91M | $6.13 | $3.07 |
| 1x RTX 6000 Pro 96 GB | INT8 | $299 | ~28 tok/s | ~73M | $8.19 | $4.10 |
| 2x RTX 6000 Pro 96 GB | FP16 | $599 | ~48 tok/s | ~124M | $9.66 | $4.83 |
| 2x RTX 6000 Pro 96 GB | INT8 | $599 | ~60 tok/s | ~155M | $7.73 | $3.86 |
| 4x RTX 6000 Pro 96 GB | FP16 | $899 | ~95 tok/s | ~246M | $7.31 | $3.65 |
Qwen 2.5 72B delivers GPT-4o class performance. The cheapest option is 2x RTX 5090 with INT4 at $3.07 per 1M tokens. For production quality, 2x RTX 6000 Pro with INT8 at $3.86/1M offers the best balance. Compare against the full 70B model cost guide.
Qwen vs Other Models: Cost Comparison
| Model (72B class) | Best Self-Hosted Rate | Quality (MMLU) |
|---|---|---|
| Qwen 2.5 72B | $3.07/1M (2x 5090 INT4) | ~84.2 |
| LLaMA 3 70B | $2.68/1M (2x 5090 INT4) | ~82.0 |
| Mistral Large 123B | $4.97/1M (4x RTX 6000 Pro INT8) | ~84.0 |
| DeepSeek-V2 236B | $3.65/1M (4x RTX 6000 Pro) | ~83.5 |
Qwen 2.5 72B offers the highest MMLU score in its class while remaining cost-competitive with LLaMA 3. For tasks where Qwen excels (Chinese/multilingual, coding, mathematics), it is the clear choice. For the full landscape, see our complete cost guide.
Best Configuration for Your Use Case
- Chatbots and lightweight tasks: Qwen 2.5 7B on RTX 3090 ($99/mo, $0.51/1M). See chatbot cost guide.
- Mid-range production: Qwen 2.5 32B on RTX 6000 Pro 96 GB ($299/mo, $2.30/1M). Great quality-to-cost ratio.
- Premium quality: Qwen 2.5 72B on 2x RTX 6000 Pro INT8 ($599/mo, $3.86/1M). GPT-4o class at a fraction of the cost.
- High throughput: Qwen 2.5 72B on 4x RTX 6000 Pro ($899/mo, $3.65/1M). Maximum concurrency for production hosting.
Get started with the right GPU and follow our self-host LLM guide for deployment. Compare all models at Phi-3 per GPU costs for smaller model options.
Host Qwen on Dedicated Hardware
From $0.50 per 1M tokens. UK-hosted, GDPR compliant, unlimited inference.
Browse GPU Servers