RTX 3050 - Order Now
Home / Blog / Cost & Pricing / Cost per 1M Tokens: Qwen by GPU (Full Breakdown)
Cost & Pricing

Cost per 1M Tokens: Qwen by GPU (Full Breakdown)

Exact cost per 1M tokens for Qwen 2.5 models across every GPU option. Complete breakdown for 7B, 14B, 32B, and 72B variants on dedicated hardware.

Qwen 2.5 Model Family

Alibaba’s Qwen 2.5 series has emerged as one of the strongest open-source model families, rivalling LLaMA 3 on multiple benchmarks and excelling at multilingual and coding tasks. Self-hosting Qwen on a dedicated GPU server gives you zero per-token costs and full data control. Here is the complete cost-per-million-tokens breakdown by GPU.

Qwen 2.5 comes in 7B, 14B, 32B, and 72B variants. GigaGPU’s Qwen hosting supports all sizes with vLLM pre-configured. Use our cost per million tokens calculator for interactive comparisons.

Qwen 2.5 7B: Cost per GPU

GPUMonthly CostThroughput (tok/s)Max Tok/MonthCost/1M (50%)Cost/1M (100%)
RTX 3090 24GB$99~75~194M$1.02$0.51
RTX 5090 32 GB$149~115~298M$1.00$0.50
RTX 6000 Pro$249~140~363M$1.37$0.69
RTX 6000 Pro 96 GB$299~150~389M$1.54$0.77

Qwen 2.5 7B is extremely efficient. At $0.50 per 1M tokens on an RTX 5090, it undercuts virtually every API on the market. The RTX 3090 at $0.51/1M is nearly identical and costs $50 less per month. Compare with our RTX 3090 vs RTX 5090 analysis.

Qwen 2.5 14B and 32B: Cost per GPU

Qwen 2.5 14B

GPUMonthly CostThroughput (tok/s)Max Tok/MonthCost/1M (50%)Cost/1M (100%)
RTX 5090 32 GB$149~70~181M$1.65$0.82
RTX 6000 Pro$249~90~233M$2.14$1.07
RTX 6000 Pro 96 GB$299~100~259M$2.31$1.15

Qwen 2.5 32B

GPU SetupMonthly CostThroughput (tok/s)Max Tok/MonthCost/1M (50%)Cost/1M (100%)
1x RTX 6000 Pro 96 GB (INT8)$299~50~130M$4.60$2.30
2x RTX 5090 (FP16)$279~40~104M$5.37$2.68
2x RTX 6000 Pro 96 GB (FP16)$599~70~181M$6.62$3.31

The 14B and 32B variants hit a sweet spot between quality and cost. Qwen 2.5 32B on a single RTX 6000 Pro 96 GB with INT8 quantisation delivers $2.30 per 1M tokens, which is cheaper than most premium APIs. For help choosing between model sizes, see our VRAM optimisation guide.

Calculate Your Savings

See exactly how much you’d save by self-hosting.

LLM Cost Calculator

Qwen 2.5 72B: Cost per GPU

GPU SetupPrecisionMonthly CostThroughputMax Tok/MonthCost/1M (50%)Cost/1M (100%)
1x RTX 5090INT4 (GPTQ)$149~18 tok/s~47M$6.34$3.17
2x RTX 5090INT4$279~35 tok/s~91M$6.13$3.07
1x RTX 6000 Pro 96 GBINT8$299~28 tok/s~73M$8.19$4.10
2x RTX 6000 Pro 96 GBFP16$599~48 tok/s~124M$9.66$4.83
2x RTX 6000 Pro 96 GBINT8$599~60 tok/s~155M$7.73$3.86
4x RTX 6000 Pro 96 GBFP16$899~95 tok/s~246M$7.31$3.65

Qwen 2.5 72B delivers GPT-4o class performance. The cheapest option is 2x RTX 5090 with INT4 at $3.07 per 1M tokens. For production quality, 2x RTX 6000 Pro with INT8 at $3.86/1M offers the best balance. Compare against the full 70B model cost guide.

Qwen vs Other Models: Cost Comparison

Model (72B class)Best Self-Hosted RateQuality (MMLU)
Qwen 2.5 72B$3.07/1M (2x 5090 INT4)~84.2
LLaMA 3 70B$2.68/1M (2x 5090 INT4)~82.0
Mistral Large 123B$4.97/1M (4x RTX 6000 Pro INT8)~84.0
DeepSeek-V2 236B$3.65/1M (4x RTX 6000 Pro)~83.5

Qwen 2.5 72B offers the highest MMLU score in its class while remaining cost-competitive with LLaMA 3. For tasks where Qwen excels (Chinese/multilingual, coding, mathematics), it is the clear choice. For the full landscape, see our complete cost guide.

Best Configuration for Your Use Case

  • Chatbots and lightweight tasks: Qwen 2.5 7B on RTX 3090 ($99/mo, $0.51/1M). See chatbot cost guide.
  • Mid-range production: Qwen 2.5 32B on RTX 6000 Pro 96 GB ($299/mo, $2.30/1M). Great quality-to-cost ratio.
  • Premium quality: Qwen 2.5 72B on 2x RTX 6000 Pro INT8 ($599/mo, $3.86/1M). GPT-4o class at a fraction of the cost.
  • High throughput: Qwen 2.5 72B on 4x RTX 6000 Pro ($899/mo, $3.65/1M). Maximum concurrency for production hosting.

Get started with the right GPU and follow our self-host LLM guide for deployment. Compare all models at Phi-3 per GPU costs for smaller model options.

Host Qwen on Dedicated Hardware

From $0.50 per 1M tokens. UK-hosted, GDPR compliant, unlimited inference.

Browse GPU Servers

Need a Dedicated GPU Server?

Deploy from RTX 3050 to RTX 5090. Full root access, NVMe storage, 1Gbps — UK datacenter.

Browse GPU Servers

admin

We benchmark, deploy, and optimise GPU infrastructure for AI workloads. All data in our guides comes from real-world testing on our UK-based dedicated GPU servers.

Ready to deploy your AI workload?

Dedicated GPU servers from our UK datacenter. NVMe storage, 1Gbps networking, full root access.

Browse GPU Servers Contact Sales

Have a question? Need help?