Home / Blog / Cost & Pricing / Cost per 1M Tokens: Qwen by GPU (Full Breakdown)

Cost & Pricing

Cost per 1M Tokens: Qwen by GPU (Full Breakdown)

Exact cost per 1M tokens for Qwen 2.5 models across every GPU option. Complete breakdown for 7B, 14B, 32B, and 72B variants on dedicated hardware.

Cost & Pricing April 13, 2026 3 min read admin

Table of Contents

Qwen 2.5 Model Family
Qwen 2.5 7B: Cost per GPU
Qwen 2.5 14B and 32B: Cost per GPU
Qwen 2.5 72B: Cost per GPU
Qwen vs Other Models: Cost Comparison
Best Configuration for Your Use Case

Qwen 2.5 Model Family

Alibaba’s Qwen 2.5 series has emerged as one of the strongest open-source model families, rivalling LLaMA 3 on multiple benchmarks and excelling at multilingual and coding tasks. Self-hosting Qwen on a dedicated GPU server gives you zero per-token costs and full data control. Here is the complete cost-per-million-tokens breakdown by GPU.

Qwen 2.5 comes in 7B, 14B, 32B, and 72B variants. GigaGPU’s Qwen hosting supports all sizes with vLLM pre-configured. Use our cost per million tokens calculator for interactive comparisons.

Qwen 2.5 7B: Cost per GPU

GPU	Monthly Cost	Throughput (tok/s)	Max Tok/Month	Cost/1M (50%)	Cost/1M (100%)
RTX 3090 24GB	$99	~75	~194M	$1.02	$0.51
RTX 5090 32 GB	$149	~115	~298M	$1.00	$0.50
RTX 6000 Pro	$249	~140	~363M	$1.37	$0.69
RTX 6000 Pro 96 GB	$299	~150	~389M	$1.54	$0.77

Qwen 2.5 7B is extremely efficient. At $0.50 per 1M tokens on an RTX 5090, it undercuts virtually every API on the market. The RTX 3090 at $0.51/1M is nearly identical and costs $50 less per month. Compare with our RTX 3090 vs RTX 5090 analysis.

Qwen 2.5 14B and 32B: Cost per GPU

Qwen 2.5 14B

GPU	Monthly Cost	Throughput (tok/s)	Max Tok/Month	Cost/1M (50%)	Cost/1M (100%)
RTX 5090 32 GB	$149	~70	~181M	$1.65	$0.82
RTX 6000 Pro	$249	~90	~233M	$2.14	$1.07
RTX 6000 Pro 96 GB	$299	~100	~259M	$2.31	$1.15

Qwen 2.5 32B

GPU Setup	Monthly Cost	Throughput (tok/s)	Max Tok/Month	Cost/1M (50%)	Cost/1M (100%)
1x RTX 6000 Pro 96 GB (INT8)	$299	~50	~130M	$4.60	$2.30
2x RTX 5090 (FP16)	$279	~40	~104M	$5.37	$2.68
2x RTX 6000 Pro 96 GB (FP16)	$599	~70	~181M	$6.62	$3.31

The 14B and 32B variants hit a sweet spot between quality and cost. Qwen 2.5 32B on a single RTX 6000 Pro 96 GB with INT8 quantisation delivers $2.30 per 1M tokens, which is cheaper than most premium APIs. For help choosing between model sizes, see our VRAM optimisation guide.

Calculate Your Savings

See exactly how much you’d save by self-hosting.

LLM Cost Calculator

Qwen 2.5 72B: Cost per GPU

GPU Setup	Precision	Monthly Cost	Throughput	Max Tok/Month	Cost/1M (50%)	Cost/1M (100%)
1x RTX 5090	INT4 (GPTQ)	$149	~18 tok/s	~47M	$6.34	$3.17
2x RTX 5090	INT4	$279	~35 tok/s	~91M	$6.13	$3.07
1x RTX 6000 Pro 96 GB	INT8	$299	~28 tok/s	~73M	$8.19	$4.10
2x RTX 6000 Pro 96 GB	FP16	$599	~48 tok/s	~124M	$9.66	$4.83
2x RTX 6000 Pro 96 GB	INT8	$599	~60 tok/s	~155M	$7.73	$3.86
4x RTX 6000 Pro 96 GB	FP16	$899	~95 tok/s	~246M	$7.31	$3.65

Qwen 2.5 72B delivers GPT-4o class performance. The cheapest option is 2x RTX 5090 with INT4 at $3.07 per 1M tokens. For production quality, 2x RTX 6000 Pro with INT8 at $3.86/1M offers the best balance. Compare against the full 70B model cost guide.

Qwen vs Other Models: Cost Comparison

Model (72B class)	Best Self-Hosted Rate	Quality (MMLU)
Qwen 2.5 72B	$3.07/1M (2x 5090 INT4)	~84.2
LLaMA 3 70B	$2.68/1M (2x 5090 INT4)	~82.0
Mistral Large 123B	$4.97/1M (4x RTX 6000 Pro INT8)	~84.0
DeepSeek-V2 236B	$3.65/1M (4x RTX 6000 Pro)	~83.5

Qwen 2.5 72B offers the highest MMLU score in its class while remaining cost-competitive with LLaMA 3. For tasks where Qwen excels (Chinese/multilingual, coding, mathematics), it is the clear choice. For the full landscape, see our complete cost guide.

Best Configuration for Your Use Case

Chatbots and lightweight tasks: Qwen 2.5 7B on RTX 3090 ($99/mo, $0.51/1M). See chatbot cost guide.
Mid-range production: Qwen 2.5 32B on RTX 6000 Pro 96 GB ($299/mo, $2.30/1M). Great quality-to-cost ratio.
Premium quality: Qwen 2.5 72B on 2x RTX 6000 Pro INT8 ($599/mo, $3.86/1M). GPT-4o class at a fraction of the cost.
High throughput: Qwen 2.5 72B on 4x RTX 6000 Pro ($899/mo, $3.65/1M). Maximum concurrency for production hosting.

Get started with the right GPU and follow our self-host LLM guide for deployment. Compare all models at Phi-3 per GPU costs for smaller model options.

Host Qwen on Dedicated Hardware

From $0.50 per 1M tokens. UK-hosted, GDPR compliant, unlimited inference.

Browse GPU Servers

Need a Dedicated GPU Server?

Deploy from RTX 3050 to RTX 5090. Full root access, NVMe storage, 1Gbps — UK datacenter.

Browse GPU Servers

Cost & Pricing

admin

We benchmark, deploy, and optimise GPU infrastructure for AI workloads. All data in our guides comes from real-world testing on our UK-based dedicated GPU servers.

Ready to deploy your AI workload?

Dedicated GPU servers from our UK datacenter. NVMe storage, 1Gbps networking, full root access.

Browse GPU Servers Contact Sales

Cost per 1M Tokens: Qwen by GPU (Full Breakdown)

Qwen 2.5 Model Family

Qwen 2.5 7B: Cost per GPU

Qwen 2.5 14B and 32B: Cost per GPU

Qwen 2.5 14B

Qwen 2.5 32B

Calculate Your Savings

Qwen 2.5 72B: Cost per GPU

Qwen vs Other Models: Cost Comparison

Best Configuration for Your Use Case

Host Qwen on Dedicated Hardware

Need a Dedicated GPU Server?

admin

GPU Hosting

Blog Categories

AI Model Hosting

Benchmarks & Tools

Deploy a GPU Server

Ready to deploy your AI workload?

Have a question? Need help?

Cost per 1M Tokens: Qwen by GPU (Full Breakdown)

Qwen 2.5 Model Family

Qwen 2.5 7B: Cost per GPU

Qwen 2.5 14B and 32B: Cost per GPU

Qwen 2.5 14B

Qwen 2.5 32B

Calculate Your Savings

Qwen 2.5 72B: Cost per GPU

Qwen vs Other Models: Cost Comparison

Best Configuration for Your Use Case

Host Qwen on Dedicated Hardware

Need a Dedicated GPU Server?

admin

Related Articles

DeepSeek 7B on RTX 4060 Ti: Monthly Cost & Token Output

AI Inference Cost per Query by Model and GPU

OCR Cost per 10,000 Pages by GPU

Migrate from Google Vertex to Dedicated GPU: Savings Calculator

GPU Hosting

Blog Categories

AI Model Hosting

Benchmarks & Tools

Deploy a GPU Server

Ready to deploy your AI workload?

Have a question? Need help? Contact us

Have a question? Need help?