DeepSeek Model Overview
DeepSeek’s Mixture of Experts (MoE) architecture makes their models unusually efficient on GPU hardware. With 236B total parameters but only 21B active during inference, DeepSeek-V2 delivers large-model quality at small-model speed. Here is what it actually costs per million tokens across every GPU server configuration available at GigaGPU.
DeepSeek’s own API is already cheap at $0.20 per 1M tokens (blended), but at high volume, self-hosting on a dedicated DeepSeek server can beat even their pricing. Let us look at the numbers.
DeepSeek-V2 Lite (16B): Cost per GPU
| GPU | Monthly Cost | Throughput (tok/s) | Max Tok/Month | Cost/1M (50%) | Cost/1M (100%) |
|---|---|---|---|---|---|
| RTX 3090 24GB | $99 | ~65 | ~168M | $1.18 | $0.59 |
| RTX 5090 32 GB | $149 | ~95 | ~246M | $1.21 | $0.61 |
| RTX 6000 Pro | $249 | ~110 | ~285M | $1.75 | $0.87 |
| RTX 6000 Pro 96 GB | $299 | ~120 | ~311M | $1.92 | $0.96 |
The RTX 3090 delivers the best cost efficiency for DeepSeek-V2 Lite at $0.59 per 1M tokens. For most use cases, this model provides excellent bang for your buck. See our cheapest GPU for AI inference guide for more budget options.
DeepSeek-V2 236B (MoE): Cost per GPU
| GPU Setup | Monthly Cost | Throughput (tok/s) | Max Tok/Month | Cost/1M (50%) | Cost/1M (100%) |
|---|---|---|---|---|---|
| 2x RTX 6000 Pro 96 GB | $599 | ~45 | ~117M | $10.24 | $5.12 |
| 4x RTX 6000 Pro 96 GB | $899 | ~95 | ~246M | $7.31 | $3.65 |
| 8x RTX 6000 Pro 96 GB | $1,599 | ~160 | ~414M | $7.72 | $3.86 |
Despite having 236B total parameters, the MoE architecture keeps inference efficient. The 4x RTX 6000 Pro setup at $3.65 per 1M tokens offers the best throughput-to-cost ratio for production DeepSeek-V2 workloads. Deploy via vLLM for maximum batch efficiency.
DeepSeek Coder V2: Cost per GPU
| GPU Setup | Monthly Cost | Throughput (tok/s) | Max Tok/Month | Cost/1M (50%) | Cost/1M (100%) |
|---|---|---|---|---|---|
| 1x RTX 5090 (Lite 16B) | $149 | ~90 | ~233M | $1.28 | $0.64 |
| 2x RTX 6000 Pro 96 GB (236B) | $599 | ~45 | ~117M | $10.24 | $5.12 |
| 4x RTX 6000 Pro 96 GB (236B) | $899 | ~90 | ~233M | $7.72 | $3.86 |
DeepSeek Coder is a top choice for AI coding assistant workloads. The Lite variant on a single RTX 5090 handles most coding tasks efficiently. See our cost to run an AI coding assistant guide for workload-specific recommendations.
Self-Hosted vs DeepSeek API
| Option | Cost per 1M Tokens | Break-Even Volume |
|---|---|---|
| DeepSeek API | $0.20 (blended) | N/A (baseline) |
| DeepSeek-V2 Lite (RTX 3090) | $0.59 | API cheaper at all volumes |
| DeepSeek-V2 Lite (RTX 5090) | $0.61 | API cheaper at all volumes |
| DeepSeek-V2 236B (4x RTX 6000 Pro) | $3.65 | API far cheaper |
On pure cost, DeepSeek’s API is hard to beat for their own models. Self-hosting makes sense for three specific reasons:
- Data privacy: DeepSeek routes data through Chinese servers. For UK businesses requiring GDPR compliance, private GPU hosting is the only option.
- Reliability: API outages and rate limits. Self-hosting guarantees availability.
- Customisation: Fine-tuning and custom configurations are only possible self-hosted.
If you are comparing DeepSeek against other APIs, the savings picture changes. Self-hosted DeepSeek-V2 Lite at $0.59/1M is far cheaper than GPT-4o at $5.50/1M or Claude at $7.80/1M. For full cross-provider analysis, see the complete cost guide.
Optimal Configuration Guide
- Best value (small model): DeepSeek-V2 Lite on RTX 3090 — $0.59/1M tokens, $99/month
- Best for coding: DeepSeek Coder Lite on RTX 5090 — $0.64/1M tokens, $149/month
- Best quality: DeepSeek-V2 236B on 4x RTX 6000 Pro — $3.65/1M tokens, $899/month
- Best for GDPR: Any self-hosted config on UK-based servers
Compare DeepSeek costs against other models: LLaMA 3, Mistral, Qwen, and Phi-3. Use our cost per million tokens calculator for precise comparisons, and check the full DeepSeek vs API analysis for break-even details.
Host DeepSeek on Dedicated Hardware
Full data privacy, UK hosting, GDPR compliant. Deploy in under an hour.
Browse GPU Servers