Table of Contents
Can RTX 3090 Run DeepSeek V3?
No, the RTX 3090 cannot run the full DeepSeek V3 model. DeepSeek V3 is a 671 billion parameter Mixture-of-Experts (MoE) model that requires a minimum of ~350 GB VRAM at 4-bit quantization. The RTX 3090 has 24 GB, which is nowhere near sufficient. However, you can run smaller DeepSeek models on a dedicated GPU server with an RTX 3090.
DeepSeek V3 uses a MoE architecture with 256 experts, of which only a subset activates per token (roughly 37B active parameters). Despite this efficiency during inference, you still need to load all 671B parameters into memory. The model simply will not fit on consumer GPUs.
VRAM Analysis: 671B MoE on 24 GB
Here is the stark reality of DeepSeek V3’s VRAM requirements:
| Precision | Weight VRAM | KV Cache (2K ctx) | Total | RTX 3090 (24 GB) |
|---|---|---|---|---|
| FP16 | ~1,342 GB | ~5 GB | ~1,347 GB | No (56x too large) |
| FP8 | ~671 GB | ~5 GB | ~676 GB | No (28x too large) |
| INT8 | ~671 GB | ~5 GB | ~676 GB | No (28x too large) |
| INT4 / GPTQ 4-bit | ~350 GB | ~3 GB | ~353 GB | No (15x too large) |
Even with the most aggressive quantization, DeepSeek V3 needs over 300 GB of VRAM. This is a model designed for data center deployment with multiple high-end GPUs. For the full breakdown of all DeepSeek variants, see our DeepSeek VRAM requirements guide.
What DeepSeek Models Fit on RTX 3090?
While V3 is out of reach, the DeepSeek family includes smaller models that work well on 24 GB:
| Model | Parameters | FP16 VRAM | 4-bit VRAM | Fits RTX 3090? |
|---|---|---|---|---|
| DeepSeek-R1-Distill-Qwen-1.5B | 1.5B | ~3 GB | ~1.5 GB | Yes (FP16) |
| DeepSeek-R1-Distill-Qwen-7B | 7B | ~14 GB | ~5 GB | Yes (FP16) |
| DeepSeek-R1-Distill-Qwen-14B | 14B | ~28 GB | ~9 GB | Yes (4-bit) |
| DeepSeek-R1-Distill-Qwen-32B | 32B | ~64 GB | ~20 GB | Yes (4-bit, tight) |
| DeepSeek-R1-Distill-LLaMA-70B | 70B | ~140 GB | ~38 GB | No |
| DeepSeek V3 (full) | 671B MoE | ~1,342 GB | ~350 GB | No |
The sweet spot for the RTX 3090 is the DeepSeek-R1-Distill-Qwen-14B at 4-bit quantization or the 7B distillation at FP16. These distilled models retain much of R1’s reasoning capability. Learn more on our DeepSeek hosting page.
Performance for Models That Fit
Here are benchmarks for DeepSeek models that actually run on the RTX 3090:
| Model | Precision | Gen Speed (tok/s) | Context | Quality |
|---|---|---|---|---|
| R1-Distill-7B | FP16 | ~40-45 | 4096 | Good |
| R1-Distill-7B | INT8 | ~50-55 | 4096 | Good |
| R1-Distill-14B | 4-bit | ~25-30 | 4096 | Very good |
| R1-Distill-32B | 4-bit | ~12-15 | 2048 | Excellent |
The 32B distill at 4-bit is particularly impressive on the 3090, delivering strong reasoning at usable speeds. Compare these against other models using our tokens per second benchmark tool.
Multi-GPU Options for Full V3
If you need the full DeepSeek V3, here is what it takes:
- 8x RTX 6000 Pro 96 GB (FP8): 640 GB total. Fits the FP8 model with room for KV cache. This is the standard deployment configuration.
- 4x RTX 6000 Pro 96 GB (4-bit): 320 GB total. Marginal fit with aggressive quantization.
- 16x RTX 3090 (4-bit): 384 GB total. Theoretically possible but impractical due to interconnect limitations.
Consumer GPUs are not designed for models of this scale. For production V3 deployment, contact us about multi-GPU cluster options.
Setup Commands
Running the DeepSeek distilled models on an RTX 3090:
Ollama (Quickest)
# Run DeepSeek R1 distillation
curl -fsSL https://ollama.com/install.sh | sh
ollama run deepseek-r1:14b
vLLM (Production API)
# Serve DeepSeek-R1-Distill-Qwen-14B with vLLM
pip install vllm
vllm serve deepseek-ai/DeepSeek-R1-Distill-Qwen-14B \
--quantization awq --max-model-len 4096 \
--gpu-memory-utilization 0.90
For full deployment instructions, see our deploy DeepSeek server guide and vLLM hosting page. If you want to explore other models that fit on 24 GB, our best GPU for LLM inference guide covers the options.
For a cost comparison of self-hosting versus using the DeepSeek API, check our cost per 1M tokens: GPU vs API analysis and the LLM cost calculator.
Deploy This Model Now
Dedicated GPU servers with the VRAM you need. UK datacenter, full root access.
Browse GPU Servers