Yes, the RTX 5080 can run DeepSeek R1 distilled models very well. With 16GB GDDR7 VRAM, the RTX 5080 handles the 7B and 8B distilled variants of DeepSeek R1 in FP16, and can run the 14B model in INT8 or INT4 quantisation. The full 671B DeepSeek R1 MoE model requires far more VRAM and will not fit on a single 5080.
The Short Answer
YES for distilled models up to 14B in INT4/INT8. NO for the full 671B MoE model.
DeepSeek R1 comes in several sizes. The distilled 7B variant (based on Qwen 2.5) requires roughly 14GB in FP16, fitting within the RTX 5080’s 16GB. The 14B distill needs about 28GB in FP16, so you must quantise to INT4 (~8GB) or INT8 (~15GB). The full 671B model is a mixture-of-experts architecture that requires hundreds of gigabytes even quantised. For a detailed breakdown, see our DeepSeek VRAM requirements guide.
VRAM Analysis
| DeepSeek Variant | Precision | VRAM Required | RTX 5080 (16GB) |
|---|---|---|---|
| R1 Distill 7B | FP16 | ~14GB | Fits |
| R1 Distill 7B | INT4 | ~5GB | Fits easily |
| R1 Distill 14B | INT8 | ~15GB | Tight fit |
| R1 Distill 14B | INT4 | ~8.5GB | Fits well |
| R1 Distill 32B | INT4 | ~20GB | No |
| R1 Full 671B MoE | INT4 | ~180GB+ | No |
The sweet spot on the RTX 5080 is the 7B distill in FP16 or the 14B distill in INT4. Both configurations leave enough headroom for KV cache and comfortable context lengths up to 8192 tokens.
Performance Benchmarks
| Configuration | RTX 5080 (tok/s) | RTX 3090 (tok/s) | RTX 4060 Ti (tok/s) |
|---|---|---|---|
| R1 Distill 7B FP16 | ~72 | ~52 | ~38 |
| R1 Distill 7B INT4 | ~95 | ~68 | ~55 |
| R1 Distill 14B INT4 | ~48 | ~35 | ~25 |
| R1 Distill 14B INT8 | ~35 | N/A (OOM) | N/A (OOM) |
The RTX 5080’s GDDR7 memory bandwidth gives it a clear lead over the older RTX 3090 for the same model sizes. At 72 tokens per second for the 7B FP16 variant, the 5080 is fast enough for real-time chat applications. Compare these numbers across more GPUs on our tokens per second benchmark page.
Setup Guide
Ollama is the fastest way to get DeepSeek running on your RTX 5080:
# Install and run DeepSeek R1 Distill 7B
ollama run deepseek-r1:7b
# For the 14B variant in INT4
ollama run deepseek-r1:14b-q4_K_M
For production deployments with an OpenAI-compatible API, use vLLM:
vllm serve deepseek-ai/DeepSeek-R1-Distill-Qwen-7B \
--max-model-len 8192 \
--gpu-memory-utilization 0.90 \
--host 0.0.0.0 --port 8000
Set --gpu-memory-utilization 0.90 to reserve VRAM for KV cache while keeping the model fully loaded.
Recommended Alternative
If you need the 14B distill in FP16 or the 32B variant, the RTX 3090 with 24GB handles the 14B in FP16 and the 32B in INT4. For the full DeepSeek R1 671B model, you need multi-GPU setups well beyond a single card.
For other models on the 5080, see whether it can run Mistral 7B in FP16 or Stable Diffusion XL. Explore multi-model setups with the RTX 5080 Whisper + LLM guide. Browse all comparisons in the GPU Comparisons category or find the right server on our dedicated GPU hosting page.
Deploy This Model Now
Dedicated GPU servers with the VRAM you need. UK datacenter, full root access.
Browse GPU Servers