RTX 3050 - Order Now
Home / Blog / GPU Comparisons / Can RTX 5080 Run DeepSeek?
GPU Comparisons

Can RTX 5080 Run DeepSeek?

Yes, the RTX 5080 can run DeepSeek R1 distilled models comfortably with 16GB VRAM. Here is what fits and what performance to expect.

Yes, the RTX 5080 can run DeepSeek R1 distilled models very well. With 16GB GDDR7 VRAM, the RTX 5080 handles the 7B and 8B distilled variants of DeepSeek R1 in FP16, and can run the 14B model in INT8 or INT4 quantisation. The full 671B DeepSeek R1 MoE model requires far more VRAM and will not fit on a single 5080.

The Short Answer

YES for distilled models up to 14B in INT4/INT8. NO for the full 671B MoE model.

DeepSeek R1 comes in several sizes. The distilled 7B variant (based on Qwen 2.5) requires roughly 14GB in FP16, fitting within the RTX 5080’s 16GB. The 14B distill needs about 28GB in FP16, so you must quantise to INT4 (~8GB) or INT8 (~15GB). The full 671B model is a mixture-of-experts architecture that requires hundreds of gigabytes even quantised. For a detailed breakdown, see our DeepSeek VRAM requirements guide.

VRAM Analysis

DeepSeek VariantPrecisionVRAM RequiredRTX 5080 (16GB)
R1 Distill 7BFP16~14GBFits
R1 Distill 7BINT4~5GBFits easily
R1 Distill 14BINT8~15GBTight fit
R1 Distill 14BINT4~8.5GBFits well
R1 Distill 32BINT4~20GBNo
R1 Full 671B MoEINT4~180GB+No

The sweet spot on the RTX 5080 is the 7B distill in FP16 or the 14B distill in INT4. Both configurations leave enough headroom for KV cache and comfortable context lengths up to 8192 tokens.

Performance Benchmarks

ConfigurationRTX 5080 (tok/s)RTX 3090 (tok/s)RTX 4060 Ti (tok/s)
R1 Distill 7B FP16~72~52~38
R1 Distill 7B INT4~95~68~55
R1 Distill 14B INT4~48~35~25
R1 Distill 14B INT8~35N/A (OOM)N/A (OOM)

The RTX 5080’s GDDR7 memory bandwidth gives it a clear lead over the older RTX 3090 for the same model sizes. At 72 tokens per second for the 7B FP16 variant, the 5080 is fast enough for real-time chat applications. Compare these numbers across more GPUs on our tokens per second benchmark page.

Setup Guide

Ollama is the fastest way to get DeepSeek running on your RTX 5080:

# Install and run DeepSeek R1 Distill 7B
ollama run deepseek-r1:7b

# For the 14B variant in INT4
ollama run deepseek-r1:14b-q4_K_M

For production deployments with an OpenAI-compatible API, use vLLM:

vllm serve deepseek-ai/DeepSeek-R1-Distill-Qwen-7B \
  --max-model-len 8192 \
  --gpu-memory-utilization 0.90 \
  --host 0.0.0.0 --port 8000

Set --gpu-memory-utilization 0.90 to reserve VRAM for KV cache while keeping the model fully loaded.

If you need the 14B distill in FP16 or the 32B variant, the RTX 3090 with 24GB handles the 14B in FP16 and the 32B in INT4. For the full DeepSeek R1 671B model, you need multi-GPU setups well beyond a single card.

For other models on the 5080, see whether it can run Mistral 7B in FP16 or Stable Diffusion XL. Explore multi-model setups with the RTX 5080 Whisper + LLM guide. Browse all comparisons in the GPU Comparisons category or find the right server on our dedicated GPU hosting page.

Deploy This Model Now

Dedicated GPU servers with the VRAM you need. UK datacenter, full root access.

Browse GPU Servers

Need a Dedicated GPU Server?

Deploy from RTX 3050 to RTX 5090. Full root access, NVMe storage, 1Gbps — UK datacenter.

Browse GPU Servers

admin

We benchmark, deploy, and optimise GPU infrastructure for AI workloads. All data in our guides comes from real-world testing on our UK-based dedicated GPU servers.

Ready to deploy your AI workload?

Dedicated GPU servers from our UK datacenter. NVMe storage, 1Gbps networking, full root access.

Browse GPU Servers Contact Sales

Have a question? Need help?