Home / Blog / Model Guides / DeepSeek VRAM Requirements (All Model Sizes)

Model Guides

DeepSeek VRAM Requirements (All Model Sizes)

Complete DeepSeek VRAM requirements for every model — R1, V3, Coder, and distillations. FP32, FP16, INT8, and INT4 tables plus GPU recommendations.

Model Guides April 13, 2026 4 min read admin

Table of Contents

DeepSeek VRAM Requirements Overview
Complete VRAM Table (All Models)
Which GPU Do You Need?
Context Length Impact on VRAM
Batch Size Impact on VRAM
Practical Deployment Recommendations
Quick Setup Commands

DeepSeek VRAM Requirements Overview

DeepSeek offers a wide range of models from 1.5B distillations to the massive 671B V3 MoE. The VRAM you need depends entirely on which model you are running and at what precision. This guide covers every variant to help you choose the right dedicated GPU server for your DeepSeek deployment.

The key thing to understand about DeepSeek V3 and R1 (full versions) is that they use a Mixture-of-Experts architecture with 671B total parameters but only ~37B active per token. Despite this efficiency during computation, all 671B parameters must be loaded into VRAM.

Complete VRAM Table (All Models)

DeepSeek R1 Distillations

Model	Parameters	FP32	FP16	INT8	INT4
R1-Distill-Qwen-1.5B	1.5B	~6 GB	~3 GB	~1.6 GB	~1 GB
R1-Distill-Qwen-7B	7B	~28 GB	~14 GB	~7 GB	~4.5 GB
R1-Distill-Qwen-14B	14B	~56 GB	~28 GB	~14 GB	~9 GB
R1-Distill-Qwen-32B	32B	~128 GB	~64 GB	~32 GB	~20 GB
R1-Distill-LLaMA-8B	8B	~32 GB	~16 GB	~8.5 GB	~5.5 GB
R1-Distill-LLaMA-70B	70B	~280 GB	~140 GB	~70 GB	~38 GB

DeepSeek Full Models

Model	Parameters	FP32	FP16	INT8/FP8	INT4
DeepSeek V2 Lite	16B MoE	~64 GB	~32 GB	~16 GB	~10 GB
DeepSeek V2	236B MoE	~944 GB	~472 GB	~236 GB	~125 GB
DeepSeek V3	671B MoE	~2,684 GB	~1,342 GB	~671 GB	~350 GB
DeepSeek R1 (full)	671B MoE	~2,684 GB	~1,342 GB	~671 GB	~350 GB
DeepSeek Coder V2 Lite	16B MoE	~64 GB	~32 GB	~16 GB	~10 GB
DeepSeek Coder V2	236B MoE	~944 GB	~472 GB	~236 GB	~125 GB

Note: FP32 is shown for reference but never used in practice for inference. FP16 is the standard full-precision inference format. For related model comparisons, see our LLaMA 3 VRAM requirements and Qwen VRAM requirements pages.

Which GPU Do You Need?

GPU	VRAM	Best DeepSeek Model	Precision	Use Case
RTX 3050	8 GB	R1-Distill-7B	4-bit	Dev / testing
RTX 4060	8 GB	R1-Distill-7B	4-bit	Dev / light API
RTX 4060 Ti	16 GB	R1-Distill-7B / 14B	FP16 / 4-bit	Small production
RTX 3090	24 GB	R1-Distill-14B / 32B	FP16 / 4-bit	Production
2x RTX 3090	48 GB	R1-Distill-32B	FP16	High quality
8x RTX 6000 Pro 96 GB	640 GB	DeepSeek V3 / R1	FP8	Full model

For most users, the R1 distillations are the practical choice. The 14B and 32B distillations retain strong reasoning capability from the full R1 model at a fraction of the VRAM cost. See our RTX 3090 DeepSeek V3 analysis for the full model discussion.

Context Length Impact on VRAM

DeepSeek models support long context windows, but longer context means more KV cache VRAM:

Model	Context Length	KV Cache (FP16)	Total VRAM (FP16 weights)
R1-Distill-7B	4,096	~0.5 GB	~14.5 GB
R1-Distill-7B	16,384	~2 GB	~16 GB
R1-Distill-7B	32,768	~4 GB	~18 GB
R1-Distill-14B	4,096	~1 GB	~29 GB
R1-Distill-14B	16,384	~4 GB	~32 GB
R1-Distill-32B	4,096	~2 GB	~66 GB
R1-Distill-32B	16,384	~8 GB	~72 GB

For DeepSeek R1’s chain-of-thought reasoning, longer context is often needed since the model generates lengthy reasoning chains. Budget extra VRAM for this. Use our LLM cost calculator to estimate costs at your target context length.

Batch Size Impact on VRAM

Serving multiple concurrent requests multiplies KV cache usage:

Model (4-bit)	Batch 1	Batch 4	Batch 8	Batch 16
R1-Distill-7B (4K ctx)	~5 GB	~7 GB	~9 GB	~13 GB
R1-Distill-14B (4K ctx)	~10 GB	~14 GB	~18 GB	~26 GB
R1-Distill-32B (4K ctx)	~22 GB	~30 GB	~38 GB	~54 GB

For production APIs serving multiple users, the KV cache quickly becomes the dominant VRAM consumer. Plan your GPU choice around peak concurrent requests, not just single-request VRAM.

Practical Deployment Recommendations

Personal/dev use: R1-Distill-7B on an RTX 4060 (4-bit) or RTX 4060 Ti (FP16). Fast, cheap, good quality for coding and general tasks.
Small team (2-5 users): R1-Distill-14B on an RTX 3090 (4-bit). Strong reasoning at 25-30 tok/s with room for concurrent requests.
Production API: R1-Distill-32B on 2x RTX 3090 or higher. Best quality from the distillation family.
Maximum capability: Full DeepSeek R1/V3 on multi-GPU clusters (8x RTX 6000 Pro 96 GB minimum).

For cost analysis of self-hosting versus the DeepSeek API, see our cost per 1M tokens comparison. Also check the deploy DeepSeek server tutorial for step-by-step instructions.

Quick Setup Commands

Ollama

# R1 distillation (auto-selects quantization)
curl -fsSL https://ollama.com/install.sh | sh
ollama run deepseek-r1:7b
ollama run deepseek-r1:14b
ollama run deepseek-r1:32b

vLLM

# Serve R1-Distill-14B with AWQ
pip install vllm
vllm serve deepseek-ai/DeepSeek-R1-Distill-Qwen-14B \
  --quantization awq --max-model-len 8192

For full deployment guides, see our Ollama hosting and vLLM hosting pages. Compare with other models in our best GPU for LLM inference guide and use our benchmark tool for performance comparisons.

Deploy This Model Now

Dedicated GPU servers with the VRAM you need. UK datacenter, full root access.

Browse GPU Servers

Need a Dedicated GPU Server?

Deploy from RTX 3050 to RTX 5090. Full root access, NVMe storage, 1Gbps — UK datacenter.

Browse GPU Servers

Model Guides

admin

We benchmark, deploy, and optimise GPU infrastructure for AI workloads. All data in our guides comes from real-world testing on our UK-based dedicated GPU servers.

Ready to deploy your AI workload?

Dedicated GPU servers from our UK datacenter. NVMe storage, 1Gbps networking, full root access.

Browse GPU Servers Contact Sales

DeepSeek VRAM Requirements (All Model Sizes)

DeepSeek VRAM Requirements Overview

Complete VRAM Table (All Models)

DeepSeek R1 Distillations

DeepSeek Full Models

Which GPU Do You Need?

Context Length Impact on VRAM

Batch Size Impact on VRAM

Practical Deployment Recommendations

Quick Setup Commands

Ollama

vLLM

Deploy This Model Now

Need a Dedicated GPU Server?

admin

GPU Hosting

Blog Categories

AI Model Hosting

Benchmarks & Tools

Deploy a GPU Server

Ready to deploy your AI workload?

Have a question? Need help?

DeepSeek VRAM Requirements (All Model Sizes)

DeepSeek VRAM Requirements Overview

Complete VRAM Table (All Models)

DeepSeek R1 Distillations

DeepSeek Full Models

Which GPU Do You Need?

Context Length Impact on VRAM

Batch Size Impact on VRAM

Practical Deployment Recommendations

Quick Setup Commands

Ollama

vLLM

Deploy This Model Now

Need a Dedicated GPU Server?

admin

Related Articles

Whisper for Audio Data Extraction: GPU Requirements & Setup

LLaMA 3 8B for Code Generation & Review: GPU Requirements & Setup

LLaMA 3 8B vs 70B: When Do You Need the Bigger Model?

Phi-3 VRAM Requirements (Mini, Small, Medium, 3.5)

GPU Hosting

Blog Categories

AI Model Hosting

Benchmarks & Tools

Deploy a GPU Server

Ready to deploy your AI workload?

Have a question? Need help? Contact us

Have a question? Need help?