DeepSeek R1 (early 2025 release) is an open-weight reasoning model — extended chain-of-thought, transparent reasoning trace, frontier-class on math/coding benchmarks.
DeepSeek R1 distilled variants (1.5B, 7B, 14B, 32B, 70B) fit progressively bigger GPUs. R1-Distill-Qwen-32B fits a single 6000 Pro at FP8. The full R1 (671B MoE) requires multi-node H100 cluster — use the official API.
About R1
- Reasoning trace exposed in
<think>blocks - Distilled variants based on Llama / Qwen architectures
- Strong on math, code, multi-step reasoning
- Generates much more tokens (chain-of-thought) → cost per query is higher
Hardware
| Variant | VRAM (FP8) | Recommended GPU |
|---|---|---|
| R1-Distill-Qwen-1.5B | ~2 GB | RTX 5060 |
| R1-Distill-Qwen-7B | ~7 GB | RTX 5060 Ti |
| R1-Distill-Llama-8B | ~8 GB | RTX 5060 Ti |
| R1-Distill-Qwen-14B | ~14 GB | RTX 5080 / 5090 |
| R1-Distill-Qwen-32B | ~32 GB | RTX 5090 / 6000 Pro |
| R1-Distill-Llama-70B | ~70 GB | RTX 6000 Pro |
| DeepSeek R1 (full 671B) | ~330 GB | Multi-node H100 only |
Verdict
For self-hosted reasoning, R1-Distill-Qwen-32B on a 5090 or 6000 Pro is the production target. Full R1 is API-only territory.
Bottom line
Reasoning models cost ~3-5× more tokens per query but solve harder problems. See best GPU for DeepSeek.