DeepSeek Coder V2 is a family of mixture-of-experts coding models. Two variants matter in practice: the 16B Lite (runs on one consumer card) and the 236B flagship (needs serious multi-GPU capacity). On dedicated GPU hosting only one of these is realistic for most teams.
Contents
Lite 16B
MoE with 2.4B active parameters. Total weights are ~32 GB at FP16 because MoE counts everything, but active compute is small. VRAM needed to host the model:
| Precision | Weights VRAM |
|---|---|
| FP16 | ~32 GB |
| FP8 | ~16 GB |
| AWQ INT4 | ~10 GB |
Fits on a 16 GB 4060 Ti at INT4, on a 5080 at FP8, on a 5090 at FP16.
236B Flagship
MoE with 21B active parameters. Total weights:
| Precision | Weights VRAM |
|---|---|
| FP16 | ~472 GB |
| FP8 | ~236 GB |
| AWQ INT4 | ~140 GB |
Even at INT4 this needs multiple 96 GB cards. Realistic deployments are rare on dedicated hosting – this is generally a datacenter GPU workload. If you need flagship coding quality on dedicated hosting, Qwen Coder 32B is a better target.
What to Actually Host
For nearly all users, the 16B Lite is the right DeepSeek Coder V2 variant. It delivers strong coding performance on a single dedicated GPU. Activation memory is low because only 2.4B parameters are “hot” at a time. Throughput on a 5090 typically exceeds a dense 14B model of similar quality.
Self-Hosted Coding Models on UK Dedicated
DeepSeek Coder V2 Lite or Qwen Coder 32B, preconfigured for your team.
Browse GPU ServersSee DeepSeek V3 distilled for the R1-style reasoning models and Qwen Coder 32B for the main alternative.