RTX 3050 - Order Now
Home / Blog / Model Guides / Codestral 22B Self-Hosted on a Dedicated GPU
Model Guides

Codestral 22B Self-Hosted on a Dedicated GPU

Mistral's Codestral 22B is a dedicated coding model that beats many 30B+ generalists on programming tasks. Hosting it is straightforward.

Codestral 22B is Mistral’s coding-specialised model – competitive with much larger generalist models on programming tasks and small enough to fit a single mid-tier GPU at INT4. On our dedicated GPU hosting it is a frequent pick for IDE autocomplete backends and code-review assistants.

Contents

VRAM

PrecisionWeightsFits On
FP16~44 GB96 GB card or multi-GPU
FP8~22 GB24 GB+ card
AWQ INT4~13 GB16 GB+ card

GPU Options

Deployment

python -m vllm.entrypoints.openai.api_server \
  --model mistralai/Codestral-22B-v0.1 \
  --quantization awq \
  --max-model-len 32768 \
  --gpu-memory-utilization 0.92 \
  --enable-prefix-caching

Fill-in-Middle

Codestral supports fill-in-middle for IDE autocomplete. Format:

[PREFIX]code before cursor[MIDDLE][SUFFIX]code after cursor[INFIX]

Actual markers vary by client library – consult the model card. For a Continue.dev or similar IDE plugin, most configurations work out of the box with Codestral’s template.

Self-Hosted Coding Assistant

Codestral 22B on UK dedicated GPUs – 5080, 5090, 3090, or 6000 Pro.

Browse GPU Servers

Compare against Qwen Coder 32B (higher quality, more VRAM) and StarCoder 2 15B (smaller, lower quality).

Need a Dedicated GPU Server?

Deploy from RTX 3050 to RTX 5090. Full root access, NVMe storage, 1Gbps — UK datacenter.

Browse GPU Servers

admin

We benchmark, deploy, and optimise GPU infrastructure for AI workloads. All data in our guides comes from real-world testing on our UK-based dedicated GPU servers.

Ready to deploy your AI workload?

Dedicated GPU servers from our UK datacenter. NVMe storage, 1Gbps networking, full root access.

Browse GPU Servers Contact Sales

Have a question? Need help?