Codestral 22B is Mistral’s coding-specialised model – competitive with much larger generalist models on programming tasks and small enough to fit a single mid-tier GPU at INT4. On our dedicated GPU hosting it is a frequent pick for IDE autocomplete backends and code-review assistants.
Contents
VRAM
| Precision | Weights | Fits On |
|---|---|---|
| FP16 | ~44 GB | 96 GB card or multi-GPU |
| FP8 | ~22 GB | 24 GB+ card |
| AWQ INT4 | ~13 GB | 16 GB+ card |
GPU Options
- RTX 4060 Ti 16GB: AWQ INT4 viable
- RTX 3090 24GB: AWQ INT4 comfortable
- RTX 5090 32GB: FP8 native
- RTX 6000 Pro 96GB: FP16, high concurrency
Deployment
python -m vllm.entrypoints.openai.api_server \
--model mistralai/Codestral-22B-v0.1 \
--quantization awq \
--max-model-len 32768 \
--gpu-memory-utilization 0.92 \
--enable-prefix-caching
Fill-in-Middle
Codestral supports fill-in-middle for IDE autocomplete. Format:
[PREFIX]code before cursor[MIDDLE][SUFFIX]code after cursor[INFIX]
Actual markers vary by client library – consult the model card. For a Continue.dev or similar IDE plugin, most configurations work out of the box with Codestral’s template.
Self-Hosted Coding Assistant
Codestral 22B on UK dedicated GPUs – 5080, 5090, 3090, or 6000 Pro.
Browse GPU ServersCompare against Qwen Coder 32B (higher quality, more VRAM) and StarCoder 2 15B (smaller, lower quality).