Home / Blog / Model Guides / Codestral 22B Self-Hosted on a Dedicated GPU

Model Guides

Codestral 22B Self-Hosted on a Dedicated GPU

Mistral's Codestral 22B is a dedicated coding model that beats many 30B+ generalists on programming tasks. Hosting it is straightforward.

Model Guides April 19, 2026 1 min read admin

Codestral 22B is Mistral’s coding-specialised model – competitive with much larger generalist models on programming tasks and small enough to fit a single mid-tier GPU at INT4. On our dedicated GPU hosting it is a frequent pick for IDE autocomplete backends and code-review assistants.

VRAM
GPU options
Deployment
Fill-in-middle

VRAM

Precision	Weights	Fits On
FP16	~44 GB	96 GB card or multi-GPU
FP8	~22 GB	24 GB+ card
AWQ INT4	~13 GB	16 GB+ card

GPU Options

RTX 4060 Ti 16GB: AWQ INT4 viable
RTX 3090 24GB: AWQ INT4 comfortable
RTX 5090 32GB: FP8 native
RTX 6000 Pro 96GB: FP16, high concurrency

Deployment

python -m vllm.entrypoints.openai.api_server \
  --model mistralai/Codestral-22B-v0.1 \
  --quantization awq \
  --max-model-len 32768 \
  --gpu-memory-utilization 0.92 \
  --enable-prefix-caching

Fill-in-Middle

Codestral supports fill-in-middle for IDE autocomplete. Format:

[PREFIX]code before cursor[MIDDLE][SUFFIX]code after cursor[INFIX]

Actual markers vary by client library – consult the model card. For a Continue.dev or similar IDE plugin, most configurations work out of the box with Codestral’s template.

Self-Hosted Coding Assistant

Codestral 22B on UK dedicated GPUs – 5080, 5090, 3090, or 6000 Pro.

Browse GPU Servers

Compare against Qwen Coder 32B (higher quality, more VRAM) and StarCoder 2 15B (smaller, lower quality).

Need a Dedicated GPU Server?

Deploy from RTX 3050 to RTX 5090. Full root access, NVMe storage, 1Gbps — UK datacenter.

Browse GPU Servers

Model Guides

admin

We benchmark, deploy, and optimise GPU infrastructure for AI workloads. All data in our guides comes from real-world testing on our UK-based dedicated GPU servers.

Ready to deploy your AI workload?

Dedicated GPU servers from our UK datacenter. NVMe storage, 1Gbps networking, full root access.

Browse GPU Servers Contact Sales

Codestral 22B Self-Hosted on a Dedicated GPU

Contents

VRAM

GPU Options

Deployment

Fill-in-Middle

Self-Hosted Coding Assistant

Need a Dedicated GPU Server?

admin

GPU Hosting

Blog Categories

AI Model Hosting

Benchmarks & Tools

Deploy a GPU Server

Ready to deploy your AI workload?

Have a question? Need help?

Codestral 22B Self-Hosted on a Dedicated GPU

Contents

VRAM

GPU Options

Deployment

Fill-in-Middle

Self-Hosted Coding Assistant

Need a Dedicated GPU Server?

admin

Related Articles

Fish Speech v1.5 Self-Hosted

LLaMA 3 VRAM Requirements (8B, 70B, 405B)

RTX 5060 Ti 16GB for Mistral 7B

Mistral Instruct vs Base: Which to Deploy

GPU Hosting

Blog Categories

AI Model Hosting

Benchmarks & Tools

Deploy a GPU Server

Ready to deploy your AI workload?

Have a question? Need help? Contact us

Have a question? Need help?