RTX 3050 - Order Now
Home / Blog / Model Guides / Mistral Small 3 Self-Hosted Deployment
Model Guides

Mistral Small 3 Self-Hosted Deployment

Mistral's 24B Small 3 refresh lands between the 7B and 70B class with genuinely strong benchmarks and fits a single 24-32GB card.

Mistral Small 3 (24B parameters) hits a productive size bracket: stronger than 7B models on reasoning, cheaper to host than 70B class models, fits a single 24-32 GB GPU. On our dedicated GPU hosting it is a frequent choice for teams who need quality without multi-GPU complexity.

Contents

VRAM

PrecisionWeightsFits On
FP16~48 GB96 GB card, multi-GPU
FP8~24 GB32 GB single card
AWQ INT4~14 GB16 GB+ card
GPTQ INT4~14 GB16 GB+ card

GPU Options

Deployment

FP8 on a 5090:

python -m vllm.entrypoints.openai.api_server \
  --model mistralai/Mistral-Small-3-24B-Instruct-FP8 \
  --quantization fp8 \
  --max-model-len 32768 \
  --gpu-memory-utilization 0.92 \
  --enable-prefix-caching

Mistral Small 3 supports 32k context natively. Configure max-model-len accordingly – this is one of its selling points.

Use Cases

Mistral Small 3 fits workloads where:

  • 7B models underperform on reasoning or coding
  • 70B models are overkill for cost
  • 32k context matters (long documents, multi-turn chats)
  • European data residency matters (Mistral is French)

Throughput on a 5090 FP8: ~75 t/s at batch 1, ~620 t/s at batch 16 aggregate.

Mistral Small 3 on UK Dedicated

FP8 or INT4 preconfigured on the GPU class that matches your budget.

Browse GPU Servers

See Mistral Nemo 12B for the smaller variant and Codestral 22B for Mistral’s coding model.

Need a Dedicated GPU Server?

Deploy from RTX 3050 to RTX 5090. Full root access, NVMe storage, 1Gbps — UK datacenter.

Browse GPU Servers

admin

We benchmark, deploy, and optimise GPU infrastructure for AI workloads. All data in our guides comes from real-world testing on our UK-based dedicated GPU servers.

Ready to deploy your AI workload?

Dedicated GPU servers from our UK datacenter. NVMe storage, 1Gbps networking, full root access.

Browse GPU Servers Contact Sales

Have a question? Need help?