Home / Blog / Model Guides / Mistral Small 3 Self-Hosted Deployment

Model Guides

Mistral Small 3 Self-Hosted Deployment

Mistral's 24B Small 3 refresh lands between the 7B and 70B class with genuinely strong benchmarks and fits a single 24-32GB card.

Model Guides April 19, 2026 1 min read gigagpu

Mistral Small 3 (24B parameters) hits a productive size bracket: stronger than 7B models on reasoning, cheaper to host than 70B class models, fits a single 24-32 GB GPU. On our dedicated GPU hosting it is a frequent choice for teams who need quality without multi-GPU complexity.

VRAM
GPU options
Deployment
Use cases

VRAM

Precision	Weights	Fits On
FP16	~48 GB	96 GB card, multi-GPU
FP8	~24 GB	32 GB single card
AWQ INT4	~14 GB	16 GB+ card
GPTQ INT4	~14 GB	16 GB+ card

GPU Options

RTX 4060 Ti 16GB: AWQ INT4 tight, short context
RTX 3090 24GB: AWQ INT4 comfortable
RTX 5090 32GB: FP8 native, best single-GPU option
Intel Arc Pro B70 32GB: AWQ or FP8 via OpenVINO

Deployment

FP8 on a 5090:

python -m vllm.entrypoints.openai.api_server \
  --model mistralai/Mistral-Small-3-24B-Instruct-FP8 \
  --quantization fp8 \
  --max-model-len 32768 \
  --gpu-memory-utilization 0.92 \
  --enable-prefix-caching

Mistral Small 3 supports 32k context natively. Configure max-model-len accordingly – this is one of its selling points.

Use Cases

Mistral Small 3 fits workloads where:

7B models underperform on reasoning or coding
70B models are overkill for cost
32k context matters (long documents, multi-turn chats)
European data residency matters (Mistral is French)

Throughput on a 5090 FP8: ~75 t/s at batch 1, ~620 t/s at batch 16 aggregate.

Mistral Small 3 on UK Dedicated

FP8 or INT4 preconfigured on the GPU class that matches your budget.

Browse GPU Servers

See Mistral Nemo 12B for the smaller variant and Codestral 22B for Mistral’s coding model.

Need a Dedicated GPU Server?

Deploy from RTX 3050 to RTX 5090. Full root access, NVMe storage, 1Gbps — UK datacenter.

Browse GPU Servers

Model Guides

gigagpu

We benchmark, deploy, and optimise GPU infrastructure for AI workloads. All data in our guides comes from real-world testing on our UK-based dedicated GPU servers.

Ready to deploy your AI workload?

Dedicated GPU servers from our UK datacenter. NVMe storage, 1Gbps networking, full root access.

Browse GPU Servers Contact Sales

Mistral Small 3 Self-Hosted Deployment

Contents

VRAM

GPU Options

Deployment

Use Cases

Mistral Small 3 on UK Dedicated

Need a Dedicated GPU Server?

gigagpu

GPU Hosting

Blog Categories

AI Model Hosting

Benchmarks & Tools

Deploy a GPU Server

Ready to deploy your AI workload?

Have a question? Need help?

Mistral Small 3 Self-Hosted Deployment

Contents

VRAM

GPU Options

Deployment

Use Cases

Mistral Small 3 on UK Dedicated

Need a Dedicated GPU Server?

gigagpu

Related Articles

8B LLM VRAM Requirements: Llama 3, Qwen, Phi-3 and the Rest

How to Deploy Gemma on a Dedicated GPU Server

PixArt Sigma Deployment Guide

LLaVA VRAM Requirements (All Model Sizes)

GPU Hosting

Blog Categories

AI Model Hosting

Benchmarks & Tools

Deploy a GPU Server

Ready to deploy your AI workload?

Have a question? Need help? Contact us

Have a question? Need help?