RTX 3050 - Order Now
Home / Blog / Model Guides / Pixtral 12B on a Dedicated GPU
Model Guides

Pixtral 12B on a Dedicated GPU

Mistral's Pixtral 12B is a capable vision-language model with native variable image resolution - a practical generalist VLM for dedicated hosting.

Pixtral 12B from Mistral is a vision-language model built on top of Mistral Nemo 12B, supporting variable image resolutions rather than fixed tiling. On our dedicated GPU hosting it fits a 24 GB card at FP16 with room for reasonable concurrency.

Contents

VRAM

PrecisionWeightsFits On
FP16~24 GB24 GB card tight, 32 GB comfortable
FP8~12 GB16 GB+ card
AWQ INT4~7 GBAny 12 GB+ card

Deployment

python -m vllm.entrypoints.openai.api_server \
  --model mistral-community/pixtral-12b \
  --max-model-len 32768 \
  --limit-mm-per-prompt 'image=4' \
  --tokenizer-mode mistral \
  --config-format mistral

--limit-mm-per-prompt 'image=4' allows up to four images per request – Pixtral handles multi-image reasoning well.

Variable Resolution

Unlike many VLMs that downsample images to a fixed tile grid, Pixtral handles the input at its native resolution (up to the model’s context budget). Small images stay small; large images get more visual tokens. Practical impact:

  • Better detail recognition on high-resolution photos
  • Lower cost on simple images (no forced 336×336 tiling)
  • Context usage varies with image size – budget KV cache accordingly

A 1024×1024 image consumes roughly 4x the visual tokens a 512×512 image does. For high-volume deployments, normalise input resolution before submission.

Vision-Language Model Hosting

Pixtral 12B preconfigured on UK dedicated GPUs with appropriate VRAM sizing.

Browse GPU Servers

Compare against Llama 3.2 Vision 11B and Qwen VL 2.

Need a Dedicated GPU Server?

Deploy from RTX 3050 to RTX 5090. Full root access, NVMe storage, 1Gbps — UK datacenter.

Browse GPU Servers

admin

We benchmark, deploy, and optimise GPU infrastructure for AI workloads. All data in our guides comes from real-world testing on our UK-based dedicated GPU servers.

Ready to deploy your AI workload?

Dedicated GPU servers from our UK datacenter. NVMe storage, 1Gbps networking, full root access.

Browse GPU Servers Contact Sales

Have a question? Need help?