OLMo 2 from Allen AI is “open” in a stricter sense than most open-weights models: weights, training data, training code, and intermediate checkpoints are all public. For research and regulated industries that need full provenance on a model, this matters. On our dedicated GPU hosting the deployment is straightforward.
Contents
Variants
OLMo 2 ships in 7B and 13B sizes. Both are instruction-tuned via standard supervised fine-tuning and DPO. Quality benchmarks sit close to Llama 3 equivalents – slightly below on some tasks, parity on others.
VRAM
| Variant | FP16 | Fits |
|---|---|---|
| 7B | ~14 GB | 16 GB+ card |
| 13B | ~26 GB | 32 GB card, 24 GB tight |
Deployment
python -m vllm.entrypoints.openai.api_server \
--model allenai/OLMo-2-1124-13B-Instruct \
--dtype bfloat16 \
--max-model-len 4096 \
--trust-remote-code \
--gpu-memory-utilization 0.92
OLMo 2’s context is 4k in the base variants. Check for longer-context branches before relying on long-context use cases.
Why Pick OLMo
Choose OLMo when:
- You need to audit or reproduce training data (research, regulated industries)
- Full transparency on model provenance is a procurement requirement
- You want to fine-tune on public checkpoints from different training stages
For pure quality on English chat Llama 3.3 70B or Mistral Small 3 will serve better. OLMo’s value is the transparency, not the benchmark score.
Fully Open LLM Hosting
OLMo on UK dedicated GPUs – clean provenance for research and regulated deployments.
Browse GPU ServersSee Granite Code for another licence-friendly option.