Table of Contents
For applications serving non-English-dominant users, the right open-weight LLM is rarely Llama 3. This page covers the three strongest multilingual options for self-hosting.
Aya 23 / Aya Expanse for genuinely 100+ languages including low-resource. Qwen 2.5 for European + Asian languages with strong English. Llama 3.1 for English-heavy + good 8 European languages. Hardware sizing matches the equivalent English models.
The contenders
- Cohere Aya 23 / Aya Expanse 8B/32B — explicitly trained on 23 (Aya 23) or 100+ (Aya Expanse) languages. Strong on low-resource languages.
- Qwen 2.5 7B/14B/32B — strong on Chinese, Japanese, Korean, plus solid European. ~30 languages.
- Llama 3.1 8B/70B — strong English, good German/French/Spanish/Italian/Portuguese/Hindi/Thai. Weak on lower-resource.
- Mistral Nemo 12B / Mistral Small 22B — solid European multilingual.
- Gemma 2 9B/27B — strong English, decent multilingual, weaker on low-resource.
By language profile
| Language profile | Recommended model |
|---|---|
| 100% English | Llama 3.1 8B |
| Western European (DE/FR/ES/IT/PT) | Qwen 2.5 14B or Llama 3.1 8B |
| CJK (Chinese / Japanese / Korean) | Qwen 2.5 14B |
| Indic / South Asian | Aya Expanse |
| African / low-resource | Aya Expanse 32B |
| Mixed / unknown | Aya Expanse 8B |
Hardware sizing
Multilingual variants have similar VRAM profiles to their English siblings:
- Aya Expanse 8B → fits RTX 5060 Ti FP8
- Aya Expanse 32B → fits RTX 5090 INT4 or RTX 6000 Pro FP8
- Qwen 2.5 14B → fits RTX 5090 FP16
- Qwen 2.5 32B → fits RTX 6000 Pro FP8
Verdict
For genuinely multilingual applications, Aya Expanse 8B on a 5060 Ti or Qwen 2.5 14B on a 5090 is the default. Llama 3.1 is fine when English dominates with light multilingual.
Bottom line
Match the model to your language traffic distribution. For Aya specifically, see Cohere's documentation; for Qwen sizing see best GPU for Qwen.