Upstage Solar 10.7B via depth upscaling achieves performance competitive with 13-15B dense models at smaller size. On the RTX 5060 Ti 16GB at our hosting it hosts at FP8 or AWQ with good concurrency.
Contents
Fit
- FP16: ~22 GB – does not fit
- FP8: ~11 GB – fits comfortably
- AWQ INT4: ~6.5 GB – very comfortable
Deployment
python -m vllm.entrypoints.openai.api_server \
--model upstage/SOLAR-10.7B-Instruct-v1.0-AWQ \
--quantization awq \
--max-model-len 4096 \
--gpu-memory-utilization 0.92
Solar was trained on 4k native context. For long-context workloads pick Mistral Nemo 12B or Qwen 2.5 14B.
Performance
- AWQ batch 1: ~70 t/s
- AWQ batch 8 aggregate: ~350 t/s
- TTFT 1k prompt: ~180 ms
Strengths and Limits
Strong:
- Korean-English bilingual
- Cost-efficient English tasks
- Small footprint for 10B-class quality
Weaker:
- Short 4k context
- Aging training cutoff vs 2026 models
- Narrower community support than Llama/Mistral
For 2026 English-first workloads, Qwen 14B or Llama 3 8B are usually better picks at this tier.
See full Solar guide.
Compact Korean-English LLM
Solar 10.7B on Blackwell 16GB. UK dedicated hosting.
Order the RTX 5060 Ti 16GB