Together.ai is one of the strongest serverless hosts for open-weight models. Cheap at low volume, materially more expensive at scale. A dedicated RTX 5060 Ti 16GB on our UK dedicated hosting is the natural graduation when your token volume gets serious.
Contents
- Together serverless pricing
- 5060 Ti capacity
- Break-even table
- Serverless vs dedicated economics
- When each wins
Together serverless pricing
| Model | Input $/M | Output $/M | Blended (2:1) | 1B tokens/month |
|---|---|---|---|---|
| Llama 3.1 8B Turbo | $0.18 | $0.18 | $0.18 | $180 |
| Mistral 7B v0.3 | $0.20 | $0.20 | $0.20 | $200 |
| Qwen 2.5 14B | $0.30 | $0.30 | $0.30 | $300 |
| Llama 3.1 70B Turbo | $0.88 | $0.88 | $0.88 | $880 |
| Custom fine-tune (Together) | + base + LoRA fee | + ~1.5-2x | $0.40-1.00 | $400-1,000 |
| 5060 Ti dedicated | flat | flat | $380 total | $380 |
5060 Ti capacity
One 5060 Ti 16GB sustains roughly 720 tokens/sec aggregate throughput on Llama 3.1 8B FP8 at batch 32. Over a month at 50% utilisation that is:
- 720 t/s × 3600 × 720 × 0.5 = ~932M output tokens/month.
- Assuming 2:1 input to output, ~1.87B input tokens supported.
- Combined blended capacity: ~2.8B tokens/month at 50% utilisation.
- At full utilisation: ~5.6B tokens/month – the theoretical ceiling.
Break-even table
| Model on Together | Break-even tokens/month | 5060 Ti utilisation at break-even |
|---|---|---|
| Llama 3.1 8B @ $0.18/M | 2.11B | ~38% |
| Mistral 7B @ $0.20/M | 1.9B | ~34% |
| Qwen 14B @ $0.30/M | 1.27B | ~23% |
| Custom fine-tune @ $0.60/M | 633M | ~11% |
| Llama 3.1 70B (needs 5090/6000) | 432M | n/a on 5060 Ti |
Serverless vs dedicated economics
- Serverless advantages: zero ops, pay-per-use, near-instant scale, no idle cost at 3am.
- Serverless drawbacks: per-request cold-start tax, shared multi-tenant models, per-request LoRA surcharges, data egress for US-hosted infra, no fine-grained kernel / server tuning.
- Dedicated advantages: flat cost, no cold start, full stack control, free co-hosted embeddings and reranker, UK residency.
- Dedicated drawbacks: idle cost if you do not use it, ops overhead, card-sized ceiling.
When each wins
- Pick Together at under ~1B tokens/month, if you cannot staff ops, or if you want a model catalogue with minimal commitment.
- Pick dedicated 5060 Ti above 1-2B tokens/month, if you run custom fine-tunes, need UK data residency, or want embeddings + reranker + LLM on one host.
- Hybrid: many teams keep Together for a fallback model and route bulk traffic to dedicated.
See also our Fireworks comparison and break-even calculator.
Self-host above the crossover
Above 1B tokens/month on an 8-14B model, dedicated is cheaper. UK dedicated hosting.
Order the RTX 5060 Ti 16GBSee also: vs Fireworks, break-even calculator, FP8 Llama deployment, max throughput.