Home / Blog / Cost & Pricing / Cost Per 1M Tokens for Mistral 7B Self-Hosted: Every GPU, Every Precision

Cost & Pricing

Cost Per 1M Tokens for Mistral 7B Self-Hosted: Every GPU, Every Precision

Exactly how much you pay per million Mistral 7B tokens on each GPU we host, at FP16 and FP8. Compared to OpenAI, Together AI and the rest of the hosted-API market.

Cost & Pricing May 4, 2026 3 min read gigagpu

Table of Contents

The cleanest way to compare GPU servers for LLM inference is cost per million tokens at full utilisation. It rolls hardware price, throughput, precision support, and the inevitable inefficiencies of real workloads into one number you can put on a spreadsheet. This page is the consolidated cost-per-token table for Mistral 7B Instruct, the most-deployed open-weight LLM in our customer base.

TL;DR

RTX 5090 at FP8 is the cost leader at £0.12/1M tokens at 60% utilisation. The 5080 is essentially tied at £0.11. The 3090 at FP16 is £0.16. Compared to hosted APIs (OpenAI gpt-4o-mini at £0.60/1M output, Together Mistral 7B at £0.16/1M), self-hosting is roughly 4–5× cheaper at meaningful volumes.

Methodology

The formula:

cost_per_1M = (monthly_GBP / (aggregate_tok_per_sec * 86400 * 30 * utilisation)) * 1_000_000

Where:

monthly_GBP = the GigaGPU monthly rental price for that card
aggregate_tok_per_sec = vLLM 0.6.3 aggregate throughput at 50 concurrent users (from our Mistral benchmarks)
utilisation = the fraction of the month the GPU is actually serving traffic at full throughput

I use 60% utilisation as the headline number — that is a realistic steady-state for a busy production deployment with traffic spread across the day. At 100% utilisation (batch jobs running 24/7) the numbers below scale down by 0.6×.

Cost per 1M tokens — Mistral 7B FP16

GPU	Monthly	Aggregate tok/s (FP16)	Tokens/month at 60% util	Cost per 1M
RTX 5060 Ti 16 GB	£119	580	900M	£0.19
RTX 5080	£189	820	1.27B	£0.18
RTX 3090	£159	720	1.12B	£0.16
RTX 4090	£289	950	1.48B	£0.19
RTX 5090	£399	1,180	1.83B	£0.20
RTX 6000 Pro 96 GB	£899	1,140	1.77B	£0.62
A100 80 GB	POA	1,310	2.04B	POA

FP16 baseline. Lower is better. The 3090 at £159 is the cost leader because the price/throughput ratio is most favourable.

Cost per 1M tokens — Mistral 7B FP8 (Blackwell native)

FP8 lifts throughput by 50–60% on Blackwell hardware with negligible quality impact. Cost-per-token drops accordingly:

GPU	Monthly	Aggregate tok/s (FP8)	Tokens/month at 60% util	Cost per 1M
RTX 5060 Ti 16 GB	£119	880	1.37B	£0.12
RTX 5080	£189	1,290	2.01B	£0.11
RTX 5090	£399	1,920	2.99B	£0.12
RTX 6000 Pro 96 GB	£899	1,860	2.89B	£0.38

FP8 native. The 5080 wins on absolute cost-per-token; the 5090 wins on absolute throughput in the same league.

Self-hosted vs hosted APIs

Comparable hosted-API options for Mistral 7B (or equivalent-class small models), September 2026 pricing:

Provider	Model	Output cost per 1M	Notes
OpenAI	gpt-4o-mini	£0.60	Closed, USD-billed
Anthropic	Claude 3.5 Haiku	£3.20	Closed
Together AI	Mistral 7B Instruct v0.3	£0.16	Open weights, hosted
Fireworks AI	Mistral 7B Instruct	£0.16	Open weights, hosted
Groq	Mistral 7B (when available)	£0.20	LPU hardware
GigaGPU dedicated 5080 (FP8)	Mistral 7B (yours)	£0.11	Self-hosted, fixed cost

Self-hosting is dramatically cheaper at non-trivial volumes. The break-even point against Together is roughly £230/mo of API spend.

Why utilisation rate dominates the math

The single biggest variable is the utilisation rate. At 100% util the 5080 hits £0.066/1M; at 30% it is £0.18/1M. Real-world rates we see:

Internal tooling / B2B SaaS — 15–25% utilisation. Self-hosting still wins above ~£300/mo of equivalent API spend.
Customer-facing chatbot — 30–50% utilisation. Self-hosting wins decisively above ~£150/mo of API spend.
RAG-heavy backend with embeddings + LLM — 60–80% utilisation. Self-hosting always wins.
Batch / nightly jobs — Spiky 100%-then-idle. Hosted APIs may actually be cheaper if total minutes are low.

The math does not help you decide unless you measure your actual traffic. Run a 7-day token-volume report against your hosted-API bill, divide by 168 hours of GPU-time-equivalent, and you have your real utilisation rate.

Which GPU is the cost leader?

Lowest absolute cost per million tokens (FP8): RTX 5080 at £0.09/1M.
Lowest absolute cost per million (FP16): RTX 3090 at £0.14/1M.
Best for genuinely high concurrency: RTX 5090 — same cost per token as 5080 but 2× the absolute capacity.
When 6000 Pro / A100 wins: never on cost-per-token for 7B-class. Pick those for ECC, larger models, or compliance.

Bottom line

For Mistral 7B specifically, the cost story is clear: RTX 5080 if your traffic fits, RTX 5090 when you need more concurrency, RTX 3090 if you cannot use FP8. Self-hosting beats every hosted-API price point we benchmarked at any meaningful volume.

For Llama 3 cost across the same GPUs, see cost per 1M tokens — Llama 3; for the deployment-side details, our API hosting page.

Need a Dedicated GPU Server?

Deploy from RTX 3050 to RTX 5090. Full root access, NVMe storage, 1Gbps — UK datacenter.

Browse GPU Servers

Cost & Pricing

gigagpu

We benchmark, deploy, and optimise GPU infrastructure for AI workloads. All data in our guides comes from real-world testing on our UK-based dedicated GPU servers.

Ready to deploy your AI workload?

Dedicated GPU servers from our UK datacenter. NVMe storage, 1Gbps networking, full root access.

Browse GPU Servers Contact Sales

Cost Per 1M Tokens for Mistral 7B Self-Hosted: Every GPU, Every Precision

Methodology

Cost per 1M tokens — Mistral 7B FP16

Cost per 1M tokens — Mistral 7B FP8 (Blackwell native)

Self-hosted vs hosted APIs

Why utilisation rate dominates the math

Which GPU is the cost leader?

Bottom line

Need a Dedicated GPU Server?

gigagpu

GPU Hosting

Blog Categories

AI Model Hosting

Benchmarks & Tools

Deploy a GPU Server

Ready to deploy your AI workload?

Have a question? Need help?

Cost Per 1M Tokens for Mistral 7B Self-Hosted: Every GPU, Every Precision

Methodology

Cost per 1M tokens — Mistral 7B FP16

Cost per 1M tokens — Mistral 7B FP8 (Blackwell native)

Self-hosted vs hosted APIs

Why utilisation rate dominates the math

Which GPU is the cost leader?

Bottom line

Need a Dedicated GPU Server?

gigagpu

Related Articles

Transcription Service: Cost at 1000 Hours/Month

Gemma 9B (INT4) on RTX 4060: Monthly Cost & Token Output

Self-Hosted Mixtral 8x7B vs GPT-4o: Cost Comparison

Eight AI Cost Optimization Techniques for Self-Hosted Inference

GPU Hosting

Blog Categories

AI Model Hosting

Benchmarks & Tools

Deploy a GPU Server

Ready to deploy your AI workload?

Have a question? Need help? Contact us

Have a question? Need help?