Ada Lovelace · 24 GB · 13B Sweet Spot

NVIDIA RTX 4090 Hosting — The 24 GB Workhorse

The 24 GB sweet-spot for 13B-class models. Ada Lovelace at £289/mo, with hardware FP8, 1 TB/s bandwidth, and enough VRAM to run Llama 3.1 8B at full FP16 with 32K context — or a 13B at FP8 with KV cache to spare. The dependable production workhorse.

24 GB GDDR6X Hardware FP8 (660 TOPS) 1,008 GB/s bandwidth From £289/mo

Compare GPU Servers Talk to Sales

24 GB

GDDR6X VRAM

16,384

CUDA cores

1,008 GB/s

Memory bandwidth

£289

/mo from

RTX 4090 Server Specs

The hardware you actually rent.

GPU model	NVIDIA GeForce RTX 4090 (Ada Lovelace, AD102)
Architecture	Ada Lovelace — 4th gen Tensor Cores
VRAM	24 GB GDDR6X @ 1,008 GB/s
CUDA cores	16,384
FP16 compute	~ 82.6 TFLOPS
FP8	~ 660 TOPS (no hardware FP4 on Ada)
TDP	450 W
Host CPU	AMD Ryzen 7 / 9
Host RAM	Up to 64 GB DDR5
Storage	1 TB NVMe + 4 TB SATA SSD
Network	1 Gbps unmetered
Location	London, United Kingdom

What Fits on a Single RTX 4090

24 GB is the practical sweet spot for production. Comfortable for 7B–8B at full FP16 with long context, and capable of 13B–14B at FP8 or AWQ-INT4 with full 8K–32K context windows.

Model	Params	FP16	FP8 / INT4	Notes
Mistral 7B Instruct	7B	14 GB FP16	5 GB INT4	FP16 fits comfortably with 32K context
Llama 3.1 8B	8B	16 GB FP16	8 GB FP8	FP16 fits 32K, FP8 fits 128K context
Llama 3.1 13B	13B	26 GB FP16	13 GB FP8	FP16 won’t fit — FP8 is comfortable
Qwen 2.5 14B	14B	28 GB FP16	9 GB INT4	FP16 won’t fit — AWQ-INT4 only, FP8 fits with 8K context
Codestral 22B	22B	44 GB FP16	12 GB INT4	AWQ-INT4 only, tight KV budget
DeepSeek-Coder V2 Lite	16B MoE (~3B active)	~17 GB FP16	~9 GB FP8	Fits FP16 — the cheapest comfortable home for it
FLUX.1 dev	12B	24 GB FP16	12 GB FP8	FP16 borderline — FP8 fits comfortably
SDXL 1.0	3.5B	8 GB FP16	4 GB FP8	FP16 fits with headroom for ControlNets
Whisper Large-v3	1.5B	6 GB	n/a	Plenty of room left for an LLM alongside

When the RTX 4090 Is the Right Card

Real customer workloads we run on this hardware every day.

13B / 14B class production

The 4090 is the cheapest GPU we host that comfortably runs 13B at FP8 with full context. If your eval shows a 13B beating an 8B, this is the card to deploy on.

Llama 3.1 13BQwen 2.5 14BFP8 inference

Code assistants

Code Llama 13B and DeepSeek-Coder V2 Lite both fit at FP8 with KV cache to spare. The 4090 is the cheapest comfortable host for either model in production.

Code Llama 13BDeepSeek-Coder V2 LiteCodestral AWQ

FLUX.1 dev image generation

FP8 FLUX.1 dev produces a 1024×1024 in under 10 seconds with full LoRA support. The 4090 is the price/quality pick for FLUX in production.

FLUX.1 dev FP8SDXL + ControlNetComfyUI

Full-FP16 7B/8B with long context

Llama 3.1 8B at full FP16 with a 32K context window — no quantisation tradeoff, no quality concerns. The 24 GB envelope is exactly right for “the best 8B you can serve.”

Llama 3.1 8B FP1632K contextRAG backends

Voice agent backend

Whisper Large-v3 + a 13B LLM at FP8 + a TTS model all fit on one card with room left for KV cache. Roughly 6–8 concurrent voice sessions per server.

Whisper13B LLMTTS

Mid-tier production deployment

Headroom matters. The 4090’s 24 GB lets you load a primary model plus an embedding model plus a reranker without juggling. The dependable workhorse for serious production.

Multi-model servingEmbeddings + rerankerProduction stacks

RTX 4090 vs Other Cards

How this card stacks up against the rest of the GigaGPU catalogue for the workloads we benchmark.

GPU	VRAM	Throughput / Notes	13B FP8 fits?	Price
RTX 4090	24 GB GDDR6X	~110 tok/s (Llama 3.1 8B FP8 single-stream)	Yes, with full 32K context	from £289
RTX 5080	16 GB GDDR7	Faster per-tok on FP8 (Blackwell), 50% less VRAM	FP8 only with short context	from £189
RTX 5090	32 GB GDDR7	Hardware FP4 + 32 GB envelope, 38% more expensive	Yes — and 70B INT4 fits too	from £399
RTX 3090	24 GB GDDR6X	~30% slower, no FP8 hardware path, 45% cheaper	Yes (FP16/INT4 only — no native FP8)	from £159
RTX 6000 PRO	96 GB GDDR7	4× the VRAM, ECC, only single-card 70B FP8 option	Yes — and 70B FP8 fits	from £899

Deep Dive

"The 24 GB sweet-spot" — what we mean

VRAM tiers in production GPU hosting cluster around three points: 16 GB (one 8B model, tight), 24 GB (one 13B model comfortably, or an 8B with a long context and a friend on the side), and 32 GB+ (room for genuine multi-model stacks). The 4090’s 24 GB lands exactly on the middle tier — and at £289/mo it’s a dependable place to stand on it.

If your eval matrix has 13B models beating 8B models on the metrics that matter, the 4090 is almost certainly the right deployment target. The 5080 forces FP8 with short context. The 5090 costs 38% more. The 6000 PRO costs 3× more for VRAM you won’t use unless you’re serving 70B.

Why we still recommend the 5080 over the 4090 for some teams

The honest answer: if your model fits in 16 GB and you care about latency over headroom, the 5080 wins. Blackwell tensor cores are faster per-token than Ada, hardware FP4 saves another 2× on memory pressure, and you save £100/mo. For single-model serving of a 7B at FP8, the 5080 is the sharper tool.

The 4090 wins the moment you need: (a) FP16 quality on an 8B, (b) any 13B/14B model, (c) FLUX.1 dev, or (d) multiple models loaded at once. That covers most production stacks we see — which is why the 4090 stays our most-deployed mid-tier card.

FP8 path matters — but FP4 is missing

The 4090’s Ada tensor cores have hardware FP8 (~660 TOPS) but no hardware FP4. That last detail matters less than the marketing makes it sound. FP4 gets you another 2× memory headroom and ~1.5× speed on the 5090 — useful for squeezing 70B onto 32 GB. On a 24 GB 4090 running 13B-and-below, FP8 is already the right precision tier:

Llama 3.1 13B at FP16 → 26 GB. Won’t fit.
Llama 3.1 13B at FP8 → 13 GB. Comfortable, with headroom for long context.
Llama 3.1 13B at INT4/AWQ → 7 GB. Plenty of room for a second model.

Most production deployments on a 4090 land at FP8 — best balance of quality, memory, and speed. If you genuinely need FP4 (you’re trying to fit a 70B on a single card), the 5090 or 6000 PRO is the right call.

Frequently Asked Questions

The questions buyers actually ask before committing to a GPU server.

Can I run Llama 3.1 13B on a single 4090?

Yes at FP8 (13 GB weights + KV cache, comfortable with 8K–32K context). Not at FP16 — that needs 26 GB and the 4090 has 24 GB. Most production teams run 13B at FP8 on the 4090 and don’t notice the quality difference vs FP16.

4090 vs 5080 — which should I pick?

5080 if your model fits in 16 GB and you want the lowest latency. 4090 if you need 24 GB headroom — for any 13B model, FLUX.1 dev, or multi-model serving. The 4090 costs £100 more but saves you the "does it fit?" gymnastics.

Is the 4090 enough for fine-tuning?

QLoRA on 7B–13B models works well. Full SFT on 13B+ does not — go to a 5090, two 4090s, or a 6000 PRO for that.

How does it compare to the 3090?

Same 24 GB. The 4090 is ~30% faster per token, has hardware FP8 (3090 has no FP8 path), and pulls 450 W vs 350 W. The 3090 is 45% cheaper at £159/mo. If you’re cost-sensitive and don’t need FP8, the 3090 is still a solid pick.

Can I run two 4090s in one server?

Yes via PCIe. They don’t have NVLink. 2× 4090 = 48 GB combined, which lets you run a 70B at INT4 with tensor parallel, or a 30B at FP8. Talk to sales for dual-GPU pricing.

Will FLUX.1 dev fit?

FP16 is borderline (24 GB transformer + VAE + text encoders is right at the edge). FP8 fits with comfortable headroom for LoRAs and ControlNets. Most ComfyUI users run FP8 on the 4090 in production.

Power draw at 100% load?

450 W. We chassis it with a 4U cooler and a 1,000 W PSU headroom. Stable at sustained load.

Same-day deployment?

Yes for in-stock SKUs. Out-of-stock 4090 lead time is 2–3 working days.

Pages our visitors typically read next.

13B in production? The 4090 is your card.

24 GB GDDR6X, hardware FP8, the cheapest comfortable home for any 13B–14B model. From £289/mo with same-day deployment.

View GPU Catalogue Talk to Sales

NVIDIA RTX 4090 Hosting — The 24 GB Workhorse

RTX 4090 Server Specs

What Fits on a Single RTX 4090

When the RTX 4090 Is the Right Card

13B / 14B class production

Code assistants

FLUX.1 dev image generation

Full-FP16 7B/8B with long context

Voice agent backend

Mid-tier production deployment

RTX 4090 vs Other Cards

Deep Dive

"The 24 GB sweet-spot" — what we mean

Why we still recommend the 5080 over the 4090 for some teams

FP8 path matters — but FP4 is missing

Frequently Asked Questions

Can I run Llama 3.1 13B on a single 4090?

4090 vs 5080 — which should I pick?

Is the 4090 enough for fine-tuning?

How does it compare to the 3090?

Can I run two 4090s in one server?

Will FLUX.1 dev fit?

Power draw at 100% load?

Same-day deployment?

Related Pages

13B in production? The 4090 is your card.

Have a question? Need help?

NVIDIA RTX 4090 Hosting — The 24 GB Workhorse

RTX 4090 Server Specs

What Fits on a Single RTX 4090

When the RTX 4090 Is the Right Card

13B / 14B class production

Code assistants

FLUX.1 dev image generation

Full-FP16 7B/8B with long context

Voice agent backend

Mid-tier production deployment

RTX 4090 vs Other Cards

Deep Dive

"The 24 GB sweet-spot" — what we mean

Why we still recommend the 5080 over the 4090 for some teams

FP8 path matters — but FP4 is missing

Frequently Asked Questions

Can I run Llama 3.1 13B on a single 4090?

4090 vs 5080 — which should I pick?

Is the 4090 enough for fine-tuning?

How does it compare to the 3090?

Can I run two 4090s in one server?

Will FLUX.1 dev fit?

Power draw at 100% load?

Same-day deployment?

Related Pages

13B in production? The 4090 is your card.

Have a question? Need help? Contact us

Have a question? Need help?