RDNA 4 · 16 GB · ROCm-Ready

AMD Radeon RX 9070 XT Hosting — The RDNA 4 Debut

AMD’s first RDNA 4 card on our floor. 16 GB of GDDR6 at GeForce RTX 5060 Ti pricing, with double the AI accelerators per CU of the previous generation. The right pick when you want a 16 GB inference card without the NVIDIA dependency.

16 GB GDDR6 128 AI accelerators 644.6 GB/s bandwidth From £129/mo

Compare GPU Servers Talk to Sales

16 GB

GDDR6 VRAM

4,096

Stream processors

644.6 GB/s

Memory bandwidth

£129

/mo from

RX 9070 XT Server Specs

The hardware you actually rent.

GPU model	AMD Radeon RX 9070 XT (RDNA 4, Navi 48)
Architecture	RDNA 4 — 2nd gen AI accelerators
VRAM	16 GB GDDR6 @ 644.6 GB/s
Compute units	64 RDNA4 CUs (4,096 stream processors)
AI accelerators	128 (2× per CU vs RDNA 3)
FP16 compute	~ 97 TFLOPS
INT8	~ 389 TOPS
TDP	304 W
Host CPU	AMD Ryzen 7 / 9
Host RAM	Up to 64 GB DDR5
Storage	1 TB NVMe + 4 TB SATA SSD
Network	1 Gbps unmetered
Location	London, United Kingdom

What Fits on a Single RX 9070 XT

16 GB is the smallest VRAM we’d recommend for production LLM serving. The 9070 XT runs the same 7B–8B FP16/INT8 envelope as a 5080 — what changes is the software path: ROCm + PyTorch instead of CUDA + TensorRT.

Model	Params	FP16	INT8 / INT4	Notes
Mistral 7B Instruct	7B	14 GB FP16	7 GB INT8	Fits FP16 with 8K context
Llama 3.1 8B	8B	16 GB FP16	8 GB INT8	Tight FP16 — comfortable at INT8
Qwen 2.5 7B	7B	14 GB FP16	7 GB INT8	Fits FP16 with 16K context
Phi-3 Mini	3.8B	8 GB FP16	4 GB INT8	128K context comfortable
Gemma 2 9B	9B	18 GB FP16	9 GB INT8	INT8 only — FP16 won’t fit
Qwen 2.5 14B	14B	28 GB FP16	9 GB AWQ-INT4	AWQ-INT4 only on the 9070 XT
Whisper Large-v3	1.5B	6 GB	n/a	Real-time + headroom for an 8B LLM
FLUX.1 schnell	12B	24 GB FP16	12 GB INT8	INT8 only on the 9070 XT
SDXL 1.0	3.5B	8 GB FP16	4 GB INT8	Works via ROCm + PyTorch

When the RX 9070 XT Is the Right Card

Real customer workloads we run on this hardware every day.

AMD-first deployment

If your stack policy is “no NVIDIA dependency” — for licensing, supply-chain diversification, or strategic reasons — the 9070 XT is the cheapest 16 GB AMD card we host. ROCm 6.x runs the inference frameworks you already know.

ROCm-onlyVendor diversificationSupply chain

7B/8B chatbots in INT8 / AWQ-INT4

Mistral 7B, Llama 3.1 8B, and Qwen 2.5 7B all run well at INT8 via vLLM-ROCm or llama.cpp’s HIP backend. AWQ-INT4 lets you push to 14B with 8K context.

vLLM-ROCmllama.cpp HIPAWQ

ROCm/PyTorch dev environment

A clean RDNA 4 box for teams developing against ROCm 6.x. PyTorch, JAX, ONNX Runtime, Hugging Face Transformers — the standard stack works with HIP as the backend instead of CUDA.

PyTorch-ROCmHIPHugging Face

Stable Diffusion / SDXL via ROCm

SDXL and FLUX (INT8) run on AUTOMATIC1111, ComfyUI, and Diffusers with the ROCm backend. Slower kernel maturity than CUDA, but workable for batch image gen and APIs without strict latency SLAs.

SDXLComfyUI-ROCmDiffusers

Embeddings on a non-NVIDIA stack

BGE-large + reranker on a 16 GB AMD card lets you build a retrieval pipeline without locking the embedding tier to NVIDIA. Throughput is in the same ballpark as a 5060 Ti for embeddings.

BGE-largeColBERTReranker

LLM serving for AMD-licensed customers

If you sell to enterprise customers whose procurement requires AMD silicon or who run an AMD-first datacentre, the 9070 XT is the lowest-cost 16 GB SKU we offer that satisfies that constraint.

Enterprise resaleProcurementCompliance

RX 9070 XT vs Other 16 GB Cards

How the 9070 XT stacks up against the closest siblings in the GigaGPU catalogue.

GPU	VRAM	Throughput / Notes	Software	Price
RX 9070 XT	16 GB GDDR6	~97 TFLOPS FP16, 389 TOPS INT8	ROCm 6 / PyTorch-HIP	from £129
RTX 5060 Ti	16 GB GDDR7	Lower raw FP16, but FP8 hardware + mature CUDA	CUDA / TensorRT	from £119
RTX 5080	16 GB GDDR7	56 TFLOPS FP16, 450 TOPS FP8, 900 TOPS FP4	CUDA / TensorRT	from £189
Radeon AI Pro R9700	32 GB GDDR6	2× VRAM, datacentre-grade, ECC	ROCm 6 / PyTorch-HIP	from £199
RTX 3090	24 GB GDDR6X	~58 tok/s, 13B FP16 fits	CUDA	from £159

Deep Dive

Why we added an AMD card to the catalogue

Most of our customers run NVIDIA, and most of our roster is NVIDIA. But two things changed in 2025 that made it worth carrying RDNA 4: ROCm 6.x finally hit “good enough” parity for vLLM, llama.cpp, PyTorch, and Hugging Face Transformers; and a real number of customers — especially in regulated industries and EU procurement — started asking for non-NVIDIA inference paths.

The 9070 XT is the right entry point. It’s the cheapest 16 GB AMD card we offer, the silicon is brand new (RDNA 4, launched March 2025), and the AI accelerator count doubled per-CU compared to RDNA 3. It’s not a replacement for a 5080 — it’s an alternative for teams who genuinely want to be on AMD.

The honest software story

ROCm has matured enormously. vLLM-ROCm runs Llama 3, Mistral, Qwen, and Phi families with the same OpenAI-compatible API you’d get on CUDA. llama.cpp’s HIP backend is production-grade. PyTorch-ROCm covers the standard model zoo. Stable Diffusion (AUTOMATIC1111, ComfyUI, Diffusers) all work.

What’s still uneven: TensorRT-class graph compilers don’t have a direct AMD analogue at the same maturity. Some niche frameworks — particularly cutting-edge research code released against CUDA-only kernels (FlashAttention variants, custom Triton kernels, very new quantisation libraries) — will need porting effort or won’t run at all. If your stack depends on a single CUDA-only library, the 9070 XT isn’t your card.

INT8 is the production-ready quant on RDNA 4

NVIDIA Blackwell ships with hardware FP8 and FP4 paths. AMD’s RDNA 4 has the AI accelerators but the FP8 software path is still warming up. In practice the production-ready precision ladder on the 9070 XT looks like:

Llama 3 8B at FP16 → 16 GB. Tight, short context only.
Llama 3 8B at INT8 → 8 GB. Comfortable, room for KV cache.
Llama 3 8B at AWQ-INT4 → 4–5 GB. Or run a 14B at 9 GB AWQ-INT4.

Most production deployments on a 9070 XT land at INT8 — best balance of quality, memory, and ROCm kernel maturity today.

9070 XT vs 5060 Ti — the £10 question

The 5060 Ti is £119, the 9070 XT is £129. £10 a month on a sub-£150 server is noise. The real choice is software path: NVIDIA CUDA (5060 Ti) or AMD ROCm (9070 XT). If you have no preference, take the 5060 Ti — broader framework support and FP8 hardware. If you specifically want AMD silicon, the 9070 XT has more raw FP16 throughput and the newer RDNA 4 AI accelerators.

Frequently Asked Questions

The questions buyers actually ask before committing to an AMD GPU server.

Will my CUDA code run on a 9070 XT?

Not directly — it has to run via HIP/ROCm. The good news: PyTorch, vLLM, llama.cpp, Hugging Face Transformers, and Diffusers all have first-class ROCm builds. If your code uses those frameworks at the API level, the port is usually a Docker image swap. If it depends on raw CUDA kernels or NVIDIA-only libraries, expect porting work.

Is ROCm production-ready in 2026?

For the mainstream LLM and diffusion stack — yes. We run vLLM-ROCm and llama.cpp-HIP in production for paying customers. For bleeding-edge research workloads, expect rough edges around very new kernels and quantisation libraries.

How does it compare to the RTX 5080?

The 5080 still wins on AI software ecosystem (CUDA, TensorRT, FP8, FP4) and is faster on real LLM serving workloads. The 9070 XT is 32% cheaper and is on the AMD software stack. Choose by software path, not benchmarks. See RTX 5080 hosting.

Should I get this or the Radeon AI Pro R9700?

The R9700 has 32 GB VRAM (twice the envelope), ECC, and datacentre-grade firmware. If you need to load a 13B FP16 model or run multiple models on one card, go to the R9700. The 9070 XT is the consumer-grade 16 GB option at £70/mo less.

Does FP8 work?

The hardware accelerators are there, but the ROCm software path for FP8 inference is not yet at the maturity of NVIDIA’s. We recommend INT8 as the production quant on RDNA 4 today and expect FP8 to land properly in a future ROCm release.

Can I run vLLM on it?

Yes. vLLM has an official ROCm build that supports Llama, Mistral, Qwen, Phi, Gemma, and most other mainstream models. We provide a pre-built Docker image.

Power draw at 100% load?

304 W. Comfortable in our 4U chassis with the standard cooling.

Same-day deployment?

Yes for in-stock SKUs. The 9070 XT is a newer addition to our roster — if it’s out of stock, lead time is 3–5 working days.

Pages our visitors typically read next.

Building on AMD? The 9070 XT is your entry point.

16 GB GDDR6, 128 RDNA 4 AI accelerators, ROCm 6 ready. From £129/mo with same-day deployment for in-stock SKUs.

View GPU Catalogue Talk to Sales

AMD Radeon RX 9070 XT Hosting — The RDNA 4 Debut

RX 9070 XT Server Specs

What Fits on a Single RX 9070 XT

When the RX 9070 XT Is the Right Card

AMD-first deployment

7B/8B chatbots in INT8 / AWQ-INT4

ROCm/PyTorch dev environment

Stable Diffusion / SDXL via ROCm

Embeddings on a non-NVIDIA stack

LLM serving for AMD-licensed customers

RX 9070 XT vs Other 16 GB Cards

Deep Dive

Why we added an AMD card to the catalogue

The honest software story

INT8 is the production-ready quant on RDNA 4

9070 XT vs 5060 Ti — the £10 question

Frequently Asked Questions

Will my CUDA code run on a 9070 XT?

Is ROCm production-ready in 2026?

How does it compare to the RTX 5080?

Should I get this or the Radeon AI Pro R9700?

Does FP8 work?

Can I run vLLM on it?

Power draw at 100% load?

Same-day deployment?

Related Pages

Building on AMD? The 9070 XT is your entry point.

Have a question? Need help?

AMD Radeon RX 9070 XT Hosting — The RDNA 4 Debut

RX 9070 XT Server Specs

What Fits on a Single RX 9070 XT

When the RX 9070 XT Is the Right Card

AMD-first deployment

7B/8B chatbots in INT8 / AWQ-INT4

ROCm/PyTorch dev environment

Stable Diffusion / SDXL via ROCm

Embeddings on a non-NVIDIA stack

LLM serving for AMD-licensed customers

RX 9070 XT vs Other 16 GB Cards

Deep Dive

Why we added an AMD card to the catalogue

The honest software story

INT8 is the production-ready quant on RDNA 4

9070 XT vs 5060 Ti — the £10 question

Frequently Asked Questions

Will my CUDA code run on a 9070 XT?

Is ROCm production-ready in 2026?

How does it compare to the RTX 5080?

Should I get this or the Radeon AI Pro R9700?

Does FP8 work?

Can I run vLLM on it?

Power draw at 100% load?

Same-day deployment?

Related Pages

Building on AMD? The 9070 XT is your entry point.

Have a question? Need help? Contact us

Have a question? Need help?