RTX 3050 - Order Now
RDNA 4 · 16 GB · ROCm-Ready

AMD Radeon RX 9070 XT Hosting — The RDNA 4 Debut

AMD’s first RDNA 4 card on our floor. 16 GB of GDDR6 at GeForce RTX 5060 Ti pricing, with double the AI accelerators per CU of the previous generation. The right pick when you want a 16 GB inference card without the NVIDIA dependency.

16 GB GDDR6 128 AI accelerators 644.6 GB/s bandwidth From £129/mo
16 GB
GDDR6 VRAM
4,096
Stream processors
644.6 GB/s
Memory bandwidth
£129
/mo from

RX 9070 XT Server Specs

The hardware you actually rent.

GPU modelAMD Radeon RX 9070 XT (RDNA 4, Navi 48)
ArchitectureRDNA 4 — 2nd gen AI accelerators
VRAM16 GB GDDR6 @ 644.6 GB/s
Compute units64 RDNA4 CUs (4,096 stream processors)
AI accelerators128 (2× per CU vs RDNA 3)
FP16 compute~ 97 TFLOPS
INT8~ 389 TOPS
TDP304 W
Host CPUAMD Ryzen 7 / 9
Host RAMUp to 64 GB DDR5
Storage1 TB NVMe + 4 TB SATA SSD
Network1 Gbps unmetered
LocationLondon, United Kingdom

What Fits on a Single RX 9070 XT

16 GB is the smallest VRAM we’d recommend for production LLM serving. The 9070 XT runs the same 7B–8B FP16/INT8 envelope as a 5080 — what changes is the software path: ROCm + PyTorch instead of CUDA + TensorRT.

ModelParamsFP16INT8 / INT4Notes
Mistral 7B Instruct7B14 GB FP167 GB INT8Fits FP16 with 8K context
Llama 3.1 8B8B16 GB FP168 GB INT8Tight FP16 — comfortable at INT8
Qwen 2.5 7B7B14 GB FP167 GB INT8Fits FP16 with 16K context
Phi-3 Mini3.8B8 GB FP164 GB INT8128K context comfortable
Gemma 2 9B9B18 GB FP169 GB INT8INT8 only — FP16 won’t fit
Qwen 2.5 14B14B28 GB FP169 GB AWQ-INT4AWQ-INT4 only on the 9070 XT
Whisper Large-v31.5B6 GBn/aReal-time + headroom for an 8B LLM
FLUX.1 schnell12B24 GB FP1612 GB INT8INT8 only on the 9070 XT
SDXL 1.03.5B8 GB FP164 GB INT8Works via ROCm + PyTorch

When the RX 9070 XT Is the Right Card

Real customer workloads we run on this hardware every day.

AMD-first deployment

If your stack policy is “no NVIDIA dependency” — for licensing, supply-chain diversification, or strategic reasons — the 9070 XT is the cheapest 16 GB AMD card we host. ROCm 6.x runs the inference frameworks you already know.

ROCm-onlyVendor diversificationSupply chain

7B/8B chatbots in INT8 / AWQ-INT4

Mistral 7B, Llama 3.1 8B, and Qwen 2.5 7B all run well at INT8 via vLLM-ROCm or llama.cpp’s HIP backend. AWQ-INT4 lets you push to 14B with 8K context.

vLLM-ROCmllama.cpp HIPAWQ

ROCm/PyTorch dev environment

A clean RDNA 4 box for teams developing against ROCm 6.x. PyTorch, JAX, ONNX Runtime, Hugging Face Transformers — the standard stack works with HIP as the backend instead of CUDA.

PyTorch-ROCmHIPHugging Face

Stable Diffusion / SDXL via ROCm

SDXL and FLUX (INT8) run on AUTOMATIC1111, ComfyUI, and Diffusers with the ROCm backend. Slower kernel maturity than CUDA, but workable for batch image gen and APIs without strict latency SLAs.

SDXLComfyUI-ROCmDiffusers

Embeddings on a non-NVIDIA stack

BGE-large + reranker on a 16 GB AMD card lets you build a retrieval pipeline without locking the embedding tier to NVIDIA. Throughput is in the same ballpark as a 5060 Ti for embeddings.

BGE-largeColBERTReranker

LLM serving for AMD-licensed customers

If you sell to enterprise customers whose procurement requires AMD silicon or who run an AMD-first datacentre, the 9070 XT is the lowest-cost 16 GB SKU we offer that satisfies that constraint.

Enterprise resaleProcurementCompliance

RX 9070 XT vs Other 16 GB Cards

How the 9070 XT stacks up against the closest siblings in the GigaGPU catalogue.

GPUVRAMThroughput / NotesSoftwarePrice
RX 9070 XT16 GB GDDR6~97 TFLOPS FP16, 389 TOPS INT8ROCm 6 / PyTorch-HIPfrom £129
RTX 5060 Ti16 GB GDDR7Lower raw FP16, but FP8 hardware + mature CUDACUDA / TensorRTfrom £119
RTX 508016 GB GDDR756 TFLOPS FP16, 450 TOPS FP8, 900 TOPS FP4CUDA / TensorRTfrom £189
Radeon AI Pro R970032 GB GDDR62× VRAM, datacentre-grade, ECCROCm 6 / PyTorch-HIPfrom £199
RTX 309024 GB GDDR6X~58 tok/s, 13B FP16 fitsCUDAfrom £159

Deep Dive

Why we added an AMD card to the catalogue

Most of our customers run NVIDIA, and most of our roster is NVIDIA. But two things changed in 2025 that made it worth carrying RDNA 4: ROCm 6.x finally hit “good enough” parity for vLLM, llama.cpp, PyTorch, and Hugging Face Transformers; and a real number of customers — especially in regulated industries and EU procurement — started asking for non-NVIDIA inference paths.

The 9070 XT is the right entry point. It’s the cheapest 16 GB AMD card we offer, the silicon is brand new (RDNA 4, launched March 2025), and the AI accelerator count doubled per-CU compared to RDNA 3. It’s not a replacement for a 5080 — it’s an alternative for teams who genuinely want to be on AMD.

The honest software story

ROCm has matured enormously. vLLM-ROCm runs Llama 3, Mistral, Qwen, and Phi families with the same OpenAI-compatible API you’d get on CUDA. llama.cpp’s HIP backend is production-grade. PyTorch-ROCm covers the standard model zoo. Stable Diffusion (AUTOMATIC1111, ComfyUI, Diffusers) all work.

What’s still uneven: TensorRT-class graph compilers don’t have a direct AMD analogue at the same maturity. Some niche frameworks — particularly cutting-edge research code released against CUDA-only kernels (FlashAttention variants, custom Triton kernels, very new quantisation libraries) — will need porting effort or won’t run at all. If your stack depends on a single CUDA-only library, the 9070 XT isn’t your card.

INT8 is the production-ready quant on RDNA 4

NVIDIA Blackwell ships with hardware FP8 and FP4 paths. AMD’s RDNA 4 has the AI accelerators but the FP8 software path is still warming up. In practice the production-ready precision ladder on the 9070 XT looks like:

  • Llama 3 8B at FP16 → 16 GB. Tight, short context only.
  • Llama 3 8B at INT8 → 8 GB. Comfortable, room for KV cache.
  • Llama 3 8B at AWQ-INT4 → 4–5 GB. Or run a 14B at 9 GB AWQ-INT4.

Most production deployments on a 9070 XT land at INT8 — best balance of quality, memory, and ROCm kernel maturity today.

9070 XT vs 5060 Ti — the £10 question

The 5060 Ti is £119, the 9070 XT is £129. £10 a month on a sub-£150 server is noise. The real choice is software path: NVIDIA CUDA (5060 Ti) or AMD ROCm (9070 XT). If you have no preference, take the 5060 Ti — broader framework support and FP8 hardware. If you specifically want AMD silicon, the 9070 XT has more raw FP16 throughput and the newer RDNA 4 AI accelerators.

Frequently Asked Questions

The questions buyers actually ask before committing to an AMD GPU server.

Will my CUDA code run on a 9070 XT?

Not directly — it has to run via HIP/ROCm. The good news: PyTorch, vLLM, llama.cpp, Hugging Face Transformers, and Diffusers all have first-class ROCm builds. If your code uses those frameworks at the API level, the port is usually a Docker image swap. If it depends on raw CUDA kernels or NVIDIA-only libraries, expect porting work.

Is ROCm production-ready in 2026?

For the mainstream LLM and diffusion stack — yes. We run vLLM-ROCm and llama.cpp-HIP in production for paying customers. For bleeding-edge research workloads, expect rough edges around very new kernels and quantisation libraries.

How does it compare to the RTX 5080?

The 5080 still wins on AI software ecosystem (CUDA, TensorRT, FP8, FP4) and is faster on real LLM serving workloads. The 9070 XT is 32% cheaper and is on the AMD software stack. Choose by software path, not benchmarks. See RTX 5080 hosting.

Should I get this or the Radeon AI Pro R9700?

The R9700 has 32 GB VRAM (twice the envelope), ECC, and datacentre-grade firmware. If you need to load a 13B FP16 model or run multiple models on one card, go to the R9700. The 9070 XT is the consumer-grade 16 GB option at £70/mo less.

Does FP8 work?

The hardware accelerators are there, but the ROCm software path for FP8 inference is not yet at the maturity of NVIDIA’s. We recommend INT8 as the production quant on RDNA 4 today and expect FP8 to land properly in a future ROCm release.

Can I run vLLM on it?

Yes. vLLM has an official ROCm build that supports Llama, Mistral, Qwen, Phi, Gemma, and most other mainstream models. We provide a pre-built Docker image.

Power draw at 100% load?

304 W. Comfortable in our 4U chassis with the standard cooling.

Same-day deployment?

Yes for in-stock SKUs. The 9070 XT is a newer addition to our roster — if it’s out of stock, lead time is 3–5 working days.

Building on AMD? The 9070 XT is your entry point.

16 GB GDDR6, 128 RDNA 4 AI accelerators, ROCm 6 ready. From £129/mo with same-day deployment for in-stock SKUs.

Have a question? Need help?