AMD Radeon RX 9070 XT Hosting — The RDNA 4 Debut
AMD’s first RDNA 4 card on our floor. 16 GB of GDDR6 at GeForce RTX 5060 Ti pricing, with double the AI accelerators per CU of the previous generation. The right pick when you want a 16 GB inference card without the NVIDIA dependency.
RX 9070 XT Server Specs
The hardware you actually rent.
| GPU model | AMD Radeon RX 9070 XT (RDNA 4, Navi 48) |
|---|---|
| Architecture | RDNA 4 — 2nd gen AI accelerators |
| VRAM | 16 GB GDDR6 @ 644.6 GB/s |
| Compute units | 64 RDNA4 CUs (4,096 stream processors) |
| AI accelerators | 128 (2× per CU vs RDNA 3) |
| FP16 compute | ~ 97 TFLOPS |
| INT8 | ~ 389 TOPS |
| TDP | 304 W |
| Host CPU | AMD Ryzen 7 / 9 |
| Host RAM | Up to 64 GB DDR5 |
| Storage | 1 TB NVMe + 4 TB SATA SSD |
| Network | 1 Gbps unmetered |
| Location | London, United Kingdom |
What Fits on a Single RX 9070 XT
16 GB is the smallest VRAM we’d recommend for production LLM serving. The 9070 XT runs the same 7B–8B FP16/INT8 envelope as a 5080 — what changes is the software path: ROCm + PyTorch instead of CUDA + TensorRT.
| Model | Params | FP16 | INT8 / INT4 | Notes |
|---|---|---|---|---|
| Mistral 7B Instruct | 7B | 14 GB FP16 | 7 GB INT8 | Fits FP16 with 8K context |
| Llama 3.1 8B | 8B | 16 GB FP16 | 8 GB INT8 | Tight FP16 — comfortable at INT8 |
| Qwen 2.5 7B | 7B | 14 GB FP16 | 7 GB INT8 | Fits FP16 with 16K context |
| Phi-3 Mini | 3.8B | 8 GB FP16 | 4 GB INT8 | 128K context comfortable |
| Gemma 2 9B | 9B | 18 GB FP16 | 9 GB INT8 | INT8 only — FP16 won’t fit |
| Qwen 2.5 14B | 14B | 28 GB FP16 | 9 GB AWQ-INT4 | AWQ-INT4 only on the 9070 XT |
| Whisper Large-v3 | 1.5B | 6 GB | n/a | Real-time + headroom for an 8B LLM |
| FLUX.1 schnell | 12B | 24 GB FP16 | 12 GB INT8 | INT8 only on the 9070 XT |
| SDXL 1.0 | 3.5B | 8 GB FP16 | 4 GB INT8 | Works via ROCm + PyTorch |
When the RX 9070 XT Is the Right Card
Real customer workloads we run on this hardware every day.
AMD-first deployment
If your stack policy is “no NVIDIA dependency” — for licensing, supply-chain diversification, or strategic reasons — the 9070 XT is the cheapest 16 GB AMD card we host. ROCm 6.x runs the inference frameworks you already know.
7B/8B chatbots in INT8 / AWQ-INT4
Mistral 7B, Llama 3.1 8B, and Qwen 2.5 7B all run well at INT8 via vLLM-ROCm or llama.cpp’s HIP backend. AWQ-INT4 lets you push to 14B with 8K context.
ROCm/PyTorch dev environment
A clean RDNA 4 box for teams developing against ROCm 6.x. PyTorch, JAX, ONNX Runtime, Hugging Face Transformers — the standard stack works with HIP as the backend instead of CUDA.
Stable Diffusion / SDXL via ROCm
SDXL and FLUX (INT8) run on AUTOMATIC1111, ComfyUI, and Diffusers with the ROCm backend. Slower kernel maturity than CUDA, but workable for batch image gen and APIs without strict latency SLAs.
Embeddings on a non-NVIDIA stack
BGE-large + reranker on a 16 GB AMD card lets you build a retrieval pipeline without locking the embedding tier to NVIDIA. Throughput is in the same ballpark as a 5060 Ti for embeddings.
LLM serving for AMD-licensed customers
If you sell to enterprise customers whose procurement requires AMD silicon or who run an AMD-first datacentre, the 9070 XT is the lowest-cost 16 GB SKU we offer that satisfies that constraint.
RX 9070 XT vs Other 16 GB Cards
How the 9070 XT stacks up against the closest siblings in the GigaGPU catalogue.
| GPU | VRAM | Throughput / Notes | Software | Price |
|---|---|---|---|---|
| RX 9070 XT | 16 GB GDDR6 | ~97 TFLOPS FP16, 389 TOPS INT8 | ROCm 6 / PyTorch-HIP | from £129 |
| RTX 5060 Ti | 16 GB GDDR7 | Lower raw FP16, but FP8 hardware + mature CUDA | CUDA / TensorRT | from £119 |
| RTX 5080 | 16 GB GDDR7 | 56 TFLOPS FP16, 450 TOPS FP8, 900 TOPS FP4 | CUDA / TensorRT | from £189 |
| Radeon AI Pro R9700 | 32 GB GDDR6 | 2× VRAM, datacentre-grade, ECC | ROCm 6 / PyTorch-HIP | from £199 |
| RTX 3090 | 24 GB GDDR6X | ~58 tok/s, 13B FP16 fits | CUDA | from £159 |
Deep Dive
Why we added an AMD card to the catalogue
Most of our customers run NVIDIA, and most of our roster is NVIDIA. But two things changed in 2025 that made it worth carrying RDNA 4: ROCm 6.x finally hit “good enough” parity for vLLM, llama.cpp, PyTorch, and Hugging Face Transformers; and a real number of customers — especially in regulated industries and EU procurement — started asking for non-NVIDIA inference paths.
The 9070 XT is the right entry point. It’s the cheapest 16 GB AMD card we offer, the silicon is brand new (RDNA 4, launched March 2025), and the AI accelerator count doubled per-CU compared to RDNA 3. It’s not a replacement for a 5080 — it’s an alternative for teams who genuinely want to be on AMD.
The honest software story
ROCm has matured enormously. vLLM-ROCm runs Llama 3, Mistral, Qwen, and Phi families with the same OpenAI-compatible API you’d get on CUDA. llama.cpp’s HIP backend is production-grade. PyTorch-ROCm covers the standard model zoo. Stable Diffusion (AUTOMATIC1111, ComfyUI, Diffusers) all work.
What’s still uneven: TensorRT-class graph compilers don’t have a direct AMD analogue at the same maturity. Some niche frameworks — particularly cutting-edge research code released against CUDA-only kernels (FlashAttention variants, custom Triton kernels, very new quantisation libraries) — will need porting effort or won’t run at all. If your stack depends on a single CUDA-only library, the 9070 XT isn’t your card.
INT8 is the production-ready quant on RDNA 4
NVIDIA Blackwell ships with hardware FP8 and FP4 paths. AMD’s RDNA 4 has the AI accelerators but the FP8 software path is still warming up. In practice the production-ready precision ladder on the 9070 XT looks like:
- Llama 3 8B at FP16 → 16 GB. Tight, short context only.
- Llama 3 8B at INT8 → 8 GB. Comfortable, room for KV cache.
- Llama 3 8B at AWQ-INT4 → 4–5 GB. Or run a 14B at 9 GB AWQ-INT4.
Most production deployments on a 9070 XT land at INT8 — best balance of quality, memory, and ROCm kernel maturity today.
9070 XT vs 5060 Ti — the £10 question
The 5060 Ti is £119, the 9070 XT is £129. £10 a month on a sub-£150 server is noise. The real choice is software path: NVIDIA CUDA (5060 Ti) or AMD ROCm (9070 XT). If you have no preference, take the 5060 Ti — broader framework support and FP8 hardware. If you specifically want AMD silicon, the 9070 XT has more raw FP16 throughput and the newer RDNA 4 AI accelerators.
Frequently Asked Questions
The questions buyers actually ask before committing to an AMD GPU server.
Will my CUDA code run on a 9070 XT?
Not directly — it has to run via HIP/ROCm. The good news: PyTorch, vLLM, llama.cpp, Hugging Face Transformers, and Diffusers all have first-class ROCm builds. If your code uses those frameworks at the API level, the port is usually a Docker image swap. If it depends on raw CUDA kernels or NVIDIA-only libraries, expect porting work.
Is ROCm production-ready in 2026?
For the mainstream LLM and diffusion stack — yes. We run vLLM-ROCm and llama.cpp-HIP in production for paying customers. For bleeding-edge research workloads, expect rough edges around very new kernels and quantisation libraries.
How does it compare to the RTX 5080?
The 5080 still wins on AI software ecosystem (CUDA, TensorRT, FP8, FP4) and is faster on real LLM serving workloads. The 9070 XT is 32% cheaper and is on the AMD software stack. Choose by software path, not benchmarks. See RTX 5080 hosting.
Should I get this or the Radeon AI Pro R9700?
The R9700 has 32 GB VRAM (twice the envelope), ECC, and datacentre-grade firmware. If you need to load a 13B FP16 model or run multiple models on one card, go to the R9700. The 9070 XT is the consumer-grade 16 GB option at £70/mo less.
Does FP8 work?
The hardware accelerators are there, but the ROCm software path for FP8 inference is not yet at the maturity of NVIDIA’s. We recommend INT8 as the production quant on RDNA 4 today and expect FP8 to land properly in a future ROCm release.
Can I run vLLM on it?
Yes. vLLM has an official ROCm build that supports Llama, Mistral, Qwen, Phi, Gemma, and most other mainstream models. We provide a pre-built Docker image.
Power draw at 100% load?
304 W. Comfortable in our 4U chassis with the standard cooling.
Same-day deployment?
Yes for in-stock SKUs. The 9070 XT is a newer addition to our roster — if it’s out of stock, lead time is 3–5 working days.
Related Pages
Pages our visitors typically read next.
Building on AMD? The 9070 XT is your entry point.
16 GB GDDR6, 128 RDNA 4 AI accelerators, ROCm 6 ready. From £129/mo with same-day deployment for in-stock SKUs.