SDXL remains the workhorse of production image generation: mature tooling, vast community LoRAs, ControlNet, IP-Adapter and battle-tested fine-tunes for every style. Hosting it as an API on the RTX 5060 Ti 16GB via UK dedicated GPU hosting delivers 3.4 seconds per 1024×1024 FP16 image, with 16 GB of GDDR7 wide enough to hold base, refiner, two ControlNets and a LoRA stack simultaneously.
Contents
Throughput
| Workflow | Steps | Precision | Time/image | Images/hour |
|---|---|---|---|---|
| SDXL base 1024 | 30 | FP16 | 3.4 s | 1,050 |
| SDXL base + refiner | 30 + 10 | FP16 | 4.6 s | 780 |
| SDXL Lightning 4-step | 4 | FP16 | 0.95 s | 3,780 |
| SDXL Turbo 1-step | 1 | FP16 | 0.35 s | 10,280 |
| SDXL + ControlNet | 30 | FP16 | 4.2 s | 850 |
At 3,780 images/hour for SDXL Lightning and 50% utilisation, one 5060 Ti sustains 1.3M images/month. See our SDXL benchmark.
Feature stack
- Base SDXL 1.0 (6.9 GB FP16) plus optional refiner.
- SDXL Lightning / Turbo / Hyper-SD distilled fast variants.
- ControlNet – pose, depth, canny, tile, QR-code.
- IP-Adapter – style and subject conditioning from reference images.
- LoRA stacking – up to eight active LoRAs at rank 32 in 16 GB.
- Diffusers scheduler pool – DPM++ 2M Karras, Euler a, UniPC.
LoRA-swap architecture
For a customer-facing product with per-brand style models, keep the SDXL UNet resident and swap LoRA adapters per request. Rank-32 LoRAs weigh 140-220 MB each; NVMe-to-VRAM swap completes in 80-120 ms via load_lora_weights, small enough to amortise across a 3.4-second image generation.
| LoRA rank | Size | Hot in VRAM | Swap latency |
|---|---|---|---|
| 16 | 110 MB | ~20 | 60 ms |
| 32 | 180 MB | ~12 | 100 ms |
| 64 | 340 MB | ~6 | 180 ms |
API design
FastAPI front door, Redis job queue, one GPU worker process loading SDXL with a LoRA manager. Expose OpenAI-compatible image endpoints or Replicate-style async predictions with webhooks. Cache generated images to S3/R2 and return signed URLs.
Cost vs hosted APIs
| Option | Per image | 500k images/mo |
|---|---|---|
| OpenAI DALL-E 3 1024 | $0.04 | £15,750 |
| Stability AI Core 1024 | $0.03 | £11,800 |
| Replicate SDXL | $0.0017 | £670 |
| Self-hosted 5060 Ti | Fixed | Fixed monthly |
Break-even for self-hosting vs Replicate lands around 200k images/month; above 1M/month self-hosting is decisively cheaper and gives you private LoRA storage, custom checkpoints and UK data residency.
Private SDXL API on Blackwell 16GB
LoRA, ControlNet and IP-Adapter on one card. UK dedicated hosting.
Order the RTX 5060 Ti 16GBSee also: FLUX benchmark, FLUX API, image generation studio, startup MVP.