RTX 3050 - Order Now
Home / Blog / Model Guides / How to Run Stable Diffusion XL on a Dedicated Server
Model Guides

How to Run Stable Diffusion XL on a Dedicated Server

Deploy Stable Diffusion XL on a dedicated GPU server for high-quality image generation at scale. Covers hardware requirements, ComfyUI and API deployment, batch generation, and production configuration.

Why Run SDXL on Dedicated Hardware

Stable Diffusion XL produces stunning 1024×1024 images with superior text rendering, composition, and detail compared with earlier versions. Running SDXL on a dedicated GPU server unlocks unlimited image generation without per-image API costs, complete privacy for generated content, and the performance to serve real-time generation for production applications. GigaGPU offers pre-configured Stable Diffusion hosting and image generator hosting for turnkey deployment.

For businesses generating product images, marketing assets, game textures, or creative content at volume, the economics of self-hosting are compelling. A single RTX 5090 generates an SDXL image in 5-8 seconds, meaning one server can produce 400-700 images per hour continuously. For detailed deployment patterns, see our guide to deploying a Stable Diffusion server.

GPU Requirements for Stable Diffusion XL

SDXL is more VRAM-hungry than SD 1.5, but modern GPUs handle it well. The base model requires approximately 7 GB of VRAM, with additional memory needed for the refiner, LoRA adapters, and batch processing.

GPU VRAM SDXL Base (1024×1024) SDXL + Refiner Batch of 4
RTX 3090 24 GB ~5 sec ~9 sec ~15 sec
RTX 5090 24 GB ~4 sec ~7 sec ~12 sec
RTX 5080 24 GB ~7 sec ~12 sec ~20 sec
RTX 6000 Pro 48 GB ~5 sec ~9 sec ~14 sec
RTX 6000 Pro 80 GB ~3 sec ~6 sec ~8 sec

The RTX 3090 vs RTX 5090 choice often comes down to price: the 3090 is more cost-effective, while the 5090 is faster per image. For commercial generation at scale, the RTX 6000 Pro delivers the best throughput per dollar for sustained workloads. Our cheapest GPU for AI inference guide covers the full pricing breakdown. If you are comparing the cost of self-hosted image generation against API services, our GPU vs API cost comparison tool can help model the economics.

Installing SDXL with Diffusers

The Hugging Face Diffusers library provides a clean Python API for SDXL inference:

# Create environment
python3 -m venv ~/sdxl-env
source ~/sdxl-env/bin/activate

# Install dependencies
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu121
pip install diffusers transformers accelerate safetensors

# Generate an image
python3 << 'PYEOF'
from diffusers import StableDiffusionXLPipeline
import torch

pipe = StableDiffusionXLPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0",
    torch_dtype=torch.float16,
    variant="fp16",
    use_safetensors=True
).to("cuda")

# Enable memory optimization
pipe.enable_vae_slicing()

image = pipe(
    prompt="A professional product photo of a sleek laptop on a minimalist desk, soft studio lighting, 8k",
    negative_prompt="blurry, low quality, distorted",
    num_inference_steps=30,
    guidance_scale=7.5,
    width=1024,
    height=1024
).images[0]

image.save("output.png")
print("Image generated successfully")
PYEOF

The first run downloads model weights (~7 GB). GigaGPU servers with NVMe storage load these weights in seconds on subsequent runs. For the underlying PyTorch setup, our PyTorch GPU server installation guide covers the full environment configuration.

Deploying ComfyUI for Visual Workflows

ComfyUI provides a node-based interface for building complex generation workflows. It is ideal for teams that need fine-grained control over the generation pipeline. GigaGPU offers dedicated ComfyUI hosting with pre-installed models and extensions.

# Clone and set up ComfyUI
git clone https://github.com/comfyanonymous/ComfyUI.git ~/ComfyUI
cd ~/ComfyUI

# Install dependencies
pip install -r requirements.txt

# Download SDXL model to the models directory
cd models/checkpoints
wget https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/resolve/main/sd_xl_base_1.0.safetensors

# Start ComfyUI
cd ~/ComfyUI
python main.py --listen 0.0.0.0 --port 8188

Access ComfyUI at http://YOUR_SERVER_IP:8188. The visual workflow editor lets you chain models, LoRA adapters, upscalers, and post-processing nodes without writing code.

Building an Image Generation API

For programmatic access, wrap SDXL in a FastAPI endpoint:

# sdxl_server.py
from fastapi import FastAPI
from fastapi.responses import Response
from pydantic import BaseModel
from diffusers import StableDiffusionXLPipeline
import torch
import io

app = FastAPI(title="SDXL Image Generation API")

pipe = StableDiffusionXLPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0",
    torch_dtype=torch.float16, variant="fp16", use_safetensors=True
).to("cuda")
pipe.enable_vae_slicing()

class GenerateRequest(BaseModel):
    prompt: str
    negative_prompt: str = "blurry, low quality, distorted"
    steps: int = 30
    guidance_scale: float = 7.5
    width: int = 1024
    height: int = 1024
    seed: int = -1

@app.post("/generate")
async def generate(req: GenerateRequest):
    generator = None
    if req.seed >= 0:
        generator = torch.Generator("cuda").manual_seed(req.seed)

    image = pipe(
        prompt=req.prompt,
        negative_prompt=req.negative_prompt,
        num_inference_steps=req.steps,
        guidance_scale=req.guidance_scale,
        width=req.width,
        height=req.height,
        generator=generator
    ).images[0]

    buf = io.BytesIO()
    image.save(buf, format="PNG")
    return Response(content=buf.getvalue(), media_type="image/png")
# Run the API
uvicorn sdxl_server:app --host 0.0.0.0 --port 8000 --workers 1

# Test with curl
curl -X POST http://localhost:8000/generate \
  -H "Content-Type: application/json" \
  -d '{"prompt": "A mountain landscape at sunset, oil painting style"}' \
  --output generated.png

Optimization and Batch Generation

Maximise throughput with these optimizations:

# Enable torch.compile for 20-30% speedup (requires PyTorch 2.0+)
pipe.unet = torch.compile(pipe.unet, mode="reduce-overhead", fullgraph=True)

# Batch generation for multiple images
images = pipe(
    prompt=["A red sports car", "A blue mountain lake", "A golden retriever"],
    negative_prompt=["blurry"] * 3,
    num_inference_steps=25,
    guidance_scale=7.0,
    width=1024,
    height=1024
).images

for i, img in enumerate(images):
    img.save(f"batch_{i}.png")

Additional optimization techniques:

  • Reduce inference steps: 25 steps often matches 30-step quality with DPM++ scheduler
  • Use FP16: Half-precision reduces VRAM usage and increases speed
  • VAE slicing: Processes the VAE in slices to reduce peak VRAM
  • torch.compile: Compiles the UNet for optimised GPU kernels
  • LoRA adapters: Smaller than full fine-tuned checkpoints, fast to swap

For teams running SDXL alongside other models, a single private GPU server can host image generation, an LLM for prompt enhancement, and vision models for quality assessment in a unified pipeline.

Production Configuration

Deploy SDXL as a systemd service with persistent model caching:

# /etc/systemd/system/sdxl.service
[Unit]
Description=SDXL Image Generation API
After=network.target

[Service]
User=deploy
WorkingDirectory=/home/deploy
ExecStart=/home/deploy/sdxl-env/bin/uvicorn sdxl_server:app \
  --host 0.0.0.0 --port 8000 --workers 1
Restart=always
RestartSec=10
Environment=HF_HOME=/data/huggingface
Environment=CUDA_VISIBLE_DEVICES=0

[Install]
WantedBy=multi-user.target
sudo systemctl daemon-reload
sudo systemctl enable sdxl
sudo systemctl start sdxl

Place Nginx in front for TLS and rate limiting, following the patterns in our production inference server guide. Monitor GPU utilization and generation queue depth to ensure consistent response times. For teams scaling beyond a single GPU, see our guide to multi-GPU server setup for load balancing image generation across multiple GPUs. Check the model guides section for deployment guides covering additional image and vision models.

Run Stable Diffusion XL on Dedicated Hardware

GigaGPU provides GPU servers optimised for image generation. Pre-configured with CUDA, fast NVMe storage, and the VRAM you need for SDXL at full resolution. Generate unlimited images with zero per-image costs.

Browse GPU Servers

Need a Dedicated GPU Server?

Deploy from RTX 3050 to RTX 5090. Full root access, NVMe storage, 1Gbps — UK datacenter.

Browse GPU Servers

admin

We benchmark, deploy, and optimise GPU infrastructure for AI workloads. All data in our guides comes from real-world testing on our UK-based dedicated GPU servers.

Ready to deploy your AI workload?

Dedicated GPU servers from our UK datacenter. NVMe storage, 1Gbps networking, full root access.

Browse GPU Servers Contact Sales

Have a question? Need help?