RTX 3050 - Order Now
Home / Blog / Tutorials / Self-Hosted AI Image Generation API: Architecture and Cost Math
Tutorials

Self-Hosted AI Image Generation API: Architecture and Cost Math

Building a production image-generation API on dedicated GPU hardware — ComfyUI as backend, FastAPI wrapper, queueing, and cost-per-image at scale.

Image-gen APIs differ from LLM APIs: each request is 5-30 seconds of compute, queueing matters more than streaming. The architecture pattern is different.

TL;DR

FastAPI in front of ComfyUI on a 5090. Redis queue for incoming requests. Per-image cost at 60% utilisation: ~£0.0006 for FLUX.1 dev FP8. Compared to Replicate at ~£0.04/image, self-hosting wins above ~250 images/day.

Architecture

  • FastAPI: HTTP /v1/images/generations endpoint, OpenAI-shaped
  • Redis queue: requests buffered, worker picks them up
  • ComfyUI: workflow runner, multiple workers possible
  • S3-compatible storage: generated images, signed URLs
  • Webhook callbacks: for async clients

Queueing strategy

Two patterns:

  1. Synchronous: request blocks until image ready. Simple but holds connections for ~10 s each.
  2. Async with webhook: request returns job ID immediately, ComfyUI processes, webhook notifies on completion. Better for high concurrency.

Cost per image

RTX 5090 at £399/mo. FLUX.1 dev FP8 at ~6 s per 1024². At 60% utilisation:

  • Images per month at 60% util: 30 days × 86400 s × 0.6 / 6 s = ~259K images
  • Cost per image: £399 / 259K = ~£0.0014

Replicate FLUX.1 dev: ~$0.055 per image = ~£0.044. Self-hosting is 30× cheaper at full utilisation.

Verdict

For high-volume image-gen APIs, self-hosting on dedicated hardware is dramatically cheaper than per-image services. Break-even with Replicate is around ~250 images/day.

Bottom line

For image-gen APIs at any meaningful scale, dedicated GPU wins. See FLUX.1 images per second by GPU.

Need a Dedicated GPU Server?

Deploy from RTX 3050 to RTX 5090. Full root access, NVMe storage, 1Gbps — UK datacenter.

Browse GPU Servers

gigagpu

We benchmark, deploy, and optimise GPU infrastructure for AI workloads. All data in our guides comes from real-world testing on our UK-based dedicated GPU servers.

Ready to deploy your AI workload?

Dedicated GPU servers from our UK datacenter. NVMe storage, 1Gbps networking, full root access.

Browse GPU Servers Contact Sales

Have a question? Need help?