Table of Contents
Image-gen APIs differ from LLM APIs: each request is 5-30 seconds of compute, queueing matters more than streaming. The architecture pattern is different.
FastAPI in front of ComfyUI on a 5090. Redis queue for incoming requests. Per-image cost at 60% utilisation: ~£0.0006 for FLUX.1 dev FP8. Compared to Replicate at ~£0.04/image, self-hosting wins above ~250 images/day.
Architecture
- FastAPI: HTTP /v1/images/generations endpoint, OpenAI-shaped
- Redis queue: requests buffered, worker picks them up
- ComfyUI: workflow runner, multiple workers possible
- S3-compatible storage: generated images, signed URLs
- Webhook callbacks: for async clients
Queueing strategy
Two patterns:
- Synchronous: request blocks until image ready. Simple but holds connections for ~10 s each.
- Async with webhook: request returns job ID immediately, ComfyUI processes, webhook notifies on completion. Better for high concurrency.
Cost per image
RTX 5090 at £399/mo. FLUX.1 dev FP8 at ~6 s per 1024². At 60% utilisation:
- Images per month at 60% util: 30 days × 86400 s × 0.6 / 6 s = ~259K images
- Cost per image: £399 / 259K = ~£0.0014
Replicate FLUX.1 dev: ~$0.055 per image = ~£0.044. Self-hosting is 30× cheaper at full utilisation.
Verdict
For high-volume image-gen APIs, self-hosting on dedicated hardware is dramatically cheaper than per-image services. Break-even with Replicate is around ~250 images/day.
Bottom line
For image-gen APIs at any meaningful scale, dedicated GPU wins. See FLUX.1 images per second by GPU.