FLUX.1-schnell from Black Forest Labs is the first open, Apache 2.0-licensed model that genuinely rivals Midjourney on photorealism and prompt fidelity. Running it as a production API on the RTX 5060 Ti 16GB via UK dedicated GPU hosting turns it into a private, commercially licensed replacement for Replicate, fal.ai or Together’s hosted FLUX endpoints – at 2.4 seconds per 1024×1024 image and fixed monthly cost.
Contents
Image latency
FLUX.1-schnell is a four-step distilled diffusion model: quality is closer to FLUX.1-dev than to SDXL Turbo, yet inference is short. Quantised to FP8 via the native Blackwell fifth-generation tensor cores, a 1024×1024 image lands in 2.4 seconds including VAE decode. See our FLUX.1-schnell benchmark for the full sweep.
| Resolution | Steps | Precision | Time/image | VRAM |
|---|---|---|---|---|
| 1024×1024 | 4 | FP8 | 2.4 s | 11.8 GB |
| 1024×1024 | 4 | BF16 | 3.6 s | 14.9 GB |
| 768×1344 (portrait) | 4 | FP8 | 2.6 s | 12.1 GB |
| 1344×768 (landscape) | 4 | FP8 | 2.6 s | 12.1 GB |
| 512×512 | 4 | FP8 | 0.9 s | 8.4 GB |
License advantage
FLUX.1-schnell ships under Apache 2.0: commercial use, redistribution, fine-tuning and product embedding all allowed with no royalty or attribution. That clears the biggest legal hurdle in offering a paid image-generation SaaS. FLUX.1-dev is non-commercial without a license agreement with Black Forest Labs; schnell-only deployments sidestep that entirely.
API architecture
A Replicate-style API has four moving parts: a FastAPI front door, a Redis-backed job queue, a GPU worker pool and a webhook dispatcher for async completion callbacks. On one 5060 Ti the worker is a single process loading FLUX.1-schnell FP8 plus a CLIP text encoder, with an optional LoRA stack for brand-specific styles.
POST /v1/predictions
{
"model": "flux-schnell",
"input": {"prompt": "...", "width": 1024, "height": 1024},
"webhook": "https://your-app/callback"
}
→ 202 { "id": "pred_abc", "status": "starting" }
POST /callback (when done)
{ "id": "pred_abc", "status": "succeeded", "output": ["https://cdn/.../img.webp"] }
Throughput and capacity
| Utilisation | Images/hour | Images/day | Monthly |
|---|---|---|---|
| 100% | 1,500 | 36,000 | 1.08M |
| 70% | 1,050 | 25,200 | 756k |
| 50% | 750 | 18,000 | 540k |
Cost vs hosted APIs
| Provider | Per image | 500k images/mo |
|---|---|---|
| Replicate FLUX-schnell | $0.003 | £1,180 |
| fal.ai FLUX-schnell | $0.003 | £1,180 |
| OpenAI DALL-E 3 | $0.04 | £15,750 |
| Self-hosted 5060 Ti | Fixed | Fixed monthly |
FLUX.1-schnell API on Blackwell 16GB
Apache 2.0 image generation at 2.4s/image. UK dedicated hosting.
Order the RTX 5060 Ti 16GBSee also: SDXL benchmark, SDXL API backend, image generation studio, vLLM setup.