Home / Blog / Benchmarks / RTX 5060 Ti 16GB FLUX.1 Schnell Benchmark

Benchmarks

RTX 5060 Ti 16GB FLUX.1 Schnell Benchmark

FLUX.1-schnell on Blackwell 16GB - 4-step distilled SOTA image gen, FP16 and FP8 throughput numbers.

Benchmarks April 23, 2026 1 min read gigagpu

FLUX.1-schnell from Black Forest Labs is the 4-step distilled variant of FLUX.1 – SOTA image quality at fast iteration time. On the RTX 5060 Ti 16GB via our hosting, FLUX.1-schnell fits with headroom.

Setup
FP16
FP8
Fit and VRAM
vs SDXL

Setup

Diffusers 0.30, PyTorch 2.5
Model: black-forest-labs/FLUX.1-schnell
12B-param diffusion transformer (DiT), T5 + CLIP text encoders
Resolution: 1024×1024
Licence: Apache 2.0 (schnell variant)

FP16 Throughput

Steps	Time	VRAM peak
1	1.9 s	14.8 GB
2	2.5 s	14.8 GB
4	3.8 s	14.8 GB

FP16 just fits. 1-step produces passable images; 4-step is the recommended setting – under 4 seconds per 1024×1024 image.

FP8 Throughput

Steps	Time	VRAM peak
1	1.2 s	9.2 GB
2	1.6 s	9.2 GB
4	2.4 s	9.2 GB

FP8 drops VRAM from 14.8 GB to 9.2 GB with ~35% speed uplift on Blackwell. Quality is essentially indistinguishable at 1024×1024. The FP8-quantised weights are available from the community or produced locally via ComfyUI’s FP8 nodes.

Fit and VRAM

FP16: 14.8 GB peak – no room for batch
FP8: 9.2 GB – fits batch 2 comfortably
CPU-offloaded text encoder (T5): reclaims ~3 GB at cost of ~500 ms first-token latency

vs SDXL

Metric	FLUX.1-schnell FP8 4-step	SDXL 30-step
Time/image @ 1024	2.4 s	3.4 s
Peak VRAM	9.2 GB	9.2 GB
Prompt adherence	Noticeably better	Baseline
Typography / text in image	Much better	Weak

FLUX.1-schnell is faster AND higher quality than SDXL for most prompts. Unless you have an SDXL-specific LoRA or checkpoint you need, FLUX is the new default.

FLUX.1 on Blackwell 16GB

2.4 s per 1024px image at FP8. UK dedicated hosting.

Order the RTX 5060 Ti 16GB

Need a Dedicated GPU Server?

Deploy from RTX 3050 to RTX 5090. Full root access, NVMe storage, 1Gbps — UK datacenter.

Browse GPU Servers

Benchmarks

gigagpu

We benchmark, deploy, and optimise GPU infrastructure for AI workloads. All data in our guides comes from real-world testing on our UK-based dedicated GPU servers.

Ready to deploy your AI workload?

Dedicated GPU servers from our UK datacenter. NVMe storage, 1Gbps networking, full root access.

Browse GPU Servers Contact Sales

RTX 5060 Ti 16GB FLUX.1 Schnell Benchmark

Contents

Setup

FP16 Throughput

FP8 Throughput

Fit and VRAM

vs SDXL

FLUX.1 on Blackwell 16GB

Need a Dedicated GPU Server?

gigagpu

GPU Hosting

Blog Categories

AI Model Hosting

Benchmarks & Tools

Deploy a GPU Server

Ready to deploy your AI workload?

Have a question? Need help?

RTX 5060 Ti 16GB FLUX.1 Schnell Benchmark

Contents

Setup

FP16 Throughput

FP8 Throughput

Fit and VRAM

vs SDXL

FLUX.1 on Blackwell 16GB

Need a Dedicated GPU Server?

gigagpu

Related Articles

LLaMA 3 8B on RTX 3090: Performance Benchmark & Cost, Category: Benchmarks, Slug: llama-3-8b-on-rtx-3090-benchmark, Excerpt: LLaMA 3 8B benchmarked on RTX 3090: 62 tok/s at FP16, VRAM usage, cost per 1M tokens, and deployment configuration., Internal links: 9 –>

RTX 5060 Ti 16GB Thermal Performance

SD 1.5 on RTX 5090: Images/sec & VRAM Usage, Category: Benchmarks, Slug: sd-1.5-on-rtx-5090-benchmark, Excerpt: SD 1.5 benchmarked on RTX 5090: 25.5 it/s, 61.2 images/min at 512×512, VRAM usage, and cost per 1K images., Internal links: 8 –>

Stable Diffusion 1.5 vs SDXL Speed by GPU

GPU Hosting

Blog Categories

AI Model Hosting

Benchmarks & Tools

Deploy a GPU Server

Ready to deploy your AI workload?

Have a question? Need help? Contact us

Have a question? Need help?