RTX 3050 - Order Now
Home / Blog / Benchmarks / Whisper Medium RTF by GPU
Benchmarks

Whisper Medium RTF by GPU

Benchmark data for OpenAI Whisper Medium real-time factor across six GPUs with FP16 and INT8 results and cost analysis for dedicated GPU hosting.

Whisper Medium Benchmark Overview

OpenAI Whisper Medium (769M parameters) sits between the smaller Whisper Small and the flagship Large-v3, offering a strong balance of accuracy and speed. For many transcription workloads it provides more than sufficient quality while running significantly faster. Deploying it on a dedicated GPU server keeps latency low and throughput high for production use.

We benchmarked Whisper Medium using faster-whisper (CTranslate2) on GigaGPU servers with a 10-minute English audio sample. The model needs approximately 1.5 GB of VRAM at FP16, making it runnable on every GPU tested. For methodology details, see our benchmark hub.

RTF Results by GPU

Lower RTF is better. Below 1.0 means faster than real-time transcription.

GPUVRAMWhisper Medium FP16 RTFSpeed vs Real-Time
RTX 30506 GB0.166.3x real-time
RTX 40608 GB0.0911.1x real-time
RTX 4060 Ti16 GB0.06515.4x real-time
RTX 309024 GB0.04522.2x real-time
RTX 508016 GB0.0333.3x real-time
RTX 509032 GB0.0250x real-time

Whisper Medium is substantially faster than Large-v3, with the RTX 5090 reaching a remarkable 50x real-time speed. Even the budget RTX 3050 manages 6.3x real-time, making it viable for lightweight self-hosted transcription.

FP16 vs INT8 Comparison

INT8 quantisation further improves speed. See our quantisation analysis for background on precision trade-offs.

GPUFP16 RTFINT8 RTFImprovement
RTX 30500.160.1131%
RTX 40600.090.0633%
RTX 4060 Ti0.0650.04432%
RTX 30900.0450.0333%
RTX 50800.030.0233%
RTX 50900.020.01430%

INT8 gives a consistent ~32% speed boost. The RTX 5090 at INT8 reaches 71x real-time, processing a 1-hour recording in under 51 seconds.

Cost Efficiency Analysis

GPUFP16 RTFApprox. Monthly CostSpeed/Pound
RTX 30500.16~£450.139
RTX 40600.09~£600.185
RTX 4060 Ti0.065~£750.205
RTX 30900.045~£1100.202
RTX 50800.03~£1600.208
RTX 50900.02~£2500.200

The RTX 5080 and RTX 4060 Ti tie for best cost efficiency. For the best GPU for Whisper, the RTX 4060 Ti is the clear budget champion.

GPU Recommendations

  • Budget: RTX 4060 — 11x real-time is excellent for moderate transcription volumes at low cost.
  • Best value: RTX 4060 Ti — top cost efficiency with 15x real-time speed.
  • High volume: RTX 5080 — 33x real-time handles heavy transcription pipelines.
  • Maximum speed: RTX 5090 — 50x real-time for time-critical applications.

If you need better accuracy, see the Whisper Large-v3 RTF benchmark. For a detailed comparison across model sizes, check the Whisper Tiny vs Base vs Small comparison. Browse all data in the Benchmarks category.

Conclusion

Whisper Medium is the sweet spot for most transcription workloads, offering near-Large-v3 accuracy with roughly double the speed. It runs on every GPU we tested and delivers exceptional cost efficiency on mid-range cards. For teams that do not need the absolute best multilingual accuracy, Whisper Medium on dedicated hardware is the practical choice.

Fast Transcription with Whisper on Dedicated GPUs

Bare-metal GPU servers for speech-to-text workloads. From budget to high-end, find the right server for your volume.

Browse GPU Servers

Need a Dedicated GPU Server?

Deploy from RTX 3050 to RTX 5090. Full root access, NVMe storage, 1Gbps — UK datacenter.

Browse GPU Servers

admin

We benchmark, deploy, and optimise GPU infrastructure for AI workloads. All data in our guides comes from real-world testing on our UK-based dedicated GPU servers.

Ready to deploy your AI workload?

Dedicated GPU servers from our UK datacenter. NVMe storage, 1Gbps networking, full root access.

Browse GPU Servers Contact Sales

Have a question? Need help?