Home / Blog / Tutorials / Whisper + Pyannote Diarization on a GPU

Tutorials

Whisper + Pyannote Diarization on a GPU

Transcription tells you what was said. Diarization tells you who said it. Combined pipeline on a dedicated GPU for full speaker-labelled transcripts.

Tutorials April 23, 2026 2 min read admin

Raw Whisper transcription is a monologue-style stream of text. For meetings, calls, or interviews you need to know who said what. Pyannote handles speaker diarization. On dedicated GPU hosting the combined pipeline produces speaker-labelled transcripts reliably.

Stack
Pipeline
Code
Quality tips

Stack

faster-whisper for transcription with per-segment timestamps
pyannote.audio for speaker diarization
A merger that assigns segments to speakers based on overlapping timestamps

The whisperX project combines these and is the practical default.

Pipeline

Transcribe audio with faster-whisper, keeping word-level timestamps
Diarize audio with Pyannote – produces (start, end, speaker_id) intervals
Assign each transcribed word to the speaker whose interval contains its timestamp
Collapse consecutive same-speaker words into turns

Code

import whisperx

audio = whisperx.load_audio("meeting.wav")
model = whisperx.load_model("large-v3-turbo", device="cuda")
result = model.transcribe(audio, batch_size=16)

align_model, meta = whisperx.load_align_model(language_code=result["language"], device="cuda")
result = whisperx.align(result["segments"], align_model, meta, audio, device="cuda")

diarize = whisperx.DiarizationPipeline(use_auth_token=HF_TOKEN, device="cuda")
segments = diarize(audio)
result = whisperx.assign_word_speakers(segments, result)

for seg in result["segments"]:
    print(f"{seg['speaker']}: {seg['text']}")

Quality Tips

Pyannote needs clean audio – noise and overlapping speakers degrade diarization
Set the expected speaker count if you know it (min_speakers=2, max_speakers=4)
For phone calls, two-speaker diarization is usually accurate; large conference calls are harder
Diarization runs on GPU for speed but CPU is fine for non-real-time batch jobs

Speaker-Labelled Transcription Hosting

Whisper + Pyannote on UK dedicated GPUs with HuggingFace tokens configured.

Browse GPU Servers

See Whisper Turbo.

Need a Dedicated GPU Server?

Deploy from RTX 3050 to RTX 5090. Full root access, NVMe storage, 1Gbps — UK datacenter.

Browse GPU Servers

Tutorials

admin

We benchmark, deploy, and optimise GPU infrastructure for AI workloads. All data in our guides comes from real-world testing on our UK-based dedicated GPU servers.

Ready to deploy your AI workload?

Dedicated GPU servers from our UK datacenter. NVMe storage, 1Gbps networking, full root access.

Browse GPU Servers Contact Sales

Whisper + Pyannote Diarization on a GPU

Contents

Stack

Pipeline

Code

Quality Tips

Speaker-Labelled Transcription Hosting

Need a Dedicated GPU Server?

admin

GPU Hosting

Blog Categories

AI Model Hosting

Benchmarks & Tools

Deploy a GPU Server

Ready to deploy your AI workload?

Have a question? Need help?

Whisper + Pyannote Diarization on a GPU

Contents

Stack

Pipeline

Code

Quality Tips

Speaker-Labelled Transcription Hosting

Need a Dedicated GPU Server?

admin

Related Articles

Migrate from Replicate to Dedicated GPU: Audio Transcription

Rolling Model Upgrade on an Inference Server

LangChain with Self-Hosted vLLM

FAISS vs Qdrant vs Weaviate vs ChromaDB: Vector DB Comparison

GPU Hosting

Blog Categories

AI Model Hosting

Benchmarks & Tools

Deploy a GPU Server

Ready to deploy your AI workload?

Have a question? Need help? Contact us

Have a question? Need help?