Webinar and meeting transcription runs comfortably on the RTX 5060 Ti 16GB at our hosting – transcribe, diarise, and summarise a 1-hour recording in under 90 seconds.
Contents
Pipeline
- Upload recording (MP3/MP4/WAV)
- VAD splits into speech segments
- Whisper large-v3-turbo transcribes
- pyannote.audio diarises speakers
- Merge transcript + speaker labels
- Llama 3 8B summarises, extracts decisions, action items
- Output: structured Markdown with timestamps
Throughput
| Stage | Time for 1-hour audio |
|---|---|
| Whisper Turbo INT8 | ~65 seconds |
| pyannote diarisation | ~30 seconds |
| LLM summary (Llama 3 8B) | ~10 seconds |
| Total | ~105 seconds |
90-minute meeting completes in ~2.5 minutes. Daily capacity on one card processing 8-hour days of audio: ~200+ hours of recordings.
Speaker Diarisation
- pyannote/speaker-diarization-3.1 – industry standard
- Runs on GPU, ~500 MB VRAM additional
- Accuracy: 90%+ for 2-5 clearly distinct speakers
- Drops noticeably with overlap or poor mic quality
Summary Output
Feed the diarised transcript into Llama 3 8B with a prompt like:
SYSTEM: Summarise the following meeting transcript. Output sections:
- Attendees
- Key Discussion Points
- Decisions Made
- Action Items (who owns what)
- Open Questions
- Timestamps for key moments
Enable prefix caching since the same system prompt repeats across every recording.
Webinar Transcription on Blackwell 16GB
1 hour audio -> structured notes in 2 minutes. UK dedicated hosting.
Order the RTX 5060 Ti 16GBSee also: Whisper benchmark, voice pipeline, podcast tools, summarisation.