RTX 3050 - Order Now
Home / Blog / Tutorials / Silero VAD Production Deployment
Tutorials

Silero VAD Production Deployment

Voice activity detection separates speech from silence before transcription. Silero VAD is tiny, fast, and essential for streaming audio pipelines.

A voice activity detector is the quiet workhorse of any real-time audio pipeline. Silero VAD runs in milliseconds, decides whether each audio chunk contains speech, and is how you avoid transcribing endless silence on dedicated GPU hosting.

Contents

Why VAD

Without VAD, an always-on microphone feeds silence, background noise, and speech alike to Whisper. Whisper hallucinates on silence. VAD gates the transcriber – only speech-containing chunks go through. Benefits:

  • No hallucinated transcripts during silence
  • Lower compute cost (Whisper only runs on real speech)
  • Accurate speech/silence boundaries for downstream tasks

Deployment

import torch

model, utils = torch.hub.load(
    repo_or_dir="snakers4/silero-vad",
    model="silero_vad",
    force_reload=False,
)
(get_speech_timestamps, _, _, _, _) = utils

audio = read_audio("input.wav")
timestamps = get_speech_timestamps(audio, model, threshold=0.5)

Silero VAD is tiny (~2 MB). Runs on CPU or GPU. GPU gives a minor speed-up but is not required.

Streaming

For real-time audio, feed Silero VAD 512-sample chunks (32 ms at 16 kHz) as they arrive. Accumulate chunks flagged as speech, and when you get 2+ seconds of speech-then-silence, hand the buffer to Whisper:

while True:
    chunk = read_from_mic()
    is_speech = vad(chunk) > 0.5
    if is_speech:
        buffer.append(chunk)
    elif buffer and silence_count > 30:  # ~1 second of silence
        transcribe(buffer)
        buffer.clear()

Threshold

Default threshold 0.5 works for clean audio. In noisy environments raise to 0.6-0.7 to reduce false positives. In quiet audio with soft speakers drop to 0.3-0.4 to catch quiet speech.

Real-Time Audio Pipeline Hosting

UK dedicated GPUs with Silero VAD and Whisper preconfigured for streaming.

Browse GPU Servers

See Whisper Turbo and Whisper + diarization.

Need a Dedicated GPU Server?

Deploy from RTX 3050 to RTX 5090. Full root access, NVMe storage, 1Gbps — UK datacenter.

Browse GPU Servers

admin

We benchmark, deploy, and optimise GPU infrastructure for AI workloads. All data in our guides comes from real-world testing on our UK-based dedicated GPU servers.

Ready to deploy your AI workload?

Dedicated GPU servers from our UK datacenter. NVMe storage, 1Gbps networking, full root access.

Browse GPU Servers Contact Sales

Have a question? Need help?