Detect Voice Activity (Silero Vad) avatar

Detect Voice Activity (Silero Vad)

1 version
Open in App

Detect voice activity using Silero VAD

Use This When

  • Building voice assistants that should only process speech segments to save compute
  • Implementing push-to-talk alternatives with automatic speech detection
  • Reducing ASR and LLM costs by filtering out silence and non-speech audio
  • Creating audio recording systems that skip silent intervals

What It Does

  • Detects presence of human speech in audio frames using lightweight Silero VAD model
  • Resamples audio to 16kHz and returns boolean indicating speech presence
  • Generates timestamp-based speech segments for temporal speech localization
  • Runs efficient neural model suitable for real-time streaming applications

Works Best With

  • Audio inputs → this component → transcribe-audio to gate ASR on speech segments only
  • Integration with denoise-audio → this component → ASR for clean speech detection
  • Voice assistant pipelines where VAD triggers wake word detection or command processing
  • Recording systems that need automatic silence removal or speech-only archival

Caveats

  • Very low SNR or loud music can cause false positive speech detections
  • Model optimized for 16kHz; quality degrades if original recording is lower sample rate
  • Frame-level decisions may miss very short utterances shorter than analysis window
  • Background babble or TV dialogue may trigger false positives in multi-speaker environments

Versions

  • 182b7f25latestdefaultlinux/amd64

    Automated release