Detect Voice Activity (Silero Vad)
1 version
Detect voice activity using Silero VAD
Use This When
- Building voice assistants that should only process speech segments to save compute
- Implementing push-to-talk alternatives with automatic speech detection
- Reducing ASR and LLM costs by filtering out silence and non-speech audio
- Creating audio recording systems that skip silent intervals
What It Does
- Detects presence of human speech in audio frames using lightweight Silero VAD model
- Resamples audio to 16kHz and returns boolean indicating speech presence
- Generates timestamp-based speech segments for temporal speech localization
- Runs efficient neural model suitable for real-time streaming applications
Works Best With
- Audio inputs → this component → transcribe-audio to gate ASR on speech segments only
- Integration with denoise-audio → this component → ASR for clean speech detection
- Voice assistant pipelines where VAD triggers wake word detection or command processing
- Recording systems that need automatic silence removal or speech-only archival
Caveats
- Very low SNR or loud music can cause false positive speech detections
- Model optimized for 16kHz; quality degrades if original recording is lower sample rate
- Frame-level decisions may miss very short utterances shorter than analysis window
- Background babble or TV dialogue may trigger false positives in multi-speaker environments
Versions
- 182b7f25latestdefaultlinux/amd64
Automated release