Detect Voice Activity (Silero VAD)

2 versions

Frame-level voice activity detection using the Silero VAD model. Returns true whenever any speech region is detected inside the audio frame — typical use is gating ASR or downstream language stages.

How it fits

AudioFrame  ──►  detect_voice_activity_silero_vad  ──► Bool (speech present)

Pick this when you want a cheap "is anyone talking?" signal to avoid spending compute on silence. For full turn-end detection use detect_turn_speech.

Typical pipelines

  • Speech-gated transcription: input_audio_filedenoise_audio_mp_senetdetect_voice_activity_silero_vadfiltertranscribe_audio_faster_whispersend_http.
  • Conversation segment collector: live audio → detect_voice_activity_silero_vadcollect (gate) → transcribe_audio_faster_whispergenerate_text_ollama.
  • Recording auto-trim: input_audio_filedetect_voice_activity_silero_vadfilteroutput_audio_file (only speech).

Caveats

  • The boolean fires on ANY speech-flagged region in the frame, no matter how short — clicks, coughs, or sneezes can register as speech.
  • Silero is trained at 16 kHz. The worker resamples internally; that's fine for higher rates but downsampling from 8 kHz hurts accuracy.
  • No exposed tuning knobs (threshold, min/max speech duration, padding). If you need to filter by duration, accumulate VAD outputs downstream.
  • On CPU the worker disables PyTorch's NNPACK backend (Silero needs that for predictable behaviour). On GPU this is irrelevant.
  • Only device is configurable in this worker — for fine-grained Silero tuning you'll need a custom variant or post-processing.

Related components

  • detect_turn_speech — paired component for "speaker has just finished" detection.
  • denoise_audio_mp_senet — typical pre-VAD step on noisy inputs.
  • transcribe_audio_faster_whisper, transcribe_audio_moonshine, transcribe_audio_parakeet, transcribe_audio_sensevoice — typical downstream ASR.
  • filter, collect — typical control-flow consumers.

Versions

  • 6a34b255latestdefaultlinux/amd64

    Automated release

  • 182b7f25linux/amd64

    Automated release