Transcribe Audio (Faster Whisper)
1 version
Transcribe Audio (Faster Whisper)
Summary: CTranslate2-accelerated Whisper with built-in Silero VAD filtering, 4x faster than vanilla Whisper with configurable hallucination suppression.
Use This When
- You want Whisper-quality transcription with significantly better performance
- You need built-in VAD filtering to suppress hallucinations on silence
- You need flexible model size selection from tiny to large-v3-turbo
- Multilingual support across 100+ languages is required
What It Does
- Converts audio frames to text using Whisper models via CTranslate2 backend
- Built-in Silero VAD pre-filters silent segments before they reach the decoder
- Supports INT8 and FP16 quantization for speed/memory tradeoffs
- Configurable no_speech_threshold and beam_size for accuracy tuning
- 4x faster than vanilla Whisper at equivalent accuracy
Works Best With
- Drop-in replacement for the standard Transcribe Audio component with better performance
- Pair with detect-voice-activity for additional silence pre-filtering
- Voice pipelines where Whisper compatibility and multilingual support matter
Caveats
- Still hallucinates on silence (Whisper architecture limitation) but VAD filter mitigates most cases
- Requires CUDA 12 + cuDNN 9 for GPU inference
- Autoregressive decoder is inherently slower than CTC-based alternatives for short utterances
- Model auto-downloads from HuggingFace on first run (size varies by model)
Versions
- 1436f7d5latestdefaultlinux/amd64
Automated release