Transcribe Audio (Faster Whisper)

1 version

Transcribe Audio (Faster Whisper)

Summary: CTranslate2-accelerated Whisper with built-in Silero VAD filtering, 4x faster than vanilla Whisper with configurable hallucination suppression.

Use This When

You want Whisper-quality transcription with significantly better performance
You need built-in VAD filtering to suppress hallucinations on silence
You need flexible model size selection from tiny to large-v3-turbo
Multilingual support across 100+ languages is required

What It Does

Converts audio frames to text using Whisper models via CTranslate2 backend
Built-in Silero VAD pre-filters silent segments before they reach the decoder
Supports INT8 and FP16 quantization for speed/memory tradeoffs
Configurable no_speech_threshold and beam_size for accuracy tuning
4x faster than vanilla Whisper at equivalent accuracy

Works Best With

Drop-in replacement for the standard Transcribe Audio component with better performance
Pair with detect-voice-activity for additional silence pre-filtering
Voice pipelines where Whisper compatibility and multilingual support matter

Caveats

Still hallucinates on silence (Whisper architecture limitation) but VAD filter mitigates most cases
Requires CUDA 12 + cuDNN 9 for GPU inference
Autoregressive decoder is inherently slower than CTC-based alternatives for short utterances
Model auto-downloads from HuggingFace on first run (size varies by model)

Versions

1436f7d5latestdefaultlinux/amd64
Automated release
4/8/2026