Transcribe Audio (Faster Whisper) avatar

Transcribe Audio (Faster Whisper)

1 version
Open in App

Transcribe Audio (Faster Whisper)

Summary: CTranslate2-accelerated Whisper with built-in Silero VAD filtering, 4x faster than vanilla Whisper with configurable hallucination suppression.

Use This When

  • You want Whisper-quality transcription with significantly better performance
  • You need built-in VAD filtering to suppress hallucinations on silence
  • You need flexible model size selection from tiny to large-v3-turbo
  • Multilingual support across 100+ languages is required

What It Does

  • Converts audio frames to text using Whisper models via CTranslate2 backend
  • Built-in Silero VAD pre-filters silent segments before they reach the decoder
  • Supports INT8 and FP16 quantization for speed/memory tradeoffs
  • Configurable no_speech_threshold and beam_size for accuracy tuning
  • 4x faster than vanilla Whisper at equivalent accuracy

Works Best With

  • Drop-in replacement for the standard Transcribe Audio component with better performance
  • Pair with detect-voice-activity for additional silence pre-filtering
  • Voice pipelines where Whisper compatibility and multilingual support matter

Caveats

  • Still hallucinates on silence (Whisper architecture limitation) but VAD filter mitigates most cases
  • Requires CUDA 12 + cuDNN 9 for GPU inference
  • Autoregressive decoder is inherently slower than CTC-based alternatives for short utterances
  • Model auto-downloads from HuggingFace on first run (size varies by model)

Versions

  • 1436f7d5latestdefaultlinux/amd64

    Automated release