Transcribe Audio (Parakeet) avatar

Transcribe Audio (Parakeet)

1 version
Open in App

Transcribe Audio (Parakeet)

Summary: High-accuracy speech-to-text using NVIDIA Parakeet-TDT with transducer-based decoding and minimal silence hallucination.

Use This When

  • Accuracy is the top priority (6.05% WER on Open ASR Leaderboard)
  • You need multilingual European language support (v3: 25 languages)
  • You want near-zero hallucination on silence (v3 trained on 36,000 hours of non-speech data)

What It Does

  • Converts audio frames to text using NVIDIA Parakeet-TDT (600M params, FastConformer + TDT decoder)
  • Transducer architecture can emit blank tokens for silence instead of hallucinating
  • 3386x realtime throughput on GPU
  • v2 is English-only with best accuracy; v3 adds multilingual support and better silence handling

Works Best With

  • High-fidelity transcription pipelines where accuracy matters most
  • Pair with detect-voice-activity for pre-segmented audio
  • Meeting transcription, dictation, and professional captioning workflows

Caveats

  • Large dependency footprint: nemo_toolkit pulls pytorch-lightning, hydra-core, omegaconf, sentencepiece
  • Model weights are 2.47GB
  • v2 can produce minor filler words ("Yeah", "Mm-hmm") on silence; v3 handles this better
  • CC-BY-4.0 license

Versions

  • bd174c31latestdefaultlinux/amd64

    Automated release