Detect Turn Speech
1 version
Detect speech turn completion for conversation flow control
Use This When
- Building conversational AI systems that need to know when speaker finishes utterance
- Implementing turn-taking logic for multi-party dialogue systems
- Creating voice assistants that should respond only after user completes thought
- Gating LLM processing to complete utterances to improve response relevance
What It Does
- Predicts the probability (0.0-1.0) that a speech turn is complete using a trained endpoint model
- Uses Smart Turn v3 (ONNX + Whisper features) with an 8-second audio window (pads/truncates to last 8s)
- Resamples audio to 16kHz and normalizes to [-1, 1] range
- Returns completion probability so downstream logic can apply its own thresholding / smoothing
- Processes each frame independently for real-time conversation flow decisions
- Auto-selects an ONNX file from the model repo by default (override via
onnx_filenameif needed) - Supports selecting execution device via
device('auto', 'cuda', or 'cpu')
Works Best With
- detect-voice-activity → this component → query-llm for conversational AI pipelines
- Speech recognition → this component → determine when to send transcript to LLM
- Integration with chatbot components to trigger response generation at natural pauses
- Voice UI systems that need to distinguish partial from complete user commands
Caveats
- Model trained on specific speaker patterns; accuracy varies with accents and speaking styles
- Sample rate mismatch requires resampling which adds latency; configure appropriately
- Natural pauses may appear as turn endings depending on how you threshold/smooth the probability
- Normalization assumes calibrated audio levels; extremely quiet speech may fail classification
Versions
- b9c7b31clatestdefaultlinux/amd64
Automated release