Detect speech turn completion for conversation flow control

Use This When

Building conversational AI systems that need to know when speaker finishes utterance
Implementing turn-taking logic for multi-party dialogue systems
Creating voice assistants that should respond only after user completes thought
Gating LLM processing to complete utterances to improve response relevance

What It Does

Predicts the probability (0.0-1.0) that a speech turn is complete using a trained endpoint model
Uses Smart Turn v3 (ONNX + Whisper features) with an 8-second audio window (pads/truncates to last 8s)
Resamples audio to 16kHz and normalizes to [-1, 1] range
Returns completion probability so downstream logic can apply its own thresholding / smoothing
Processes each frame independently for real-time conversation flow decisions
Auto-selects an ONNX file from the model repo by default (override via onnx_filename if needed)
Supports selecting execution device via device ('auto', 'cuda', or 'cpu')

Works Best With

detect-voice-activity → this component → query-llm for conversational AI pipelines
Speech recognition → this component → determine when to send transcript to LLM
Integration with chatbot components to trigger response generation at natural pauses
Voice UI systems that need to distinguish partial from complete user commands

Caveats

Model trained on specific speaker patterns; accuracy varies with accents and speaking styles
Sample rate mismatch requires resampling which adds latency; configure appropriately
Natural pauses may appear as turn endings depending on how you threshold/smooth the probability
Normalization assumes calibrated audio levels; extremely quiet speech may fail classification

Detect Turn Speech