Detect Turn Speech avatar

Detect Turn Speech

1 version
Open in App

Detect speech turn completion for conversation flow control

Use This When

  • Building conversational AI systems that need to know when speaker finishes utterance
  • Implementing turn-taking logic for multi-party dialogue systems
  • Creating voice assistants that should respond only after user completes thought
  • Gating LLM processing to complete utterances to improve response relevance

What It Does

  • Predicts the probability (0.0-1.0) that a speech turn is complete using a trained endpoint model
  • Uses Smart Turn v3 (ONNX + Whisper features) with an 8-second audio window (pads/truncates to last 8s)
  • Resamples audio to 16kHz and normalizes to [-1, 1] range
  • Returns completion probability so downstream logic can apply its own thresholding / smoothing
  • Processes each frame independently for real-time conversation flow decisions
  • Auto-selects an ONNX file from the model repo by default (override via onnx_filename if needed)
  • Supports selecting execution device via device ('auto', 'cuda', or 'cpu')

Works Best With

  • detect-voice-activity → this component → query-llm for conversational AI pipelines
  • Speech recognition → this component → determine when to send transcript to LLM
  • Integration with chatbot components to trigger response generation at natural pauses
  • Voice UI systems that need to distinguish partial from complete user commands

Caveats

  • Model trained on specific speaker patterns; accuracy varies with accents and speaking styles
  • Sample rate mismatch requires resampling which adds latency; configure appropriately
  • Natural pauses may appear as turn endings depending on how you threshold/smooth the probability
  • Normalization assumes calibrated audio levels; extremely quiet speech may fail classification

Versions

  • b9c7b31clatestdefaultlinux/amd64

    Automated release