Generate Speech (Kokoro) avatar

Generate Speech (Kokoro)

1 version
Open in App

Generate speech using Kokoro TTS models

Use This When

  • Building voice assistants that need fast, natural-sounding text-to-speech responses
  • Creating conversational UIs where low-latency TTS improves user experience
  • Implementing audio feedback systems for accessibility or hands-free operation
  • Generating voice narration for automated content creation or alerts

What It Does

  • Converts text strings to audio using Kokoro neural TTS pipeline
  • Supports multiple voice presets via configurable voice parameter (e.g., af_heart)
  • Returns AudioFrame at 24kHz sample rate (tensor shape [samples, channels], typically [N, 1])
  • Handles empty input gracefully by returning silent audio frame

Works Best With

  • query-llm → this component → output-audio-file or audio playback for voice responses
  • Chatbot systems → this component → real-time audio streaming to users
  • Integration with detect-voice-activity for bidirectional voice conversations
  • Alert systems that need spoken notifications rather than visual displays

Caveats

  • Fixed 24kHz output sample rate; resampling required if downstream expects different rate
  • Voice quality and availability depends on Kokoro model; verify voice parameter validity
  • Real-time factor (RTF) varies by text length and hardware; GPU strongly recommended
  • Language code parameter affects pronunciation; ensure alignment with input text language

Versions

  • db738b77latestdefaultlinux/amd64

    Automated release