Query AI (Ollama)

1 version

Query Ollama LLM with streaming or batch responses

Use This When

Building conversational AI assistants or chatbots
Creating interactive voice interfaces needing natural language understanding
Implementing intelligent agents that respond to user prompts
Adding LLM reasoning or decision-making to pipelines

What It Does

Sends prompts to Ollama-hosted LLM models with configurable temperature
Accepts a per-request system prompt as a second input
Supports streaming mode pushing token-by-token chunks or batch full responses
Optionally displays chain-of-thought reasoning before final answer
Preempts ongoing generation when new input arrives if enabled

Works Best With

Voice input → transcribe-audio → this component → TTS for voice assistants
Document text → this component for Q&A or summarization
Integration with activate-voice-assistant for wake-word gated conversations
Multi-modal pipelines combining caption-image-lavis → this component for visual Q&A

Caveats

Requires Ollama service running and model pulled; component fails without backend
Streaming mode generates multiple output messages; downstream must handle fragments
Preemption stops ongoing generation abruptly; may truncate useful partial responses
Display reasoning mode adds significant latency and token count

Versions

74be85e8latestdefaultlinux/amd64
Automated release
3/27/2026