Query Vision (Ollama)

1 version

Query vision LLM via Ollama

Use This When

Running vision-language models locally with Ollama for image understanding
Building visual QA, captioning, or OCR pipelines with prompt-driven VLMs
Processing images with natural language queries using models like deepseek-ocr or llava
Deploying vision AI without external API dependencies

What It Does

Encodes input images (JPEG or PNG) and sends them with a text prompt to an Ollama-served VLM
Returns the model's text response for each image-prompt pair
Supports concurrent requests via thread pool with ordered output when max_parallel > 1
Automatic health check retry waits for Ollama service availability before requests
Pulls the model automatically if not already available in Ollama

Works Best With

Camera or image inputs + text prompts → this component → downstream text processing
Pair with transcribe-audio for multimodal QA (audio → text prompt + image → answer)
Chain with send-http or log-message for alert pipelines based on visual analysis
Use as an alternative to query-vision-sglang when Ollama model ecosystem is preferred

Caveats

Requires Ollama sidecar service (depends_on: ollama); OLLAMA_HOST and OLLAMA_PORT must be set
VLM inference is heavy; GPU strongly recommended for reasonable latency
num_ctx must be large enough to fit image tokens plus num_predict; set explicitly for large generations
JPEG quality parameter trades image fidelity for request payload size

Versions

c60cd690latestdefaultlinux/amd64
Automated release
3/27/2026