Query Vision (Ollama)
1 version
Query vision LLM via Ollama
Use This When
- Running vision-language models locally with Ollama for image understanding
- Building visual QA, captioning, or OCR pipelines with prompt-driven VLMs
- Processing images with natural language queries using models like deepseek-ocr or llava
- Deploying vision AI without external API dependencies
What It Does
- Encodes input images (JPEG or PNG) and sends them with a text prompt to an Ollama-served VLM
- Returns the model's text response for each image-prompt pair
- Supports concurrent requests via thread pool with ordered output when max_parallel > 1
- Automatic health check retry waits for Ollama service availability before requests
- Pulls the model automatically if not already available in Ollama
Works Best With
- Camera or image inputs + text prompts → this component → downstream text processing
- Pair with transcribe-audio for multimodal QA (audio → text prompt + image → answer)
- Chain with send-http or log-message for alert pipelines based on visual analysis
- Use as an alternative to query-vision-sglang when Ollama model ecosystem is preferred
Caveats
- Requires Ollama sidecar service (depends_on: ollama); OLLAMA_HOST and OLLAMA_PORT must be set
- VLM inference is heavy; GPU strongly recommended for reasonable latency
- num_ctx must be large enough to fit image tokens plus num_predict; set explicitly for large generations
- JPEG quality parameter trades image fidelity for request payload size
Versions
- c60cd690latestdefaultlinux/amd64
Automated release