Query Vision (Ollama) avatar

Query Vision (Ollama)

1 version
Open in App

Query vision LLM via Ollama

Use This When

  • Running vision-language models locally with Ollama for image understanding
  • Building visual QA, captioning, or OCR pipelines with prompt-driven VLMs
  • Processing images with natural language queries using models like deepseek-ocr or llava
  • Deploying vision AI without external API dependencies

What It Does

  • Encodes input images (JPEG or PNG) and sends them with a text prompt to an Ollama-served VLM
  • Returns the model's text response for each image-prompt pair
  • Supports concurrent requests via thread pool with ordered output when max_parallel > 1
  • Automatic health check retry waits for Ollama service availability before requests
  • Pulls the model automatically if not already available in Ollama

Works Best With

  • Camera or image inputs + text prompts → this component → downstream text processing
  • Pair with transcribe-audio for multimodal QA (audio → text prompt + image → answer)
  • Chain with send-http or log-message for alert pipelines based on visual analysis
  • Use as an alternative to query-vision-sglang when Ollama model ecosystem is preferred

Caveats

  • Requires Ollama sidecar service (depends_on: ollama); OLLAMA_HOST and OLLAMA_PORT must be set
  • VLM inference is heavy; GPU strongly recommended for reasonable latency
  • num_ctx must be large enough to fit image tokens plus num_predict; set explicitly for large generations
  • JPEG quality parameter trades image fidelity for request payload size

Versions

  • c60cd690latestdefaultlinux/amd64

    Automated release