Query OCR (Surya) avatar

Query OCR (Surya)

1 version
Open in App

Query OCR

Use This When

  • You want one OCR worker with pluggable backends
  • Switch between lightweight OCR (Tesseract, EasyOCR) and VLMs (MiniCPM, LLaVA)

What It Does

  • Accepts an Image and runs OCR using the selected backend
  • Backends:
    • tesseract (CPU, classic OCR)
    • easyocr (CPU/GPU, detection+recognition)
    • surya (Surya OCR; downloads model weights on first use)
    • minicpm (VLM via Transformers; prompt-driven)
    • llava (VLM; visual QA style)
  • Lazily initializes heavy models and reuses them across calls

Key Config

  • backend: tesseract | easyocr | rapidocr | paddleocr | trocr | doctr | minicpm | llava | gotocr | surya
  • use_gpu: enable GPU acceleration where supported (defaults to true; falls back to CPU if not available)
  • language: language code(s). For Tesseract use ISO3 (e.g., eng), for EasyOCR use 2-letter (e.g., en). eng is auto-mapped to en.
  • prompt: instructions for VLM backends
  • Tesseract tuning: tesseract_oem, tesseract_psm
  • Surya: surya_task_name, surya_output_format, surya_disable_math, surya_sort_lines, surya_separator, surya_recognition_batch_size, surya_detector_batch_size, surya_foundation_chunk_size, surya_foundation_max_tokens, surya_foundation_model_quantize, surya_disable_tqdm
  • Hugging Face model ids:
    • trocr_model, minicpm_model, llava_model, gotocr_model
  • LLaVA: num_tokens
  • GOT-OCR: gotocr_mode, gotocr_crop, gotocr_box, gotocr_color
  • RapidOCR: rapidocr_text_only
  • PaddleOCR: paddle_lang
  • PaddleX cache (PaddleOCR): paddlex_models, paddlex_revision
  • TrOCR: trocr_use_rapid_det
  • docTR: doctr_det_arch, doctr_reco_arch

Notes

  • Tesseract requires the Tesseract binary installed in the runtime image.
  • EasyOCR uses PyTorch and OpenCV; GPU is optional via easyocr_gpu.
  • Surya downloads model weights on first use (requires network access in the runtime environment).
  • VLM backends are much heavier; GPU recommended.

Build

  • Per repo rule, build with: ppl compile --non-interactive from this component directory (requires elevated permissions).

Versions

  • c332a33elatestdefaultlinux/amd64

    Automated release