Query OCR (Surya)
1 version
Query OCR
Use This When
- You want one OCR worker with pluggable backends
- Switch between lightweight OCR (Tesseract, EasyOCR) and VLMs (MiniCPM, LLaVA)
What It Does
- Accepts an Image and runs OCR using the selected
backend - Backends:
tesseract(CPU, classic OCR)easyocr(CPU/GPU, detection+recognition)surya(Surya OCR; downloads model weights on first use)minicpm(VLM via Transformers; prompt-driven)llava(VLM; visual QA style)
- Lazily initializes heavy models and reuses them across calls
Key Config
backend: tesseract | easyocr | rapidocr | paddleocr | trocr | doctr | minicpm | llava | gotocr | suryause_gpu: enable GPU acceleration where supported (defaults to true; falls back to CPU if not available)language: language code(s). For Tesseract use ISO3 (e.g.,eng), for EasyOCR use 2-letter (e.g.,en).engis auto-mapped toen.prompt: instructions for VLM backends- Tesseract tuning:
tesseract_oem,tesseract_psm - Surya:
surya_task_name,surya_output_format,surya_disable_math,surya_sort_lines,surya_separator,surya_recognition_batch_size,surya_detector_batch_size,surya_foundation_chunk_size,surya_foundation_max_tokens,surya_foundation_model_quantize,surya_disable_tqdm - Hugging Face model ids:
trocr_model,minicpm_model,llava_model,gotocr_model
- LLaVA:
num_tokens - GOT-OCR:
gotocr_mode,gotocr_crop,gotocr_box,gotocr_color - RapidOCR:
rapidocr_text_only - PaddleOCR:
paddle_lang - PaddleX cache (PaddleOCR):
paddlex_models,paddlex_revision - TrOCR:
trocr_use_rapid_det - docTR:
doctr_det_arch,doctr_reco_arch
Notes
- Tesseract requires the Tesseract binary installed in the runtime image.
- EasyOCR uses PyTorch and OpenCV; GPU is optional via
easyocr_gpu. - Surya downloads model weights on first use (requires network access in the runtime environment).
- VLM backends are much heavier; GPU recommended.
Build
- Per repo rule, build with:
ppl compile --non-interactivefrom this component directory (requires elevated permissions).
Versions
- c332a33elatestdefaultlinux/amd64
Automated release