Query OCR

Use This When

What It Does

Key Config

backend: tesseract | easyocr | rapidocr | paddleocr | trocr | doctr | minicpm | llava | gotocr | surya
use_gpu: enable GPU acceleration where supported (defaults to true; falls back to CPU if not available)
language: language code(s). For Tesseract use ISO3 (e.g., eng), for EasyOCR use 2-letter (e.g., en). eng is auto-mapped to en.
prompt: instructions for VLM backends
Tesseract tuning: tesseract_oem, tesseract_psm
Surya: surya_task_name, surya_output_format, surya_disable_math, surya_sort_lines, surya_separator, surya_recognition_batch_size, surya_detector_batch_size, surya_foundation_chunk_size, surya_foundation_max_tokens, surya_foundation_model_quantize, surya_disable_tqdm
Hugging Face model ids:
- trocr_model, minicpm_model, llava_model, gotocr_model
LLaVA: num_tokens
GOT-OCR: gotocr_mode, gotocr_crop, gotocr_box, gotocr_color
RapidOCR: rapidocr_text_only
PaddleOCR: paddle_lang
PaddleX cache (PaddleOCR): paddlex_models, paddlex_revision
TrOCR: trocr_use_rapid_det
docTR: doctr_det_arch, doctr_reco_arch

Notes

Tesseract requires the Tesseract binary installed in the runtime image.
EasyOCR uses PyTorch and OpenCV; GPU is optional via easyocr_gpu.
Surya downloads model weights on first use (requires network access in the runtime environment).
VLM backends are much heavier; GPU recommended.

Build

Per repo rule, build with: ppl compile --non-interactive from this component directory (requires elevated permissions).

Query OCR (Surya)