Формалізація та первинна експериментальна перевірка адаптивного підходу до вибору OCR послідовності для розпізнавання тексту на зображеннях

Христина  Грицай; Оксана  Грицай; Ольга  Терендій

doi:10.15407/fmmit2026.42.158

Authors

Христина Грицай
Оксана Грицай
Ольга Терендій

DOI:

https://doi.org/10.15407/fmmit2026.42.158

Keywords:

оптичне розпізнавання символів, OCR, попередня обробка зображень, Tesseract, EasyOCR, PaddleOCR, RapidOCR, AmazonTextract, CER, WER, інтегральна оц інка

Abstract

The paper addresses the problem of selecting an appropriate text recognition
pipeline for images by considering image preprocessing methods and the specific
features of modern optical c haracter recognition (OCR) models. The relevance of the
study is determined by the fact that OCR quality depends not only on the selected
recognition model but also on the characteristics of the input image, including noise,
contrast, illumination, resolut ion, text skew, and background complexity.
The aim of the paper is to formalize an adaptive approach to OCR pipeline
selection and to perform its initial experimental evaluation. The proposed approach is
based on generating several preprocessed versions of the same input image, applying
OCR models to each version, obtaining recognized text, text region coordinates,
confidence scores, and processing time, and then evaluating the obtained results using
a multi criteria quality score. The study considers the following OCR tools: Tesseract,
EasyOCR, PaddleOCR, RapidOCR, and Amazon Textract. The preprocessing
configurations include the original image without preprocessing, grayscale
conversion, contrast enhancement, denoising with scaling, and Otsu binarization. The
quality assessment is based on Character Error Rate (CER), Word Error Rate (WER),
processing time, model confidence score, and fuzzy matching score. The experimental
part is considered as an initial experimental evaluation rather than a full scale
sta tistical comparison of OCR models. Its purpose is to verify the logic of the proposed
methodology, identify the main parameters that should be fixed in further experiments,
and prepare a basis for extended research on a larger dataset of images of different
quality. The obtained results demonstrate that the quality of OCR recognition may vary
depending on the selected combination of preprocessing method and OCR model.
However, the results should be interpreted as preliminary and cannot be considered a
final ranking of OCR models. The practical value of the proposed approach lies in its
potential use as a methodological basis for building OCR pipelines in automated
document processing systems, digital archives, electronic document management
systems, information retrieval systems, and applications for text recognition from
images.

References

Wang X.-F., He Z.-H., Wang K., Wang Y.-F., Zou L., Wu Z.-Z. A survey of text detection and recognition algorithms based on deep learning technology // *Neurocomputing*. – 2023. – Vol. 556. – Article 126702. DOI: 10.1016/j.neucom.2023.126702.

Smith R. An Overview of the Tesseract OCR Engine // *Proceedings of the Ninth International Conference on Document Analysis and Recognition (ICDAR 2007)*. – Curitiba, Brazil, 2007. – P. 629–633. DOI: 10.1109/ICDAR.2007.4376991.

Cui L., Xu Y., Lv T., Wei F. Document AI: Benchmarks, Models and Applications // *arXiv preprint*. – 2021. DOI: 10.48550/arXiv.2111.08609.

Appalaraju S., Jasani B., Kota B. U., Xie Y., Manmatha R. DocFormer: End-to-End Transformer for Document Understanding // *Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)*. – 2021. – P. 993–1003.

Baviskar D., Ahirrao S., Potdar V., Kotecha K. Efficient Automated Processing of Unstructured Documents Using Artificial Intelligence: A Systematic Literature Review and Future Directions // *IEEE Access*. – 2021. – Vol. 9. – P. 72894–72936. DOI: 10.1109/ACCESS.2021.3072900.

Subramani N., Matton A., Greaves M., Lam A. A Survey of Deep Learning Approaches for OCR and Document Understanding // *arXiv preprint*. – 2020. DOI: 10.48550/arXiv.2011.13534.

Long S., He X., Yao C. Scene Text Detection and Recognition: The Deep Learning Era // *International Journal of Computer Vision*. – 2021. – Vol. 129. – P. 161–184. DOI: 10.1007/s11263-020-01369-0.

Raisi Z., Naiel M. A., Fieguth P., Wardell S., Zelek J. Text Detection and Recognition in the Wild: A Review // *arXiv preprint*. – 2020. DOI: 10.48550/arXiv.2006.04305.

Kim G., Hong T., Yim M., Nam J., Park J., Yim J., Hwang W., Yun S., Han D., Park S. OCR-Free Document Understanding Transformer // *Computer Vision – ECCV 2022*. – *Lecture Notes in Computer Science*. – Cham: Springer, 2022. – Vol. 13688. – P. 498–517. DOI: 10.1007/978-3-031-19815-1_29.

Kshetry R. L. Image Preprocessing and Modified Adaptive Thresholding for Improving OCR // *arXiv preprint*. – 2021. DOI: 10.48550/arXiv.2111.14075.

Otsu N. A Threshold Selection Method from Gray-Level Histograms // *IEEE Transactions on Systems, Man, and Cybernetics*. – 1979. – Vol. 9, No. 1. – P. 62–66. DOI: 10.1109/TSMC.1979.4310076.

*Quality Assurance in OCR-D: Evaluation Specification*. – OCR-D Documentation. – 2022.

*Recommendation ITU-R BT.601-7. Studio Encoding Parameters of Digital Television for Standard 4:3 and Wide-Screen 16:9 Aspect Ratios*. – Geneva: International Telecommunication Union, 2011. – 20 p.

Gonzalez R. C., Woods R. E. *Digital Image Processing*. – 4th ed. – New York: Pearson, 2018. – 1168 p.

Tekalp A. M. *Digital Video Processing*. – 2nd ed. – Hoboken: Prentice Hall Press, 2015. – 624 p.

Zuiderveld K. Contrast Limited Adaptive Histogram Equalization // *Graphics Gems IV* / ed. by P. S. Heckbert. – San Diego: Academic Press, 1994. – P. 474–485. DOI: 10.1016/B978-0-12-336156-1.50061-6.

*EasyOCR: Ready-to-use OCR with 80+ Supported Languages*. – GitHub repository.

*PaddleOCR: Turn Any PDF or Image Document into Structured Data for Your AI*. – GitHub repository.

*RapidOCR: Open Source OCR Tool for Multi-Platform and Offline Deployment*. – GitHub repository.

*Amazon Textract Developer Guide*. – Amazon Web Services Documentation.

*AnalyzeDocument – Amazon Textract API Reference*. – Amazon Web Services Documentation.

Wagner R. A., Fischer M. J. The String-to-String Correction Problem // *Journal of the ACM*. – 1974. – Vol. 21, No. 1. – P. 168–173. DOI: 10.1145/321796.321811.

Hwang C.-L., Yoon K. *Multiple Attribute Decision Making: Methods and Applications. A State-of-the-Art Survey*. – Berlin; Heidelberg; New York: Springer-Verlag, 1981. – 259 p. DOI: 10.1007/978-3-642-48318-9.

Formalization and Initial Experimental Evaluationof an Adaptive Approachto OCR Pipeline Selection for Text RecognitioninImages

Authors

DOI:

Keywords:

Abstract

References

Downloads

Published

How to Cite

Issue

Section

License

Developed By

Language

Information