OCR Engine Research

by 들숨

OCR (Optical Character Recognition) is the process that converts input images into texts.

OCR is also a significant area of research in AI, Pattern Recognition, and Computer Vision.




1. Steps of OCR


i) Text Detection (of Input Image)

- Algorithms: DB && DB++, EAST, SAST, PSENet, FCENet, etc.

- Using CRAFT (Character-Region Awareness for Text Detection) => Strong Supervision / Weak Supervision


ii) Skew Correction

- Using Python OpenCV

- Detection of the Block of Text in the Input Image

- Compute the Angle of the Rotated Text

- Rotate the Input Image to Correct for the Skew


iii) Text Recognition

- Algorithms: CRNN, Rosetta, STAR-Net, RARE, SRN, NRTR, etc.

- Using Deep-Text-Recognition-Benchmark




2. Improving the Accuracy of OCR

- Skew Correction (Well Aligned Characters)

- Better Quality of the Image

- Higher Contrast of the Image

- Sharper Character Borders

- Less Pixel Noise




3. Types of Input Images & Output Texts

- Input Images: PDF, TIFF, JPG, etc.

- Output Texts: TXT, etc.




4. Significance of OCR for Computer Vision

- Improve Accuracy

- Speed-up the Process

- Cost-effective

- Improve Productivity




5. Applications of OCR

- Sheet Music Recognition

- Document Identification

- Data Entry Automation

- Archives and Digital Libraries Creation

- Text Translation

- Marketing Campaigns

- Banking

- Healthcare

- Legal



References

- https://github.com/clovaai/CRAFT-pytorch

- https://docparser.com/blog/improve-ocr-accuracy/

- https://pyimagesearch.com/2017/02/20/text-skew-correction-opencv-python/

- https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.6/doc/doc_en/algorithm_overview_en.md#12

- https://yongwookha.github.io/MachineLearning/2021-06-04-open-ocr-engine

-https://viso.ai/computer-vision/optical-character-recognition-ocr/

- https://medium.com/swlh/applications-of-ocr-you-havent-thought-of-69a6a559874b

keyword
작가의 이전글일하는 작업절차와 방법