OCR (Optical Character Recognition) is the process that converts input images into texts.
OCR is also a significant area of research in AI, Pattern Recognition, and Computer Vision.
1. Steps of OCR
i) Text Detection (of Input Image)
- Algorithms: DB && DB++, EAST, SAST, PSENet, FCENet, etc.
- Using CRAFT (Character-Region Awareness for Text Detection) => Strong Supervision / Weak Supervision
ii) Skew Correction
- Using Python OpenCV
- Detection of the Block of Text in the Input Image
- Compute the Angle of the Rotated Text
- Rotate the Input Image to Correct for the Skew
iii) Text Recognition
- Algorithms: CRNN, Rosetta, STAR-Net, RARE, SRN, NRTR, etc.
- Using Deep-Text-Recognition-Benchmark
2. Improving the Accuracy of OCR
- Skew Correction (Well Aligned Characters)
- Better Quality of the Image
- Higher Contrast of the Image
- Sharper Character Borders
- Less Pixel Noise
3. Types of Input Images & Output Texts
- Input Images: PDF, TIFF, JPG, etc.
- Output Texts: TXT, etc.
4. Significance of OCR for Computer Vision
- Improve Accuracy
- Speed-up the Process
- Cost-effective
- Improve Productivity
5. Applications of OCR
- Sheet Music Recognition
- Document Identification
- Data Entry Automation
- Archives and Digital Libraries Creation
- Text Translation
- Marketing Campaigns
- Banking
- Healthcare
- Legal
References
- https://github.com/clovaai/CRAFT-pytorch
- https://docparser.com/blog/improve-ocr-accuracy/
- https://pyimagesearch.com/2017/02/20/text-skew-correction-opencv-python/
- https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.6/doc/doc_en/algorithm_overview_en.md#12
- https://yongwookha.github.io/MachineLearning/2021-06-04-open-ocr-engine
-https://viso.ai/computer-vision/optical-character-recognition-ocr/
- https://medium.com/swlh/applications-of-ocr-you-havent-thought-of-69a6a559874b