3 things you need to know about OCR
Technology has become an integral part of our lives and is influencing everyday decisions not only for individuals but for society as well. Pick any section of society or the areas we work in such as education, transportation, medicine, entertainment, etc. technology has a vital role to play in all these departments. Have you ever wondered what technology lies behind when you excitedly scan the vouchers and promotional codes on the packets of any grocery item or maybe other products on your mobile phones to avail offers? Or, have you ever wondered how you got a challan for a traffic rule that you broke when you thought there was no cop to see it? Well, the technology that is working behind the scenes in these scenarios is OCR (Optical Character Recognition).
OCR is a computer technology that helps machines to understand the text in a given sample. The sample could be an image, a written or a printed document. OCR is mostly used to differentiate between any printed or handwritten text characters inside a digital image of physical texts such as a scanned document or a vehicle registration plate. It is one of the earliest addressed computer vision tasks and when the data is normal and simplified only normal algorithms are sufficient. However, for large and complex datasets, deep learning is required. There are a lot of areas where OCR plays a vital role. The data available nowadays is in terabytes & petabytes and it requires deep learning to analyze such large data for valuable outputs. OCR functions on a very basic process that involves examining the text of any particular document and then translating the characters into code to be used for data processing.
In this blog, I will talk about some strategies, methods, and logic to address different OCR tasks. To make it more simple, I will provide available datasets for you to play and understand the fundamentals of OCR.
- Factors influencing OCR
In layman’s term, OCR works in extracting all the possible textual information from an image, for example reading a license plate number or road signs. The OCR technology is used in data entry automation, to assist blind and visually impaired and to index documents for search engines. What makes OCR a gem in everyday life is its ability to strengthen the systems and services. However, there are some factors which one needs to keep an eye on before we start using OCR. Here are some of those -
- Text density: Text density varies from one sample to be analyzed to another. For a written/printed page the text is dense, whereas the image of a street with a single street sign is scattered. Such variation will result in a change in density for different images.
- Structure of text: The printed text on a page is well-structured in strict rows. The handwritten text appears to be sparse in different rotations.
- Fonts: Printed fonts are easy to read and easy to extract because they all are well-structured but the handwritten could be unorganized or not that well-structured to what you call noisy…read more.