Custom OCR Extraction of a Unique Field from Identity Documents

Introduction

Picture courtesy: https://unsplash.com/@convertkit

Our client, a leading provider of education credentials verification services, presented us with a unique challenge. Their service automates the verification of documents, and they need to extract a specific field from a particular identity document. The challenge was twofold: this extraction process had to be custom-built and achieve an accuracy of 95% or higher (the lower the accuracy, the higher the manual effort - which is expensive).

The client insisted on a solution that did not rely on OCR support from AWS or Azure. Their goal was to ensure that we were not bound to an external service, allowing for greater flexibility and control over the process. This requirement added a layer of complexity to the task, but we were up for the challenge.

The Solution

Our approach to solving this problem involved several steps.

Image Pre-processing

An OCR library works best when you pass it an image where the Region of Interest (RoI) is clear for text extraction. Hence, before passing the image of the identity document to the OCR modules, we needed to preprocess it. We adopted techniques like Gaussian Blurring and Binarization to enhance the image quality and make it more suitable for OCR.

Gaussian blurring is a technique in image processing that is used to reduce noise - in the case of identity document images - there are light reflections of the camera or inadequate lighting.

Binarization is a technique in image processing to convert a greyscale image to black and white; further, these images have a lot of designs like currency notes to prevent fraud, but that makes it difficult for image processing - so binarization makes the image a lot easier. The OpenCV library was used for these two techniques.

Illustration of before and after applying the techniques on an image

Face Detection

Why do we need to detect the face? The most prominent portion of the identity document is the image of the person and that does not change in size. From the position of the image, we detect the position of the information present. The face recognition feature of the library dlib is used for this purpose.

Determine skewness and deskew images

The identity documents are scanned using smartphones and there will be skewness in the image orientation. The quality of the OCR dramatically improves with straight images. But how do we find if an image of the document is skewed?

The document had something similar to a barcode. Smoothen the barcode image and apply the blackhat morphological operator to find dark regions on a light background. Now that we have the box, find the angle of the box. Based on the angle of the skewness, de-skew the image by rotating the image appropriately.

Illustration of applying the morphology to the bar-code image

Determine the Region of Interest

From the position of the face, the portion to the right is extracted as a wider RoI. Each contour is passed to the OCR detector from top to bottom.

OCR Detectors

The pre-processed image contours are passed to OCR detectors - Tesseract first and if it fails to detect, the image is passed to EasyOCR.

Conclusion

Our custom-built solution achieved impressive results. Without face detection, we achieved an accuracy of 80%. However, with the addition of face detection, the accuracy of the RoI extracted increased to 88%. The final piece of the puzzle was skewness detection and de-skewing, which contributed the remaining 8%, taking our total accuracy to a whopping 96%.

This project demonstrated that with the right techniques and tools, it is possible to build a custom OCR solution that meets high accuracy requirements without relying on external services.