How to Use Optical Character Recognition for Security System Development

Deep Learning and Computer Vision for ID Documents Data Recognition.

Applying machine learning techniques to security solutions is one of the current AI trends. This article will cover the approach to developing OCR-based software using deep learning algorithms. This software can be used to analyze and process identification such as a US driver’s license as part of a security system for verifying identity.

OCR (Optical Character Recognition) technology is already used by machine learning companies for business processes automation and optimization, with use cases ranging from Dropbox using it to parse through pictures to Google Street view identifying different street signs to searching through text messages and translating text in real time.

In this particular case, OCR can be used as part of an automated

biometric verification system. The solution uses selfie photos and runs a comparison against a database containing features from faces (face embeddings) extracted from driver’s licenses.

The OCR process uses the following data:

A selfie photograph to compare against the official ID photograph.

A picture of the front of a driver’s license, which is used in detecting the face.

A picture of the rear of the driver’s license containing barcode, from which we capture data like DOB, name and other fields.

In beta testing, it became apparent that users were able to spoof the verification process. One way of doing so was passing along pictures from multiple documents. The first picture a user showed was of valid identification, but the second, back-side picture contained false information. Fraud attempts are a given for any real-world implementation, and an independent verification system has been created to prevent spoofing attempts. It cross checks both of the pictures representing the two sides of the ID to confirm matches or flag discrepancies.

Deciding on the Optimal OCR Solution

Once the range of use cases has been articulated, we began to work with Optical Character Recognition SDKs and APIs. OCR solutions are available in a number of forms, from open source to commercial, off-the-shelf solutions.

One might assume the simple approach would be the best, in which the top-of-the-line commercial OCR solution could be implemented to read in pictures and process the relevant info. But this didn’t prove to be an effective approach.

User Friendliness and Security Concerns with OCR

One significant challenge was that the pictures on IDs and the selfie photos had significant positioning and quality disparities at times. This is an unavoidable consequence of using different cameras and processes to take them.

If the system requires a high threshold for accuracy in finding matches, a large number of legitimate photos will be rejected, which can cause issues with the user experience. A balance needs to be struck is between holding a high security standard while keeping false rejections to the bare minimum.

One way of dealing with this is to improve the quality of the user-submitted selfies. By creating a smoothless UI/UX experience along with a set of easy-to-follow instructions for taking the picture, the quality of the selfie photos can be dramatically increased.

Once the user-submitted pictures become standardized, it’s possible to strike a better balance between user friendliness and security. While many computer vision-based OCR solutions can be described as a ‘black box’, this one requires raw data to function.

Challenges of the Driver’s License Data Recognition

Each state in the United States creates a unique driver’s license format, with these formats changing periodically. As a result, it’s not possible to simply pre-generate templates for parsing licenses. Another hurdle is the fact that driver’s license pictures are frequently of poor quality. Any OCR solutions working with them must account for these issues.

Any OCR system using CV and ML will generate errors. With this system, it was necessary to create a secure and reliable solution which was able to handle pictures of low quality paired with the OCR component. As a result, both DS and software engineering teams took part in the development process.

In order to map out the development of any solution based on AI, it’s necessary to answer a few fundamental questions. Chiefly, what are the mandatory data requirements, and what are the ways in which available data can be used?

We began data mining with a small set of approximately 150 IDs and 100 driver’s licenses, pulled both from existing open datasets and the ones gathered by us.

At the preliminary stage, existing open-source and commercial OCR systems were compared and evaluated. A short-list of best-suited OCR solutions were selected, and these systems were subjected to evaluation using the real dataset.

The metric being tracked during the research were direct DOB, name and surname matches, and Google Vision was the natural choice for the process.

Evaluating a Large Open Dataset

An alternative approach might be to try to tackle a massive open database with book scans or pictures containing text. However, this approach doesn’t fit the unique task of dealing with driver’s licenses. Given the variability of driver’s license templates and the frequently low quality photos accompanying them, many OCR solutions that would otherwise be suitable simply aren’t up to the task. An OCR solution that requires perfectly aligned text and excellent pictures to function will fail in this capacity.

The actual task of parsing driver’s licenses and other IDs gives the truest indicator of a solution’s fitness. The best dataset was the one specifically collected with the task in mind to fit the target audience.

Implementing an OCR Engine

A data security compliant OCR solution demands an approach combining DS, ML and Software Engineering.

A primary challenge was in dealing with the raw data Google Vision delivers and cross-referencing it with barcode-delivered data at 100% accuracy levels. While Google’s OCR system is the top of the industry, mistakes are inevitable. While many cases allow for errors, in security-related fields these errors cannot occur. This OCR solution required an additional layer of security to defeat attempts at fraud working on the limitations of OCR.

Additional Attention to Machine Readable Zone

MRZ (Machine Readable Zone) is a section of any travel document with well articulated data fields, whether a number or a letter. An MRZ will also have checking digits to make sure that the data has been parsed accurately.

However, even using Google Vision, the accuracy value able to be achieved for MRZ recognition is significantly below 100%. As a result, it was necessary to create an extra process for data cross-checking.

There are a number of frequent mistakes that OCR is prone to making when reading data. One example would be the symbols ‘1, l, I, I’ and ‘O, D, Q, 0’. These symbols are physically similar, and can be misclassified by the engine. By collecting historical error data, it’s possible to build a component that can correct an error made by the OCR system.

In summary, four tips on working with OCR: