This summer I am doing an internship at Big Vision LLC under the guidance of Dr. Satya Mallick. In this post, I will describe the problem I was asked to solve to qualify for the internship. For a beginner who is just starting out in AI, this problem was just outside my comfort zone. A difficult task, but achievable with effort. It helped me stretch my abilities and learn new ideas and tools and test my patience. It took me about 30 hours to do this project, but the best part is that Big Vision would have paid for my time even if I did not qualify!

The first receipient of the prestigious Turing Award, Dr. Alan Perlis once said this about the difficulty and the magic of AI —

” A year spent in Artificial Intelligence is enough to make one believe in God.”

For me, as a beginner, spending 30 hours for this project was equal to believe that AI has its own magic.

As we know, even beginners in AI these days are used to working with very large datasets. Most of these datasets are created manually and with a lot of effort. For example, many beginners use MNIST or CIFAR-10 or use the pre-trained ImageNet model. All these datasets were created manually. Most of the time it is not feasible for an individual to create a large dataset by themselves. It often requires a large team of annotators.

Sometimes, we get lucky, and the problem at hand can be solved by creating a synthetic dataset.

In this post, we will learn how to create a synthetic dataset and use it to train a model for recognizing a single printed character with an arbitrary background, blur, noise and other artifacts. In this case, the synthetic dataset is easy to prepare and a single programmer with access to reasonable computing power can create a dataset of millions of images.

The trained model can later be fine-tuned using a small amount of real data to produce spectacular results.

Let’s go over the process step by step, and as we go through the steps, please remember, we are not going over the state of art in character recognition, we are going over a beginner’s journey into the world of AI.

Now let us look at the details of each step.

Step 1: Install ImageMagick on your Operating System:

You can refer this to download ImageMagick for Windows/Linux/MAC according to the version.

Install ImageMagick on Windows:

1. Download the ImageMagick binary install package from the above link for Windows.

2. At the prompt, type:

convert -version

3. If ImageMagick is already installed, a message will be displayed with the version and copyright notices

Install ImageMagick on Linux/Mac:

1. First, download the latest version of the program sources ImageMagick.tar.gz from this link

2. Unzip the file using:

gunzip -c ImageMagick.tar.gz | tar xvf -

3. In the folder where you have unzipped the file, run:

./configure

4. Once Step 3 is successful, run:

make install

5. To check whether the installation has completed successfully, run:

convert -version

Step 2: Download Backgrounds

I used Google Image Downloader to download thousands of images using keywords like “Background”, “Texture”, “Pattern” etc. It is easy to install and it works like a charm. In addition, if you are developing a commercial application, you can download images with permissible licenses only to be on the safe side using the usage_rights (-r) command line argument.

Step 3: Download Google Fonts

Google has an amazing collection of fonts at Google Fonts. I downloaded about 25 fonts for my project. In real-world application, you would download pretty much all the fonts.

Note: Even if your application domain is restricted to a small set of fonts, it is a good idea to train a much larger collection of fonts to avoid over-fitting.

Download Code

To easily follow along this tutorial, please download code by clicking on the button below. It’s FREE! Download Code To easily follow along this tutorial, please download code by clicking on the button below. It’s FREE!

Step 4: Create synthetic data

I used ImageMagick to generate synthetic data in this project. The figure above shows the workflow. Regular readers of this blog may be thinking why I did not use OpenCV? The reason is that OpenCV has a limited set of fonts.

In this section, I am sharing ImageMagick commands used in my python script. I used a python wrapper to select random fonts, backgrounds, color, warps, noise etc. To start with, we will first see how to use convert function of ImageMagick to generate images having characters(A-Z) and digits(0-9) printed on them.

The operations which are to be performed on any image is done using operators called Image Operators such as crop, blur, evaluate Gaussian-noise, distort Perspective, gravity, font, weight.

The following commands are for ImageMagick 7. If you have ImageMagick 6, drop the word magick from the command and start with convert.