An Experiment on Building a Classifier by a Non-Technical Person

The hype around AI is beneficial in terms of attracting the best talent in the industry, motivating graduates and young, ambitious professionals to choose a career in AI, which will eventually result in the next achievements and hype cycle in the field.

Sometimes, the hype stimulates more unusual personalities to enter the industry. For example, my background is in ballet. I danced in professional ballet for more than 10 years. I eventually had to quit professional dancing because of an injury and decided to pursue a career path in writing.

Thanks to the AI hype, I’ve discovered a field I feel passionate about. Since the beginning, I’ve been convinced that in order to deliver high-quality content on AI I’d have to deeply learn the subject and get hands-on experience; otherwise, it would be impossible to deliver new perspectives on AI challenges and come up with new fundamental concepts.

I knew I had to gain a deep understanding of all the nuances and changes happening in the AI field. I couldn’t go into it blindly and just repeat the same message everywhere that AI will take over our jobs or kill us or whatever the popular sentiment is about the risks posed by AI.

In conversations with many journalists who cover the AI topic, I realized that it’s really challenging to get into the AI subject as most educational courses are too technical with lots of math and coding. Unfortunately, there is no course to get broad insights and understanding about the nuances on the subject without math and code. This is where a gap in the industry exists and needs to be addressed.

Despite this, as an AI writer, I decided to embrace the math and coding through ML and DL online courses, which greatly benefited me, and I’m considering continuing my deep education of the subject.

As Pieter Abbeel said in his interview to Andrew Ng, it’s great to go through tons of online courses available now, but the most important thing is practical experience. And that applies not only to data scientists but also to facilitators of AI to the mass audience, including writers. Moreover, AI infrastructure is becoming accessible and intuitive even to people without a technical background.

Today, I’d like to share with all of you my hands-on experience on building a classifier. Of course, I conducted an experiment around ballet. I decided to build a classifier to discriminate images in three categories: classic ballerina, modern dancer, and a woman in an evening dress.

Classic ballerina

Modern dancer

Woman in evening dress

Initial setup

The experiment was done with Keras and Ubuntu 16.04 OS. I signed up for the Paperspace cloud service to make all the necessary computations. Paperspace offers cloud services that can be accessed directly from a browser. I used a 8GB GPU machine and booted it from a predefined ML-in-a-Box, which already had everything I needed installed, like Nvidia drivers and TensorFlow.

First, let’s import all the packages we will need.

Step 1. Download images from Google

I created a dataset, aiming to train my neural network to discriminate between three classes: “Classic Ballerina”, “Modern Dancer”, “Woman in Evening Dress”. I collected the data for my purposes by scraping a Google Images feed for particular manually chosen search queries.

It’s important to emphasize the following:

Precise search queries should be done to produce clean and relevant output for the desired class. Get at least a few hundred clean images for each class. Get rid of all irrelevant data (I did this manually). Get rid of obvious flaws that causes our classifier to overfit (like if one class background is always dark while another is not). At the end I received several folders each that contained several hundred examples from each class.

I used the following tool (https://github.com/rushilsrivastava/image-scrappers) that allowed me to automatically collect the images from Google rather than downloading them manually.

Step 2. Clean the data

Once I collected the necessary data in folders, I prepared it for training

This preparation includes first cleaning it:

Remove all non .jpg images

Made a dictionary that maps class names to indexes.

Step 3. Split the data into a training set and test set randomly

I randomly split the data into training and test sets. The fraction of training data may vary, but I set it to 90%.

Step 4. Perform data augmentation

As we don’t have a lot of training data (our dataset is very moderate in size), and our models have a lot of parameters, it is very easy to overfit (perform well on a training set but not so well on new data that the model hasn’t seen during training).

One of the instruments to avoid overfitting and to help the model generalize well is data augmentation. This means creating a large amount of synthetic (artificially created) data.

Synthetic data is generated by taking the original data and modifying it in various ways (mirror reflection, rotate images, zoom in or out, etc.). For the classifier, all those modifications of a single image appear as different new images, making the dataset larger.

You can see below the examples of synthesized images.

Step 5. Train the classifier

After we are done with data preparation, we can start training. I decided to use Keras as my deep learning framework for now, because it is much more user friendly for beginners than TensorFlow. Also, it trains from simple to very sophisticated models in a few lines of code.

As a baseline, I have trained a very small convolutional network to see what accuracy I can get with such a basic solution.

Here is code for training a convnet from scratch.

I received around 80% accuracy with this model. Below you can find the charts that represent the accuracy of the model, the loss and confusion matrix.

The performance is acceptable, but not great. It was difficult for the model to distinguish between ballerina and modern dancer.

So I decided to stop pretending to be a data scientist with 20+ years of experience. I returned to being a layman and referred to a neural network that was created by experienced data scientists.

Concretely, I tried to use ‘transfer learning’. I started with the VGG16 network architecture that was pre-trained on an ImageNet dataset. I marked the original VGG16 weights to stay frozen (unchanged during training). I stacked a couple of custom dense layers and dropout on top of the frozen VGG16. I then trained only the top custom layers’ weights, attempting to teach the combined network to classify between our three classes (not the initial 1000 classes that VGG16 was trained for).

You can see from the charts below that the model achieved 90% accuracy, which is much better, and is not, in fact, overfitting, but under-fitting. Images from the training set which I augmented appeared to be more difficult for the model than the ones from the test set.

E ven though the results improved, the net was still struggling to discriminate perfectly. That made me think that I should fine-tune the weights parameters to enable them to discriminate between classes more efficiently.

Fine-tune dense and convolutional layers

I tried to fine-tune the last convolutional block in the VVG16 network.

This is how I conducted the fine-tuning. I took the pre-trained VGG16 neural network that does well on an ImageNet classification task. I used this net to initialize the weights in my model. I then trained the model on my ballet dataset and kept the learning rate really small. This is important to not break the good structure of the pre-trained VGG16.

The goal here is to change the VGG16 weight only slightly and adapt the net to my specific dance classification task. If the initial gradients flowing through the network are large, the first few training iterations will destroy the filters of the pre-trained network.

Thanks to fine-tuning, I received 97% accuracy. Below you can see the charts that represent the accuracy rate, loss rate and confusion matrix.

You can see that finally the model had learned the data distribution well.

The aim of this post is to encourage anyone to train their neural image classifier, even if there is no large dataset available.

It was a thrilling journey, and there’s more to come!