Mobile intelligence — traffic signs classification with retrained MobileNet model

TensorFlow Lite classification model for GTSRB dataset

This post was originally published at thinkmobile.dev - a blog about implementing intelligent solutions in mobile apps (link).

This post is a part of a series about building Machine Learning solutions in mobile apps. In the previous article, we started from building simple MNIST classification model on top of TensorFlow Lite. That post is also a good place to start if you are looking for some hints about how to set up your very first environment (local with Docker or remote with Colaboratory).

Let’s continue with basics. If you spent some time exploring the Internet for Machine Learning <-> mobile solutions, for sure you found “TensorFlow for Poets” code labs. If not, those are places where you should start your journey with building a more complex solution for apps vision intelligence.

Those code labs are focused on building very first working solution that can be launched directly on your mobile device. And here, we’ll build something very similar, with some additional explanation that can be helpful with understanding TensorFlow Lite a little bit better.

MobileNet

So what are code labs and this article about? They all show how to build a convolutional neural network that is optimized for mobile devices, with a little effort required for defining the structure of the Machine Learning model. Instead of building it from scratch, we’ll use a technique called Transfer Learning and retrain MobileNet for our needs.

MobileNet itself is a lightweight neural network used for vision applications on mobile devices. For more technical details and great visual explanation, please take a look at Matthijs Hollemans’s blog post: Google’s MobileNets on the iPhone (it says “iPhone” 😱, but the first part of the post is fully dedicated to MobileNet architecture). And if you want even more technical details, the paper titled MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications will be your friend.

Environment

Like it was described in the first article, we’ll use Docker official images for TensorFlow environment. All you have to do again is just:

Clone the repository: https://github.com/frogermcs/GTSRB-TensorFlow-Lite Run Docker in the command line

$ docker run -p 8888:8888 -v $(PWD):/notebooks -it tensorflow/tensorflow:latest-py3 Launch http://127.0.0.1:8888/?token={token} in the browser.

Alternatively, you can run it via Colaboratory (click “Open in Colab” on Github repository page). While this can be a great idea because of available TPU environment (a few times faster than MacBook Pro 2018), the notebook and its output are so heavy that Chrome sometimes can’t handle that…

Model preparation

When docker is running, navigate to notebooks/GTSRB_TensorFlow_MobileNet.ipynb . Step by step description will walk you through the retraining process, very similar to the one from TensorFlow for Poets code lab. In the end, there will be a TensorFlow Lite model that is able to classify traffic signs base on the training images from GTSRB dataset. Model isn’t production ready (accuracy is around 89.5%), but for sure it is enough to have a starting point for further development.

TensorBoard

Like it was presented in TensorFlow for Poets, you can very easily visualize training progress by running TensorBoard.

Example training visualization for 6 different models (different params used in reatrain.py)

TensorBoard is a very powerful tool, especially if you want to understand your model and its training process better. It can visualize training steps with all additional params of your model, show its graph, or present some results on testing data. For more details check the 4th step of TensorFlow for Poets code lab or the official TensorBoard website.

Android app

Our TensorFlow Lite model should be ready to use (if you ran Jupyter Notebook, it should be in output/ directory, as retrained_graph_mv1_100_224.lite , if not, in assets directory of Android project there is gtsrb_model.lite file generated in the time when this post was written).

The code for our app is almost the same as the one used in the previous article. The only bigger difference is in how we do preprocessing and pass data into our model for classification.

First, this is our model configuration (can be found in GTSRB_TensorFlow_MobileNet.ipynb notebook):

Model configuration is stored in GtsrbModelConfig.java

We picked model with input image size of 224x224 px. This is a specific value for MobileNet model we selected. There are also other variants (192, 160, 128) — the lighter versions, but also probably less accurate.

IMAGE_MEAN and IMAGE_STD are also specific for MobileNet family — they are used for data normalization (we’ll see this later).

Model input size should be self-explanatory — width x heigh x pixel size (each pixel is split into 3 channels — RGB, each represented by 1 byte) x size of float type (4 bytes).

In contrast to MNIST project, here we don’t do any preprocessing for input images, besides just scaling it into proper size:

After scaling, a bitmap is converted into input data for the classification process:

What does happen in this code?

First, a bitmap is saved as an array of integers, where each element represents one pixel as a Color value (ARGB format).

Example: RED color is 0xFFFF0000 (see Color.RED documentation).

Because MobileNet input loads each color channel separately, we need to split every pixel into 3 values (alpha channel is skipped). How do we do this?

From every pixel, we extract 3 numbers representing each channel:

RED: (val >> 16) & 0xFF

GREEN: (val >> 8) & 0xFF

BLUE: val & 0xFF

If we take Color.RED (0xFFFF0000) as an example, it’ll be:

RED channel

0xFFFF0000 >> 16 = 0xFFFFFFFF

0xFFFFFFFF & 0xFF = 0xFF = 255

GREEN channel

0xFFFF0000 >> 8 = 0xFFFFFF00

0xFFFFFF00 & 0xFF = 0x00 = 0

BLUE channel

0xFFFF0000 & 0xFF = 0x00 = 0

IMAGE_MEAN and IMAGE_STD mentioned before — they are number that do normalization of input values, from range <0, 255>, to range <0.0, 1.0>.

So in the end our example Color.RED will be a 3-elements array:

[1.0, 0.0, 0.0] (instead of [255, 0, 0]).

More examples