Building the dataset

To build our own classifier, we need to have datasets that contain images with and without Pikachu.

Let’s start with 1,000 images on each database. You can pull such images here:

Next up, let’s create two folders named pikachu and no-pikachu and drop those images accordingly.

Another handy dataset containing images for all the generation one Pokémon can be found here:

Now we have an image folder, which is structured as follows:

/dataset/

/pikachu/[image1,..]

/no-pikachu/[image1,..]

Retraining Images

We can now start labeling our images. With TensorFlow, this job becomes easier. Assuming that TensorFlow is installed on the training machine already, download the following retraining script:

curl https://github.com/tensorflow/hub/blob/master/examples/image_retraining/retrain.py

Next up, we’ll retrain the image with this Python script :

python retrain.py \

--image_dir ~/MLmobileapps/Chapter5/dataset/ \

--learning_rate=0.0001 \

--testing_percentage=20 \

--validation_percentage=20 \

--train_batch_size=32 \

--validation_batch_size=-1 \

--eval_step_interval=100 \

--how_many_training_steps=1000 \

--flip_left_right=True \

--random_scale=30 \

--random_brightness=30 \

--architecture mobilenet_1.0_224 \

--output_graph=output_graph.pb \

--output_labels=output_labels.txt

Note : If you set validation_batch_size to -1, it will validate the whole dataset. learning_rate = 0.0001 works well. You can adjust and try this for yourself.

In the architecture flag, we choose which version of MobileNet to use, from versions 1.0, 0.75, 0.50, and 0.25. The suffix number 224 represents the image resolution. You can specify 224, 192, 160, or 128 as well.

Model conversion from GraphDef to TFLite

TOCO Converter is used to convert from a TensorFlow GraphDef file or SavedModel into either a TFLite FlatBuffer or graph visualization.

(TOCO stands for TensorFlow Lite Optimizing Converter.)

We need to pass the data through command-line arguments. There are a few command-line arguments that can be passed in while converting the model:

--output_file OUTPUT_FILE

Filepath of the output tflite model. --graph_def_file GRAPH_DEF_FILE

Filepath of input TensorFlow GraphDef. --saved_model_dir

Filepath of directory containing the SavedModel. --keras_model_file

Filepath of HDF5 file containing tf.Keras model. --output_format {TFLITE,GRAPHVIZ_DOT}

Output file format. --inference_type {FLOAT,QUANTIZED_UINT8}

Target data type in the output --inference_input_type {FLOAT,QUANTIZED_UINT8}

Target data type of real-number input arrays. --input_arrays INPUT_ARRAYS

Names of the input arrays, comma-separated. --input_shapes INPUT_SHAPES

Shapes corresponding to --input_arrays, colon-separated. --output_arrays OUTPUT_ARRAYS

Names of the output arrays, comma-separated.

We can now use the TOCO tool to convert the TensorFlow model into a TensorFlow Lite model:

toco \

--graph_def_file=/tmp/output_graph.pb

--output_file=/tmp/retrained_model.tflite

--input_arrays=Mul

--output_arrays=final_result

--input_format=TENSORFLOW_GRAPHDEF

--output_format=TFLITE

--input_shape=1,${224},${224},3

--inference_type=FLOAT

--input_data_type=FLOAT

Similarly, we can use the MobileNet model in similar applications; for example, in the next section, we’ll be looking at a gender model and an emotion model.

Gender Model

This model uses the IMDB WIKI dataset, which contains 500k+ celebrity faces. It uses the MobileNet_V1_224_0.5 version of MobileNet.

It is very rare to find public datasets with thousands of images. This dataset is built on top of a large collection of celebrity faces. There are two common places: one is IMDb and the other one is Wikipedia. More than 100K celebrities’ details were retrieved from their profiles from both sources through scripts.

Then it was organized by removing noise (irrelevant content). All the images without a timestamp were removed, assuming that images with a single photo are likely to show the person with correct birth date details. At the end, there were 460,723 faces from 20,284 celebrities from IMDb, and 62,328 from Wikipedia, for a total of 523,051.

Emotion model

This is built on the AffectNet model with more than 1 million images. It uses the MobileNet_V2_224_1.4 version of MobileNet.

The link to the data model project can be found here:

The AffectNet model is built by collecting and annotating facial images of more than 1 million faces from the Internet. The images were sourced from three search engines, using around 1,250 related keywords in six different languages.

Among the collected images, half of the images were manually annotated for the presence of seven discrete facial expressions (categorical model) and the intensity of valence and arousal (dimensional model).