In the last few years, TensorFlow has became the industry standard for any task related to neural networks and deep learning. However, are there any other credible alternatives, or has Google attained the full monopoly and there is nothing new to invent? Let’s find it out!

Our team conducted a little investigation and chose Apache MXNet as the best competitor to compare with TensorFlow.

The main problem of TensorFlow is the speed. So, let’s see what we can get from MXNet at this point:

Device Placement: With MXNet, it’s easy to specify where each data structures should reside

Multi-GPU training: MXNet makes it easy to scale computation with a number of available GPUs

Automatic differentiation: MXNet automates the derivative calculations that once bogged down neural network research

Optimized Predefined Layers: While you can code up your own layers in MXNet, the predefined layers are optimized for speed, outperforming competing libraries.

We were especially interested in the last point which affects user experience in real world applications. Switching framework in production is a very risky idea, so we decided to test this reasoning in battle conditions by creating two models using the Tensorflow/Keras and MXNet frameworks.

We used pizza type recognition as the domain as it is complex enough to use some advanced techniques and common enough to create a dataset in a short time. At least, so we thought The domain was challenging enough as even human beings can be confused with different toppings.

This is what we had at the start:

5k images Dataset

Good hands-on experience with TensorFlow

Desire to get into the MXNet that everyone is talking about

2 GPUs

Let’s go!

Algorithms and approaches:

InceptionV3 & ResNet – we tested both architectures for the best results and capped in 56% accuracy for ResNet and 94%(for MXNet)/82%(TensoFlow) for InceptionV3. As a result, we decided to take Inception to production

MXNet TensorFlow import mxnet as mx import logging head = '%(asctime)-15s %(message)s' logging . basicConfig ( level = logging . DEBUG , format = head ) num_classes = 10 batch_per_gpu = 16 num_gpus = 2 batch_size = batch_per_gpu * num_gpus def get_iterators ( batch_size , data_shape = ( 3 , 299 , 299 ) ) : train = mx. io . ImageRecordIter ( path_imgrec = './data-train.rec' , data_name = 'data' , label_name = 'softmax_label' , batch_size = batch_size , data_shape = data_shape , shuffle = True , rand_crop = True , rand_mirro = True ) val = mx. io . ImageRecordIter ( path_imgrec = './data-val.rec' , data_name = 'data' , label_name = 'softmax_label' , batch_size = batch_size , data_shape = data_shape , rand_crop = True , rand_mirro = True ) return train , val def do_finetune ( symbol , arg_params ) : all_layers = symbol . get_internals ( ) net = all_layers [ "flatten_output" ] net = mx. symbol . Activation ( data = net , name = 'relu1' , act_type = "relu" ) net = mx. symbol . Dropout ( data = net , p = 0.7 , name = 'dp' , mode = 'always' ) net = mx. symbol . FullyConnected ( data = net , num_hidden = num_classes , name = 'fc1' ) net = mx. symbol . SoftmaxOutput ( data = net , name = 'softmax' ) new_args = dict ( { k: arg_params [ k ] for k in arg_params if 'fc1' not in k } ) return net , new_args def fit ( symbol , arg_params , aux_params , train , val ) : devs = [ mx. gpu ( i ) for i in range ( num_gpus ) ] mod = mx. mod . Module ( symbol = symbol , context = devs ) metrics = mx. metric . create ( [ 'ce' , 'acc' ] ) mod. fit ( train , val , num_epoch = 100 , arg_params = arg_params , aux_params = aux_params , allow_missing = True , epoch_end_callback = mx. callback . do_checkpoint ( "Inception" , 1 ) , kvstore = 'device' , optimizer = 'sgd' , optimizer_params = { 'learning_rate' : 0.01 , 'wd' : 0.0005 , 'momentum' : 0.9 } , initializer = mx. init . Xavier ( rnd_type = 'gaussian' , factor_type = "in" , magnitude = 2 ) , eval_metric = metrics , validation_metric = metrics ) sym , arg_params , aux_params = mx. model . load_checkpoint ( 'Inception-BN' , 00 ) ( train , val ) = get_iterators ( batch_size ) ( new_sym , new_args ) = do_finetune ( sym , arg_params ) fit ( new_sym , new_args , aux_params , train , val ) import argparse import os import os . path from PIL import ImageFile from keras import optimizers from keras. applications . inception_v3 import InceptionV3 from keras. callbacks import ModelCheckpoint , EarlyStopping , LearningRateScheduler from keras. layers import Dense , Dropout , Flatten , AveragePooling2D from keras. models import Model from keras. preprocessing . image import ImageDataGenerator from keras. regularizers import l2 from keras_sequential_ascii import keras2ascii ImageFile. LOAD_TRUNCATED_IMAGES = True num_classes = 10 batch_size = 16 epochs = 100 def get_num_of_files ( root_dir ) : total = 0 for root , dirs , files in os . walk ( root_dir ) : total + = len ( files ) return total def retrain ( train_data_dir , validation_data_dir ) : img_width , img_height = 299 , 299 nb_train_samples = get_num_of_files ( train_data_dir ) nb_validation_samples = get_num_of_files ( validation_data_dir ) base_model = InceptionV3 ( weights = 'imagenet' , include_top = False , input_shape = ( img_width , img_height , 3 ) ) x = base_model. output x = Dense ( 128 , activation = 'relu' , init = 'glorot_uniform' , W_regularizer = l2 ( .0005 ) ) ( x ) x = Dropout ( 0.7 ) ( x ) x = AveragePooling2D ( pool_size = ( 8 , 8 ) ) ( x ) x = Dropout ( .7 ) ( x ) x = Flatten ( ) ( x ) predictions = Dense ( num_classes , init = 'glorot_uniform' , W_regularizer = l2 ( .0005 ) , activation = 'softmax' ) ( x ) model = Model ( inputs = base_model. input , outputs = predictions ) print ( keras2ascii ( model ) ) opt = optimizers. SGD ( lr = .01 , momentum = .9 ) def schedule ( epoch ) : if epoch < ; 20 : return 0.01 if epoch < ; 35 : return 0.001 elif epoch < ; 1000 : return .0005 lr_scheduler = LearningRateScheduler ( schedule ) model. compile ( loss = "categorical_crossentropy" , optimizer = opt , metrics = [ "accuracy" ] ) train_datagen = ImageDataGenerator ( rescale = 1 . / 255 , shear_range = 0.3 , horizontal_flip = True , fill_mode = "nearest" , zoom_range = 0.3 , width_shift_range = 0.3 , height_shift_range = 0.3 , rotation_range = 70 ) train_generator = train_datagen. flow_from_directory ( train_data_dir , target_size = ( img_height , img_width ) , batch_size = batch_size , class_mode = "categorical" ) test_datagen = ImageDataGenerator ( rescale = 1 . / 255 , ) validation_generator = test_datagen. flow_from_directory ( validation_data_dir , target_size = ( img_height , img_width ) , batch_size = 256 , class_mode = "categorical" ) checkpoint = ModelCheckpoint ( filepath = 'model.{epoch:02d}-{val_loss:.2f}-{val_acc:.2f}.hdf5' , monitor = 'val_acc' , verbose = 2 , save_best_only = True , save_weights_only = False , mode = 'auto' , period = 1 ) early = EarlyStopping ( monitor = 'val_acc' , min_delta = 0 , patience = 50 , verbose = 1 , mode = 'auto' ) model. fit_generator ( train_generator , steps_per_epoch = nb_train_samples / batch_size , validation_steps = nb_validation_samples / batch_size , epochs = epochs , use_multiprocessing = True , validation_data = validation_generator , callbacks = [ lr_scheduler , early , checkpoint ] ) if __name__ == '__main__' : parser = argparse. ArgumentParser ( ) parser . add_argument ( "train" ) parser . add_argument ( "test" ) args = parser . parse_args ( ) retrain ( args. train , args. test )

Transfer learning (layer freezing) –

Most of the existing CV deep learning algorithms require huge datasets for training. The most popular approach to avoid this problem is to first pre-train a deep net on a large-scale dataset then, given a new dataset, we can use these pretrained weights when training on our new task. There are lots of transfer learning variations. With layer freezing the initial neural network is used only as a feature extractor. That means that we freeze every layer prior to the output layer and simply learn a new output layer.

MXNet TensorFlow def do_finetune ( symbol , arg_params ) : all_layers = symbol . get_internals ( ) net = all_layers [ "flatten_output" ] net = mx. symbol . Activation ( data = net , name = 'relu1' , act_type = "relu" ) net = mx. symbol . Dropout ( data = net , p = 0.7 , name = 'dp' , mode = 'always' ) net = mx. symbol . FullyConnected ( data = net , num_hidden = num_classes , name = 'fc1' ) net = mx. symbol . SoftmaxOutput ( data = net , name = 'softmax' ) new_args = dict ( { k: arg_params [ k ] for k in arg_params if 'fc1' not in k } ) return net , new_args ... base_model = InceptionV3 ( weights = 'imagenet' , include_top = False , input_shape = ( img_width , img_height , 3 ) ) x = base_model. output x = Dense ( 128 , activation = 'relu' , init = 'glorot_uniform' , W_regularizer = l2 ( .0005 ) ) ( x ) x = Dropout ( 0.7 ) ( x ) x = AveragePooling2D ( pool_size = ( 8 , 8 ) ) ( x ) x = Dropout ( .7 ) ( x ) x = Flatten ( ) ( x ) predictions = Dense ( num_classes , init = 'glorot_uniform' , W_regularizer = l2 ( .0005 ) , activation = 'softmax' ) ( x ) model = Model ( inputs = base_model. input , outputs = predictions ) ... model . fit_generator ( train_generator , steps_per_epoch = nb_train_samples / batch_size , validation_steps = nb_validation_samples / batch_size , epochs = epochs , use_multiprocessing = True , validation_data = validation_generator , callbacks = [ lr_scheduler , early , checkpoint ] )

Hyperparameters tuning –

The same machine learning model might require different weights or constraints for different data patterns. These values are called hyperparameters and, usually, should be tuned to solve the machine learning problem in the most optimal way. Hyperparameter optimization finds a tuple of hyperparameters that contain an optimal model that allows minimizing a predefined loss function on given independent data.

MXNet TensorFlow def fit ( symbol , arg_params , aux_params , train , val ) : devs = [ mx. gpu ( i ) for i in range ( num_gpus ) ] mod = mx. mod . Module ( symbol = symbol , context = devs ) metrics = mx. metric . create ( [ 'ce' , 'acc' ] ) mod. fit ( train , val , num_epoch = 100 , arg_params = arg_params , aux_params = aux_params , allow_missing = True , epoch_end_callback = mx. callback . do_checkpoint ( "Inception" , 1 ) , kvstore = 'device' , optimizer = 'sgd' , optimizer_params = { 'learning_rate' : 0.01 , 'wd' : 0.0005 , 'momentum' : 0.9 } , initializer = mx. init . Xavier ( rnd_type = 'gaussian' , factor_type = "in" , magnitude = 2 ) , eval_metric = metrics , validation_metric = metrics ) train_datagen = ImageDataGenerator ( rescale = 1 . / 255 , shear_range = 0.3 , horizontal_flip = True , fill_mode = "nearest" , zoom_range = 0.3 , width_shift_range = 0.3 , height_shift_range = 0.3 , rotation_range = 70 ) train_generator = train_datagen. flow_from_directory ( train_data_dir , target_size = ( img_height , img_width ) , batch_size = batch_size , class_mode = "categorical" ) test_datagen = ImageDataGenerator ( rescale = 1 . / 255 , ) validation_generator = test_datagen. flow_from_directory ( validation_data_dir , target_size = ( img_height , img_width ) , batch_size = 256 , class_mode = "categorical" ) checkpoint = ModelCheckpoint ( filepath = 'model.{epoch:02d}-{val_loss:.2f}-{val_acc:.2f}.hdf5' , monitor = 'val_acc' , verbose = 2 , save_best_only = True , save_weights_only = False , mode = 'auto' , period = 1 ) early = EarlyStopping ( monitor = 'val_acc' , min_delta = 0 , patience = 50 , verbose = 1 , mode = 'auto' ) model. fit_generator ( train_generator , steps_per_epoch = nb_train_samples / batch_size , validation_steps = nb_validation_samples / batch_size , epochs = epochs , use_multiprocessing = True , validation_data = validation_generator , callbacks = [ lr_scheduler , early , checkpoint ] )

Results comparison –

As soon as we have performed network training and hyperparameters tuning we got the following metrics:

Training & validation metrics:

MXNet TensorFlow

Then we have performed testing against a test dataset, that was not a part of neither training nor validation. Let’s take a look at actual result networks:

Then we have performed testing against a test dataset, that was not a part of neither training nor validation. Let’s take a look at actual result networks:

Test metrics:

MXNet TensorFlow

Performance comparison –

And here some performance metrics for trained networks:

Cold cache

MXNet – 0.637s, TensorFlow – 34s

We have definite winner here. MXNet shows really good performance



Hot Cache:

So the result is: MXNet – 0.405s, TensorFlow – 0.37s.

There is a tie.



Conclusions –

As for the last question, did we take the MXNet to production? Well… our management team liked the demo and mobile app. So – yes, we did.

Is there any significant difference between the two frameworks? This is a much harder question.

MXNet trained model has much better performance on cold run. This will become crucial on mobile devices where applications need to boot up every time

MXNet trained model has slightly better accuracy. This makes a big difference for enterprises and robotic CV systems

TensorFlow has an established community with lots of ready solutions and relevant tutorials, while MXNet documentation was outdated

TensorFlow became synonymous to machine learning in the current world, so it is much easier to sell

In my opinion we are seeing a good competition between a techy, ambitious challenger, and an established champion.

I would take the challenger’s side this time

Once we had the actual working models we had to solve one more problem. How to present the results to management in the most suitable way. So we asked our React Native developer to spare one day of his time and create a nice cross-platform mobile app for both Android and iOS.

Mobile app

Mobile app scripts on GitHub

Training scripts on GitHub