Author: Wu Jun, Amazon AI Software Engineer

Translated from: https://zh.mxnet.io/blog/mxboard

Preface

Deep neural networks are notoriously difficult to design and train. It usually involves a large number of tweaking and adjustments, modifying the network structure, and trying various optimization algorithms and hyper-parameters. From a theoretical perspective, the mathematical foundations of deep neural networks architectures remain largely incomplete and techniques are often based on generalization of empirical results.

Data visualizations, thanks to their intrinsic visual nature, can partially compensate the above deficiencies and paint a higher level picture to guide researchers during training of deep neural networks. For example, if the gradient’s data distribution can be drawn in real time during model training, the phenomenon of vanishing gradients or exploding gradients can be quickly detected and corrected.

Distribution of gradient updates over time

Another example, being able to visualize word embeddings help to clearly see that words are aggregated into different manifolds in a lower dimensional space that maintains contextual proximity. Another useful visualization is data clustering: projecting high-dimensional data into a lower-dimensional space using for example the T-SNE algorithm. There are a large amount of data visualization that can be used in the context of deep learning to help understand better the training process and the data itself.

The emergence of TensorBoard has brought powerful visualizations to TensorFlow ‘s users. We have had feedback from many different users, including corporate ones, that they started using TensorFlow because of the rich feature set offered in TensorBoard. Can this powerful tool be made available to other deep learning frameworks? Thanks to the TeamHG-Memex efforts and their tensorboard_logger, we now have a transparent interface to write custom data to the event file format that are then consumed by TensorBoard.

It is based on this foundation that we have developed MXboard, a python package for recording MXNet data frames and displaying them in TensorBoard. To install MXBoard follow these simple instructions.

Note: Please note that MXNet 1.2.0 is required to use all the features of MXBoard. Before the official release of MXNet 1.2.0, please install MXNet nightly version: pip install --pre mxnet

MXBoard Quick Start Guide

MXBoard supports most of the data types in TensorBoard:

The MXBoard API is designed to follow the tensorboard-pytorch API. All record APIs are defined in a class called SummaryWriter . This class contains information such as the file path of the record files, the frequency of writing, the queue size, etc. To record a new data point of a specific data type, be it a scalar or an image for example, you only need to call the corresponding API on the SummaryWriter object.

For example, we want to draw a data distribution diagram with a gradually decreasing standard deviation of normal distribution. First define a SummaryWriter object as follows:

from mxboard import *

sw = SummaryWriter(logdir='./logs')

Then in each loop, we create an NDArray with values drawn from normal distribution. We then pass the NDArray to the summary writer add_histogram() function, specifying the number of bin and the loop index i which will be the index of our data point. Finally, as with any file descriptors used in Python, it is good practice to close the file handle of the SummaryWriter using .close() .

import mxnet as mx

for i in range(10):

# create a normal distribution with fixed mean and decreasing std

data = mx.nd.random.normal(loc=0, scale=10.0/(i+1), shape=(10, 3, 8, 8))

sw.add_histogram(tag='norml_dist', values=data, bins=200, global_step=i)

sw.close()

In order to visualize the plotted diagram, on the terminal, enter the working directory, and type the following command to start TensorBoard:

tensorboard --logdir=./logs --host=127.0.0.1 --port=8888

Then enter 127.0.0.1:8888 in the browser's address bar. Click HISTOGRAM and you will see the following rendering:

Visualizing increasingly narrow normal distributions

Real-world MXBoard

Using what we learnt in the above section let’s try to accomplish the following two tasks:

Monitoring supervised learning training Get insights on convolutional neural networks inner workings

Training MNIST model

Let’s use the MNIST dataset from the Gluon vision API and let’s use MXBoard to record in real-time:

The cross-entropy loss

The validation and training accuracy

Gradient data distribution

All of them are good indicators of the progress of the training.

First, we define a SummaryWriter object:

sw = SummaryWriter(logdir='./logs', flush_secs=5)

The flush_secs=5 is added here to specify that we want to write the records to the log file every five seconds so that we can track the real-time progress of the training in the browser.

Then we record the cross-entropy loss at the end of each batch:

sw.add_scalar(

tag='cross_entropy',

value=L.mean().asscalar(),

global_step=global_step

)

At the end of each epoch, we record the gradient as HISTOGRAM data type and record the training and test accuracy as SCALAR types.

grads = [i.grad() for i in net.collect_params().values()]

assert len(grads) == len(param_names)

# logging the gradients of parameters for checking convergence

for i, name in enumerate(param_names):

sw.add_histogram(tag=name, values=grads[i], global_step=epoch, bins=1000)



name, acc = metric.get()

# logging training accuracy

sw.add_scalar(tag='train_acc', value=acc, global_step=epoch)



name, val_acc = test(ctx)

# logging the validation accuracy

sw.add_scalar(tag='valid_acc', value=val_acc, global_step=epoch)

Then we simultaneously run the Python training script and TensorBoard to visualize the training in the browser in real-time.

To reproduce this experiment, you can find the fully worked out solution code available here on Github.

Distribution of the gradient updates

Training metrics: cross-entropy loss, training accuracy, validation accuracy

Visualization of convolutional filters and feature maps

Visualizing the convolutional filters and feature maps as images is useful for two reasons:

When training has converged, convolutional filters exhibits clear pattern detection features, lines and distinctive colors. Convolutional filters that do not converge or overfit the model will display a lot of noise. Observing the RGB rendition of filters and feature maps can help give us an understanding of the features that are learnt and considered meaningful for the network, typically edge and color detection.

Here we use three pre-trained CNN models from the MXNet Model Zoo, the Inception-BN , Resnet-152 , and VGG16. The filters of the first convolutional layer are visualized directly in TensorBoard, alongside the resulting feature maps when applied to a black swan image. Notice how networks can have different convolutional kernel sizes.

Inception-BN

Inception-BN: 7x7 kernels

Resnet-152

Resnet-152 7x7 kernels

VGG16

VGG-16 3x3 kernels

You can see that the filters of the three models exhibit pretty good smoothness and regularity, usual signs of a model that has converged. The colored filters are mainly responsible for extracting color-based features in the image. The gray-colored images are responsible for extracting general patterns and outline features of the objects in the image.

For the full implementation and further analysis, check the code here.

Visual image embedding

The last example is equally interesting. Embedding is a key concept used in several machine learning domains, including computer vision and Natural Language Processing (NLP). It is the representation of higher-dimensional data into a lower-dimensional space. In a traditional image classification setting, the output of the penultimate layer of a convolutional neural network is usually connected to a fully connected layer with a Softmax activation that is used to predict the class or category that the image belongs to. If we strip the network of this classification layer we are left with a network that outputs a vector of features for each example, usually 512 or 1024 features per example. This is called the embedding of our image. We can call to MXBoard add_embedding() API to observe the distribution of the embeddings of the dataset projected down into 2D or 3D space. Pictures with similar visual features are clustered together.

Here we randomly select 2304 images from the validation, calculate their embeddings using Resnet-152, add the embedding to the MXBoard log file and visualize them:

3D Projection of the Resnet-152 embeddings using PCA

The embeddings of 2304 images are projected on a 3D space using the PCA algorithm by default. However the clustering effect is not obvious. This is because the PCA algorithm cannot maintain the spatial relationship between the original data points.Therefore, we use the t-SNE algorithm provided by the TensorBoard interface to get a better visualization of the embeddings. Constructing the optimal projection is a dynamic process:

3D Projection of the Resnet-152 embeddings using T-SNE

After convergence of the t-SNE algorithm, it can be clearly seen that the dataset is divided into several clusters.

Finally, we can use the TensorBoard UI to verify the correctness of the image classification. We enter “dog” in the upper right corner of the TensorBoard GUI. All pictures of the validation dataset classified as “dog” tag will be highlighted. We also see that the clustering derived from the T-SNE projection follows closely the class boundaries.

Highlighting the images classified as dog

All codes and instructions available here .

Conclusion

After this MXBoard tutorial, we can see that visualizations are a powerful tool in supervising the training of models and getting insights in the principles of deep learning. MXBoard provides MXNet with a simple, minimally intrusive, easy-to-use, centralized visualization solution for scientific and production environments. Best of all, all you need to use it is a browser.

Special thanks to Zheng Zihao for providing technical support during the development of the project!