Read part 1 here.

Testing different weight initialization techniques

Modern deep learning libraries like Keras, PyTorch, etc. offer a variety of network initialization methods, which all essentially initialize the weights with small, random numbers. We’ll do a review of the different methods and show you how these different methods affect model performance.

Reminder #1: do not initialize your network with all zeros.

Reminder #2: fan-in refers to the number of units in the previous layer while fan-out refers to the number of units in the subsequent layer

Standard Normal initialization — this approach samples each weight from a normal distribution with low deviation

— this approach samples each weight from a normal distribution with low deviation Lecun initialization — these initializations produce weights that are randomly selected numbers multiplied with the variance 1/fan-in

— these initializations produce weights that are randomly selected numbers multiplied with the variance 1/fan-in Xavier initialization (also called Glorot initialization) — in this approach, each randomly generated weight is multiplied by variance 2/(fan-in + fan-out). For a theoretical justification of the Xavier initialization, you can refer to the deeplearning.ai post on Initialization.

— in this approach, each randomly generated weight is multiplied by variance 2/(fan-in + fan-out). For a theoretical justification of the Xavier initialization, you can refer to the deeplearning.ai post on Initialization. He initialization — this approach takes randomly generated weights and multiplies them by 2/fan-in and is recommended for ReLU activations. See the He et al. 2015 paper here.

Different frameworks have different weight initialization methods set as their default. For Keras, the Xavier initialization is the default, but in PyTorch, the Lecun initiation is the default. In the example below, we’ll show you how to implement different initialization methods in PyTorch (beyond the default Lecun method) and compare differences in performance.

Let’s run some examples!

We’ll be adapting this tutorial from Deep Learning Wizard to build a fairly straightforward, feed forward neural network and switch out which initialization method we’re using.

The most important part of the code will be the ‘Create the Model Class’ section since this is where we define our activation function and weight initialization method: