$\begingroup$

I have questions regarding this Normalized weights and Initial inputs video on Udacity course Deep Learning.

In this video it talks about variables that go into Big-loss function should have zero mean and equal variances. I know weights and biases are variables and data and labels are its input. My question is how zero mean and equal variance would help in optimization?

The other question I have is, in this video it talks about weight initialization randomly using Gaussian distribution, I cannot understand which values will weights and biases will take if they are initialized using normal distribution with mean 0 and standard deviation sigma = 0.1 ? I'm very confused about this part, Could you explain this with the help of an example?

And also in this normal distribution diagram, the probability distribution should sum up to 1 instead 5. Right?

What's the meaning of the parameter size in the function numpy.random.normal ?