In the current deep learning era, there is a neural network architecture available for possibly every problem; for instance, ResNet / CNN for computer vision problems, RNN with attention for Language / Speech problems and so on. The Deep Learning models have surpassed almost all the traditional Machine Learning techniques and are the current SOTA for various problems that were deemed to be impossible to achieve for a computer.

However, if we take a deep look at the primary computations performed by a neural network, we can obtain better insights about the intricate details of its working. One prominent phenomenon that I observed is that the norm of the data that we process through the network changes to a certain extent. In order to address this problem, the use of batch-normalization layer is common. But it doesn’t fully solve that problem as it assumes the data to come from a Gaussian distribution (since we only normalize across mean and variance in BN layer).

Another way to visualise a neural network is to consider it as a series of computations (neural computations) over an input data signal. Basically, what happens is that across the neural computations, the energy of the signal doesn’t remain constant; it might drastically increase (use of high growth non-linear activation) or it might drop down too based on the computations. My thought on this is that if the network computations are supposed to resemble the real operations performed by the brain, then the primary rule they must follow should be of Energy-Conservation. I.e. the computations should preserve the norm of the input data.