Neural Networks are in it's 3rd time resurgence, after a long period of AI winter. And what that means to us is that, we can see a lot of futuristic AI artistic concepts like WestWorld and et al. and usually, not much of a reality. But according to experts, this time its for real. Machines are better in identifying objects in an image than humans. They are able to generate art as good as a human painter. All built on top some smart algorithms on basic Neural Network concepts and a lot of data.

This post aims to let you understand the basics of Neural Networks to create data driven code and write a 2 layer neural network by yourselves, that will help you learn how to create much complicated and useful neural networks.

Information Density

Entropy is the measure of randomness.

Consider 2 bits (0,0). Maximum number of states these bits can have is 4 — ((0,0), (0,1), (1,0),(1,1)) — there are no other alternate possibilities for these bits. We can call this dense (information density — not to be confused with Dense Matrix and Sparse Matrix). Most of the computer applications we build are based on packing as much information in the least amount of space. But neural networks works counter intuitive, we go sparse. To understand something with a lot of noise, we need to give enough room for error, and that’s what a sparse representation do.

Say, we represent above 2 bits into 8 bits. 8 bits can store a maximum of 256 states, but we are storing only 4. This means, each state in the 2 bits can have 64 different states in the 8 bits version of it, and still can represent the same state in 2 bits.

0000000–00111111 = (0, 0)

00100000–01111111 = (0, 1)

10000000–10111111 = (1,0)

11000000–11111111 = (1,1)

There are 64 states, which can represent (0,0) which gives it enough room to have noise and still make out it is (0,0) and not (0,1)

A neural network will increase the noise attenuation abilities by decreasing the entropy.

2. Parameter Optimization

Every problem in the universe can be reduced down to a parameter optimization problem. Take the example of cooking. Finding the right amount of pepper and salt and spices to craft the perfect tasting recipe. Or be it a polynomial with more unknowns parameters than knowns — it’s about optimizing each parameter by adjusting each of the variable. Trouble with neural networks (unsupervised learning) is,

number of parameters is unknown, and as the number of parameters increase, problem space grows exponentially.

A neural network tries to do the same, but purely based on the data, and what is expected as the output, and tweak itself internally to make sure that it’s doing the parameter optimization over each iteration

3. Bit of Linear Algebra

Matrix multiplication will combine multiple elements together — this is used in transferring data efficiently and combine it together along with each of the weights. We use the same technique to transfer data from one layer to another. When someone in machine learning field says Network, think of it as a bunch of matrices moved around.

weight matrix w1 (2x2) and input matrix x (2x1) being multiplied and made into a new weight matrix w2 (2x1)

A Quick Recap