The Most Intuitive and Easiest Guide for Artificial Neural Network

Demystifying neural networks for complete starters

Neural Network! Deep learning! Artificial Intelligence!

Anyone who is living in a world of 2019, would have heard of these words more than once. And you probably have seen the awesome works such as image classification, computer vision, and speech recognition.

So are you also interested in building those cool AI project but still have no idea of what artificial neural network is? There are already hundreds of articles explaining the concept of the artificial neural network with the name of “a beginner’s guide on back propagation in ANN” or “A gentle introduction of the artificial neural network.” They are really great already, but I found It could be still hard for someone who is not comfortable with mathematical expressions.

This is the first series of ‘The Most Intuitive and Easiest Guide’ for neural networks. The complete set of this post is like follows:

Today, I’m going to explain the basics of artificial neural network (ANN) with the least amount of maths. This could be the easiest and the most intuitive explanation ever, so if you’re a math hater or having trouble with linear algebra, come and take your piece! Today’s keywords are forward propagation, activation functions, backpropagation, gradient descent and weight updating. I’ll also leave additional resources, which can be your next steps after you finish this post. Sounds good? Let’s get it done!

So what on earth Neural Network is?

You might have seen lots of articles which start with what is neurons and how they are structured. Yes, the ‘Neural” of artificial neural network came from the neurons of human brains. In 1943, Warren McCulloch and Walter Pitts first made a trial to create a computational model from human neural networks. They wanted to apply the biological processes in the brain to mathematical algorithms and from that point, neural research field was split into two ways. Today’s neural networks in AI are taking a bit different route with real cognitive science field. So I’d rather take the frame of ANN as some kind of a structure or a diagram rather than a neuron. Cause It has not that many things to do with biological neurons of brains.

Let’s start with this picture. There are starting points (input layer) and ending points (output layer). Let’s say these are islands and we are traveling from ‘input-layer’ islands to ‘output-layer’ islands. We can take various ways going from the start to the end. Each route has different points. When we approach the destination, we’ll sum up all the possible scores and determine which island is the best one we were looking for. So It’s like sending our detection boat teams to find out the perfect island for our next vacation.

One Interesting part here is that when the boat teams approach the output layer, they come back to the input layer. And then we repeat this process of sending them toward the output layer and calling back to the input layer. For each trial, there will be an outcome score for each trial, and we will use them to calculate how accurately the prediction is made. Just like what we do with RMSE or MAE in linear regression.

Forward Propagation and Weight

The metaphor I took above is what the neural network does. Let’s go one step ahead with some real computation this time. This is a more simplified diagram of a neural network.

Let’s say our input data is 5 and 2. So we are going to pass these values to the output layer. Let’s start with 5 first. As you can see, there are two possible ways with different points. If 5 takes the upper route, the point will be 10. If 5 takes the lower route, then it will be -10. Then what will be like with the input value 2? Yes. 6 for the upper route, 2 for the lower route. So if we sum up each possible cases, the values at the hidden layer will be like below.

We can easily get the final value in the same way. You probably get the idea of what’s going on here. This is called Forward-Propagation. It’s moving from left to right. The point here is insulting the result of the left layer as input values to the next right layer.

The circles in the picture are called node, which I described as islands. The multiplying values we used are called weight. Weight is a very frequently used terminology in data science. We use it in the sense of the power of certain features or samples. So if a feature gets a high value of the weight, then that feature will give a great impact to the outcome. By giving different weights to the features, we can train our model for better prediction. This word could sound unfamiliar to you, but we already have been using them with other machine learning algorithms such as lasso regression or boosting algos, controlling the coefficient of features in other regression.

Activation Functions

There is another new concept that you might not have heard so far, which is the activation function. The activation function is giving non-linear change to the values before submitting the outcome values. Why we need that? If we just use linear calculation without activation functions, just like what we’ve done above, we can’t give any ‘hidden layer effect’ to our model. It will be not that much different from other regression models. To ‘activate’ the real power of neural networks, we need to apply an ‘activation function.’ Cause activation functions help the model to capture non-linearities within the data.

There are several activation functions and we need to choose a proper one depending on the problems. The equations could be found on Wikipedia, but I want you to see the graph of each function before the equations. What kind of shape or characteristics each function has. Because this understanding will give you clues for what to choose.