Data scientists use many different algorithms to train neural networks, and there are many variations of each. In this article, I will outline five...

Data scientists use many different algorithms to train neural networks, and there are many variations of each. In this article, I will outline five algorithms that will give you a rounded understanding of how neural networks operate. I will start with an overview of how a neural network works, mentioning at what stage the algorithms are used.

Neural networks are fairly similar to the human brain. They are made up of artificial neurons, take in multiple inputs, and produce a single output.

Because nearly all the neurons influence each other — and are therefore all connected in some way — the network is able to acknowledge and observe all aspects of the given data, and how these different bits of data may or may not relate to each other. It may find very complex patterns in a large amount of data that would otherwise be invisible to us.

Behind the scenes of neural networks

In this visualization of an artificial neural network (ANN), there are three neuron layers. The input layer (left, red), a hidden layer (in blue), and then the output layer (right, red).

Assume this network is meant to predict the weather. The input values would be attributes relevant to weather such as time, humidity, air temperature, pressure, etc. These values would then be fed forward to the hidden layer while being manipulated by the weight values. Initially, weights are random unique values on every connection or synapse. The hidden layer’s new values are fed forward to the output, while being manipulated again by the weight values.

At this point, it is important to recognize that the output would be completely random and incorrect. The manipulation during the feedforward step contained no actual logic relevant to the problem because the weights start as random. However, we are training the ANN with a huge dataset containing previous weather forecasts with the same attributes and the result of these attributes (the target value).

After the feedforward stage, we can compare the incorrect output to the desired target value and calculate the error margin. Then we can back-propagate the network and adjust the weight values based on how they contributed to that error. If we do this forward and back feeding 1,000 more times with each data item, the weights will start to manipulate future inputs in a relevant way. Oftentimes, even more success can come from training the same dataset multiple times.

The feedforward step could be seen as guessing, and the back-propagation step educates that guess based on the margin of error. Over time, the guessing will become extremely accurate.

1. The feedforward algorithm…

Where n is a neuron on layer l, and w is the weight value on layer l, and i is the value on l-1 layer. All input values are set as the first layer of neurons. Then, each neuron on the following layers takes the sum of all the neurons on the previous layer multiplied by the weights that connect them to the relevant neuron on that following layer. This summed value is then activated.

2. A common activation algorithm: Sigmoid…

No matter how high or low the input value, it will get normalized to a proportional value between 0 and 1. It’s considered a way of converting a value to a probability, which then reflects a neuron’s weight or confidence. This introduces nonlinearity to a model, allowing it to pick up on observations with greater insight.

3. The cost function…

The squared cost function lets you find the error by calculating the difference between the output values and target values. The target/desired values could be a binary vector for classification.

4. The back propagation…

The error from the cost function is then passed back by being multiplied by the derivative of the sigmoid function S’. Thus, δ is first defined as the following: at the output layer (beginning of back propagation). Then we calculate the error through each layer which can be considered to represent the recursive accumulation of change so far that contributed to the error (from the perspective of each unique neuron). Past weight values must be transposed to fit the following layer of neurons.

Finally, this change can be traced back to an individual weight by multiplying it by the weight’s activated input neuron value.

5. Applying the learning rate/weight updating…

The change now needs to be used to adapt the weight value. The eta represents the learning rate:

These are the main five algorithms needed to get a neural network running.

These algorithms and their functions only scratch the surface of how powerful neural networks can be and how they can potentially impact various aspects of business and society alike. It is always important to overview how an exciting technology is designed, and these five algorithms should be the perfect introduction.