Neural Network fundamentals

What is a neural network? How do they process data?

Definitions of terms I will be using throughout the article

AI = Artificial intelligence, what you count as intelligence is up to you

Machine learning (ML) = Getting computers to learn for themselves instead of us telling them exactly what to do, this is commonly used with AI particularly for things we can’t describe to a computer easily/at all

Weights and biases

A neural network is a type of AI that very roughly mimics how the brain’s biological neurons work. It achieves this by using weights and biases (which are nothing to do with neural nets being biased).

Weights are applied to input data by finding the dot product of the matrix of weights and matrix of data (what is meant by dot product is that you multiply each value with one in the same position in the other matrix then add together all the multiplied numbers). Don’t worry too much about the notion of a dot product if you don’t get it knowing that weights allow us to assign importance to data is more than enough knowledge for creating and using neural nets. An example of weights assigning importance is that in a neural net that attempts to predict which political party someone will vote for there will be a high weight for household income and a low weight for their favourite colour.

Modified from here

Biases are applied after weights and just let us shift values a bit, for example adding +0.2 to everything to come out of a neuron

Neurons and layers

The most first neural networks only had one neuron but these aren’t very useful for anything, so we’ve had to wait for computers to get more powerful before we could do more useful and complex things with them, hence the recent rise of neural networks. Today’s neural nets consist of multiple neurons arranged in multiple layers.

Here you can see a simple neural network. It has three input neurons i.e three different sets of weights and biases are applied. The input to each neuron in the input layer is the data which we are making predictions about.

Each neuron then connects (i.e. acts as an input) to each of the 4 neurons on the hidden layer (in practice this means the outputs of all of the neurons on the first layer are joined together to form a matrix of data which is then passed as input to every neuron on the next layer), a hidden layer just means a layer that is neither the input layer nor the output layer. These types of layer with all the neurons connected to each other is called a fully connected or dense layer.

This hidden layer is then also fully connected to another hidden layer which is fully connected to a single output neuron which should produce some useful output such as a number between 0 and 1 to represent the chance of someone getting diabetes based on the input, although in reality there would be more neurons in each layer apart from the last layer (for a chance of diabetes you only want the one neuron as you only want the one number as output, for a network that predicts if an image contains a human, a cat, or a dog you would probably use three neurons each of which would output a number 0–1 representing a probability)

Machine learning and loss functions

At this point, you may be thinking this is all very well and good but there seems to be a fair few numbers that have to be decided on for this to work and you have better things to do with your time. You would be correct and this is where machine learning comes in. All we do is supply a load of inputs and desired outputs for the network, specify our network architecture (how many layers and how many neurons on each layer) and then we leave the computer to adjust the weights and biases.

The way these adjustments work is quite complex but at the most basic level how wrong the network is on average is computed (this “wrongness score” is called the loss) and then everything is adjusted using a fixed value called the learning rate.

Which numbers to change and what direction to change them in and how much to change them is decided on through backpropagation which is basically just working backwards from the output to the last layer to the second to last layer and so on to decide which weights and biases had the biggest effect on the result.

TLDR: Neural networks consist of neurons arranged in layers where every neuron in a layer is connected to every neuron in the next layer. A neuron multiplies the data that is passed into it by a matrix of numbers called the weights (and then adds a number called a bias) to produce a single number as output. These weights and biases for each neuron are adjusted incrementally to try to decrease the loss (the average amount the network is wrong by across all the training data). TLDR for the TLDR: Each neuron multplies each peice of data it is given by a particular weight than adds this weighted data together to form a composite score of sorts which is then passed along with the outputs of the other neurons on the layer to the next layer which then forms a composite score of the composite score and so on TLDR for the TLDR of TLDR: In essence, neural networks act as universal function approximators as they take some numbers and through repeated multiplication and addition come to an output.

Further reading/watching:

https://www.youtube.com/watch?v=aircAruvnKk&list=PLZHQObOWTQDNU6R1_67000Dx_ZCJB-3pi&index=1 — a much deeper dive with a focus on back-propagation and gradient descent

https://www.youtube.com/watch?v=wvWpdrfoEv0 — a nice analogy on how machine learning works in neural networks

https://stackoverflow.com/questions/2480650/role-of-bias-in-neural-networks — elaborating on why we have the bias component

Please share this post on social media if you enjoyed it/found it useful. If there are any inaccuracies in this article please ensure to let me know so I can improve my knowledge and avoid giving people wrong information. Please feel free to leave feedback in the comments so I know how to improve for the next post.