Although artificial neurons and perceptrons were inspired by the biological processes scientists were able to observe in the brain back in the 50s, they do differ from their biological counterparts in several ways. Birds have inspired flight and horses have inspired locomotives and cars, yet none of today’s transportation vehicles resemble metal skeletons of living-breathing-self replicating animals. Still, our limited machines are even more powerful in their own domains (thus, more useful to us humans), than their animal “ancestors” could ever be. It is easy to draw the wrong conclusions from the possibilities in AI research by anthropomorphizing Deep Neural Networks, but artificial and biological neurons do differ in more ways than just the materials of their containers.

Airplanes are more useful to us than actual mechanical bird models.

Historical background

The idea behind perceptrons (the predecessors to artificial neurons) is that it is possible to mimic certain parts of neurons, such as dendrites, cell bodies and axons using simplified mathematical models of what limited knowledge we have on their inner workings: signals can be received from dendrites, and sent down the axon once enough signals were received. This outgoing signal can then be used as another input for other neurons, repeating the process. Some signals are more important than others and can trigger some neurons to fire easier. Connections can become stronger or weaker, new connections can appear while others can cease to exist. We can mimic most of this process by coming up with a function that receives a list of weighted input signals and outputs some kind of signal if the sum of these weighted inputs reach a certain bias. Note that this simplified model does not mimic neither the creation nor the destruction of connections (dendrites or axons) between neurons, and ignores signal timing. However, this restricted model alone is powerful enough to work with simple classification tasks.

A biological and an artificial neuron (via https://www.quora.com/What-is-the-differences-between-artificial-neural-network-computer-science-and-biological-neural-network)

Invented by Frank Rosenblatt, the perceptron was originally intended to be a custom-built mechanical hardware instead of a software function. The Mark 1 perceptron was a machine built for image recognition tasks by the US navy.

Researching “Artificial Brains”

Just imagine the possibilities! A machine that can mimic learning from experience with its steampunk neuron-like mind? A machine that learns from examples it “sees“ instead of scientists in glasses having to give it a set of hard coded instructions to work? The hype was real, and people were optimistic. Due to its resemblance to the biological neuron, and how promising perceptron networks initially were, the New York Times has reported in 1958 that “the Navy [has] revealed the embryo of an electronic computer today that it expects will be able to walk, talk, see, write, reproduce itself and be conscious of its existence.”

However, its shortcomings were quickly realized, as a single layer of perceptrons alone is unable to solve non-linear classification problems (such as learning a simple XOR function). This problem can only be overcome (more complex relationships in data can only be modeled) by using multiple layers (hidden layers). However, there isn’t a simple, cheap way of training multiple layers of perceptrons, other than randomly nudging all their weights, because there is no way to tell which small set of changes would end up largely affecting other neurons’ outputs down the line. This deficiency has caused artificial neural network research to stagnate for years. Then a new kind of artificial neuron have managed to solve this issue by slightly changing certain aspects in their model, which allowed the connection of multiple layers without losing on the ability to train them. Instead of working as a switch that could only receive and output binary signals (meaning that perceptrons would get either 0 or 1 depending on the absence or presence of a signal, and would also output either 0 or 1 when reaching a certain threshold of combined, weighted signal inputs), artificial neurons would instead utilize continuous (floating point) values with continuous activation functions (more on these functions later).

Activation functions for perceptrons (step function, that would either output 0 or 1 if the sum of weights was larger than the threshold) and for the first artificial neurons (sigmoid function, that always outputs values between 0 and 1).

This might not look like much of a difference, but due to this slight change in their model, layers of artificial neurons could be used in mathematical formulas as separate, continuous functions where an optional set of weights (estimating how to minimize their errors by calculating their partial derivatives one by one) could be calculated. This tiny change made it possible to teach multiple layers of artificial neurons using the backpropagation algorithm. So unlike biological neurons, artificial neurons don’t just “fire”: they send continuous values instead of binary signals. Depending on their activation functions, they might somewhat fire all the time, but the strength of these signals varies. Note that the term “multilayer perceptron” is actually inaccurate as these networks utilize layers of artificial neurons instead of layers of perceptrons. Yet, teaching these networks was so computationally expensive that people rarely used them for machine learning tasks, until recently (when large amounts of example data were easier to come by and computers got many magnitudes faster). Since artificial neural networks are hard to teach and aren’t faithful models to what actually goes inside our heads, most scientists still regarded them as dead ends in machine learning. The hype was back, when in 2012 a Deep Neural Network architecture AlexNet managed to solve the ImageNet challenge (a large visual dataset with over 14 million hand-annotated images) without relying on handcrafted, minutely extracted features that were the norm in computer vision up to this point. AlexNet beat its competition by miles, paving the way for neural networks to be once again relevant.

AlexNet correctly classifies images at the top, based on likelihood.

You can read more on the history of Deep Learning, the AI winters and the limitation of perceptrons here. The area is so quickly evolving, that researchers are continuously coming up with new solutions to work around certain limitations and shortcomings of artificial neural networks.

The main differences

Size: our brain contains about 86 billion neurons and more than a 100 trillion (or according to some estimates 1000 trillion) synapses (connections). The number of “neurons” in artificial networks is much less than that (usually in the ballpark of 10–1000) but comparing their numbers this way is misleading. Perceptrons just take inputs on their “dendrites” and generate output on their “axon branches”. A single layer perceptron network consists of several perceptrons that are not interconnected: they all just perform this very same task at once. Deep Neural Networks usually consist of input neurons (as many as the number of features in the data), output neurons (as many as the number of classes if they are built to solve a classification problem) and neurons in the hidden layers, in-between. All the layers are usually (but not necessarily) fully connected to the next layer, meaning that artificial neurons usually have as many connections as there are artificial neurons in the preceding and following layers combined. Convolutional Neural Networks also use different techniques to extract features from the data that are more sophisticated than what a few interconnected neurons can do alone. Manual feature extraction (altering data in a way that it can be fed to machine learning algorithms) requires human brain power which is also not taken into account when summing up the number of “neurons” required for Deep Learning tasks. The limitation in size isn’t just computational: simply increasing the number of layers and artificial neurons does not always yield better results in machine learning tasks. Topology: all artificial layers compute one by one, instead of being part of a network that has nodes computing asynchronously. Feedforward networks compute the state of one layer of artificial neurons and their weights, then use the results to compute the following layer the same way. During backpropagation, the algorithm computes some change in the weights the opposing way, to reduce the difference of the feedforward computational results in the output layer from the expected values of the output layer. Layers aren’t connected to non-neighboring layers, but it’s possible to somewhat mimic loops with recurrent and LSTM networks. In biological networks, neurons can fire asynchronously in parallel, have small-world nature with a small portion of highly connected neurons (hubs) and a large amount of lesser connected ones (the degree distribution at least partly follows the power-law). Since artificial neuron layers are usually fully connected, this small-world nature of biological neurons can only be simulated by introducing weights that are 0 to mimic the lack of connections between two neurons. Speed: certain biological neurons can fire around 200 times a second on average. Signals travel at different speeds depending on the type of the nerve impulse, ranging from 0.61 m/s up to 119 m/s. Signal travel speeds also vary from person to person depending on their sex, age, height, temperature, medical condition, lack of sleep etc. Action potential frequency carries information for biological neuron networks: information is carried by the firing frequency or the firing mode (tonic or burst-firing) of the output neuron and by the amplitude of the incoming signal in the input neuron in biological systems. Information in artificial neurons is instead carried over by the continuous, floating point number values of synaptic weights. How quickly feedforward or backpropagation algorithms are calculated carries no information, other than making the execution and training of the model faster. There are no refractory periods for artificial neural networks (periods while it is impossible to send another action potential, due to the sodium channels being lock shut) and artificial neurons do not experience “fatigue”: they are functions that can be calculated as many times and as fast as the computer architecture would allow. Since artificial neural network models can be understood as just a bunch of matrix operations and finding derivatives, running such calculations can be highly optimized for vector processors (doing the very same calculations on large amounts of data points over and over again) and sped up by magnitudes using GPUs or dedicated hardware (like on AI chips in recent SmartPhones). Fault-tolerance: biological neuron networks due to their topology are also fault-tolerant. Information is stored redundantly so minor failures will not result in memory loss. They don’t have one “central” part. The brain can also recover and heal to an extent. Artificial neural networks are not modeled for fault tolerance or self regeneration (similarly to fatigue, these ideas are not applicable to matrix operations), though recovery is possible by saving the current state (weight values) of the model and continuing the training from that save state. Dropouts can turn on and off random neurons in a layer during training, mimicking unavailable paths for signals and forcing some redundancy (dropouts are actually used to reduce the chance of overfitting). Trained models can be exported and used on different devices that support the framework, meaning that the same artificial neural network model will yield the same outputs for the same input data on every device it runs on. Training artificial neural networks for longer periods of time will not affect the efficiency of the artificial neurons. However, the hardware used for training can wear out really fast if used regularly, and will need to be replaced. Another difference is, that all processes (states and values) can be closely monitored inside an artificial neural network. Power consumption: the brain consumes about 20% of all the human body’s energy — despite it’s large cut, an adult brain operates on about 20 watts (barely enough to dimly light a bulb) being extremely efficient. Taking into account how humans can still operate for a while, when only given some c-vitamin rich lemon juice and beef tallow, this is quite remarkable. For benchmark: a single Nvidia GeForce Titan X GPU runs on 250 watts alone, and requires a power supply instead of beef tallow. Our machines are way less efficient than biological systems. Computers also generate a lot of heat when used, with consumer GPUs operating safely between 50–80 degrees Celsius instead of 36.5–37.5 °C. Signals: an action potential is either triggered or not — biological synapses either carry a signal or they don’t. Perceptrons work somewhat similarly, by accepting binary inputs, applying weights to them and generating binary outputs depending on whether the sum of these weighted inputs have reached a certain threshold (also called a step function). Artificial neurons accept continuous values as inputs and apply a simple non-linear, easily differentiable function (an activation function) on the sum of its weighted inputs to restrict the outputs’ range of values. The activation functions are nonlinear so multiple layers in theory could approximate any function. Formerly sigmoid and hyperbolic tangent functions were used as activation functions, but these networks suffered from the vanishing gradient problem, meaning that the more the layers in a network, the less the changes in the first layers will affect the output, due to these functions squashing their inputs into a very small output range. These problems were overcome by the introduction of different activation functions such as ReLU. The final outputs of these networks are usually also squashed between 0 — 1 (representing probabilities for classification tasks) instead of outputting binary signals. As mentioned earlier, neither the frequency/speed of the signals nor the firing rates carry any information for artificial neural networks (this information is carried over by the input weights instead). The timing of the signals is synchronous, where artificial neurons in the same layer receive their input signals and then send their output signals all at once. Loops and time deltas can only be partly simulated with Recurrent (RNN) layers (that suffer greatly from the aforementioned vanishing gradient problem) or with Long short-term memory (LSTM) layers that act more like state machines or latch circuits than neurons. These are all considerable differences between biological and artificial neurons. Learning: we still do not understand how brains learn, or how redundant connections store and recall information. Brain fibers grow and reach out to connect to other neurons, neuroplasticity allows new connections to be created or areas to move and change function, and synapses may strengthen or weaken based on their importance. Neurons that fire together, wire together (although this is a very simplified theory and should not taken too literally). By learning, we are building on information that is already stored in the brain. Our knowledge deepens by repetition and during sleep, and tasks that once required a focus can be executed automatically once mastered. Artificial neural networks in the other hand, have a predefined model, where no further neurons or connections can be added or removed. Only the weights of the connections (and biases representing thresholds) can change during training. The networks start with random weight values and will slowly try to reach a point where further changes in the weights would no longer improve performance. Just like there are many solutions for the same problems in real life, there is no guarantee that the weights of the network will be the best possible arrangement of weights to a problem — they will only represent one of the infinite approximations to infinite solutions. Learning can be understood as the process of finding optimal weights to minimize the differences between the network’s expected and generated output: changing weights one way would increase this error, changing them the other way would decrees it. Imagine a foggy mountain top, where all we could tell is that stepping towards a certain direction would take us downhill. By repeating this process, we would eventually reach a valley where taking any step further would only take us higher. Once this valley is found we can say that we have reached a local minima. Note that it’s possible that there are other, better valleys that are even lower from the mountain top (global minima) that we have missed, since we could not see them. Doing this in usually more than 3 dimensions is called gradient descent. To speed up this “learning process”, instead of going through each and every example every time, random samples (batches) are taken from the data set and used for training iterations. This will only give an approximation of how to adjust the weights to reach a local minima (finding which direction to take downhill without carefully looking at all directions all the time), but it’s still a pretty good approximation. We can also take larger steps when ascending the top and take smaller ones as we are reaching a valley where even small nudges could take us the wrong way. Walking like this downhill, going faster than carefully planning each and every step is called stochastic gradient descent. So the rate of how artificial neural networks learn can change over time (it decreases to ensure better performance), but there aren’t any periods similar to human sleep phases when the networks would learn better. There is no neural fatigue either, although GPUs overheating during training can reduce performance. Once trained, an artificial neural network’s weights can be exported and used to solve problem similar to the ones found in the training set. Training (backpropagation using an optimization method like stochastic gradient descent, over many layers and examples) is extremely expensive, but using a trained network (simply doing feedforward calculation) is ridiculously cheap. Unlike the brain, artificial neural networks don’t learn by recalling information — they only learn during training, but will always “recall” the same, learned answers afterwards, without making a mistake. The great thing about this is that “recalling” can be done on much weaker hardware as many times as we want to. It is also possible to use previously pretrained models (to save time and resources by not having to start from a totally random set of weights) and improve them by training with additional examples that have the same input features. This is somewhat similar to how it’s easier for the brain to learn certain things (like faces), by having dedicated areas for processing certain kinds of information.

So artificial and biological neurons do differ in more ways than the materials of their environment— biological neurons have only provided an inspiration to their artificial counterparts, but they are in no way direct copies with similar potential. If someone calls another human being smart or intelligent, we automatically assume that they are also capable of handling a large variety of problems, and are probably polite, kind and diligent as well. Calling a software intelligent only means that it is able to find an optimal solution to a set of problems.

What AI can‘t do

Artificial Intelligence can now pretty much beat humans in every area where:

Enough training data and examples are digitally available and engineers can clearly turn the information in the data into numerical values (features) without much ambiguity. Either the solution to the examples are clear (large amount of labelled data is available) or it is possible to clearly define preferred states, long-term goals that should be achieved (for instance it is possible to define the goal for an evolutionary algorithm to be able to walk as far as possible because the goal of its evolution can be easily measured in distance).

As scary as this sounds, we still have absolutely no idea on how general intelligence works, meaning that we do not know how the human brain is capable to be so efficient in all kinds of different areas by transferring knowledge from on area to another. AlphaGo can now beat anyone in a game of Go, yet you would most likely be able to defeat it in a game of Tic-Tac-Toe as it has no concept of games outside its domain. How a hunter-gatherer monkey figured out to use its brain not to just find and cultivate food, but to build a society that can support people who dedicate their lives not to agriculture but to playing a tabletop Go game for their entire lives, despite not having a dedicated Go-playing neural network area in their brains is a miracle on its own. Similarly to how heavy machinery has replaced human strength in many areas, just because a crane can better lift heavy objects than any human hand could, none of them could precisely place smaller objects or play the piano at the same time. Humans are pretty much like self-replicating, energy saving Swiss Army knives that can survive and work even in dire conditions.

Machine learning can map input features to outputs more efficiently than humans (especially in areas where data is only available in a form that is incomprehensible to us: we don’t see images as a bunch of numbers representing color values and edges, yet machine learning models have no problems learning from a representation like that). However, they are unable to automatically find and understand additional features (properties that might be important) and quickly update their models of the world based on them (to be fair, we can not understand features that we can not perceive either: for instance we won’t be able to see ultraviolet colors no matter how much we read about them). If until today, you’ve only seen dogs in your life but someone pointed out to you, that the wolf you are seeing right now is not a dog, but instead their undomesticated, wild ancestor, you would quickly realize that there are creatures similar to dogs, that you have possibly already seen dogs that might have been wolves without realizing it, and other pets could have similarly looking undomesticated ancestors as well. You will most likely be able to distinguish the two species from now on without having to take another look at all the dogs you have seen in your life so far and needing a few hundred pictures of wolves preferably from every side in every position they can take in different lighting conditions. You would also have no problem believing that a vague cartoon drawing of a wolf is still somewhat a representation of a wolf that has some of the properties of actual real life animals, while also carrying anthropomorphic features that no real wolves have. You would not be confused if someone introduced you to a guy called Wolf either. Artificial Intelligence can not do this (even though artificial neural networks don’t have to rely so much on hand crafted features, unlike most other types of machine learning algorithms). Machine learning models, including Deep Learning models learn the relationships in the representation of the data. This also means that if the representation is ambiguous and depends on context, even the most accurate models will fail, as they would output results that are only valid during different circumstances (for instance if certain tweets were labelled sad and funny at the same time, a sentiment analysis would have a hard time distinguishing between them, yet alone understanding irony). Humans are creatures evolved to face unknown challenges, to improve their views on the world and build upon previous experiences — not just brains to do classification or regression problems. But how we do all this is still beyond our grasps. However, if we were ever to build a machine as intelligent as humans, it would automatically be better than us, due to the sheer speed advantages silicone has.

The reason adversarial attacks can trick neural networks is because they do not “see” the same way we do. They do learn relationships in image data and can come to similar conclusions as we do when classifying, but their internal models are different from ours.

Even though artificial intelligence was inspired by our own, the advancements in the field in return help biologists and psychologists to better understand intelligence and evolution. For instance the limitations of certain learning strategies become clear when modeling agents (like how evolution must be more complex than just random mutation). Scientists used to believe that the brain has ultra specialized neurons for vision that become even more and more complex to be able to detect more complex shapes and objects. However, it is now clear that the same kinds of artificial neurons are able to learn complex shapes by having other, similar neurons learn less complex forms and properties, and by detecting if these lower level representations are signaling.