There are more than 15 approaches to machine learning, each of which uses a different algorithmic structure to optimise predictions based on the data received. One approach — ‘deep learning’ — is delivering breakthrough results in new domains and we explore this below. But there are many others which, although they receive less attention, are valuable because of their applicability to a broad range of usage cases. Some of the most effective machine learning algorithms beyond deep learning include:

‘ random forests’ that create multitudes of decision trees to optimise a prediction;

that create multitudes of decision trees to optimise a prediction; ‘ Bayesian networks’ that use a probabilistic approach to analyse variables and the relationships between them; and

that use a probabilistic approach to analyse variables and the relationships between them; and support vector machines that are fed categorised examples and create models to assign new inputs to one of the categories.

Each approach has its advantages and disadvantages and combinations may be used (an ‘ensemble’ approach). The algorithms selected to solve a particular problem will depend on factors including the nature of the available data set. In practice, developers tend to experiment to see what works.

Use cases of machine learning vary according to our needs and imagination. With the right data we can build algorithms for myriad purposes including: suggesting the products a person will like based on their prior purchases; anticipating when a robot on a car assembly line will fail; predicting whether an email was mis-addressed; estimating the probability of a credit card transaction being fraudulent; and many more.

Deep Learning: offloading feature specification

Even with general machine learning — random forests, Bayesian networks, support vector machines and more — it’s difficult to write programs that perform certain tasks well, from understanding speech to recognising objects in images. Why? Because we can’t specify the features to optimise in a way that’s practical and reliable. If we want to write a computer program that identifies images of cars, for example, we can’t specify the features of a car for an algorithm to process that will enable correct identification in all circumstances. Cars come in a wide range of shapes, sizes and colours. Their position, orientation and pose can differ. Background, lighting and myriad other factors impact the appearance of the object. There are too many variations to write a set of rules. Even if we could, if wouldn’t be a scalable solution. We’d need to write a program for every type of object we wanted to identify.

Enter deep learning (DL), which has revolutionised the world of artificial intelligence. Deep learning is a sub-set of machine learning — one of the more than 15 approaches to it. All deep learning is machine learning, but not all machine learning is deep learning (Figure 4, below).

Deep learning is useful because it avoids the programmer having to undertake the tasks of feature specification (defining the features to analyse from the data) or optimisation (how to weigh the data to deliver an accurate prediction) — the algorithm does both.

How is this achieved? The breakthrough in deep learning is to model the brain, not the world. Our own brains learn to do difficult things — including understanding speech and recognising objects — not by processing exhaustive rules but through practice and feedback. As a child we experience the world (we see, for example, a picture of a car), make predictions (‘car!’) and receive feedback (‘yes!’). Without being given an exhaustive set of rules, we learn through training.

Deep learning uses the same approach. Artificial, software-based calculators that approximate the function of neurons in a brain are connected together. They form a ‘neural network’ which receives an input (to continue our example, a picture of a car); analyses it; makes a determination about it and is informed if its determination is correct. If the output is wrong, the connections between the neurons are adjusted by the algorithm, which will change future predictions. Initially the network will be wrong many times. But as we feed in millions of examples, the connections between neurons will be tuned so the neural network makes correct determinations on almost all occasions. Practice makes (nearly) perfect.

Using this process, with increasing effectiveness we can now:

recognise elements in pictures;

translate between languages in real-time

use speech to control devices (via Apple’s Siri, Google Now; Amazon Alexa and Microsoft Cortana);

predict how genetic variation will effect DNA transcription;

analyse sentiment in customer reviews;

detect tumours in medical images; and more.

Deep learning is not well suited to every problem. It typically requires large data sets for training. It takes extensive processing power to train and run a neural network. And it has an ‘explainability’ problem — it can be difficult to know how a neural network developed its predictions. But by freeing programmers from complex feature specification, deep learning has delivered successful prediction engines for a range of important problems. As a result, it has become a powerful tool in the AI developer’s toolkit.

2. How does deep learning work?

Given its importance, it’s valuable to understand the basics of how deep learning works. Deep learning involves using an artificial ‘neural network’ — a collection of ‘neurons’ (software-based calculators) connected together.

An artificial neuron has one or more inputs. It performs a mathematical calculation based on these to deliver an output. The output will depend on both the ‘weights’ of each input and the configuration of ‘input-output function’ in the neuron (Figure 5, below). The input-output function can vary. A neuron may be:

a linear unit (the output is proportional to the total weighted input;

(the output is proportional to the total weighted input; a threshold unit (the output is set to one of two levels, depending on whether the total input is above a specified value); or a

(the output is set to one of two levels, depending on whether the total input is above a specified value); or a sigmoid unit (the output varies continuously, but not linearly as the input changes).

A neural network is created when neurons are connected to one another; the output of one neuron becomes an input for another (Figure 6, below).

Neural networks are organised into multiple layers of neurons (hence ‘deep’ learning). The ‘input layer’ receives information the network will process — for example, a set of pictures. The ‘output layer’ provides the results. Between the input and output layers are ‘hidden layers’ where most activity occurs. Typically, the outputs of each neuron on one level of the neural network serve as one of the inputs for each of the neurons in the next layer (Figure 7, below).

Let’s consider the example of an image recognition algorithm — say, to recognise human faces in pictures. When data are fed into the neural network, the first layers identify patterns of local contrast — ‘low level’ features such as edges. As the image traverses the network, progressively ‘higher level’ features are extracted — from edges to noses, from noses to faces (Fig. 8, below)

At its output layer, based on its training the neural network will deliver a probability that the picture is of the specified type (human face: 97%; balloon 2%; leaf 1%).

Typically, neural networks are trained by exposing them to a large number of labelled examples. Errors are detected and the weights of the connections between the neurons tuned by the algorithm to improve results. The optimisation process is extensively repeated, after which the system is deployed and unlabelled images are assessed.

The above is a simple neural network but their structure can vary and most are more complex. Variations include connections between neurons on the same layer; differing numbers of neurons per layer; and the connection of neuron outputs into the previous levels of the network (‘recursive’ neural networks).

Designing and improving a neural network requires considerable skill. Steps include structuring the network for a particular application, providing a suitable training set of data, adjusting the structure of the network according to progress, and combining multiple approaches.

3. Why is AI important?

AI is important because it tackles profoundly difficult problems, and the solutions to those problems can be applied to sectors important to human wellbeing — ranging from health, education and commerce to transport, utilities and entertainment. Since the 1950s, AI research has focused on five fields of enquiry:

Reasoning: the ability to solve problems through logical deduction Knowledge: the ability to represent knowledge about the world (the understanding that there are certain entities, events and situations in the world; those elements have properties; and those elements can be categorised.) Planning: the ability to set and achieve goals (there is a specific future state of the world that is desirable, and sequences of actions can be undertaken that will effect progress towards it) Communication: the ability to understand written and spoken language. Perception: the ability to deduce things about the world from visual images, sounds and other sensory inputs.

AI is valuable because in many contexts, progress in these capabilities offers revolutionary, rather than evolutionary, capabilities. Example applications of AI include the following; there are many more.

Reasoning: Legal assessment; financial asset management; financial application processing; games; autonomous weapons systems. Knowledge: Medical diagnosis; drug creation; media recommendation; purchase prediction; financial market trading; fraud prevention. Planning: Logistics; scheduling; navigation; physical and digital network optimisation; predictive maintenance; demand forecasting; inventory management. Communication: Voice control; intelligent agents, assistants and customer support; real-time translation of written and spoken languages; real-time transcription. Perception: Autonomous vehicles; medical diagnosis; surveillance.

In the coming years, machine learning capabilities will be employed in almost all sectors in a wide variety of processes. Considering a single corporate function — for example, human resource (HR) activity within a company — illustrates the range of processes to which machine learning will be applied:

recruitment can be improved with enhanced targeting, intelligent job matching and partially automated assessment;

workforce management can be enhanced by predictive planning of personnel requirements and probable absences;

workforce learning can be more effective as content better suited to the employee is recommended; and

employee churn can be reduced by predicting that valuable employees may be at risk of leaving.

Over time we expect the adoption of machine learning to become normalised. Machine learning will become a part of a developer’s standard toolkit, initially improving existing processes and then reinventing them.

The second-order consequences of machine learning will exceed its immediate impact. Deep learning has improved computer vision, for example, to the point that autonomous vehicles (cars and trucks) are viable. But what will be their impact? Today, 90% of people and 80% of freight are transported via road in the UK. Autonomous vehicles alone will impact:

safety (90% of accidents are caused by driver inattention)

employment (2.2 million people work in the UK haulage and logistics industry, receiving an estimated £57B in annual salaries)

insurance (Autonomous Research anticipates a 63% fall in UK car insurance premiums over time)

sector economics (consumers are likely to use on-demand transportation services in place of car ownership);

vehicle throughput; urban planning; regulation and more.

4. Why is AI coming of age today?

AI research began in the 1950s; after repeated false dawns, why is now the inflection point? The effectiveness of AI has been transformed in recent years due to the development of new algorithms, greater availability of data to inform them, better hardware to train them and cloud-based services to catalyse their adoption among developers.

1. Improved algorithms

While deep learning is not new — the specification for the first effective, multi-layer neural network was published in 1965 — evolutions in deep learning algorithms during the last decade have transformed results.

Our ability to recognise objects within images has been transformed (Figure 9, below) by the development of convolutional neural networks (CNN). In a design inspired by the visual cortexes of animals, each layer in the neural network acts as a filter for the presence of a specific pattern. In 2015, Microsoft’s CNN-based computer vision system identified objects in pictures more effectively (95.1% accuracy) than humans (94.9% accuracy). “To our knowledge,” they wrote, “our result is the first to surpass human level performance.” Broader applications of CNNs include video and speech recognition.

Progress in speech and handwriting recognition, meanwhile, is improving rapidly (Figure 10, bel0w) following the creation of recurrent neural networks (RNNs). RNNs have feedback connections that enable data to flow in a loop, unlike conventional neural networks that ‘feed forward’ only. A powerful new type of RNN is the ‘Long Short-Term Memory’ (LSTM) model. With additional connections and memory cells, RNNs ‘remember’ the data they saw thousands of steps ago and use this to inform their interpretation of what follows — valuable for speech recognition where interpretation of the next word will be informed by the words that preceded it. From 2012, Google used LSTMs to power the speech recognition system in Android. Just six weeks ago, Microsoft engineers reported that their system reached a word error rate of 5.9% — a figure roughly equal to that of human abilities for the first time in history.