The idea of artificial intelligence – and machine learning, a subset of the genre – conjures images of shiny metal robots trooping to work on Threadneedle Street to set the UK’s monetary policy. International Monetary Fund (IMF) chief Christine Lagarde envisaged such a scenario in a recent speech in London, though she concluded robots wouldn’t make for good central bankers because machines follow rules, whereas a good central banker requires discretion to respond to surprises.1 “In 2040, the governor walking into the Bank will be of flesh and bones, and behind the front door she will find people – at least a few,” Lagarde predicted.

Uses for machine learning have abounded in recent years as companies, academics and central banks have embraced the technology, supported by increasingly powerful computers. The most eye-catching examples of machine learning in action see robots taking on humanoid roles, whether it is defeating the world champion at Go, navigating a car along a busy street or holding something that resembles a conversation.

Economists who use machine learning tend to stress its more prosaic side, however. It is a powerful tool if used to answer the right questions, but also a statistical technique that should be viewed as just one element of the econometrician’s toolkit, rather than an entirely new way of thinking about statistics.

Stephen Hansen, an associate professor at the University of Oxford, is making use of machine learning to analyse text. The technique he is applying – latent Dirichlet allocation (LDA) – resembles factor models employed by central banks. “Conceptually, it is not so different from the kinds of dimensionality‑reducing techniques that central banks are already using, it’s just that it’s being applied to text data,” he says. One of the challenges of working with text is its large number of dimensions – machine learning helps to pick out the ones that matter.

Pinning down the concept

At its heart, machine learning is a tool for the automated building, selection or refinement of statistical models. Computers are uniquely well suited to picking through vast quantities of data in search of patterns – which is why machine learning is normally applied to large, often unstructured datasets. It can also work well on smaller datasets that may have other awkward features, such as lack of structure or high dimensions.

While there is a multitude of possible approaches, in general, machine learning proceeds through training, validation and testing phases, with the dataset carved up into three pieces accordingly. The system trains on the first section – often around 60% of the data – before the model is refined.

A big part of the power of machine learning comes from optimization – computers are very good at choosing parameters to minimise a loss function based on the training data. Where the computer thinks a variable is irrelevant or duplicated, its parameter can be set at zero to eliminate it. The problem is that the model may fit the training data so closely that it fails as soon as it encounters anything new – a problem known as overfitting. The validation phase, working on around 20% of the data, helps to avoid this. A common technique is “regularisation”, in which more complex models are penalised. By trading off goodness-of-fit against simplicity, the computer can find a model that is most likely to succeed out of sample.

In the testing phase, the model is run on the final 20% of the dataset to make its predictions. If the results are good, with low errors, the model may be used in further analysis. Otherwise it will be returned for refinement.

Machine learning is generally split into supervised and unsupervised learning. In supervised learning, the system is trained on a set of known outputs – an image recognition program may be trained using images of cats to categorise photos it has not come across before into those that contain cats and those that don’t. Unsupervised learning deals with “clustering”, or asking the computer to find any pattern in the dataset, without the researcher imposing a model.

Putting it to work

Working with high-dimensional models – those with more parameters to estimate than there are observations – is challenging. “It is problematic because I am trying to get a lot of information from a very small amount of information,” says Jana Marečková, a PhD candidate at the University of Konstanz, who has been putting various types of machine learning into practice.

One of her projects has been to detect structural breaks in time‑series data. She makes use of a model where the parameter vector can change at any moment, requiring estimation of the number of parameters multiplied by the number of time periods – a high-dimensional problem. The assumption, however, is that, until there is a structural break, the parameters will be constant. Her choice of regularisation method simultaneously finds the position of the structural breaks and the parameter estimates.

A second project makes use of clustering analysis to find patterns in a survey of non-cognitive skills, mapping these on to labour market outcomes. The 1970 British Cohort Study tracks a group of people born in 1970, asking them a set of questions every four or five years throughout childhood and into adulthood, yielding a rich dataset on broad aspects of their lives, including non-cognitive skills and economic outcomes. “When I compared my results with the psychology literature, I was able to label the groups by the measures that are already known in psychology,” Marečková says. “That was a really nice result for me. The machine‑learning technique found the right grouping.”

The results, set out in a paper co-authored with Winfried Pohlmeier,2 suggest non-cognitive skills – those relating to an individual’s personality or temperament – have a significant impact on how likely a person is to find employment. Marečková and Pohlmeier also seek evidence of an impact on wages, but the relationship proves weaker.

FOMC transcripts

Hansen, of the University of Oxford, similarly uses an unsupervised learning technique to analyse transcripts of Federal Open Market Committee (FOMC) meetings. While others have used dictionary techniques – picking out keywords for a machine to find, for example – his approach, with co-authors Michael McMahon and Andrea Prat, employs LDA, a Bayesian factor model designed to find patterns in text without human guidance.3

The researchers identify around 10,000 unique words across every transcript produced during Alan Greenspan’s tenure as chair, which LDA is able to boil down into about 40 topics. Each FOMC meeting can then be represented as the percentage of time spent on each topic. In this way, the authors are able to construct a unique dataset from information that at one time only a human could process by reading. With the study comprising 149 meetings, 46,502 unique interjections, and 5.5 million words, there is only so much value a human could extract – even with a lot of time on their hands.

“A lot of machine‑learning literature developed with prediction in mind, so the question central banks tend to ask themselves is: can I draw on this toolkit to improve my forecasting?” Hansen says. “But machine‑learning techniques can also be used to represent new data.”

While LDA is much like other factor models, Hansen notes that text throws up unusual challenges. First, it is high‑dimensional: “The dimensionality is really an order of magnitude greater than in quantitative time series.” Second, it is very sparse, meaning each transcript will use only a subset of the 10,000 words.

Hansen says the Bayesian nature of the model helps handle situations where an FOMC member only says a few words – it will still allocate shares of the 40 topics, but you should not take the data “too seriously”, he says. “These Bayesian models allow you to not overfit the model based on these limited data points.”

He and his co-authors exploit a natural experiment, since during Greenspan’s tenure he realised tapes of the FOMC meetings were not being erased once minutes were written up, as members of the committee had previously thought. The decision was later taken to publish the transcripts with a lag, allowing researchers to examine how the quality of discussion changed before and after committee members knew their exact phrasing was the subject of historical record. The authors find that the quality of discussion did shift: “We show large behavioural responses to transparency along many dimensions,” they write. “The most striking results are that meetings become less interactive, more scripted and more quantitatively oriented.”

FOMC members also change their voting patterns as they become more experienced, the researchers find – becoming more likely to challenge the consensus view, and speaking more broadly and less quantitatively. The authors attribute this to the reduced concern each member has over their career later on in their terms.

Going deeper

All of the more “sci-fi” applications of machine learning, from self-driving cars to AlphaGo – the system that beat Fan Hui at the game he’d spent his life studying – are based on an idea called “deep learning”. The technique utilises artificial neural networks – so called because they mimic the patterns of neurons and synapses in the human brain.

A deep learning system is structured in layers, each a set of nodes, connected to the nodes in the next layer via a series of links. Input data is processed through any number of “hidden” layers to produce a final output. For instance, data on driving conditions – the input – is processed into a decision – the output – by a self-driving car.

The technique can be very powerful, allowing robots to mimic human behaviour and decision-making. But moral dilemmas reminiscent of Isaac Asimov stories also emerge. A car could easily find itself in a version of the philosophical “trolley problem” – should it kill one to save many? Swerve into a lamppost to avoid the schoolchildren who just stepped out into the road? Engineers training their cars might end up deliberately teaching them to kill their passengers in certain situations.

Economists making use of deep learning are less likely to encounter such knotty moral issues, but there are still plenty of challenges. Peter Sarlin, an associate professor at the Hanken School of Economics, used to build early warning models based on machine learning for the European Central Bank. One of his recent projects with Samuel Rönnqvist uses neural networks in two stages to build an early warning indicator for financial distress.4 In the first stage, Reuters articles are processed to reduce the number of dimensions from millions to the few hundred that contain meaningful information for financial stability. In the second stage, these few hundred inputs are processed into a signal that is represented by just two nodes: distress or tranquillity.

Black boxes

Working with so many dimensions, computers often go beyond human comprehension. “Humans are not necessarily going to understand it, but that is not the value proposition,” says Sarlin. “The value proposition is that we are capable of understanding and computationally analysing human input, and relating that to those events that we want to pinpoint.”

An issue for those making use of neural networks, particularly central banks trying to set policy, is that it is hard to know exactly what is going on in the hidden layers – the models are “black boxes”. That means, however powerful the program is, and even if it is right 100% of the time, it will be difficult for a policymaker to justify a course of action based on the model output. “What we have done here is a lot more advanced than what we have done with central banks around early warning models, in general,” says Sarlin. “At central banks, we have had to end up with models that are fully interpretable.”

Nevertheless, he notes that, even with very complex models, it can be possible to trace the reasoning throughout. “The fact we don’t understand how precisely we came up with the model doesn’t mean we cannot interpret the results,” he says.

Systemic risk

Sarlin believes a particularly valuable use for deep learning is in building more realistic models of systemic risk in the financial system. He says, broadly, there have been two branches of research in the area, running on parallel tracks. Some researchers have concentrated on features of the banks and other financial institutions, such as how particular balance‑sheet characteristics might make them more vulnerable. Others have built network models to map how interconnections can transmit shocks through the system.

Sarlin’s research is now focusing on ways machine‑learning techniques can be used to model risks at the level of individual financial institutions before network theory connects them. In this way, supervisors can build up a much more complete picture of the financial system. “I think that is how, in future, you should be looking at interconnected risk,” he says.

Prediction versus causation

Much of statistics is devoted to establishing evidence of causation, rather than simple correlation, but many of the standard techniques do not work in machine learning – it can be difficult to employ randomised control trials, for example. Chiranjit Chakraborty and Andreas Joseph explore some of these questions in a wide-ranging Bank of England working paper on the topic.5 Machine learning has often been developed for use in the market, with a “what works is fine” attitude, they say.

“Concretely, machine‑learning tools often ignore the issue of endogeneity and its many possible causes,” Chakraborty and Joseph write. “Additionally, there are few known asymptotic or small-sample properties of estimators to draw on. These are serious issues when making decisions and more research is needed.”

Susan Athey and Guido Imbens are working on one method of establishing causal inference.6 Their approach is to estimate heterogeneity in causal effects in experimental studies. The challenge is that some elements of the population have received treatment while others have not, but clearly none can have received both. Athey and Imbens divide up the population into subgroups, then use validation techniques similar to those used to refine standard machine‑learning models, to explain differences in treatment effects between subgroups.

The technique can be applied to gain additional insights into randomised controlled trials that have already been conducted. “A researcher can apply our methods and discover subpopulations with lower‑than‑average or higher‑than‑average treatment effects, and can report confidence intervals for these estimates without concern about multiple testing,” the researchers say.

Future applications

Research into machine learning and the many challenges it poses is clearly active, and many major questions remain unanswered. But with a proliferation of research and the creation of many private sector firms working with machine learning, the ease of putting the techniques into practice is growing rapidly. Google has released a package of open-source tools for deep learning called TensorFlow7, which works via the Python programming language. Other free machine‑learning tools abound – SciKit-Learn8 is another Python-based toolkit that supports methods including clustering, dimensionality reduction and model selection.

As such, machine learning gives central banks a suite of new tools they can put to use at relatively low cost. For example, Rendell de Kort, an economist at the Central Bank of Aruba, presented a machine-learning model for forecasting tourism demand at a conference hosted by the Irving Fisher Committee earlier this year.9 Using neural networks and another method known as “random forests”, de Kort found that machine learning yielded “fairly accurate” estimates of tourism demand without a severe computational burden. However, both techniques do represent black boxes, he noted.

The IMF’s Lagarde may find it unlikely that machines will sit on monetary policy committees any time soon, but the advances in deep learning are opening up areas of artificial intelligence that previously seemed highly speculative. Various companies claim they will have self-driving cars on the market in a few years’ time, and robots are already taking on managerial roles in industries such as asset management. Most applications will likely remain in the narrower econometric space, further from the public eye, but robot central bankers – or at least central bank analysts – do not seem far off. As Marečková says of deep learning systems, there is a long way to go, but “they are going to be really powerful once they start doing what they promise”.

Notes

1. Christine Lagarde, Central banking and fintech – A brave new world?, International Monetary Fund, September 2017.

2. Jana Marečková and Winfried Pohlmeier, Noncognitive skills and labor market outcomes: A machine learning approach, May 2017, https://tinyurl.com/ya4xerxo

3. Stephen Hansen, Michael McMahon and Andrea Prat, Transparency and deliberation within the FOMC: A computational linguistics approach, June 2017, https://tinyurl.com/yajst9qm

4. Samuel Rönnqvist and Peter Sarlin, Bank distress in the news: Describing events through deep learning, 2016, https://tinyurl.com/yddc3bz9

5. Chiranjit Chakraborty and Andreas Joseph, Machine learning at central banks, September 2017, https://tinyurl.com/y87vm5kz

6. Susan Athey and Guido Imbens, Machine learning methods for estimating heterogeneous causal effects, April 2015, https://tinyurl.com/y77qyjzy

7. TensorFlow, www.tensorflow.org

8. Scikit-learn.org, Machine learning in Python, https://tinyurl.com/cuttxkx

9. Rendell E de Kort, Forecasting tourism demand through search queries and machine learning, March 2017, www.bis.org/ifc/publ/ifcb44f.pdf

This article is part of the Central Banking focus report, Big data in central banks, published in association with BearingPoint.