Why Economists should embrace Data Science

An opinion piece written By Mark Farragher, Associate Data Scientist at Triptease and Applied Data Science Bootcamp alumnus

Economists can make great data scientists and complement the skillsets of Data Scientists who have other backgrounds. I have a background in economics but branched out my skillset into Data Science more recently. Economics and Data Science actually have a lot in common.

Both disciplines have solid foundations in statistics, seek to solve quantitative problems through modelling and require strong analytical skills. Data science is also revolutionising some of the sectors in which economists work: e.g. banking, finance, public policy and consulting.

In this post I will make a case for why more economists should embrace Data Science tools and techniques.

To help make that case, I will examine the two disciplines in more detail and compare them:

How do economics and Data Science as disciplines compare?

What is the nature of scientific enquiry in economics?

A brief comparison of economics and Data Science

The main difference between economics and Data Science lies in the focus of their enquiry: the economist focuses on causality, whereas the data scientist focuses on prediction. Economics and Data Science modelling techniques both require statistical assumptions to be met, in order to make inferences from data about a complex system.

Suppose that a system can be represented by a set of linear equations in this form:

y = βX + e (where e is the error term)

The terminology used in economics and Data Science is different, but both have a language to describe that same system. Economists would know the matrix X as a matrix of regressors or independent variables (also include here a constant term), whereas in Data Science it would be a feature matrix. In a similar vein, y is an array that represents the dependent variable to economists, but in Data Science terminology it would be the target variable.

Here is the key difference in how the disciplines examine the system:

Economics cares most about the estimation of the regressors’ coefficients (β).

Data science cares most about predicting the target variable y.

Suppose that we have a dataset of millions of commercial loan payments in a bank, with rich data covering details all the way from application stage to payment history. Economic techniques lend themselves well to answer a question such as ‘what are the main factors that increase the credit risk of commercial loans?’ (focus is on causality). Data science techniques would work better for a question such as ‘what is the best model to predict the credit risk of commercial loans?’ (focus is on prediction).

Those research areas are both important and could be asked by different stakeholders within a large bank. A financial analyst may want to identify ways to rebalance a portfolio by reducing exposure to the most significant credit risk factors, while someone in another analytics team may be more concerned with predicting credit risk for new applicants via machine learning.

That is my core exposition on how the two disciplines compare, but they differ in other ways. For example, the toolkit of economics is typically associated with proprietary software like Stata and Matlab, but the toolkit of Data Science is associated more with open-source programming (e.g. Python) and big data frameworks (e.g. Spark).

The table below presents other means of comparison, e.g. model validation techniques and data collection.

The nature of scientific enquiry in economics

The early development of Data Science as a field was more closely aligned with computer science and engineering skills, but the proliferation of cheap data storage and cloud computing has enabled more specialist paths to emerge. Sophisticated tracking of metadata and micro events has opened up more opportunities for domain experts in sectors that have embraced big data to approach business problems in a more scientific way. That includes practitioners of social sciences such as economics.

I work at Triptease, a start-up that provides a suite of products to hotels in order to drive more bookings to their websites and enhance their online experience. Thousands of hotels use the platform and every visit to a hotel website leaves a trail of dozens of events that can be tracked. I am part of a new Data Science team at the company and have already used experimental methods such as A/B testing to influence product development. There is a vast amount of data available, which is so valuable for measuring performance and building up a solid evidence base before making major decisions that influence the direction of a product.

My economics background and Data Science skills together have certainly given me an edge in hypothesis-driven development that examines consumer behaviour and purchasing decisions.

Analytical rigour is important and I need it every day in my job, but that rigour has its greatest influence when it is accessible to a less technical audience. The economist Alfred Marshall famously brought more rigour to economics with mathematical modelling, but wanted to disseminate his theories (e.g. consumer surplus) to a wider audience through these rules:

Use mathematics as shorthand language, rather than an engine of inquiry.

Keep to them till you have done.

Translate into English.

Then illustrate by examples that are important in real life.

Burn the mathematics.

If you can’t succeed in (4), burn (3).

In a similar vein, it can be tricky in a Data Science project to ‘burn the maths’ and make the output ‘translate into English’. Models with high performance often have no practical interpretability. It is a tough task to illustrate real-life applications of ensemble methods and deep learning in an accessible way. Communication skills are critical for making Data Science projects have a wide impact, yet they are neglected in comparison to coding and statistics. I write blog posts about my own Data Science projects and take heed of Marshall’s approach, in order to develop my writing skills and focus on the applications of Data Science.

Economics and Data Science complement each other

There are many sectors that are being disrupted by Data Science: e.g. finance, insurance, healthcare; public policy. They are highly regulated sectors so they also need thorough documentation of modelling processes. Many economists go into client-facing roles in those sectors, so they become well-accustomed to summarising the output of quantitative projects to clients and develop a thorough domain knowledge (e.g. of regulatory issues).

I believe that economics and Data Science complement each other greatly. They both have a lot in common, such as a solid foundation in statistics. However, there is much that each discipline can teach the other. Economists have a thorough knowledge of areas such as finance and consumer behaviour, yet do not have much exposure to key Data Science tools in the academic curriculum. Economists have a tradition in communicating mathematical models to a wider audience, but that kind of communication can be tricky for data scientists if the models they use are very abstract.

These are examples that come to mind on why economists should learn Data Science skills:

Open-source programming is growing rapidly. There are R and Python packages that make analytics workflows easier compared to proprietary software (e.g. data cleaning, scripting to automate tasks, more flexible modelling).

The learning curve to get started in Data Science is not steep, even compared to the learning curve faced by some software engineers. The statistical foundations of machine learning are similar to those used in econometrics.

Business problems now draw upon data on a vast scale, e.g. metadata. Interacting with that data requires knowledge and experience with various data types, databases and specialist big data tools.

There are already positive signs of the economics profession embracing Data Science. The 2018 Nobel Prize in Economics was awarded to Paul Romer, who has embraced Jupyter Notebooks for their value in reproducing and sharing research. The Economist newspaper has also released its Big Mac Index data for its first open-source project, which has scripts written with R.

I certainly encourage anyone who has studied economics to learn some Data Science skills. It will allow you to do statistical modelling with open-source tools and let your mind approach problems from a different perspective at the very least. However, the fun for me comes in being able to use a variety of Data Science techniques to tackle novel business problems and pursue my own personal projects.