First, let me get this out of the way. This is going to be an emotional post. Knowing myself, it will end up reading like a rant riddled with spelling errors (so not much of difference there). Why, you ask? Because I care about Economics and I’m mad we were robbed a good topic that any Econometrics student should be offered. We are still robbed. So read this like a letter to both students and professors.

Econometrics is about data. Econometrics is about analysis and distilling information to obtain the best picture within the data to mimic the population at large. This isn’t statistics used to unearth correlations but, like any self respecting economists, to unearth causations; always trying to answer the why within a phenomenon. So, on one hand I can forgive the unfortunate avoidance of economists shying away from heavy data crunching. However, it is an unforgivable sin to not mention, at least in passing, the wealth of options surrounding students.

So what is this thing I keep raving about? Well, it is known as data mining and/or machine learning(ML). I will avoid explaining the differences between the two (mostly because the answer is a bit vague, especially for the scope of this article) 1. To explain the field itself, it is using algorithms seep through data and obtain meaningful relationships. Alright, that sounds kinda like Econometrics. And that is exactly my point. Having knowledge of the field makes you an even more complete econometrician. Remember all the linear regressions you made in Econometrics? Well, that is the first algorithm found in intro to ML. Basically, the entire semester I spent learning Mathematical Economics (which is like advanced Econometrics) was over in a week. Then came logistic regression (an even more useful algorithm). Then came neural networks. Then came feed-forward neural networks and (wait for it!) backpropagating networks. OK, I will stop with the forced revision. My point still stands on exciting and useful algorithms that can be used to detect relationships and avoid errors.

Ignoring the hype with big data 2, think of how much data is generated every single second. Think of events happening that were once hard to measure/track: mobile phones, geo-location, PaaS, SaaS and multiple ways fixed costs have become variable costs. Hal Varian puts it best,

“There is now a computer in the middle of most economic transactions. These computer­mediated transactions enable data collection and analysis, personalization and customization, continuous experimentation, and contractual innovation.Taking full advantage of the potential of these new capabilities will require increasing sophistication in knowing what to do with the data that are now available” 3

I should note that Hal is the main reason I am writing this post. He works as the chief economist at Google. He has also written one of the most intriguing papers (Big Data: New Tricks for Econometrics) 4 concerning the future of Econometrics – a must read if you have read this far.

I pointed out exciting algorithms that might change the way we approach analysis. Some algorithms correct and anticipate their own errors! Not even joking! Remember when we had to account for bias in sampling? Well, ML has a better solution for correcting for this automatically 5. All me to quote Varian again –

“Our goal with prediction is typically to get good out-of-sample predictions. Most of us know from experience that it is all too easy to construct a predictor that works well in-sample, but fails miserably out-of-sample. To take a trivial example, ‘n’ linearly independent regressors will fit ‘n’ observations perfectly but will usually have poor out-of-sample performance. Machine learning specialists refer to this phenomenon as the ‘overfitting problem.’ ”

To be on point, you end up having algorithms that penalize themselves 6

It will be unfair to blame undergrad Economics’ syllabus for not including ML concepts. I should note that most of these concepts are relatively new. Case in point, Varian’s paper is still a working paper (last revision a week ago as of publishing this post). ML is also mostly computer science driven. The algorithms are not written with Economic theories in mind. This should not be an excuse however because inter-disciplinary studies are not uncommon. There is also the lack of basic coding knowledge associated with most economics students. I, personally, believe any student taking econometrics and wants to go into the field should, at least, have basic coding skills but that is an argument for another day.

In hindsight, this stopped being angsty rather quickly. However, I am still disappointed I missed out on exciting new topics during my earlier economic analysis lessons. Let this be a lesson to any econometrics student. There are mind-blowing projects and ventures popping up. You should not, however, think you will stop predicting wages versus education and age. That thing haunts you everywhere. Seriously, it’s everywhere!

1. [Stack Exchange has a good discussion on the differences.]

2. [I don’t think it’s even hype anymore. You know it’s mainstream when government scandals are invited to the party!]

3. [Varian, Hal. 2014. Beyond Big Data.]

4. [Varian, Hal. 2013. Big Data: New Tricks for Econometrics.]

5. [I understand that some of these methods are already applied in certain Econometrics works. Feel free to point out other interesting projects using these methods.]

6. [One of the funniest tweets from ML Hipster. You should follow him.]