Linda Burch of Burch Works released her latest study on Data Science salaries, demographics and trends. There are a couple of interesting trends beginning to appear that I'd like to comment on. The one that might catch the most headlines is that for the first time in recent years, data science salaries have not increased Year-over-Year by double digits. Of the six categories she splits by, the YoY increases are (7%, 0%, 1%, 0%, 3% , -4%). One category even declined though it's probably statistically consistent with 0% increase. So, we are only seeing about a 1-2% YoY increase in data science salaries overall.

While leveling out of salaries will eventually occur, I suspect it actually isn't quite here yet. The reason is in the details and is actually the most interesting thing to come out of the report. As Linda points out, these YoY figures are not longitudinal. They aren't tracking the same people from year to year. They are simply statistics on each year's survey population compared against each other. So people in the sample this year make only 1-2% more than people in the sample last year.

Changes in patterns of training

The interesting part is there is a significant change in the demographics for this year's sample concerning the fraction of data scientists with PhDs. It's declined from 43% last year to only 28% this year. This is a trend that I expected we'd see but I'm a little surprised how quickly it came.

It's not hard to see the cause of this trend. The last few years has seen an explosion of online and traditional courses and programs for learning data science. Some of these are master's level programs. Most are just certificate programs. Others are no more than educational materials put online. I'm not aware of any PhD program in Data Science, nor would it seem appropriate. PhD data scientists tend to have PhD's from Math, Physics, Computer Science, Statistics and other hard sciences. Data science as a career is never the intention of people entering into such PhD programs. (If it is, they need career counseling!).

Once you acknowledge this population shift, it isn't hard to see why salaries haven't budged. The newer population of data scientist have a shorter amount of training. Data scientists with PhDs across all of her categories still make more than those without.

I think we should hesitate to proclaim that data science is now getting watered down by wanna-be data scientists with sub-standard training. There is surely some truth to that but I think a better way to view it is that data science is starting to become more standardized. And the market has rightly determined that PhD programs are not an efficient way to produce data scientists and could never keep up with the demand anyway.

The data science field is starting to segment both horizontally and vertically; horizontally in terms of varying skill and experience levels and vertically by domain. The rock-star data scientists with deep mathematical insight, strong programming skills and the ability to innovate in any industry still exist, they just will make up a smaller percentage of the field of people we will call data scientists and they were actually rare to begin with.

As for degradation of the job title itself, this unsurprising outcome will be remedied by adding new prefixes like "Chief" and "Executive" or whatever to differentiate seniority within the field as usual.

Why does having a PhD matter at all?

It's important to understand what it is about having a PhD that matters in data science. Part of it is nothing but selection effect. If you got into and successfully out of the PhD Math program at Harvard, you are incredibly smart. You might not be incredibly useful but "incredibly smart" is a good starting place for just about every career. But the thing is that this person might have made a great data scientist if they started working right after their undergraduate degree. They probably didn't increase their skill in data science by spending 5 years studying "Cohomological Aspects in Complex Non-Kähler Geometry". They have perhaps learned a few useful things but for the most part, they just got 5 years older.

Now other PhD programs like physics, astronomy or statistics teach more useful data skills but, even so, the majority of the time you are learning something field specific not things that are generally useful. And really, you are trained to do research which is useful in data science but not equivalent to it. So while these programs can create excellent data scientists, they do not do so very efficiently.

So, I think taking that ultra smart math student after undergrad and putting her through a 6 month data science course is a much better way to train data scientists than most PhD programs. It doesn't however change the fact that the person is much more likely to succeed at that if they were smart enough to be admitted to those PhD programs to begin with. It might not have much to do with what you actually learned in grad school. You would have probably learned more useful data science skills by actually doing data science in industry. Hence the selection effect that people with PhDs will probably still be better data scientists, on average, than people without them even if it is not necessarily a causative relation. It's similar to how professional football players probably make much better boxers than the rest of us though that is not the ideal way to train boxing champs.