Data Science Has Become Too Vague

Let’s Specialize and Break it Up!

I would not be opposed to downplaying the term “data science” and breaking it up into specialized disciplines. Do not misunderstand, I think the global “data science” movement was necessary and had a positive impact on the curmudgeon corporate world. But the campaign has been won and everybody is bought into the idea. Rather than continuing to evangelize and hire under the “data science” umbrella, perhaps we should allow the dust to settle so people can adjust to the change.

Data science professionals, consider no longer burdening yourselves with the heavy title of “data scientist”. Most of us do not have PhD’s or encyclopedic knowledge on every new topic. Maybe we should specialize and relieve ourselves the pressure of having to know everything. Data science has become too broad of a buzzword, and has become so ubiquitous and vague it’s often meaningless. Why would anybody want to take ownership of something so nondescript?

It’s also interesting how eight years ago, the very folks you would call “data scientists” had these very concerns, and even contested that data science is not a real science. After all, “show me a science that doesn’t involve data”.

In this article, I want to highlight how “data science” has evolved and why it may be time to fragment it.

The Jabberwocky Effect

In 2010, there was a short-lived but memorable U.S. TV series called Better Off Ted. The show is a silly workplace comedy that lampoons corporate culture to a hyperbolic extent. But one episode, Jabberwocky (Season 1 Episode 12), captures the corporate buzzword effect too accurately.

Ted, the lead character, tries to hide budget for a pet project. When his boss Veronica confronts him, he lies and says the funds went to the revolutionary “Jabberwocky” project, which he vaguely makes up on the spot.

Here’s the funny part though. Rather than clarify what “Jabberwocky” is, Veronica pretends to be “in the know” fearing to look incompetent for being out of the loop. She pushes the nonexistent Jabberwocky project as top priority on the rest of the company. With hilarious results, every leader and employee works on Jabberwocky having no idea what it is, but would never dare admit their ignorance to each other.

Blindsided by how far it escalated, Ted comes clean to Veronica right before they do a keynote on “Jabberwocky”. Veronica tells Ted to proceed anyway because “products are for people who don’t have presentations”.

I probably do not have to explain the analogy that is “Jabberwocky”. Replace that word with “Blockchain”, “Big Data”, “Bitcoin”, “Artificial Intelligence”, “Internet of Things”, “Quantum Computing”, “Machine Learning”, or “Data Science” and you know exactly what I mean. Corporate culture has long had a history of hyping innovations and people pretending to understand them, only to encounter their limits and chase something else.

Now that I have highlighted the “Jabberwocky Effect”, let’s continue.

A Brief History of Data Science

If you want to define “data science” as anything that has to do with “data”, you can go back to the dawn of computing. If you think math and statistics are crucial to data science just as much as data, you could go centuries back and say statisticians were the original “data scientists”.

For the sake of brevity, let’s go to the 1990’s. Things used to be pretty simple. Analysts, statisticians, researchers, and data engineers were all pretty separate roles with occasional overlap. Tooling stacks often consisted of spreadsheets, R, MATLAB, SAS, and/or SQL.

Of course throughout the 2000’s things were changing. Google pushed data collection and analytics to unimaginable heights. In 2009, Google executives insisted statisticians will have the “sexiest job” for the next 10 years. That was a decade ago, but I recall that being a strange sentiment. But lo and behold, in 2011 “Harvard Business Review” mainstreamed this concept called “data science” and declared it the sexiest job of the 21st century.

It was at that moment the craze started in “Jabberwocky” fashion. Harvard created a void called “data science” and everyone raced to fill it. SQL developers, analysts, researchers, quants, statisticians, physicists, biologists, and a myriad of other professionals rebranded themselves as “data science” professionals. Silicon Valley companies, feeling that traditional role titles like “analyst” or “researcher” sounded too limited, renamed the roles to “data scientist” which sounded more empowered and impactful.

Outside Silicon Valley, this added to the confusion as most folks think of “scientists” as PhD’s in white lab coats. Counter-intuitively, data scientists actually come from many backgrounds (technical and nontechnical) with varying levels of education (BS, BA, MBA, and sometimes PhDs). Many hiring managers, HR departments, and organizations in general struggled to define what they needed in a data scientist, which is why many of you probably have sad anecdotes about a young data scientist getting thrown a MySQL database, but was unable to do anything meaningful with it.

Throw in scaling advancements in data engineering (think “big data”), as well as the rapid advancements of “machine learning”, then the “data science” umbrella gets larger and more vague. More buzzwords are thrown around that many people are saying and yet few understand. Before you know it, “big data” and “machine learning” have become synonymous, and distinction of disciplines becomes lost.

Even worse, companies make uninformed decisions and think they need data science skill “X" (e.g. deep learning) to solve an everyday problem like scheduling, when in fact they need an operations research person who knows search algorithms. What’s hot and current is not the best solution for most problems, and this can be a costly mistake. You can read two of my other articles on this subject below.

The domain of “data science” has been exhausted by the “Jabberwocky” effect. If we want it to continue succeeding we need to specialize it, rather than causing more confusion with generalization.

Reasons to Dissolve “Data Science”

The “data science” push did some great things. It rejuventated old, grumpy businesses to do something fresh and exciting. IT departments, who were traditionally stingy about giving access to data and allowing non-I.T. staff to write code, were forced to evolve and support such initiatives. Most importantly, it democratized technology to so many non-technology professions. The idea that a lawyer can benefit from learning to code is not so fringe anymore, and the rite is no longer reserved for computer scientists, professional programmers, and engineers.

Before you know it, “big data” and “machine learning” have become synonymous, and distinction of disciplines becomes lost.

But this is a sign that the “data science” campaign has succeeded and ran its course. Continuing to push it is starting to become detrimental. Here are some reasons why:

It is Too Broad

Not too long ago, if you got a bachelors degree in “Business Management”, you could easily be upwardly mobile. But today, conventional success often requires specializing and focusing in a specific area, simply because our world has gotten complex. A business student will be much better off studying finance, supply chain management, operations research, accounting, marketing, or some other specific business discipline.

I believe “Data Science” needs to go through a similar transition. Like business itself, there are too many disciplines to expect total mastery. It is unproductive to try learning all of them, especially at once. Of course high-level awareness of what is out there is beneficial. It is also healthy to change interests over time. However, attempting to be omniscient will never yield value. I find this form of unfocused learning unproductive, and it is best caricatured by the comedian Brian Regan: “I wanna learn! I wanna be a learner of things!”

It has always bothered me that “data science” can be creating a chart in Excel or Tableau… as well as building and tuning a neural network classifier. Seriously, what is up with that? These two tasks are thousands of miles apart in their nature, the technical skill needed, and the salary. Writing a SQL query versus building a Bayesian model? These are also unrelated skillsets and definitely not interchangeable. So why do we generalize people with these extremely diverse skills as “data scientists” and make hiring so vague and difficult?

Some folks reading this may argue “well all these disciplines are interconnected and the discipline of “data science” helps unify and integrate them all.” That’s arguable to some degree, but marketing, finance, supply chain, accounting, and other business functions are interconnected as well. Despite a common objective, they still are distinct areas and we no longer put emphasis on the whole of “business management”. Fragmentation and specialization is part of a domain maturing, and over time those get more attention than the domain itself.

It has always bothered me that “data science” can be creating a chart in Excel or Tableau… as well as building and tuning a neural network classifier. Seriously, what is up with that?

It is Overwhelming

One of the things that prompted me to write this article is the growing number of articles from data scientists confessing their feelings of “imposter syndrome”. There is this one which I’ve seen circulating. There is also this one. As time progresses, more data science professionals continue to come forward and confess their feelings of fraudulence. Professionally, the burden of Imposter Syndrome can fill you with dread and keep you up at night. The question always lingers “How long will it be until I’m discovered for the fraud I am?”

But I believe this a symptom of the larger issue in this article. It took me way too long to figure out that “data science” has become anything and everything related to “data”. Sadly, there are folks that take it upon themselves to own all that. Why anyone would want to is beyond me.