Big data is like teenage sex: everyone talks about it, nobody really knows how to do it, everyone thinks everyone else is doing it, so everyone claims they are doing it… — Dan Ariely

This quote is so apt. Many junior data scientists I know (this includes myself) wanted to get into data science because it was all about solving complex problems with cool new machine learning algorithms that make huge impact on a business. This was a chance to feel like the work we were doing was more important than anything we’ve done before. However, this is often not the case.

In my opinion, the fact that expectation does not match reality is the ultimate reason why many data scientists leave. There are many reasons for this and I can’t possibly come up with an exhaustive list but this post is essentially a list of some of the reasons that I encountered.

Every company is different so I can’t speak for them all but many companies hire data scientists without a suitable infrastructure in place to start getting value out of AI. This contributes to the cold start problem in AI. Couple this with the fact that these companies fail to hire senior/experienced data practitioners before hiring juniors, you’ve now got a recipe for a disillusioned and unhappy relationship for both parties. The data scientist likely came in to write smart machine learning algorithms to drive insight but can’t do this because their first job is to sort out the data infrastructure and/or create analytic reports. In contrast, the company only wanted a chart that they could present in their board meeting each day. The company then get frustrated because they don’t see value being driven quickly enough and all of this leads to the data scientist being unhappy in their role.

Robert Chang gave a very insightful quote in his blog post giving advice to junior data scientists:

It’s important to evaluate how well our aspirations align with the critical path of the environment we are in. Find projects, teams, and companies whose critical path best aligned with yours.

This highlights the 2-way relationship between the employer and the data scientist. If the company isn’t in the right place or has goals aligned with that of the data scientist then it’ll only be a matter of time before the data scientist will find something else.

For those that are interested Samson Hu has a fantastic series on how the analytics team was built at Wish which I also found very insightful.

Another reason that data scientists are disillusioned is a similar reason to why I was disillusioned with academia: I believed that I would be able to make a huge impact on people everywhere, not just within the company. In reality, if the company’s core business is not machine learning (my previous employer is a media publishing company), it’s likely that the data science that you do is only going to provide small incremental gains. These can add up to something very significant or you may be lucky to stumble on a gold mine project but this is less common.

2. Politics reigns supreme

The issue of politics already has a brilliant article dedicated to it: The most difficult thing in data science: politics and I urge you to read it. The first few sentences from that article pretty much sum up what I want to say:

When I was waking up at 6 AM to study Support Vector Machines I thought: “This is really tough! But, hey, at least I will become very valuable for my future employer!”. If I could get the DeLorean, I would go back in time and call “Bulls**t!” on myself.

If you seriously think that knowing lots of machine learning algorithms will make you the most valuable data scientist then go back to my first point above: expectation does not match reality.

The truth is the people in the business with the most clout need to have a good perception of you. That may mean that you have to constantly do ad hoc work such as getting numbers from a database to give to the right people at the right time, doing simple projects just so that the right people have the right perception of you. I had to do this a lot in my previous place. As frustrating as it can feel, it was a necessary part of the job.

3) You’re the go to person about anything data

Following on from doing anything to please the right people, those very same people with all of the clout often don’t understand what is meant by “data scientist”. This means that you’ll be the analytics expert as well as the go-to reporting guy and let’s not forget that you’ll be the database expert too.

It isn’t just non-technical executives that make too many assumptions about your skills. Other colleagues in technology assume you know everything data related. You know your way around Spark, Hadoop, Hive, Pig, SQL, Neo4J, MySQL, Python, R, Scala, Tensorflow, A/B Testing, NLP, anything machine learning (and anything else data related that you can think of — BTW if you see a job specification with all of these written on it, stay well clear. It reeks of a job spec from a company that has no idea what their data strategy is and they’ll hire anyone because they think that hiring any data person will fix all of their data problems).

But it doesn’t stop there. Because you know all of this and you obviously have access to ALL of the data, you are expected to have the answers to ALL of the questions by……. well, it should’ve landed in the relevant person’s inbox 5 minutes ago.

Trying to tell everyone what you actually know and have control of can be hard. Not because anyone will actually think any less of you, but because as a junior data scientist with little industry experience you’ll worry that people will think less of you. This can be quite a difficult situation.

4) Working in an isolated team

When we see successful data products we often see expertly designed user interfaces with intelligent capabilities and most importantly, a useful output which, at the very least, is perceived by the users to solve a pertinent problem. Now if a data scientist spends their time only learning how to write and execute machine learning algorithms, then they can only be a small (albeit necessary) part of a team that leads to the success of a project that produces a valuable product. This means that data science teams that work in isolation will struggle to provide value!

Despite this, many companies still have data science teams that come up with their own projects and write code to try and solve a problem. In some cases this can suffice. For example, if all that’s needed is a static spreadsheet that is produced once a quarter then it can provide some value. On the other hand, if the goal is to optimize provide intelligent suggestions in a bespoke website building product then this will involve many different skills which shouldn’t be expected for the vast majority of data scientists (only the true data science unicorn can solve this one). So if the project is taken on by an isolated data science team it is most likely to fail (or take a very long time because organizing isolated teams to work on collaborative project in large enterprises is not easy).