The academic trap and data science

How to get a data science position after academia with no previous industry experience

In recent years I’ve been going to college mathematics departments across the country to give talks on how to get into data science. My presentations are targeted at undergraduate math students who don’t know what they are going to do with the degree they’ll soon have (which also describes how I was as an undergrad). Despite the target audience, I always run into at least one person who has a graduate degree and wants help getting out of academia. The question they pose is usually something like:

I have been spending my life working towards being a tenured professor. I have spent years doing research and publishing papers, all for the goal of an academia life. But now I realize that isn’t what I want, and instead I would like to go into industry. I don’t have any industry experience; how do I get in?

They may be a PhD, a post-doc, an adjunct, or even tenure track professor, but regardless of their status the question is the same. These people are in a place I like to call the academic trap. The academic trap is when your career trajectory is so specialized for academia that you’re unprepared for a job outside of it. The academic trap happens in all areas of study, but for this post I will focus only on math and statistics students who want to leave academia for data science positions, since that’s what I am most familiar with.

Academia is place where many people are competing for few positions, and to get a position you need to put all your energy into becoming the best candidate. That means prioritizing writing papers over internships, making grants over learning programming languages, and not doing the things that could help you in industry but not academia. When someone who was been focusing on academia decides to go into industry they are at a serious disadvantage. The things that are importing for academia hiring, such as: papers, talks, and grants, are not things that are taken into considering when hiring in industry. Further, companies are often hesitant to hire people coming straight from academia for a number of reasons:

Salary expectations for advanced degree holders are higher than someone with only an undergraduate degree. A person with a PhD in a STEM field can make twice as much as a data scientist compared to someone with an undergraduate degree, so recruiters expect people coming from academia will want a lot of money.

People from academia don’t have experience in an industry work environment. Working as a data scientist within a corporation requires an understanding of how the business world works, including: how quickly deliverables need to be made, how to craft a good presentation, and how to word an email to make a request. Corporate culture is simply different than academic culture. People coming from academia need to learn these lessons at their first job, which means that there is a lot of risk for the hiring company that the academic won’t be able to quickly learn them. A company would rather hire someone the can trust knows the culture and will be able to handle the job on day one.

They have different motivations. In academia, you are encouraged to spend as much time as you need to find the most innovative and elegant solution. In industry, you are encouraged to spend as little time as possible to find an analytical solution that merely satisfies the need, there is no benefit to more elegant methods. In fact the more complex solutions are often not robust and difficult to implement in practice. That is a huge mental shift that is really hard to make! After doing innovative research, trying to use a simple but imperfect solution often feels like a menial task. Industry work often has mundane parts of it, such as manipulating text files so that they can even be read, that well-educated people are especially displeased to have to do.

Besides companies hesitancy to hire people straight from academia, there is the fact that leaving academia is terrifying! Academia has its own culture and norms which people get used to. Further, within academic settings there is the notion that leaving academia means you are a failure, which in reality couldn’t be further from the truth. These norms make it extremely difficult to have a good transition out.

If you have gotten this far and feel like this is a call out post: breathe, you are not alone and this is a solvable problem. The point I am trying to make is that the transition from academia to industry is a very difficult on, but getting out of the academic trap and into data science can be done!

The first thing to realize is that the required skills to be a data scientist are lower than you think. Plenty of articles on the internet suggest that to be a data scientist you need to understand a master’s collection of algorithms, be a wizard at programming, have a deep understanding of business and so on. But in reality, most data science jobs require very little deep knowledge, but instead the ability to be adaptable and learn new things. As I discuss in my series of posts about how I hire data scientists, when I hire I don’t require someone to have a degree in data science, instead I just want to see that they have the basic skills and can learn how to get things done.

In fact, to get a data science job in industry all you need to convince someone you are equipped to handle it. Convincing someone you can handle a job requires two things: showing you have the prerequisite knowledge and showing you have experience in doing similar work. Let’s dive into those two components in more detail.

Prerequisite knowledge for a data science job

On the programming side that means either learning R or Python. There is plenty of material on the internet on which of the two to learn, but honestly if you learn one it’s easy enough to pick up the other one later. In academia you may have been using a different scientific programming language like MATLAB, SPSS, or god help you FORTRAN, but those really aren’t substitutes. The point you are trying to make to employers is that you won’t have a major problem adapting to a new job, so learn the tools that companies use. In addition to R or Python I would learn a bit of SQL, since that is how most companies store their data.

On the techniques side, I would learn three things:

How to join, filter, and aggregate tables. No matter what sort of data science you’re doing, you’ll be processing data sets from different sources. Learning the fundamental ideas of how to connect data sets together, such as doing an inner join between to tables, will be essential to being a data scientist. These concepts aren’t difficult, but they’ll end up being the majority of what you do on an hourly basis. Linear and logistic regressions. These two techniques are for understanding the relationships between data. A linear regression helps you understand a continuous variable, for example the relationship between square footage of a house and it’s location to the continuous variable of house price. A logistic regression helps you understand a binary variable, for example understanding the relationship between customer spend and the binary variable of if they made a follow-on purchase or not. How to make a good plot. Whatever programming language you use you’ll need to take that data and visualize it. Understand how to make a bar chart, a scatter plot, and other simple visualizations so that you can explore the data.

Those three techniques will get you quite far in data science. Other methods such as factor analysis, text analytics, and deep learning can all come later. Again, for more detail check out my series on how I hire.

Experience doing work related to data science

It’s not enough to show employers that you know the right techniques, you also need to show them that you can work in a corporate environment. Corporate work is unstructured compared to undergraduate and masters level studies: you aren’t given concrete tasks that you’re graded on, instead you’re given a broad assignment and need to figure out what the right approach is. Compared to PhD level research, a corporate environment is extremely structured: you have deadlines for deliverables and you can’t stall because you’re waiting to get the perfect result. So corporate work isn’t similar to anything that someone who has only been in academia has done before.

The traditional way of showing a company you can handle a corporate environment is having worked in corporate environments before. When recruiters look at resumes, the first they gravitate towards is your previous work experience. This makes sense, since the life experiences that are going to most similar to your next job is your previous job. The problem people in the academia trap have is that they don’t have any corporate experience to draw on. If you’re going to leave academia you’ll likely need something that looks at least vaguely similar to work experience.

There are three main ways you can get something that can be a substitute for work experience in the eyes of employers:

Do a side project yourself

Find a project you’re passionate about and try and do something on the side. This could be something like analyzing data from the untappd app, algorithmically generating offensive license plates, or a doing a text analysis of Jane Austen. As long as it’s a topic you are interested in and it involves some semblance of data science it should be good. By having a side project, you’ll be forced to learn how to use tools and techniques of industry. You can use sites like DataCamp to learn the basics, and then figure out the rest on the way. You’ll also have to figure out how to go from an idea to an actual result you share with the world. You can put this on your resume and use it as a point of experience that you talk to employers about.

Upside of a side project: it’s free! Nothing is stopping you from doing this but you. Of the three options, this is the easiest to start.

it’s free! Nothing is stopping you from doing this but you. Of the three options, this is the easiest to start. Downside of a side project: Nothing is motivating you to do this but you. It’s extremely hard to have the motivation to do a side project when you have a life already, especially if you are in academia. And while a side project is good, it will take a bit of effort to spin it so that employers use it as evidence that you have what it takes.

Do a data science bootcamp

There are bootcamps popping up all over that will teach you data science fundamentals over three months, Galvanize in Seattle is one example. They generally cost around $15,000 and go through an introduction to programming, different data science techniques, and end with a capstone project. This capstone will be something employers will use as validation that you can do data science.

Upside of a bootcamp: you will learn a lot, and quickly. The capstones are great projects, and the bootcamps partner with local companies who are looking for data science hires. Of the three methods, this is the most straightforward path to getting a data science job.

you will learn a lot, and quickly. The capstones are great projects, and the bootcamps partner with local companies who are looking for data science hires. Of the three methods, this is the most straightforward path to getting a data science job. Downside of a bootcamp: it’s $15,000. That’s a ton of money, and there is a risk you won’t end up finding a data science job in which case you are out of luck.

Find an internship

There are companies out there which hire graduate students for internships, and this often ends up including post-docs or other early career academics. These internships run over the summer and you are basically a very junior employee. You’ll have a job experience on your resume, and companies will look very highly on that when hiring.