Ivan (center) discussing data science projects with Insight Fellows

If you are a biologist, you may think data science is a field far away from your background, and only folks from computer science or math are suitable for the job. That’s what I was thinking two years ago. The truth is, your scientific training in life sciences could have prepared you well for a career of data science in the healthcare industry. Here I would like to share my story and provide tips on what biologists can do to make the transition smoother.

The central function of a data scientist is to extract information from complex datasets and provide actionable insights that inform research strategy and business decision-making. As healthcare enters the digital age and continues to merge with the broader tech industry, the scope of data science needs is massive. Data scientists in this emerging space are regularly tasked with tackling sophisticated problems, such as reducing the burden of repetitive tasks on doctors, developing high-performance genomic analysis platforms, identifying new molecular targets for drug discovery, optimizing clinical trial procedures, analyzing electronic medical data to improve patient care, and forecasting disease progression to reduce mortality rates.

With biomedical research developing at a rapid pace, biologists have proved the necessity of adopting concepts and tools from other areas including machine learning, computational chemistry, engineering, mathematics, and physics. The shift of paradigm drives many initiatives in the health and data space, such as precision medicine, value-based healthcare and genomics. On the other hand, the differences in the healthcare and tech industry tend to favor data scientists who understand the domain and its challenges.

Take genomics as an example. While the sequencing of the first human genome took over a decade to complete and cost over $2.5 billion, recent technological breakthroughs have made the price of sequencing substantially more affordable, transforming the field of genomics and medicine in the processes. Today, the unimaginable $1000 genome is a reality. Genome sequencing technologies are routinely applied to monitor disease progression, evaluate drug efficacy, and develop personalized treatment plans. In addition, genomic tools have been widely implemented throughout biomedical research and pushing the boundaries of scientific discovery. If you are actively applying genomic tools in your research, I would strongly recommend you to figure out the mechanism underlying algorithms and analyses because you will learn concepts of computer science and statistics — I was in a similar situation not too long ago.

I was a PhD student pursuing a degree in biochemistry at the University of Illinois at Urbana-Champaign. Yet, I was especially interested in the interface between computer science and biology. With the goal of finding predictive biomarkers for cancers and potential drug candidates, I spent much of my time applying genomic methods to guide my biomedical research. Because my laboratory focuses on wet lab experiments, it required a great deal of independent effort to move this computational work forward. While it had been challenging and enjoyable to carry out this work on my own, it was a bit of struggle to gain a solid understanding of genomics algorithms coming from a biochemistry background. To fill in this gap, I took many graduate level courses in statistics, computer science, and bioinformatics, including probability, statistical learning, algorithm, database systems, distributed systems, artificial intelligence, and machine learning. I also utilized online resources to learn basic computer science fundamentals, such as programming in C++ and Java, data structure, and system programing. Then I stumbled upon data science. Since data science is such a nascent field, I heavily relied on the “Preparing for the Transition to Data Science” blog post to firm up my technical skills. In addition, I spent my spare time working on data science side projects to sharpen my Python skills and to build up my portfolio.

But is this enough? As I dug more into learning all the technical skills for data science, I came to realize that successful data science candidates do a few extra things. First, they understand how to transform their domain knowledge in a business context. Second, they spend time figuring out their career objectives by interacting with leaders in data science at every opportunity. Finally, they prepare for job interviews with a balance of technical and non-technical subjects. Although the technical bar may vary widely, the core areas of focus are consistent — companies want to assess your coding skills, statistical knowledge, business intuition, and ability to communicate effectively. These questions can be intimidating if you have never interviewed outside of academia.

Insight alumni, now data scientists in industry, meeting at a conference

As a graduate student or academic professional, you may be asking yourself: How can I identify leaders in the data science field? Who should I connect with to learn more about a career in data science? What are the best strategies to prepare for interviews? What are the tools I need to learn? As I have learned from my experience, the landscape of preparing for interviews, finding people to talk to, and searching important industry players is tough to navigate alone. I was eager to find a program that could facilitate my transition into data science industry without breaking my bank account; I participated in the Insight Health Data Program in Boston.

Insight is a full-time and tuition-free Fellowship program that helps PhDs and postdocs with a quantitative background make the transition from academic research to data science roles in industry. Throughout the program, I gained hands-on experience using cutting-edge machine learning techniques and industry-relevant tools to create data products that align with business needs. Additionally, I developed my project under the supervision of industry mentors and was exposed to the pace and environment of the non-academic world. It has been valuable experience as a very first step in industry. This was by far the best decision I have made in my career development.