$\begingroup$

Having recently graduated from my PhD program in statistics, I had for the last couple of months began searching for work in the field of statistics. Almost every company I considered had a job posting with a job title of "Data Scientist". In fact, it felt like long gone were the days of seeing job titles of Statistical Scientist or Statistician. Had being a data scientist really replaced what being a statistician was or were the titles synonymous I wondered?

Well, most of the qualifications for the jobs felt like things that would qualify under the title of statistician. Most jobs wanted a PhD in statistics ($\checkmark$), most required understanding experimental design ($\checkmark$), linear regression and anova ($\checkmark$), generalized linear models ($\checkmark$), and other multivariate methods such as PCA ($\checkmark$), as well as knowledge in a statistical computing environment such as R or SAS ($\checkmark$). Sounds like a data scientist is really just a code name for statistician.

However, every interview I went to started with the question: "So are you familiar with machine learning algorithms?" More often than not, I found myself having to try and answer questions about big data, high performance computing, and topics on neural networks, CART, support vector machines, boosting trees, unsupervised models, etc. Sure I convinced myself that these were all statistical questions at heart, but at the end of every interview I couldn't help but leave feeling like I knew less and less about what a data scientist is.

I am a statistician, but am I a data scientist? I work on scientific problems so I must be a scientist! And also I work with data, so I must be a data scientist! And according to Wikipedia, most academics would agree with me (https://en.wikipedia.org/wiki/Data_science, etc. )

Although use of the term "data science" has exploded in business environments, many academics and journalists see no distinction between data science and statistics.

But if I am going on all these job interviews for a data scientist position, why does it feel like they are never asking me statistical questions?

Well after my last interview I did want any good scientist would do and I sought out data to solve this problem (hey, I am a data scientist after all). However, after many countless Google searches later, I ended up right where I started feeling as if I was once again grappling with the definition of what a data scientist was. I didn't know what a data scientist was exactly since there was so many definitions of it, (http://blog.udacity.com/2014/11/data-science-job-skills.html, http://www-01.ibm.com/software/data/infosphere/data-scientist/) but it seemed like everyone was telling me I wanted to be one:

Well at the end of the day, what I figured out was "what is a data scientist" is a very hard question to answer. Heck, there were two entire months in Amstat where they devoted time to trying to answer this question:

Well for now, I have to be a sexy statistician to be a data scientist but hopefully the cross validated community might be able to shed some light and help me understand what it means to be a data scientist. Aren't all statisticians data scientists?

(Edit/Update)

I thought this might spice up the conversation. I just received an email from the American Statistical Association about a job positing with Microsoft looking for a Data Scientist. Here is the link: Data Scientist Position. I think this is interesting because the role of the position hits on a lot of specific traits we have been talking about, but I think lots of them require a very rigorous background in statistics, as well as contradicting many of the answers posted below. In case the link goes dead, here are the qualities Microsoft seeks in a data scientist: