As I get deeper and deeper into this field we call Data Science, the following questions arise:

How did I get here? What is Data Science? What can I do to spread the word? The answer to this question involves a journey through the internet with guidance from Mike Orenllas and Mike Tamir. This past summer I took my part time work with the research group I had been passively a part of to the next level. Building out an interface for our email mining app and learning how to deploy machine learning models was how I spent my days. I started off building simple scripts to extract metrics from emails that answered questions from this podcast. The codebase had initially been started by Daniel Smilkov, creator of immersion.media.mit.edu. When Daniel left the project and school to join Google Research, I had the task of taking over. I put aside his existing desktop app built using nodewebkit and java programs for dealing with IMAP and started working on a new app with Flask, Context.IO for the IMAP headache, Heroku for deployment, and Postgres for persistence. Data Science has been described to me in many ways: Ryan Orban, founder of Zipfian Academy - “Data Science is like hammering a nail with a bazooka ,“ Cam Davidson-Pilon, author of Probabilistic Programming and Bayesian Methods for Hackers - “Data science is a bag of tricks, you need to know which trick to use,” David Gutelius, founder of The Data Guild - “A Data Scientist must fit the T shaped mold, mixing deep domain knowledge with a wide problem solving toolkit.” The big takeaway from my time in this space is that Data Science is not clearly defined. To me Data Science is the purest form of modern day philosophy: it’s all about knowing which question to ask, and then telling a story as you answer it. Once you have asked the right question you then build an understanding of the problem from the ground up, similar to the way metaphysicists like Descartes once approached their problems. Only now the process involves a lot of Googling and research paper reading. The best talk I went to this summer was lead by the Head of Data Science at Uber, Kevin Novak. Kevin spoke about how the hardest part of his job is being able to throw away all biases when choosing tools, as different problems will require different approaches, and favoring certain algorithms pigeonholes your efforts. This last question is the hardest. For now let me discuss what I won’t do. I promise to avoid using the phrase “Big Data” as it is associated with confusion, and has become the most painful buzzword I can think of. The term “data science” is starting to enter into this category. I also promise to refrain from unleashing a firehose of technologies upon people trying to get their feet wet. For me, Python was the best way to start as it has a coherent problem solving ecosystem consisting of friendly libraries, beautifully readable syntax, and Anaconda makes it easy to get up and running. Here is a more technical guide I put together with resources for doing data analysis with Python — http://goo.gl/KAs52r