Tonight we are hosting a speech at OpenDNS by Raja Iqbal (https://www.linkedin.com/in/rajaiqbal) about machine learning and using the new Microsoft Azure Machine Learning system (https://azure.microsoft.com/en-us/services/machine-learning/). Due to Raja's availability, this is an out-of-cycle event that is just a speech rather than a full OpenLate meetup.



6:30PM: Doors Open



7:00PM: Talk Begins



About the Talk:



Overview:



Feature engineering refers to the process of visualizing and exploring data to find and sometimes create useful features out of existing data. A big chunk of time should be spent in feature engineering. Once useful features are present, any off the shelf predictive modeling algorithm can provide decent performance in most cases.



In this talk, we will go through the steps involved in building a predictive model for a classification problem. Below is an overview of the talk:



Data Set:



We will be using Titanic data set for this tutorial. Details here: https://www.kaggle.com/c/titanic-gettingStarted



Tools:



We will be using R and Azure Machine Learning Studio for this tutorial.



Exploration and Visualization:



• Getting familiar: Sampling and eyeballing data



• Understanding class distribution: Pie charts in R.



• Understanding feature values and distribution: Box plots, histograms, density plots, box and whisker plots, violin plots, scatter plots in R



• Feature processing: Missing values, creating more features, reducing dimensionality



Building A Predictive Model:



We will build a predictive model using the random forest R package. We will look at things like training error, variable importance and various metrics for classifier evaluation.



Azure Machine Learning Studio Demo:



In the end, we will show how the whole work flow can be built in Azure Machine Learning studio.