My Why

What I wanted from this program was to gain a deeper understanding of the data analysis process, further my knowledge of analytics, and truly grasp underlying mathematical principles. Oh, and I also wanted to learn to program using Python. What sold me on Udacity over a traditional MOOC were their mentor services and the opportunity to get your hands dirty while working on real-world projects.

Since then, mentor services have been replaced with a “Study Group” community that is moderated by these same mentors. The vibe is similar to Slack environments, and can be kind of noisy. However, the benefit is that most students come with their own unique background and “expertise”, and are willing to help others even with troubleshooting. Project and Code Reviews are still there though, and don’t seem to be going away.

The Format

Udacity split the content into two terms — Term 1 at 499 USD and Term 2 at 699 USD. To graduate, you only needed to complete Term 2. So, if you felt comfortable enough, you could go directly to Term 2. I chose to do both terms. The time investment is on average 10 hours a week, but at least for me in Term 1 — that number was very optimistic. I probably spent closer to 15.

While I was wrapping up the program, Udacity switched the format to be a single term at 999 USD. SQL and Python are now its own Nanodegree — Programming for Data Science. The change has caused a lot of controversy, but as someone who practically had no exposure to Python or really with any procedural language, plus watching a lot of people struggle to get through the program on time, I welcome the change. I found it odd that Visualizing Data with Tableau was a Term 2 module, while Statistics and Probability was Term 1 module and technically something that could be skipped. Strong mathematical foundations are important, especially if you want to get into Machine Learning!

The new curriculum:

Small primer on SQL and an Intro Project — Explore Weather Trends (from Term 1);

Intro to Data Analysis and the Investigate a Dataset project (from Term 1);

Practical Statistics and the Analyze A/B Test Results project (from Term 1),

Data Wrangling and the Wrangle and Analyze Data project (from Term 2);

Data Visualization with Python and the Communicate Data Findings project (new, replaces the R and Tableau project);

Most of the content is presented in video snippets and include built-in exercises. At the end of each module, you are tasked with completing a project that touches upon the concepts that were just learnt. To graduate, you must successfully complete all project code reviews.

There is no real grading, but you are required to submit each project as many times as needed until you meet all specifications. Rubrics are provided, so the criteria are more or less clear. My only complaint was that sometimes… the reviewers were quite pedantic. It was frustrating having to resubmit a project when the change was very minor (and fixed in two minutes). However, the feedback even when projects met specifications were invaluable, and probably the most worthwhile aspect of the course.

The Details

Since we technically only need to complete Term 2 to graduate, I’ll focus on the Term 2 projects. Term 1 projects were pretty basic, and — while worthwhile especially for beginners — are not portfolio-ready as is:

Test a Perceptual Phenomenon: In this project, we use statistics to investigate the Stroop effect. This was considered the Intro project, and there are a bunch of questions that you need to answer;

In this project, we use statistics to investigate the Stroop effect. This was considered the Intro project, and there are a bunch of questions that you need to answer; Exploratory Data Analysis using R: Using either a Udacity-provided dataset, or your own, you are asked to perform exploratory data analysis while documenting all your thoughts and decisions, while creating “quick and dirty” visualizations. In the final section, you are asked to highlight your main findings, and most important aspects of the data. You are expected to complete a uni-variate, bi-variate, and multi-variate analysis on different variables;

Using either a Udacity-provided dataset, or your own, you are asked to perform exploratory data analysis while documenting all your thoughts and decisions, while creating “quick and dirty” visualizations. In the final section, you are asked to highlight your main findings, and most important aspects of the data. You are expected to complete a uni-variate, bi-variate, and multi-variate analysis on different variables; Wrangle and Analyze Data: The point of this project is for you to gather, assess and clean data related to tweets from the @WeRateDogs Twitter account. You are also asked to store, analyze and visualize your wrangled data, and then report on your data wrangling efforts and your data analysis and visualizations;

The point of this project is for you to gather, assess and clean data related to tweets from the @WeRateDogs Twitter account. You are also asked to store, analyze and visualize your wrangled data, and then report on your data wrangling efforts and your data analysis and visualizations; Using Data to Tell a Story: In this project, using either a Udacity supplied data-set or your own, you are asked to create data visualizations using Tableau. The point is to explore data visually — to find and then tell an interesting story;

Test a Perceptual Phenomenon

This is a simple project to get you used to the environment, and make sure that you have the necessary understanding of Statistics and Probability to get through Term 2. Specifically, you need to answer questions using descriptive statistics and hypothesis testing. You don’t need any specific tools to complete this project. I completed this project relatively quickly, but realized that even though I had just completed the Practical Statistics module not long before, there were a lot of gaps and that it was in my best interest to delve deeper using other resources.

Exploratory Data Analysis Using R

I chose to use the Prosper Loan Dataset provided by Udacity. There were a lot more variables than I was used to, so exploring the data was slow, and a lot ended up being omitted from my final analysis. I was initially very resistant to learning R, so had a hard time getting through this module. As a beginner, I felt like I had finally grasped Python and how to program in a procedural language (aka not declarative/SQL). Being asked to learn another language with not-so easy syntax so soon made me feel very resentful. However, you cannot deny how beautiful ggplot2 graphics are, and how easy time series and binning are with R when compared to Python. I am thankful that I was forced to do this. Even if I made a big stink about it.

Wrangle and Analyze Data

This was an interesting project and by far my favorite. It was also probably the most challenging project.

We were asked to gather data in three ways. Due to changes implemented by Twitter, accessing their API is more difficult and therefore this process might now be different. At the time, you were asked to download a list of tweets in csv format. The csv contained a list of tweets with basic information provided by the @WeRateDogs Twitter account. We then were asked to use the Python Requests library to retrieve a tsv file stored on one of Udacity’s servers. The tsv contained image predictions for each dog breed from the first dataset. Finally, we were asked to query the Twitter API to download each tweet’s JSON data and extract the favorite and retweet counts for each tweet.

Not only did we have to assess the data programmatically (using Python), We were required to identify eight quality and two tidiness issues. We then needed to clean, combine and store the data. Some examples of issues I encountered were improperly extracted dog names and dog ratings, and missing data. This was the tip of the iceberg — there were plenty more than 10 issues.

After analyzing and visualizing the data, we needed to create two reports — one documenting the wrangling process, and the other documenting insights with explanatory visualizations.

Just gathering the data was a lot (Twitter has a rate limit of 15 minutes that you need to work around), never mind the actual cleaning; but, the reality is that most data is messy, and data wrangling is probably how most data analysts and data scientists spend their day anyways. This project specifically is one of those you get out of it what you put into it things, and I was extremely proud of my final outcome.

Using Data to Tell a Story

I chose to do my project on Modern Human Trafficking Victims, and did my best to include non-standard visualizations in my story — visualizations beyond what was offered in the “Show Me” section. The task was pretty straightforward, but you could go as shallow or as deep as you wanted. My favorite thing about this project was that I was finally able to unleash my creativity, and again, considering this was my first exposure to Tableau, I was extremely proud of the end-result.

Level of Difficulty

Term 1 was brutal. Term 2 was pretty easy. Your mileage may vary.

Term 1

I was new programming. So not only did I have to learn Python syntax, I had to learn how to program. I thought I had a solid grasp as to what it meant to conceptualize something… but like Jon Snow, turns out I knew nothing. I went through every single video and exercise and did not move on until I understood what was being taught. After going through the materials, I found that I still needed additional practice. Since I do well with gamification, I complemented the material with free exercises from DataCamp and DataQuest, CodingBat, and CodeWars.

In between Term 1 and Term 2, I also took a break from Udacity and worked through Automate the Boring Stuff with Python and Learn Python the Hard Way. Later on, I also supplemented the Practical Statistics section with Khan Academy. At this point, I found that the videos were more breadth over depth. While frustrating at the time, I think that Udacity’s objective isn’t really to go too deep, but to provide students with the base level of knowledge required to then be able to comfortable figuring things out and trying things for themselves.

Term 2

Because I had done all this additional work after Term 1, when Term 2 came around, I found it easier going straight to the projects and only referring to the videos and exercises when required. At that point, none of the data wrangling concepts were new and I was pretty comfortable with building my own web scrapers and querying APIs. Besides working through my resistance in learning R, I found the term much easier and needed less time. My strategy was to skim through the recaps to get the basic gist of a lesson, but found that the true learning came when I was figuring things out for myself using different resources, like official documentation, tutorials and Stack Overflow.