Taking Advantage of Tutorials

Floydhub has built some incredible tutorials that make it easier than ever to implement machine learning models for a range of use cases. In June 2018, they released a blog post, Generating Commencement Speeches with Markov Chains. I had seen projects like this before, but usually implementation required a lot of development set up. (That said, you don't need to use Floydhub to use this code, but they do make it easy.)

Tutorials are a great jumping off point for projects that you might have a difficult time tackling from scratch. In fact, most software engineers will tell you that the most valuable skill you can have writing code is knowing what to search for. Depending on the project, it might make sense retrofit code rather than starting from zero.

Collecting Data / Scraping Instagram Captions

First, I'd like to address the approach I've taken in this project. I tackled this from a hacker mentality more so than a software engineer's mentality; the code in this project is not elegant.

To collect data for this project, I decided to match the format of the dataset in the tutorial so that I didn't have to figure out how to adapt the code later on. This isn't a choice that everyone would make, but it's what I did. The way it was formatted was that each text 'item' was represented by an individual text file (.txt). I scraped captions from Instagram by running some really basic Javascript in the console. The console is a developer tool that allows you to write scripts on the page and quickly test if they're working or in this case, use them to pull information off the page.

Using Google Chrome as my browser, I navigated to the page I wanted to pull captions from and opened up the Developer Tools. Then, I opened up the Network Tab and began to scroll, loading new data on the page (Instagram used continuous scroll.)