It maybe the suppressed psychologist in me trying to get out (which sounds like a great case study for any shrink…), but I’ve always found the “showing off”-ness of Instagram intriguing.

Off the bat, this is something I will admit to being guilty of just last week whilst skiing…doesn’t my life look perfect? [sic]

In reality in the 6 possible days on the slopes we only managed to get about 3 decent sessions in due to the awful weather and temperatures, but that side of ones life doesn’t always make it to social media does it? (well unless I look somewhat rugged for the first time in my life)

As most people do nowadays, any free moment we have, we are all on our phones checking our Facebook for the latest baby picture, Twitter for the most recent gaff by Trump, or Instagram for another dose of filtered to the extreme life of perfection of those we follow. It was during one of these moments not long ago I found myself scrolling through the #luxury hashtag where I was met with a vast array of high end cars, watches, yachts, private jets and homes that I would never be able to afford in a million years.

Along with each of those posts always comes a vast array of hashtags which I think is safe to say cover all possible aspects of the image and also many more fictitious ones. This got me thinking — what combination of tags is the most effective to get those precious likes from the #luxury following community, and more interestingly, how can I categorise the contents of these images at scale to also find the best possible image subject to reach the high echelons of Instagram Royalty (or at least more than a dozen likes)?

Time to write some code

My background is in code — been doing it since I was about 6 when it was all in Basic on a Windows 3.1 machine. My style is very much of the sort where you hack around with stuff until it works, so please try and resist flaming me for doing this in Node and for any lack of style and convention. I just want to build stuff!

The first port of call was to connect up to Instagram and start pulling out all posts with with our target hashtag attached. Thanks to the excellent instagram-node-lib module this was easy. Just stick in your API keys, and run something like this and you are done:

Great, now I needed to put it somewhere, for ease of implementation, I just dumped it out into a MongoDB collection (yay JSON everywhere).

Wrap this up to run at a set interval of every 30 seconds and very quickly we have a dataset to be working with.

From here some simple analysis could be done just looking at the text descriptions, tags and geo data, but that is a bit boring — lets focus on the images themselves.

Image Analysis with Google Vision API

Google have just released the beta of their Vision API which is part of the wider Google Cloud Platform. This service is incredible. Give it an image and it will return details about it. You can get face detection (along with emotion detection), logo detection, landmark detection (if there is an image of the Empire State Building then it returns New York), OCR run on any text, and a few other options. The most interesting one though is the Label Annotation API. This will take an image, and return a set of labels of things it has found in an image.

As an example, here is an image of a car:

When run through the Label Annotation API, it returns this JSON which contains not only the labels, but also a confidence score. What is the most insane here is that it managed to detect the style, make and model (I think that is the correct model anyway?) of the car. Mind = Blown.

After writing only a few dozen lines of code (thanks to the node-cloud-vision-api module), I had this analysis being run on every Instagram image I was grabbing and the results being written into the data before being persisted into Mongo. Easy!

Also worth noting that this API is really cheap so this system is only costing me pennies.

The Dataset

My code is still running gathering posts and running the analysis on them — currently at around 2000 individual images. I will keep on running this until I have a decent data set to start doing some analysis.

Also, Mongo isn’t quite the right sort of system for the analysis I want to do, I will be evolving the system to push this data into Google BigQuery. It’s support for repeated nested fields is perfect for the multiple tags and labels that come out of the APIs.

Next Steps

Now that I have this dataset locally, it is time to do something useful with it. I have some experience with large scale data analysis and visualisation so I am planning to pull out some insight from the data which I will share in the coming weeks.

If you are interested in being involved feel free to reach out to via the comments, @alexolivier on Twitter, or alex@alexolivier.me on that old fashioned email thing.