Some time ago, I tried to scrape every Bay Area profile off LinkedIn until the site blocked my entire office network (Lesson learned: Use a proxy). This was Bad because we were (and still are!) hiring.

The goal was to collect enough data to create a set of classifiers that could estimate a person’s salary from their LinkedIn profile.

LinkedIn profiles were decomposed using Latent Semantic Indexing and mapped to salary estimates based on users’ current job titles. I scraped all the Bay Area salary information from GlassDoor.

Now when we encounter a new profile, we can perform a similarity query, find the nearest matching profiles, and return their salaries.

Previously this was all done using python libraries which made it too slow for public consumption. I finally got around to rewriting it all using Google’s TensorFlow libraries. The only remaining speed bump is the roundabout way I pull a user’s LinkedIn profile.

Here it is, go play with it.

I’ll write more about TensorFlow some other day, but for now I need to spend less time on this and more time on stuff that won’t get me fired.

Many thanks to Aronima, TingTing, and Wenjie. GlassBowl would not have happened without them.

Like this: Like Loading...