A few weeks ago, Moz CEO Rand Fishkin approached MonkeyLearn team with a question which later turned into a project.

The goal was to build an online tool that provides great value to the SEO industry. Also, we wanted to showcase what can be developed with machine learning technologies by using MonkeyLearn.

The idea was to build a simple web application that:

Takes a website and a search query as an input. Shows you the website’s keywords (using MonkeyLearn’s keyword extractor) Searches the term on Google, displaying the keywords that appeared in the top 10 results.

Basically, SEOs can use this tool to compare their website’s keywords to those on the Google search results for a related term.

Randy presented this Keyword Comparison Extractor on his keynote at Mozcon 2015, the largest SEO conference out there, with more than 1,500 attendees and speakers from companies like Google, Buffer, Optimizely, Unbounce, Basecamp and others.

You can check the Keyword Comparison Extractor tool here: http://seo.demos.monkeylearn.com

How it works

The way it works is simple: we download the website’s HTML, then we search the inputted term in Google and download the HTML of the first ten results. After that, we send the HTML in a batch to a MonkeyLearn pipeline that first extracts the relevant content from the HTML and then extracts the relevant keywords from that content.

Within the UI, we show two tables: website keywords sorted by their relevance, and search keywords with their relevance and fraction of the search results in which they appeared. The search keywords are sorted by a score parameter which is essentially just the weighted product of those two values.

Sample results comparing keywords for https://moz.com/products/pro and SEO search query.

You can view the full code for the application here.

The Pipeline

This was the first time I personally had the chance to play with MonkeyLearn’s pipelines. Basically, a pipeline can be used to send some JSON data through a chain of classifiers and extractors, optionally specifying some logic to fork the flow of data through different modules of MonkeyLearn.

Pipelines are specified through a JSON file, like a simple script. In our case, it was a very simple thing: An array of JSON objects, carrying each a URL and the HTML of the site that is sent to the pipeline endpoint. Then, the relevant content is extracted from each HTML using the HTML to Text extractor. After that, the resulting text content is sent through the keyword extractor. The result is a collection of JSON objects: the URL of the site and the list of keywords, each one with their corresponding relevance score.

After creating the pipeline, we could use it simply by sending JSON data to the API endpoint at https://api.monkeylearn.com/v2/pipelines/<pipeline ID>/run/ sending the authentication token in a header and the JSON data in the POST body:

URL = 'https://api.monkeylearn.com/v2/pipelines/{}/run/' TOKEN = '...' PIPELINE = '...' def extract_keywords(urls): """Extract keywords from a list of URLs.""" json = json.dumps({'html_list': [{'url': url, 'html': download_website_text(url)} for url in urls]}) headers = { 'Content-type': 'application/json', 'Authorization': 'Token {}'.format(TOKEN) } response = requests.post(URL.format(PIPELINE), headers=headers) sites = [] for site in response.json()['result']['sites']: keywords = [Keyword(string=kw['keyword'], relevance=float(kw['relevance'])) for keyword in site['keywords']] sites.append(Site(url=site['url'], keywords=keywords)) return keywords 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 URL = 'https://api.monkeylearn.com/v2/pipelines/{}/run/' TOKEN = '...' PIPELINE = '...' def extract_keywords ( urls ) : """Extract keywords from a list of URLs.""" json = json . dumps ( { 'html_list' : [ { 'url' : url , 'html' : download_website_text ( url ) } for url in urls ] } ) headers = { 'Content-type' : 'application/json' , 'Authorization' : 'Token {}' . format ( TOKEN ) } response = requests . post ( URL . format ( PIPELINE ) , headers = headers ) sites = [ ] for site in response . json ( ) [ 'result' ] [ 'sites' ] : keywords = [ Keyword ( string = kw [ 'keyword' ] , relevance = float ( kw [ 'relevance' ] ) ) for keyword in site [ 'keywords' ] ] sites . append ( Site ( url = site [ 'url' ] , keywords = keywords ) ) return keywords

We wrote a simple web interface to the tool and deployed it here.

Final words

The conference took place and Rand revealed this SEO tool powered by machine learning on the Mozcon keynote. Usage exploded during his presentation and we’re happy to report that the server held up and none of the workers tipped over or bailed on a request.

This tool is now used by SEOs around the world to compare how they perform on certain search queries. It’s still a prototype version, so if you have any suggestions or bug reports, please let us know.

Finally, this tool is a great example of how MonkeyLearn can empower a simple but neat application using machine learning.

Any feedback or suggestions are 100% welcomed!