Granted, at first sight it appears interesting and useful, but as I sat down to explain what I had created, I wasn’t quite sure why anyone would want to know how Donald Trump is connected to Taylor Swift. Apparently she is supporting him. Don’t get me started.

But I digress. Since Webhose.io provides other types of entities, you can easily customize the script to visualize relationships between companies or locations. If you’d like to learn more about how the script works (which means you have some coding skills), keep reading. If not, you are more than welcome to play with the graph, and maybe you will find it useful (doubt it).

Try it for yourself

If you want to run your own experiments, just follow these steps:





Edit & run extract_entities.py

I’ve uploaded to GitHub the Python script that produces the JSON for both the persons connected list,

and their respective images. To run the script you need two access tokens, one for the Webhose.io API that you can obtain by creating

a free trial account. The second is also free for Bing Image Search API.





Set your Webhose.io access token on the following line:

webhose.config(token=”XXXX-XXXXX-XXX-XXXX-XXXX”)

and your Bing Image Search API key on the following line of code:

‘Ocp-Apim-Subscription-Key’: ‘XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX’,

The first entity the script extract is “Hillary Clinton”, but you can change it.

I’ve set a hard limit of 100 entities to explore, but you can of course increase or decrease this limit as you wish by changing the following code:

if len(output) == 100:

The script runs multiple requests against Webhose.io API for documents from the past 30 days. I’m using the &ts (timestamp) parameter to tell Webhose.io to return results from 30 days ago to the present. Each request returns up to 100 posts, and each post contains the mentioned entities in the article. Here is the query I’ve used:

persons:”top_person” domain_rank:<10000

Where top_person is to be replaced with the person you are looking for. The domain_rank filter tells Webhose.io to look only in sites that are ranked in the top 10,000 world wide. By the way, if you want to extract other types of entities just replace “persons” with either “organization” or “location” and count the relevant entity.

Read Webhose.io tutorial and documentation to learn more about how to use the API.





I’ve used Bing search API, to retrieve the faces of the mentioned persons. Note that if you want images other than faces, you need to remove the image type from this line:

params = urllib.urlencode({“q”:'”‘ + search_string + ‘”‘, “count”:10,”offset”:0,”mkt”:”en-us”, “size”:”small”, “imageType”:”Photo”,“imageContent”:”Face”})

Now all you have to do is to run the script:

# python extract_entities.py

And wait. When the script is done it will print two JSON strings, the first is for the list of names and their respective connections, and the second one is a list of names and the associated image.

The HTML

I’ve uploaded the HTML file to GitHub as well, and I’m relying on VivaGraphJS for the graphical interface, so make sure you download it and set the correct path:

Paste the persons JSON output from the Python script on:

var persons = {}

And the images JSON here:

var images = {}

That’s it – you are all set. You can play with the script, extract and plot people relationships, or change the script and extract relationships between companies or locations. If you do find a good use for this script, I would love to hear about it.





Happy scripting!

Ran Geva, CEO webhose.io









