TrumpWorld: Making a Knowledge Graph

Putting the TrumpWorld data into GRAKN.AI

This blog post refers to the Grakn Visualiser, which since the initial writing, has been deprecated in favour of the new Workbase.

This post is a write up of work by my esteemed colleague here at Grakn Labs, Michelangelo Bucci, who uses GRAKN.AI to analyse a dataset to find new connections in the data. He was inspired by a recent post on Buzzfeed News, which published the most exhaustive list to date of U. S. President Donald Trump’s business interests. In the article, the reporters ask the public to use their data to find connections and give context on the Trump family’s business ventures, to give potential insight on how this unprecedented intermingling of a president’s business empire with the executive branch of government may affect public policy.

At the time of writing, the dataset contains:

303 people, including notables such as former President of France, Nicolas Sarkozy, billionaire-philanthropist George Soros, and tech titan Peter Thiel.

1611 organisations.

2347 connections (i.e. relations among either people, organisations or a combination thereof).

Michelangelo Bucci has made a video to illustrate how to work with the data using GRAKN.AI. This article and the video work together as a pair to describe how you, too, can load up a graph and find your own Trumpian connections. We explain how to get set up and analyse the data, using the Grakn visualiser to walk through some queries. If something isn’t clear from the article, the video will show it from a different perspective, and vice versa.

What is GRAKN.AI?

GRAKN.AI is the database for AI. It is a deductive database in the form of a knowledge graph that uses machine reasoning to simplify data processing challenges for AI applications.

Install GRAKN.AI

GRAKN.AI is supported on Mac OS X or Linux. The first thing you need to do is download and install it. In this article, we are working with GRAKN.AI 0.11.

Prerequisites: GRAKN.AI requires Java 8 (Standard Edition) with the $JAVA_HOME set accordingly. If you don’t already have this installed, you can find it here.

Unzip the download into your preferred location and run the following in the terminal to start the Grakn engine:

cd [your Grakn install directory]

./bin/grakn.sh start

This will start:

an instance of Cassandra, which serves as the supported backend for Grakn.

Grakn engine, which is an HTTP server providing batch loading, monitoring and the browser dashboard.

Apache Kafka.

Apache Zookeeper.

Grakn engine is configured by default to use port 4567, but this can be changed, as can settings for Kafka and Zookeeper, in the grakn-engine.properties file, found within the /conf directory of the installation.

Useful commands

To start Grakn, run grakn.sh start .

To stop Grakn, run grakn.sh stop .

To remove all graphs from Grakn, run grakn.sh clean

If you are having trouble with the setup or have any questions, please ask them on our discussion forum, on Stack Overflow or on our Slack channel.

Load the TrumpWorld Data

Having started Grakn, the next step is to load in the data. We have a fork of the Buzzfeed News dataset, in which we have added the Grakn ontology and two files containing the entities (people and organisations) and relations (relationships between the entities).

First, we will load the ontology. In the terminal, from the Trumpworld directory:

<relative-path-to-grakn>/bin/graql.sh -f ./GRAKN/ontology.gql

Then the entities:

<relative-path-to-grakn>/bin/graql.sh -f ./GRAKN/entities.gql Outputs:

...

$x id "41537696" isa person; $y id "41525360" isa person;

$x id "41230368" isa person; $y id "41562208" isa person;

$x id "2281560" isa organisation; $y id "2551896" isa organisation;

...

$x id "44724256" isa organisation; $y id "1716312" isa organisation;

And the relations (we use the -b flag to batch load these):

<relative-path-to-grakn>/bin/graql.sh -b ./GRAKN/relations.gql Outputs:

...

$x id "47108208" isa organisation; $y id "397440" isa person;

$x id "2748424" isa organisation; $y id "40980512" isa person;

$x id "2748424" isa organisation; $y id "405512" isa person;

$x id "43343984" isa organisation; $y id "41238688" isa person;

...

The last step may take a few minutes, but once it is complete, and the terminal prompt returns, the graph is fully loaded and ready to use. Time to fire up the Grakn visualiser, by visiting http://localhost:4567/ in your web browser.