If you are interested in this data set you might like my latest post where I use it to make book recommendations.

This one came about because I was searching for a data set on horror films (don’t ask) and ended up with one describing the links between philosophers.

To cut a long story very short I’ve extracted the information in the influenced by section for every philosopher on Wikipedia and used it to construct a network which I’ve then visualised using gephi

It’s an easy process to repeat. It could be done for any area within Wikipedia where the information forms a network. I chose philosophy because firstly the influences section is very well maintained and secondly I know a little bit about it. At the bottom of this post I’ve described how I got there.

First I’ll show why I think it’s worked as a visualisation. Here’s the whole graph.

Each philosopher is a node in the network and the lines between them (or edges in the terminology of graph theory) represents lines of influence. The node and text are sized according to the number of connections (both in and out). The algorithm that visualises the graph also tends to put the better connected nodes in the centre of the diagram so we see the most influential philosophers, in large text, clustered in the centre. It all seems about right with the major figures in the western philosophical tradition taking the centre stage. (I need to also add the direction of influence with a arrow head – something I’ve not got round to yet.) A shortcoming however is that this evaluation only takes into account direct lines of influence. Indirect influence via another person in the network does not enter into it. This probably explains why Descartes is smaller than you’d think. It would also be better if the nodes were sized only by the number of outward connections although I think overall the differences would be slight. I’ll get round to that.

It gets more interesting when we use Gephi to identify communities (or modules) within the network. Roughly speaking it identifies groups of nodes which are more connected with each other than with nodes in other groups. Philosophy has many traditions and schools so a good test would be whether the algorithm picks them out.

It has been fairly successful. Below we can see the so called continental tradition picked out in green, stemming from Hegel and Nietzsche, leading into Heidegger and Sartre and ending in the isms of the twentieth century. It’s interesting that there is separate subgroup, in purple, influenced mainly by Schopenhauer (out of shot) and Freud.

And this close up is of the analytical tradition emerging out of Frege, Russell and Wittgenstein. At the top and to the left you can see the British empirical school and the American pragmatists.

It would be interesting to play with the number of groups picked out by the algorithm. It would hopefully identify sub groups within these overarching traditions.

The graph is probably most insightful when you zoom in close. Gephi produces a vector graphic output so if you’re interested you can download it here and explore it yourself.

Now for how you do it.

The first stop is dbpedia. This is a fantastic resource which stores structured information extracted from wikepdia in a database that accessible through the web. Among other things it stores all of the information you see in an infobox on a Wikipedia page. For example I was after the influenced and influenced by fields that you find on the infobox on the page for Plato.

The next step is to extract this information. For this we need two things: a SPARQL endpoint (try snorql), which is an online interface to submit our queries and little knowledge of SPARQL a specialist language for querying the semantic web. This is a big (and exciting) area that has to do with querying information that is structured in triples (subject-relationship-object). I assume it has its roots in predicate logic so the analytical philosophers would have been pleased. However the downside is that the language itself a lot more difficult to learn than say SQL and to complicate things still further you need to know the ontological structure of the resource you are querying. I probably wouldn’t have got anywhere at all were it not for a great blog post by Bob DuCharme which is a simple guide for getting the information out of wikipedia.

In the end the query I needed was very simple. You can test it by submitting it in snorql.

SELECT * WHERE { ?p a <http://dbpedia.org/ontology/Philosopher> . ?p <http://dbpedia.org/ontology/influenced> ?influenced. }

It then needed a bit of cleaning as the punctuation was coded for URLS. For this I used the following online URL decoder.

After a bit more simple manipulation in excel I had a finished csv file that consisted of three columns



Philosopher A

Philosopher B

Weight



Each row in the data set represented a line of influence from philosopher A to philosopher B. The weight column contained a dummy entry of 1 because in our graph we do not want any one link to matter more than any other.

Gephi is the tool I used to create the visualisation of the graph. It’s both fantastic and open source. You can download it and set it up in minutes. For a quick tutorial see this link. There are many settings you can use to change the way your graph looks. I used a combination of the force atlas and the fruchterman-reingold layout algorithms. I then scaled text and node size by node degree (number of connections) and suppressed all nodes with less than four connections (it was overwhelming otherwise). The partition tool is used to create the communities. Full instructions are in the tutorial. I also found this blog entry very useful as a guide.

I hope that helps anyone who is trying to do something similar. If anyone does has a data set on horror films tagged with keywords please let me know!

Simon

Update Griff at Griff’s graphs has used the instructions above to create a fantastic visualization of the influence network of everyone on Wikipedia. It’s well worth a look.



Graphing the history of philosophy by Simon Raper is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License.

Based on a work at drunksandlampposts.files.wordpress.com.

Permissions beyond the scope of this license may be available at http://drunks-and-lampposts.com/.