Update: Shameless self-promotion, but the final results of this exercise have been published in the International Journal of Speleology (open access) here – supplementary material includes updated scripts to run the analysis.

I’m planning to do some meta analysis of co-authorship in my field, which is really cool, but there is one small problem: I have no experience in co-authorship analysis, or any kind of complex network analysis! Luckily, I’ve got some pretty excellent Google-fu, and I found a heap of blogs and other sources (posted at the bottom) to work with.

I plan to pull bibliographic data from Google Scholar or Web of Science, so this code uses bibtex files as the data source. First I made a dummy .bib file of ‘co-authors’ from the seven Harry Potter books. HP uber fans, please forgive any overt errors. The files were based on the first few characters from each book that I could think of, and now that I look at my final network I’m tempted to go back and do it properly. I named this file HP.bib and put it in my directory.

@article{One, author = {Harry and Ron and Hermione and Malfoy and Hagrid}, title = {{Get that Stone!}}, journal = {The World of JKR}, year = {1996} } @article{Two, author = {Harry and Ron and Hermione and Fred and George and Neville and Myrtle}, title = {{Big Snakes are Scary!}}, journal = {The World of JKR}, year = {1998} } @article{Three, author = {Harry and Ron and Hermione and Sirius and Lupin and Malfoy and Wormtail}, title = {{Big Dog! I'm Going to Diel}}, journal = {The World of JKR}, year = {1999} } @article{Four, author = {Harry and Ron and Hermione and and and Malfoy and Wormtail and Tonks and Voldemort and Cedric and Cho and Dumbledore}, title = {{Kill the Spare}}, journal = {The World of JKR}, year = {2000} } @article{Five, author = {Harry and Ron and Hermione and Sirius and Lupin and Molly and Dumbledore and Bellatrix and Voldemort and Lucius}, title = {{It's a Trap}}, journal = {The World of JKR}, year = {1999} } @article{Six, author = {Harry and Ron and Hermione and Dumbledore and Snape and Slughorn and Voldemort}, title = {{Harry gets an A}}, journal = {The World of JKR}, year = {2001} } @article{Seven, author = {Harry and Ron and Hermione and Fred and George and Hagrid and Voldemort and Lucius and Malfoy and Lupin and Tonks}, title = {{Harry Dies}}, journal = {The World of JKR}, year = {2003} }

The script for R is below. I tried to annotate the script as I went, but my Google powers exceed my #rstats skills, so there are cases when I’m not exactly sure why something works, just that it does work.

setwd("C:/Users/me") #you may need to add the igraph, bibtex, and ggplot2 packages to R library(bibtex) library(igraph) #read the .bib file in citations <- read.bib("C:/Users/me/HP.bib") #gives list of all citations and authors authors <- lapply(citations, function(x) x$author) #gives list of unique authors unique.authors <- unique((unlist(authors))[grepl('family', names(unlist(authors)))]) #take a look! unique.authors #sets up the co-author adjacency matrix coauth.table <- matrix(nrow=length(unique.authors), ncol = length(unique.authors), dimnames = list(unique.authors, unique.authors), 0) #for loop that fills the adjacency matrix. Adds 1 to the matrix each time co-authors work together for(i in 1:length(citations)){ paper.auth 0] diag(coauth.table) <- 0 #inspect table coauth.table #remove ('and') from coauthor table. I used 'and' to separate the authors in the .bib file, and while 'and' wasn't listed in the author list, it was populated in the matrix table. Don't know why, but this fixed it. coauth.table <- coauth.table[-13,-13] #somehow this fixed the problem of double links when voldemort and tonks, and tonks and voldemort co-authored auth.graph <- graph.adjacency(coauth.table, mode='undirected', weighted=TRUE) plot(auth.graph, vertex.label.cex=0.8, edge.width = E(auth.graph)$weight) #gets edge list from the adjacency graph. edge <- get.edgelist(auth.graph) #re-loading edgelist igraph <- graph.data.frame(edge, directed=FALSE) #Tcl/Tk Network Graph tkplot(igraph, layout=layout.fruchterman.reingold) #to make the co-authorship network less quantitative and a bit more qualitative, you can get numbers out of the CNA! #Calculating centrality measures (dgree, betweeness, closeness, eigenvector centrality, coreness) metrics <- data.frame( deg=degree(igraph), bet=betweenness(igraph), clo=closeness(igraph), eig=evcent(igraph)$vector, cor=graph.coreness(igraph) ) #print metrics metrics #visualising metrics. I like ggplot, but any old plotting in R is fine. It just won't look so nice! library(ggplot2) ggplot( metrics, aes(x=bet, y=eig, label=rownames(metrics)) #, #colour=res, size=abs(res)) )+ xlab("Betweeness Centrality")+ ylab("Eigenvector Centrality")+ geom_text()+ ggtitle("Key Players in Harry Potter";)

And this is the network that you get. It's definitely not the prettiest, but I couldn't be bothered screwing around with aesthetics for a dummy run, I'm just pleased that it works!

And now, to all the people who’s code I unashamedly stole: thanks for hosting your scripts online! This is also why I blog my forays into coding, even though I’m not great – I’ve learned so much from other people’s code, it seems only fair to put it back out there.

References