Recently, Google Scholar updated their metrics. There was a bit of surprise in the computational linguistics community, as there is every year, where ideas of a venue’s grandeur align imperfectly with the citation data. Let’s have a look at what happened in this field. Here’s the ranking from Google:

Google Scholar metrics for CL — 2016 edition. Click image for source.

Without going into too deep a discussion of the evils of bibliometrics — that impact factor is broken, that citations are not all equal, that h-indices can be gamed (like in the successful experiment faking personal and journal scores on a large scale)— let’s take this at face value and see where it goes.

First off, we can see that arXiv has rocketed to number three. Well, that’s a surprise for a non-peer-reviewed venue — or is it? In fact, arXiv’s relevant category has a lot of papers that are published in other venues or workshops; it contains many draft papers; it contains a lot of work published in other fields, where the authors have just ticked the “cs.CL” (computational linguistics) category box to market their paper to us. Good for them. But in short, arXiv contains a lot of papers — and it’s probably this sheer volume that means it also has a lot of well-cited papers.

That fact — that arXiv contains versions of a lot of well-cited papers — is what drives it to the top of the h5 listing. The h5-index is like the normal h-index, which represents “the smallest number of papers, that have at least this many citations”; h5 is an h-index measured over just the past five years. So, a venue that gets a lot of papers, as long as the sample isn’t drawn very carefully to be bad papers, will end up having a good h-index. Another way you can think of it is the biggest square that’ll fit in a graph of most-cited-papers.

Here comes the square:

Image from Wikimedia Foundation

Found it? Good. That’s enough of that.

Another surprise for some was to see the sun & beach conference, LREC, up as high as fifth. This conference has a relatively high acceptance rate, and publishes all accepted submissions as full-length papers in its proceedings. It’s probably the largest conference in our field, attracting about 1500 delegates once every other year. It’s also a perfect subject fit for many papers, and as such attracts a lot of quality work. It doesn’t have the prestige and age one might expect from other venues like the CL journal, but we regularly see great, hard-hitting papers at LREC; recently, take for example the universal part-of-speech tagset, Freeling 3.0, or the DBpedia paper. But, it is a huge venue, so could benefit from the same factors as arXiv.

To “correct” for these unexpected rankings, the suggestion to normalise h5 scores for paper count quickly emerged on Twitter. The idea is to correct for any distortions that come from a place publishing a lot of papers. This normalisation in fact just gives us a derivative of impact scores — which are a rough, violent method for measuring quality (citations tell us nothing, etc.) and something I personally find a rather vomitous metric, unless of course an article of my own lands in a high-impact journal, in which case they can’t be so bad, right? Chasing impact factors leads to nasty behaviours that take us away from useful scientific content; behaviours like demanding one cite a journal when submitting to it, or putting up papers with little intrinsic content that really push citation stats.

Indeed, many citations don’t directly acknowledge or use the content of the paper they reference. I know this is the case for my TempEval-3 paper. It’s just as common to give a hands-wavy grounding reference to justify the existence of a problem area, to anchor readers to the literature on a topic, or to satisfy the herd-of-sheep-like motivation of “many others have looked at this problem, so we are going to as well”. Not a strong motivation — quite the opposite — but I digress! Many of the citations to highly-cited papers are like this; and you might find that your favourite own work that people actually engage with is cited much less; and the citation from work that actually does really engage with your research still only counts as just one citation. Personally, I found the papers that were most scientifically satisfying to work on — validating Reichenbach’s model of tense, and our generalisation of Brown clustering — receive less citation traction than my other projects.

So we could even say that focusing on citations really just encourages us all to write pop lit; write things that are general purpose, get them some visibility at the right time, and cross your fingers. That’s another one of those nasty behaviours impact factor encourages. Luckily, most good scientists would rather do good science and disseminate that — and most bad scientists haven’t noticed the hack, or haven’t managed to pull it off, at least.

Anyway, rant aside, the normalised table is below. The short name for the venue/conference is on the left.