The h-index is a way to assess the impact of the published work of a scientist, in terms of citations. It is an attempt to get around more simple-minded citation measures and works rather well in the sense that the scientists whom you’d expect to have high h-indices, usually do; while scientists who happen to have published one or two high-impact papers, but have had otherwise unremarkable careers, won’t score very high.

Basically, the definition of the h-index is: it is the maximum number h such that the scientist has published h papers, each with at least h citations. That is, if I have published 10 papers that have each been cited 10 or more times, but have not published 11 papers which have each been cited 11 or more times, my h-index is 10.

Based on a facebook wisecrack by a friend (on the occasion of Sachin falling short of his 100th international 100), the thought occurred to me: how about ranking batsmen in cricket by an analogous score? That is, a batsman has an index h if, on h occasions, he has scored h or more runs. (As before, we take the maximum possible h.)

It turns out that the top five in this list (for Test cricket only) are basically the top five rungetters, in the same order: Sachin Tendulkar, Rahul Dravid, Ricky Ponting, Jacques Kallis, Brian Lara. The top 20 or so almost all appear in the top 20 list for rungetters. So it’s not very interesting — yet. Tendulkar’s h-score is 76 — that is, on 76 occasions he has scored 76 or more. There is a big gap between him and Dravid (69) but the others follow closely behind.

Suppose we modify it as follows: the nh index is that value of h, for a given n, such that on h occasions the batsman has scored nh or more runs. For examples, the 10h index would be: if on 5 occasions I have scored 50 runs or more (and I have not scored 60 runs or more on 6 occasions) I have a 10h index of 5. For n > 1, basically, I am giving more importance to higher-scoring innings, and also benefiting those who played fewer matches (most older players played far fewer games than Tendulkar and can’t remotely approach either his career aggregate, or his h-score).

What is the 10h ranking of batsmen, then? It turns out to be substantially different. The top 6 batsmen are now DG Bradman, Lara, Tendulkar, V Sehwag, KS Sangakkara and DPMD Jayawardene. Bradman scored 180 or more on 18 occasions; Lara’s 10h score is 17, Tendulkar’s is 16, and the other three get 15 each. Also in the top 10 (well, tied for 10) is Gary Sobers, who ranks quite far below both in career aggregate and in h-index. Immediately after him is Wally Hammond, who drops off today’s lists in the aggregrate as well as the above h-index.

Specifically, the top-20 list goes like this:

h Batsman Score (at least) 18 DG Bradman 185 17 BC Lara 178 16 SR Tendulkar 160 15 V Sehwag 164 15 KC Sangakkara 152 15 DPMD Jayawardene 150 14 SR Waugh 150 14 RT Ponting 150 14 R Dravid 148 14 JH Kallis 148 14 GS Sobers 145 13 WR Hammond 140 13 SM Gavaskar 147 13 ML Hayden 131 13 Javed Miandad 145 13 IVA Richards 135 13 GC Smith 133 13 G Kirsten 133 12 Zaheer Abbas 126 12 Younis Khan 126

As you may expect, a 5h ranking sort of interpolates these: Tendulkar now tops again, with Bradman and Lara tied next (and close behind). Sobers and Hammond continue to rank high.

While it is always difficult to rank batsmen from different eras, it seems to me that any list of all-time-great batsmen must put Bradman at or near the top, and must include Sobers, Hammond, Sunil Gavaskar, Vivian Richards, Zaheer Abbas, Javed Miandad and other past greats in the top 20. The nh-index seems to do this, for suitable choices of n. But what is the optimal choice?

(This is based on raw batting data downloaded from cricinfo.)

UPDATE 07 Dec 2011: Gangan Prathap points, in the comments, to this 2010 paper, by him, where he proposes a “mock h index” (different from my nh-index above); his scheme ranks Bradman above four top Indian batsmen, but does not consider other international greats.