TPI World Cup Predictor: Shocks and surprises

Technology for People Initiative (TPI) is an applied research center at LUMS in Lahore, Pakistan.

By Shahan Shahid | Fahad Sultan

In a matter of hours, cricket fever will engulf a considerable fraction of the world’s population.

While the players are building up to cricket's ultimate tournament by battling it out on the pitch, cricket fans are keeping themselves busy by haggling with their friends, convincing of why this world cup belongs to their team.

And although this exercise gives rise to some entertaining arguments -- the endless parallels with Pakistan's 1992 contingent, for example -- the process relies primarily on one's hunches and conjecture rather than cold hard facts.

This predictive model of the upcoming tournament is an attempt to change that. While the purpose is by no means to provide a definitive answer and say outright that a particular team will win — which, it goes without saying, is an impossible feat — there is still value in finding out what the numbers point to. The robot in New Zealand predicted a triumph for Afghanistan, as unlikely as that may appear, let's see what our predictor points to.

We use a home-grown criterion, more details below, to compute a team's score and compare these team scores to determine the winner of a match. Without further ado, here's who the numbers are rooting for.

Our Model

At its core, our model tries to predict the total score for a team. To achieve this, we treat all 15 members of the team as batsmen and predict their scores given their form, performance versus bowling attack and performance against the pitch/conditions.

Batsman form is the Runs per Inning (RPI) achieved by the batsman in his last 15 ODI games. AB de Villiers tops this ranking, coming in at an otherworldly RPI of nearly 60. For players with less than 15 batting innings, such as Adam Milne, we use their career batting average.

To ensure that bowler versus batsman contests are captured with accuracy — the Steyn-Hafeez contest was taken as a model for this — we use ball-by-ball data to find each player's performance against the bowlers he has faced. Bowlers are categorized by their type and then by their bowling average. If a batsman struggles against a good right arm fast bowler, his data should clearly show that.

Finally, given a player’s performance against bowling types, we find out which players are likely to bowl in the match, compare their metrics to the batsman’s data and determine how the batsman will perform.

Conditions down-under are being marveled as a crucial feature of the tournament and we took this into account too. For each innings played by a batsman, we found out how difficult the pitch was for batting.

We averaged all of a player's team's innings at the venue – an example being: "Mohammad Nabi scored 15 at Sharjah where Afghani batsman average 20."

We compare pitches that will be used in the World Cup with pitches that a batsman has played on to predict a score.

We average the scores from these three methods to get a batsman's score. Team scores are compared to determine the winner.

A detailed description of the TPI model can be found here

Shahan and Fahad work at the Technology for People Initiative (TPI), an applied research center at the Lahore University of Management Sciences (LUMS) which works with data to design innovative, practical technology solutions for problems in the public sector.