Super Bowl 50: Expecting the Unexpected Photo by thelittleone417. Licensed under Creative Commons. We never expected the Broncos to win. The Panthers were practically universal favorites. In fact, out of 74 ESPN football contributors, only 19 expected the Broncos to win. Our own office discussion ran heavily in favor of Carolina going home with the Super Bowl trophy. WSO2 Machine Learner seemed to agree: 57.4% Panthers. Sold. Other companies using predictive analytics also sided with the Panthers. Microsoft, which brought Cortana/Bing into play, projected a 64% chance for the Panthers to win. Electronic Arts, which has been predicting games for a while by pitching teams against each other in its Madden NFL videogame, also believed the Panthers would nail the game. Instead, what we got was an incredible win from the Broncos. Denver opened up with a 10-0 lead that the Panthers never recovered from. Peyton Manning was stellar. The Broncos made more plays and ran a better defense than the Panthers did. Result: a 24-10 Super Bowl win for the Broncos. Well played, guys! So what does that mean for predictive analytics? The type of analysis used to predict the Super Bowl is known as probabilistic prediction. And it’s just that, a probability. A 57.4% chance for the Panthers is a 42.6% chance that the Broncos will win. If those two teams ran against each other for 10 games, the chances are extremely high that the Broncos would win four of those games. Our home-grown experiment compared favorably with other probabilistic predictions. Notably, Neil Paine of FiveThirtyEight ran a superb, detailed analysis on the Panthers versus the Broncos defense, where he pointed out that this was one of the most even matches he’d seen; FiveThirtyEight called it at 59%-41% in the Panther’s favor. Our 57.4% to 42.6% prediction was very close, even running with only a fraction of the data FiveThirtyEight has. And we came up with closer numbers than Cortana’s 64% Panthers versus 46% Broncos. The Wild Cards Ironically, the weirdest analyses seem to have won the day. RiseSmart, which has something to do with the job market, predicted that the Broncos would win - based on the theory that states with lower unemployment rates had a better chance. No, we’re not entirely sure what to make of that, either. eBay, which used consumers’ purchase data, also indicated a Broncos win. Since it was the fans doing the buying, we’d say this is a fantastic example of a prediction market. It’s likely the fans had information that we didn’t factor in. Maybe Peyton Manning’s presence carried the day. Who knows? Scoring some great lessons Looking back, we’re pretty proud of BigDataGame. After all, it’s not every day you get to beat Microsoft’s poster girl. More importantly, it’s been an incredible learning experience, especially since none of us knew anything about sports prediction when we started. We now have a much better idea of what it takes, including algorithms; with a bit more thought, and a lot of fine-tuning, we might even be looking at a whole host of other sports in the future. If you’d like to know more of the science and tech behind BigDataGame (and if you can help us figure out how we can do this better), check out our webinar on the subject. We’re planning to talk about everything - how we gathered the data, the algorithms behind the prediction, and more. We’d love to know what you think.

Here's to the Panthers! By Yudhanjaya Wijeratne January 24th saw the Panthers smash the Cardinals and the Patriots lose to the Broncos. Prediction is a tricky business. A true Delphic-Oracle-style prediction would be binary in nature. One wins, the other doesn’t - end of story. Most computational models - well, most accurate computational models - don’t work this way. Especially in sports. Big blogs like FiveThirtyEight (and ours) display probabilities, trying to figure out how much of a chance there is that one particular team will win. Because of the nature of probability, there’s no guarantee that a given team will win a given match. After all, a 70% probability means that if you repeat the experiment ten times, that underdog team will win 3 times out of the 10. There’s no telling which of those potential ten matches is the one playing tomorrow. That requires a human eye. Unfortunately, there was no better reminder of this for us than the Jan 24 games. We predicted the Panthers having a 50.83% chance against the Cardinals. That’s technically not much, considering that a coin only has a 50% chance of turning up heads. You never really know. The Panthers crushed the Cardinals that game, 49-15. They took first blood and by the second quarter were leading 17-1. Despite a brief spurt of activity in the second quarter, the Cardinals were bowled over by the end of it. Our Patriots vs Broncos prediction, however, was 56.59% vs 43.41%. That’s a larger gap. And yet the Broncos won by 2 points. Excellent play from Von Miller, a great scramble from Peyton Manning and some remarkably puzzling decisions from the Patriots saw the Broncos through to the finals. Based on this data, we ran the numbers again. Since it’s now the Panthers versus the Broncos, we lay out a 57.40% probability for the Panthers to win. Lay your bets, everyone. And someone please get the popcorn.

Postmortem, Jan 17 predictions: Why the Broncos won, and what that means for us By Yudhanjaya Wijeratne We’ve had a great week here with BigDataGame. We predicted that the Patriots had a higher chance of winning than the Chiefs. On January 16th, the Patriots beat the Chiefs. We predicted that the Cardinals would beat the Packers and the Panthers stood a slightly better chance of beating the Seahawks than vice versa. They did. One prediction, however, failed. BigDataGame showed the Steelers having a much better chance than the Broncos. That one turned out to be wrong. Well, not technically wrong - since these are probabilities we’re talking about, not simple statements - but it definitely threw us a bit. What happened? We looked at Twitter for an answer. What better way to understand something than to examine it through a thousand eyes? Fitzgerald Toussaint and the injuries Mistakes happen in sports. Toussaint’s was one of them. In the fourth quarter, Bradley Roby hit him well, making him lose the ball. Unfortunately, this seems to have set up the Broncos with an excellent position to play from for the rest of the game. Two other things that swung the play were injuries. The Steeler’s Antonio Brown took a pretty brutal hit during the previous Bengals game. The subsequent concussion ruled him out of this game. The Steelers had no one to fall on save for Ben Roethlisberger, who had just recently suffered heavy damage to his throwing shoulder. In sports, nobody can predict mistakes or injuries. Human analysis is capable of factoring in injuries - and while we could, say, build a point-based system to reduce winning percentages based on player injuries, it’s quite a time-consuming task: we’d need many years worth of data, analyze performance and probabilities at the level of individual players, and then try to analyze the impact of injuries by correlating injury with game performance. ESPN’s prediction graph ESPN, being close to the game, does a more detailed analysis. In their blogpost titled ‘Broncos didn’t get control until late, but that was enough’, they discussed the winning probabilities for both teams and what moments changed them. Here’s their win probability chart over the course of the game: It’s interesting to note how close their initial predictions for the match were to ours. Our estimate ran as follows: Pittsburgh Steelers with a 55.76% win probability, Denver Broncos with 44.24%. Their numbers are 59% and 41% respectively. It must have been quite a surprise when the Broncos won.

Bengals Playoffs: [almost] called it! By Yudhanjaya Wijeratne As we explained on the BigDataGame page, we’ve been testing our prediction machine for a while. Nevertheless, January 9’s games was its first real test at predicting an upcoming match. On Saturday, we saw the Chiefs go up against the Texans and the Bengals pit themselves against the Steelers. Three days prior to the match, BigDataGame had given us the following numbers: Saturday KCC

58.33% VS HOU

41.67% CIN

56.87% VS PIT

43.13% Sunday SEA

52.08% VS MIN

47.92% GB

51.89% VS WAS

48.11% Big Game winning chance predictions: NE 9.86%

CIN 9.86%

CAR 9.52%

KC 9.52%

ARI 9.18%

SEA 8.50%

DEN 7.99%

MIN 7.82%

PIT 7.48%

GB 6.97%

HOU 6.81%

WAS 6.46% Those were some pretty close numbers, but the final prediction was that the Chiefs, Bengals, Seahawks and Packers would win. We were almost entirely correct. KCC went 30-0 against HOU. Despite a 13-0 lead for the Chiefs, the teams played fairly evenly - until the Texans’ J.J. Watt took a hit and went off in the third quarter after a tackle from Eric Fisher (Watt, who had previously been dealing with a groin injury and a broken hand, is apparently now in need of groin surgery). Following this, the Chiefs quickly extended their lead to 27-0 and subsequently a clean 30-0 win, though not without injuries of their own: Jeremy Maclin went down with what most people feared was a torn ACL, but is now confirmed to be a sprained ankle. His participation in the upcoming Patriots match was very much up in the air at the moment. Our CIN vs PIT prediction failed. Everyone knows what happened - a brutal back-and-forth with two strange fouls early on and one of the most incredible catches ever seen made by Martavis Bryant. For a moment there it felt like the Olympics gymnasts had invaded the game. We gave the Bengals a 56.87% chance of winning. The Steelers clinched it, with Roethlisburger walking off with a shoulder injury. The other two were on point. The Seahawks took the game 10-9 in a closely fought match with the Vikings. And despite a weak start, the Packers absolutely destroyed the Redskins, winning 35-18. Adam’s knee injury was revealed to be minor, so we will see him again. Final tally: we got 3 out of 4 predictions correct on our first run - a 75% success rate. As you can see, it’s still too early to tell who’ll go on to win the season. This set of matches highlighted a flaw in our (and indeed, most) prediction systems: injuries. While injuries cannot be predicted, we should have some method of including them into our pre-game calculation - perhaps as a mathematical offset that acts as a filter on the predictions. So, while the players ready themselves for another clash, we’re working on factoring in broken bones and surgeries for an even more accurate result. To make your own predictions, visit WSO2 BigDataGame. The program is online and freely available. To understand how BigDataGame works, click here.