Building a Baseball Win Probability Model

The 2016 Cubs were the team to beat last season. Constructed with cheap and talented prospects, a daunting pitching staff headed by Jon Lester and Kyle Hendricks, and a dominant bullpen, the Cubs cut through the National League and claimed their first World Series since 1908.

To relive the 2016 season, we created a win probability model in an attempt to measure under and over-achieving teams and apply the model to the 2017 campaign. Using Logistic regression, we developed a model that measures the probability or odds of the response taking a particular value (1 if home team wins, 0 if home team loses), which is modeled conditional on the home and visitor’s Elo rating as well as the starting pitcher ratings. Unlike FiveThirtyEight’s win probability model, our model does not adjust for rest days and how far the visiting team has traveled.

The Elo rating system calculates the relative skills levels of MLB teams in a head–to-head matchup. Each team is assigned a rating (average 1500) after every game with the winning team gaining some Elo points and the losing team giving way the same number of points.

The model is trained on 2,249 games from the 2016 season using game-by-game data from RetroSheet.com. This allows us to adjust whether a team is playing at home or on the road and measure the strength of the starting pitchers on both sides. Our starting pitcher ratings are sourced from the Elias Sports Bureau on a scale of 0-100. Pitcher ratings are also adjusted after each game.

The home and visitor pitching ratings (both p<0.001) were the most significant predictors in our model. In addition, we found that the home team’s Elo rating (p=0.018) was more predictive than the visitor’s Elo rating (p=0.165).

For our win probability model for the 2016 season, each team’s preseason Elo and end of season pitcher ratings was used in this analysis.

Lopsided Matchups

According to FiveThirtyEight, the Cubs’ 1588 Elo was the best in the baseball last season. On the flip side, the rebuilding Reds, Braves, and Phillies’ Elo scores fell into the basement starting with an average 1450 Elo in 2016. Accordingly, in our model, the most lopsided matchup occurred between the Cubs’ Jon Lester (96.6 pitcher rating) and the Braves’ Aaron Blair (25.0 pitcher rating), in a game that the Cubs toppled the Braves.

The Diamondbacks (1503 starting Elo) overcame the biggest win probability deficit (19.5% chance of winning) in a victory against the Giants. This game featured a matchup between Giant’s ace Johnny Cueto and Shelby Miller, who posted a mediocre 4.87 FIP and 5.06 xFIP last season. Another unexpected victory occurred in game between the Padres and Cubs. Against all odds, the Padres’ Colin Rea defeated Kyle Hendricks in a game where the Cubs had an 80.3% probability of winning.

In our model, the Cubs, Nationals, and Blue Jays had the highest win average probability while the Diamondbacks, Phillies, and Padres settled to the bottom.

Figure 1: Average home and road win probability in our model over the 2016 season

The Starting Pitcher Effect

In baseball, starting pitchers can make all the difference between a win or a loss giving even the worst teams the chance to knock of the elites. For example, the Yankees (1540 Elo on July 1,2017) are significantly outperforming the Blue Jays (1503 Elo) so far this season. But in a start between Luis Cessa and Marcus Stroman (at the Roger’s Center in Toronto), our model gives the Blue Jays a 65% chance of winning the game. If the Yankees instead started Luis Severino, an above average starter, their win probability bumps up to 51%.

Home Field Advantage

Take two teams with identical Elo scores and starting pitcher ratings. In our model, the home team is projected to win 52.8% of the time over the visiting team. While MLB home field advantage is less of a factor than the NBA, NHL, and NFL, between 1903 and 2010, home teams have won 53.9% of their home games over this span indicating that the home team automatically has a slight boost over the visitor.

Model Outliers

Two teams that jump off as model outliers were the Orioles and Astros last season. On one hand, the Orioles winning percentage was 0.179 higher than expected according to our model. On the other, the Astros winning percentage fell 0.167 points lower than expected.

The Orioles are long accustomed to out performing their win projections. Between 2012 and 2016, the Orioles averaged 13.2 wins more than expected thanks in large part to their dominant bullpen.

According to Baseball Prospectus, the Orioles’ offense scored eight runs less than expected last season. On starting pitching side, the team’s 52.5 pitcher rating average was 2.9 points below the MLB average. The Orioles top starter, Chris Tillman (77.4), was good throughout last season but other starters Yovanni Gallardo (39.1), Wade Miley (42.3), and Tyler Wilson (41.0) were mediocre. However, the bullpen featured lockdown arms in Zach Britton (2.5 WAR), Brad Brach (1.6), and Mychal Givens (1.2), who averaged 10.66 K/9 and a 2.72 FIP. Other overachieving teams include the Braves (+12.3% winning percentage differential), Twins (+10.4%), and the Tigers (+9.9%).

Figure 2: The difference between teams’ actual winning percentage and projected winning percentage in 2016

On the flip side, the Astros, who held a 1526 Elo in 2016, significantly underperformed their win projections. Their starting rotation was very good last season with an average 58.3 pitcher rating. Dallas Keuchel (57.3), Colin McHugh (60.8), and Doug Fister (57.8) were consistently reliable in the rotation. However, the Astros got off to a rough start in April (2016) and were forced to climb out of the cellar to grab a final wild card spot.

The Giants, Cubs, and Pirates also underperformed expectations based on their starting Elo ratings. For instance, the Pirates’ Elo score fell 34 points (1524 to 1490) while the Giants added an extra three Elo points (1526 to 1529), which was enough to grab the second wild card spot in the National League. As a whole, however, most teams’ final winning percentage fell in line with their starting projections.

Sources:

https://fivethirtyeight.com/features/how-the-fivethirtyeight-senate-forecast-model-works/

http://www.espn.com/mlb/story/_/id/18740637/how-baltimore-orioles-keep-trumping-projections

http://freakonomics.com/2011/12/18/football-freakonomics-how-advantageous-is-home-field-advantage-and-why/

https://fivethirtyeight.com/features/how-our-2017-mlb-predictions-work/

Share this: Twitter

Facebook

Like this: Like Loading...