The Details

Each spring, FiveThirtyEight rolls out its latest baseball predictions for another season of major league action. We’ve been doing this for a while: We first introduced our MLB team ratings during the 2015 postseason and used them to survey the playoff picture. Starting in 2016, we began publishing two interactive graphics: our MLB Predictions dashboard, which uses our team ratings to preview upcoming games and show the chance that each team will make the postseason (or win the World Series), and our Complete History Of MLB charts, which trace the successes and failures of every franchise throughout history. Here’s how each of those interactives work.

Team ratings

Thanks to Retrosheet, we’ve collected game results and box scores going all the way back to 1871. We used that mountain of data to create an Elo-based rating system and predictive model for baseball that accounts for home-field advantage, margin of victory, park and era effects, travel, rest and — most importantly — starting pitchers.

What’s Elo, you ask? Named after the Hungarian-American chess master (and power-ratings pioneer) Arpad Elo, Elo is a simple way to rate competitors that can be tuned and customized endlessly to incorporate available data. For our purposes, each MLB team carries a rating that estimates its current skill level. (The average is about 1500.) After every game is played, the winning team gains some rating points while the losing team loses the same number of points, based on the chances our model gave each team to win the game beforehand (and the margin of victory). For example, a win by a big underdog results in a bigger exchange of points than a win by a favorite — and the larger the margin of victory, the larger the exchange.

Pregame team rating adjustments

Before every game, we adjust each team’s rating based on whether it has home-field advantage, how far it has traveled to the game, how many days of rest it’s had and which pitcher is slated to start.

Here are the particulars of those first three adjustments:

Home-field advantage is worth 24 rating points. For games played without fans in attendance, Model tweak

home-field advantage is worth 9.6 rating points.

home-field advantage is worth 9.6 rating points. The penalty for travel is worth up to about 4 points and is calculated with miles_traveled**(1.0/3.0) * -0.31

Each day of rest (up to a maximum of three) is worth 2.3 points.

Starting pitchers can have a much larger effect on pregame team ratings and win probabilities than the other three adjustments. For example, in June 2000, Pedro Martinez was worth about 109 rating points to the Red Sox each time he started, or the equivalent of about a 15 percentage point boost to Boston’s chances of winning the game.

To generate our pitcher adjustments, we’re using a version of Bill James’s game scores proposed by Tangotiger (and slightly modified by us) to isolate pitching performances. A pitcher’s game score for each start is calculated with:

\(\begin{equation*}gameScore = 47.4 + strikeouts + {(outs*1.5)} – {(walks*2)} – {(hits*2)} – {(runs*3)} – {(homeruns*4)}\end{equation*}\)

Like our team ratings, these game scores are normalized for eras and stadiums, so pitchers from throughout history can be directly compared with one another. They’re also adjusted to take the opposing team’s offensive strength into account, so a pitcher earns more credit for a great start against a top team than against a mediocre one.

Whenever a pitcher makes a start, it contributes to his rolling game score (rGS) — the model’s best guess as to how the pitcher would perform in a typical start. (Pitchers who haven’t started before are assigned a below-average rGS, but that score is more influenced by each successive start than the score of an established pitcher.) In addition to each pitcher’s rGS, we maintain an rGS for each team that incorporates every game score produced by any starting pitcher for that team.

A pitcher’s adjustment to his team’s rating, then, is all about his rGS relative to his team’s rGS; pitchers who are better than the team’s rGS give the team a bonus when they start, and pitchers below the team’s rGS give the team a penalty. Note that one pitcher may have a higher overall rGS than another pitcher but a smaller team rating adjustment; this generally means that his team has a better rotation aside from him, or that he started more games (and thus, his game scores contributed more to the team’s rGS).

A pitcher’s adjustment is calculated with:

\(\begin{equation*}ratingAdj = 4.7 * (pitcher\,rGS – team\,rGS)\end{equation*}\)

The addition of starting pitcher adjustments gives our model about a 1 percentage point improvement in the percentage of games correctly “called” and a corresponding improvement in the mean squared error of our game-by-game forecasts.

Preseason ratings

Before a season begins, we have to come up with a set of starting ratings for each team. Our preseason team ratings are made up of two components:

67 percent comes from the team’s preseason win projection according to three computer projection systems: Baseball Prospectus’s PECOTA, FanGraphs’ depth charts and Clay Davenport’s predictions — all scaled to an Elo range.

33 percent comes from the team’s final rating at the end of the previous season, reverted to the mean by one-third.

As part of all this, we also need to compute a preseason rolling game score rating for each team’s pitching staff. Our preseason team rGS ratings are an average of the team’s starting pitcher rGSs, weighted by the individual pitchers’ projected starts in FanGraphs’ depth charts.

From ratings to a forecast

Now it’s time to turn these team and player ratings into probabilities, tracking how often each team makes the playoffs or wins the World Series. To do this, we run Monte Carlo simulations, playing out the season thousands of times. As with our other sports forecasts, we run these simulations “hot,” meaning that a team’s rating doesn’t stay static — rather, it changes within each simulated season based on the results of every simulated game, including the bonus for playoff wins. Starting with the 2019 season, our team ratings change at three-quarters of the speed they previously changed. Model tweak

As a result, the “hot” simulations have a bit less variance, and the forecast’s overall uncertainty is decreased a touch.

These simulated games also account for starting pitching matchups; for games in which a starter is not yet known, we assume that the most-rested pitcher from the team’s regular rotation will play. During the postseason, we assume teams use a four-man rotation.

The complete history of MLB

Our Complete History Of MLB interactive contains historical Elo ratings stretching back to the 1871 season. These charts use a simplified Elo system that doesn’t take pitchers, travel or rest into account. Between seasons, it simply reverts the previous season’s ratings toward the mean by one-third, rather than using projection systems to set preseason ratings.

This means that the Elo ratings in our Complete History of MLB won’t exactly match the team ratings in our MLB Predictions. (Why use two systems? The projection systems we use to generate preseason ratings aren’t available back to 1871. Also, using a simplified rating system for the historical ratings gives us the flexibility to alter our current-season forecast’s methodology from year to year while keeping our historical Elo ratings unchanged.) They’re still pretty useful, however, when it comes to measuring the ebbs and flows of a franchise’s fate over time. Plus, just like our forecast model, our historical Elo ratings will update with the results of each game this season.

Editor’s note: This article is adapted from previous articles about how our MLB predictions work.