Who was the best pitcher in the American League in 2013?

By WAR, it was Max Scherzer, although teammate Anibal Sanchez had a better ERA, and Yu Darvish had more strikeouts. King Felix and Chris Sale also had excellent seasons and could be considered. One name you might not expect to hear is Greg Holland, although by one measure, he helped his team win games more than any other pitcher in the Junior Circuit.

Baseball statistics exist on a spectrum from predictive to descriptive. Pitching metrics like SIERA and xFIP try to predict future outcomes based on exhibited skills. Moving down the spectrum, FIP and ERA describe what happened, in terms of the pitcher’s K/BB/HR or total runs allowed. Near the end of the spectrum is a statistic that best describes how a player influenced his team’s chances of winning each time he played: Win Probability Added (WPA).

It is by this metric that Greg Holland led the AL in 2013.

Let’s be clear: I’m not saying that Holland is better than or even close to being as valuable as Scherzer, Darvish and Sale. This just illustrates that his contribution to his team’s record was greater than any other pitcher’s. In fact, if you look at the pitching leader boards and sort by WPA, relievers routinely place very highly on the list, including seven of the top 10 in 2013.

This is one of many arguments used when people claim that WAR systematically undervalues relief pitchers. To some degree, this is a valid point. Having an elite reliever to pitch in high-leverage situations can be have just as much of an impact on a team’s performance as having a dominant starter. Plus, since leverage is largely determined by the manager, it actually has a higher year-to-year correlation than performance metrics like FIP and xFIP, making it both predictable and repeatable. However, if a team has a second similarly skilled reliever who could have taken care of those high-leverage innings with similar proficiency, then the first pitcher doesn’t seem quite as valuable.

Regardless of how you evaluate players, we can all agree on two things. First, determining the value of a relief pitcher or an entire bullpen is extremely difficult, perhaps more so than any other position on the field other than catcher. Second, a great relief pitcher and smart bullpen management can provide a tremendous value to a team over the course of a season.

What this means is that a team that has a better understanding of how bullpens work can make smart investments in relievers without paying a premium for pitchers who won’t throw meaningful innings. Also, if there are certain team characteristics that correlate with an increase in high-leverage situations for relievers, a team can decide the best time to make a big investment in the bullpen.

With that in mind, let’s take a closer look at how relief pitchers are valued by different metrics, the importance of leverage in constructing a bullpen, and characteristics of teams where the bullpen performance may be of increased importance.

Chaining: A Primer

As mentioned above, relievers can have a disproportionate impact on their team’s performance relative to their workload by pitching in higher leverage situations. For starting pitchers, WAR effectively incorporates the pitcher’s performance (park- and league-adjusted FIP) and volume (innings pitched). This method fails to capture the increased importance of late-inning relievers, which is where chaining comes in.

With chaining comes the concession that leverage should be included in some capacity for reliever WAR, but with the caveat that a relief pitcher shouldn’t get full credit for the leverage of the situation he pitches. Why? Well, if a team’s closer or “relief ace” who pitched in the highest leverage innings were to leave the team or get hurt, it’s not like those innings would disappear. They would simply get occupied by the next-best reliever. After that, everyone in the bullpen effectively moves up the chain, and the pitcher who is actually replacing the closer will end up pitching mostly low-leverage innings.

(For a more detailed explanation of chaining, read Dave Cameron’s discussion of WAR and Relievers, or this piece by Sky Kalkman at Beyond the Box Score.)

As it stands now, reliever WAR essentially gives the pitcher credit for half of the additional leverage that he pitches in above average situations (LI + 1 / 2). This is very important to remember when referencing reliever WAR. Two pitchers who have the same adjusted FIP and workload can end up with different WAR values because of their usage. To illustrate this point, let’s take a look at a few relief pitchers in 2014:

Relief Pitcher Comparison, 2014 (through Aug. 5)

These relievers all had pitched between 44 and 48 innings as of Aug. 5. We see that Miller had been the best of the group judging by FIP-, and while he had been 13 percent better than Cishek over a similar workload, their WAR is identical at 1.5, because Cishek had pitched in more high-leverage situations than all but four other relievers in baseball.

Similarly, Duke had put up an excellent season in the Brewers’ bullpen, but given his track record of mediocrity, hadn’t thrown in many high-leverage situations. Therefore, his WAR sat at just 0.9, despite the fact that he performed similarly to Cishek. Even Papelbon beat Duke’s WAR, as his high-leverage usage more than made up for the gap in performance.

Generally, the better a pitcher performs, the more trust he gains from his manager, and the higher leverage innings he will pitch. This means that there’s usually a decent correlation between performance and leverage, but there are always outliers (like Duke).

Leverage by Bullpen Slot

To get a picture of an “average bullpen,” I pulled all relievers from 2004-2013 who threw at least 40 innings in a season and sorted them by average leverage index (pLI). The results show what the back end of a typical major-league bullpen has looked like over the past decade:

Bullpen Usage by Slot, 2004-2013

Slot SV IP ERA- FIP- pLI WPA WAR RP1 28 65 70 79 1.87 1.59 1.2 RP2 7 65 78 85 1.44 0.82 0.8 RP3 2 62 82 90 1.18 0.52 0.6 RP4 1 59 89 94 0.97 0.09 0.3 RP5 0 57 94 100 0.77 -0.06 0.2 RP6 0 55 103 105 0.52 -0.16 0.0

While managers tend to catch a lot of flak for improper bullpen use, we see that across baseball, the pitchers who are throwing the most important innings are also those with the best numbers. These also happen to be the pitchers who are racking up the most saves, as save opportunities tend to have fairly high leverage. (Of course, this isn’t to say that there aren’t a number of teams and managers who could do a much better job of using their relievers.)

It is well known that relief pitchers generally allow fewer runs than their FIP would predict. The fact that every average bullpen slot is capable of doing this tells us that it has more to do with the nature of the bullpen than the skill of the pitchers. However, there is a bigger gap between ERA and FIP for the good relievers (slots 1-3, ERA- is 8 percent better on average) than those who are closer to replacement level, so perhaps some credit is due.

Another striking trend in this chart is how quickly the reliever leverage decreases. In an average bullpen, only one or two pitchers are regularly throwing “high leverage” (LI > 1.5) innings, and only three are throwing innings with above-average leverage.

What does this mean? In simple terms, it probably makes sense for a team to invest in two or three quality relievers who are capable of shouldering these innings. Beyond those first few bullpen slots, leverage has little impact on how relievers contribute to their team.

Put another way, the scarcity of high-leverage situations means that a team will receive diminishing returns from investing in relief pitchers.

Modeling Leverage, Performance and WPA

Using our “typical bullpens” from the past decade, we can try to model the relationship among leverage, performance and WPA from relievers. For each bullpen slot, we can run a linear regression to find how performance (measured by ERA-) influences team impact (WPA). This gives us the following chart:

If a team is investing in a “relief ace” (RP1), an improvement by 10 points of ERA- will yield an average of +0.6 WPA. If a team already has an established closer and is looking for a setup man (RP2), that same 10-point improvement will project to help the team by only +0.4 WPA. This may not seem like much, but with the cost of a win hovering around $7 million, this difference can be measured in the millions of dollars.

Obviously, the extreme left side of this graph is where relievers can be highly valuable. Only a few pitchers each year post an ERA- below 40, but if used correctly they can have a huge influence on a team’s performance.

In 2008, Brad Ziegler posted an ERA- of 25 in 59.2 innings for the Athletics. While his average leverage was only 1.69 (a bit below the RP1 average of 1.87), he was still able to contribute +3.20 WPA for his team, despite having a WAR of just 0.6. Meanwhile, after six consecutive seasons with an ERA of 4.50 or worse, Dennys Reyes had a miraculous 0.89 ERA (20 ERA-) for the Twins in 2006. However, since the team presumably didn’t feel he could keep posting incredible numbers (rightfully so), it kept him out of high-leverage situations (LI of 0.88, in RP4/5 territory) and he accumulated only +1.45 WPA.

Let’s use the model above to analyze some hypothetical examples, to illustrate how chaining works and why investing in a bullpen provides diminishing returns.

Example 1: Diminishing Returns

The 2014 season has ended and free agency has begun. Several elite relief pitchersare on the market, including Koji Uehara, who has a career ERA- of 52, and is willing to settle on a one-year contract given his advanced age (for a professional baseball player, that is). Two teams have emerged as frontrunners for him: the Kansas City Royals and the Texas Rangers. Uehara projects for 2 WAR on the season, which should be worth around $15 million on the open market.

To determine exactly how Uehara would help these teams win, they look at how their bullpens would look if they sign him.

Koji Uehara Comparison

Rangers Rangers + Uehara Royals Royals + Uehara RP# ERA- WPA ERA- WPA ERA- WPA ERA- WPA RP1 78 1.15 52 2.62 55 2.45 52 2.62 RP2 95 0.12 78 0.82 60 1.57 55 1.78 RP3 95 0.10 95 0.10 65 1.05 60 1.21 RP4 100 -0.13 95 -0.02 95 -0.02 65 0.61 RP5 100 -0.14 100 -0.14 95 -0.05 95 -0.05 RP6 105 -0.18 100 -0.14 100 -0.14 95 -0.11 Total – 0.91 – 3.24 – 4.86 – 6.06

The Rangers don’t have much of a bullpen this year. The only reliever who has at least 0.3 WAR and is under team control next year is Nick Martinez, who has spent most of his season pitching out of the rotation. Therefore, I filled their hypothetical 2015 bullpen with mostly mop-up guys who are roughly league-average or worse. Because the Rangers don’t have another capable reliever to take on high-leverage innings, Uehara’s presence at the back of the bullpen becomes extremely valuable, worth an increase of 2.33 WPA.

The Royals, on the other hand, have had one of the most dominant bullpens in 2014, and return most of the key pieces. Their top three relievers this year — Greg Holland, Wade Davis and Kelvin Herrera — are all under team control in 2015. Because of the Royals’ bullpen depth, adding Uehara would mean that Herrera (RP3, 65 ERA-) would be bumped down to the RP4 slot, where his leverage will be roughly average despite posting RP1-type numbers. As such, the result is a WPA increase of just 1.20.

So, an elite reliever like Koji Uehara is worth significantly more to a bad bullpen than a good bullpen. The more quality relievers you have, the more likely you are to end up with a good pitcher throwing innings that aren’t very meaningful. If these two teams were bidding on a reliever in free agency, they might be willing to pay very different sums of money, even if they are on the same budget and have the same evaluation of the player in question.

Example 2: The Elite Setup Man

In February, I wrote about how teams may benefit in the long-term by preventing pre-arbitration relief pitchers from accumulating saves. The arbitration process places way too much value on saves, so young relievers who close out games become very expensive, very quickly.

One common concern was that a team can hurt its chances of winning if it allows an inferior pitcher to handle the higher leverage situations. Of course, not all saves are high leverage, but closers do generally have the highest average leverage among relievers.

Let’s imagine a team with two hypothetical relief pitchers — the veteran closer and the young setup man — in a battle for control of the ninth inning. In that February article, I estimated that a team could save $7-8 million by preventing its young setup man from picking up more than a dozen or so saves before reaching arbitration. So, how much better does the setup man have to be than the veteran for it to make financial sense to swap their roles?

To answer this question, I plugged in the numbers for these hypothetical relievers and generated the following table, comparing the gap in ERA- between the pitchers and the effect on WPA.

Closer vs. Set-Up Man Comparison

Gap in ERA- Change in WPA 70 1.07 60 0.92 50 0.77 40 0.61 30 0.46 20 0.31 10 0.15 0 0.00

So, to get a difference of at least one win by WPA, the setup man occupying the RP2 slot would have to be 70 percent better than the veteran RP1. Even in situations where the young guy is clearly a superior option, the gap is not likely to be that large, unless the established closer completely falls apart.

For example, the gap between Cody Allen (with a 1.89 ERA) and John Axford, whose job he took away earlier this season, is “only” 33 points of ERA-. This accounts for roughly half a win over the course of an entire season. Since Axford held onto the role for the first quarter of the season, that makes the swap worth closer to three-eighths of a win. While that might be worth around $2.5-3 million on the open market, the Indians will pay the price when Allen starts going through arbitration in 2016. However, with the team over .500 with a shot at a Wild Card spot, it would be difficult to pull Allen from the ninth inning now.

Maximizing Bullpen Value

This offseason, the Athletics invested heavily in their bullpen, trading for Jim Johnson and Luke Gregerson, along with the $15 million they were owed in 2014. While a number of explanations were offered at the time, I wondered if certain teams might be in a better position to exploit high-leverage situations, and whether the A’s might be one of these teams.

I looked at a variety of team metrics from 2004-2013 and determined whether any had a significant correlation to average reliever leverage. I could include a table here with all of the results, but I’ll save you the time and tell you that there simply weren’t any significant correlations.

The two metrics that had the strongest relationship to reliever leverage were innings pitched by starters (R = 0.40) and ERA- of starters (R = -0.36). The first makes sense, as later innings tend to have higher leverage, and if starters aren’t going deep into games, it means that relievers are going to have to pitch in the less-meaningful middle innings of the game. The second is most likely tied to the first — the better your starting pitchers are, the more innings they pitch.

One hypothesis I had was that if a team was more balanced between its rotation and offense, the more likely the bullpen would be to see close games. To do this, I calculated the gap between ERA- and wRC+. (For example, a team with a rotation ERA- of 95 and a wRC+ of 105 would be 0, since both the starters and the offense are 5 percent above average. 90 ERA- and 100 wRC+ would be +10, meaning that the rotation is 10 percent better than the offense. A 105 ERA- and 110 wRC+ would be -15, meaning that the rotation is 15 percent worse than the offense.)

In theory, when the gap is smaller, the team should play more close games and the relievers should pitch in higher leverage situations. However, there turns out to almost zero correlation. In fact, the small trend was that teams whose rotation was stronger than their offense were more likely to have high-leverage innings thrown by the bullpen (R = 0.30). While this makes sense, it is also probably driven largely by the fact that these teams are more likely to have good starters who pitch deep into games (see above).

Conclusions

What did we learn today? First, evaluating relievers is tricky, and WAR doesn’t always tell the whole story. Also, the quality of a team’s bullpen will have a huge impact on the impact that an addition will have. Therefore, always keep leverage in mind when looking at both individual relievers and how they will contribute to a team’s bullpen.

Also, it’s extremely difficult to predict which teams will have higher-leverage innings for their bullpen to throw. Generally, if you have a solid group of starters who can throw quality innings deep into games, there will be high-leverage situations available to maximize the use of your relief corps. This strategy aligns with the idea that you shouldn’t invest significantly in a bullpen unless the rest of the team is good, given the short shelf life of relievers.

Lastly, and perhaps most importantly: the worse a bullpen is, the bigger the payoff from adding an elite reliever. While relievers can be volatile and it can never hurt to have depth, most teams only have two bullpen slots pitching high-leverage innings. Having two or three great relief options can be beneficial to a team, but once those slots are filled, a team would be wise to invest its resources elsewhere on its roster.