Editor’s Note: This piece was initially given as a presentation at the marvelous 2016 Saberseminar.

Back in 2014, it became clear that a large portion of the decrease in run scoring during the 2000s drop – as much as 40 percent – could be attributed to umpires expanding their strike zone downward about three inches. This was confirmed by three independent researchers: Jon Roegele here at THT, Ben Lindbergh at Grantland, and yours truly. The trend in the larger strike zone had started around 2009 and continued through 2014. Prior to the 2015 season, Jeff Passan reported it was something the league would look into, particularly if the low-scoring games were less interesting to fans. But, as Jon showed, the strike zone expansion continued as the 2015 season began.

Beginning last August, however, home runs started to increase and offense made a small comeback. In 2016 that trend has continued, and data from MLB’s new Statcast system show exit velocity is to blame: The ball is coming off the bat harder than it was last year. This has been well documented by others.

The increase in exit velocity and home runs has led to various theories about the return of steroids or “juiced” baseballs. These are pretty serious accusations. The former implies impropriety among players. The latter has precedent I’m sure Rob Manfred would prefer to avoid. It cost Ryozo Kato, the commissioner of the Nippon Professional League in Japan, his job.

Most recently, Rob Arthur and Ben Lindbergh put forth some interesting evidence regarding the juiced-ball theory over at FiveThirtyEight. But Alan Nathan presented some evidence here at THT that was inconsistent with the juiced-ball theory. The investigation into the juiced ball, at least to some extent, seems fueled by the findings of Jon Roegele after the 2015 season, concluding there wasn’t much changing there.

Contrary to Jon’s conclusion, my own data hinted at something different. When I looked at the average height of called strikes–an admittedly surface-level look–it seemed to be ticking upward in the latter part of the 2015 season.

Pairing that with Alan’s more recent presentation, I was rather skeptical that the juiced ball was a good explanation of the home run and scoring increases. But I wanted a larger sample than the last two months of the 2015 season to follow up on Jon’s findings with the strike zone.

So I decided to investigate for my Saberseminar talk. I figured I could show the evidence that, as before, the umpires are a large culprit in all of this. I returned to the data last month, adding games through July, 2016. The trend in called-strike height seemed to continue.

It would seem reasonable to expect much of this change would be caused by umpires cutting down on strikes at the knees. But it also could be a selection bias issue or simply an expansion of the zone up high. So let’s take a closer look. The obvious question from all of this is: What does the strike zone look like now, and has this impacted run scoring since the 2015 All-Star Game?

I’ve been modeling the strike zone using PITCHf/x data since about 2010 using the same non-parametric method, a generalized additive model (GAM). GAMs are relatively flexible and allow us to make pretty pictures of the zone as well as control for various factors related to changes in the strike zone (like the ball-strike count). I’ll start with a simple GAM of the strike zone to get you acquainted with the visuals, and I’ll tell the rest of my story mostly in pictures. (Note: You can see enlarged versions of any of the following visuals if you open them in a new tab.)

A Hardball Times Update by Rachael McDaniel Goodbye for now.

Notice that the darker the red, the more likely a pitch will be called a strike. The darker the blue, the less likely it will be called a strike in that location. There’s nothing particularly surprising here. Pitches down the middle are almost always called strikes. And pitches five feet off the ground and two feet inside are never called strikes.

But comparing these figures across time periods is a bit difficult, especially with small changes. It’s much easier to plot the changes in the strike zone for data prior to the 2015 All-Star Game (the “PreASG15” era, starting at the beginning of 2015) and data from after the 2015 All-Star Game (the “PostASG15” era continuing through July 23, 2016).

In other words, I subtract the probability a pitch–given its location–is called a strike in the PreASG15era from the probability that same pitch is expected to be called a strike in the PostASG15 era. The result is below.

Notice the scale is similar to the previous plot. As the probability of a strike decreases in the PostASG15 era, relative to the PreASG15era, it is darker blue. On the other hand, if the probability of a strike call increases in the PostASG15 era, it is darker red. White or light grey is a neutral color, meaning there is no change from the PreASG15to the PostASG15 era in strike probability.

The story is relatively clear. For both right-handed and left-handed batters, the probability of a strike call on low pitches has decreased substantially. (The base rate of called strikes for these pitches ranges from 20 to 60 percent.) In some cases, the probability of a low outside strike has decreased by as much as 12 percentage points. Much of this difference is low and outside, pitches we know to be more difficult to hit with high exit velocities or for home runs.

Interestingly, there are also more strikes being called up in the zone. While this would mitigate some of the decrease in the total size of the strike zone, it could increase the rate at which balls are hit in the air, possibly leading to more home runs.

As an academic interested in economics and incentives, my first thought was that the players must have recognized this change and adjusted their behavior strategically. For example, we might expect these changes to induce pitchers to throw up in the zone and over the plate more often. With more pitches over the plate, it’s possible batters are squaring the ball up more consistently since they don’t have to worry about those low, outside pitches as much. The behavioral change would be easy enough to confirm in the data.

To identify changes in pitch location, I switched to using kernel density estimation, which simply evaluates the proportion of pitches in a given location, rather than the probability of calling those pitches strikes (or some other event), given location. I use the same differencing method as with the strike zone, calculating the change in the rate that pitchers throw to certain areas of the strike zone in the PostASG15 era. The result is visualized below.

The changes here are again rather clear. The reduction in called strikes on the low-outside corner has induced pitchers to throw to that location less often. That has moved pitches inward toward the plate, and the inner half is being targeted more often than it was in the PreASG15 era. These changes are pretty subtle at an extra pitch or two per game.

But given the increase in pitches on the inner half of the plate, we should see more swings and contact there as well. Again using kernel density visuals, this is evident in the data.

The net effect here is around one additional contacted pitch on the inside half per game, and an average contact point between one tenth and one quarter of an inch closer to the center of the plate. Should we expect these small changes to result in home run increases?

As I did with the probability of a strike call, I use a GAM to estimate the probability of hitting a home run when the batter swings, conditional on the location of the pitch. And, as I suspected, the most common location for home runs aligns almost perfectly with the locational increases in pitches, swings, and contact in the PostASG15 era.

Given all the evidence here, my next step was to apply this in the context of Simpson’s Paradox. The changes in the rate of pitches in locations that result in higher exit velocities and home run rates, when averaged in aggregate, could result in what looks like an increase in how hard the ball is coming off the bat globally. In other words, if we reallocate the proportion of contact locations with the same associated exit velocities, can we explain the increased average aggregated exit velocity?

To do this, I broke the zone down into 36 separate six-inch by six-inch boxes as you see in the grid below. I then averaged the exit velocity of batted balls in each zone in the PreASG15 era and calculated a weighted average exit velocity using the PostASG15 proportions of contact in each zone. If the average exit velocity overall using the PostASG15era proportions paired with the PreASG15 era exit velocities, then we could conclude the increase is largely due to changes external to a juiced ball.

From this grid, we take the average exit velocity for Zone 1 (top left) in the PreASG15 era and multiply it by the proportion of contacted pitches in Zone 1 in the PostASG15 era. We do the same for Zone 2, Zone 3, and so on. These estimates in each zone are just a discrete version of the density plots (contact proportion) and GAMs (home run rate, or in this case, exit velocity).

And doing this did result in an increased exit velocity estimate. However, the change was only about 0.055 mph, or 5.5 percent of the actual change of one mph in the PreASG15 and PostASG15 eras. But I wasn’t satisfied with this. There could be other considerations. (I won’t go through the mathematical gymnastics here, but the percentage point change in inside rate is about 0.1 to 0.3 percent in a given zone area. I’m happy to share the specific numbers with anyone interested.)

It’s also possible the ball-strike count has become more favorable to batters, and in turn, batters are swinging at pitches more often in 3-1 or 3-0 counts. So while the locational differences didn’t result in much, perhaps batters simply are more prepared to sit on pitches in these counts and, in turn, hit the ball harder on average. Going through the same re-weighting exercise, I accounted for another 0.025 mph. That brings the total increase explained to only eight percent.

I remained stumped and a bit more open to the idea that the manufacturing of the ball may have changed slightly. But there are a few additional considerations that could be contributors.

It’s clear batters have been hitting balls at more favorable launch angles for home runs. The figure below is from a GAM estimation that evaluates the change in probability a batter hits the ball in the “sweet spot” angle for home runs. Clearly, some things have changed here, too.

There are two important takeaways from this plot. First, hitting the ball at these angles should increase the probability of home runs and, in turn, explain some of the increase in scoring. But Alan Nathan shows us that it’s not enough to explain the increase in home runs alone. Second, if balls are being hit at more favorable angles more often, it may also be that batters are squaring them up better. There is some evidence this may be the case, as the variability in exit velocity has been reduced overall in the PostASG15 era (though, by only about two percent) as measured by the coefficient of variation.

The skewness of the exit velocity distribution has also changed (again, very slightly) in a way that implies more balls are being hit harder, but the hardest hit balls are necessarily being hit harder. Again, this seems to indicate some change toward more consistent squared contact across the league. But it’s very small, and as Rob Arthur recently showed, some of this could be due to systematic missing data in the tracking system in the PreASG15 era.

With more detailed, reliable data, one might be able to “reverse engineer” contact quality beyond just exit velocity and launch angle. For example, the batted-ball spin rate and direction may tell us more about the swing plane and how squarely balls are hit. Unfortunately, that data are not publicly available, and I’m told the results are still not particularly reliable.

Ultimately, even in the reverse-engineering scenario, it’s not clear we could account for the entire change. And it’s also not clear why the entire league so suddenly would change the way it approaches hitting. My hope is Alan Nathan can enlighten us all in the coming months regarding the exit-velocity puzzle.

So, after nearly 3,000 lines of R code and sifting through hundreds of thousands of observations, I explained very little. But sometimes that’s the fun of scientific inquiry. If we always made clear discoveries that could explain what was happening, we would be deprived of the challenge that makes the inquiry fun in the first place. It will be interesting to see how both offensive output and research on exit velocity continues to unfold.

What I do find fascinating, however, is that there are apparently some changes consistent with pitchers responding to incentives. The same behavior took place when the strike zone worked its way downward. If umpires continue to change the pitches they call strikes in different ways, it should be fun to keep tabs on how this induces strategic changes among pitchers.

References & Resources