This is Nate Freiman’s second post as part of his August residency. Nate is a former MLB first baseman. He also played for Team Israel in the 2017 World Baseball Classic and spent time in the Atlantic and Mexican Leagues. He can be found on Twitter @natefreiman. His wife Amanda routinely beats him at golf. To read work by earlier residents, click here.

Being a tall hitter came with its drawbacks. Long arms, lots of moving parts. Eight-hour bus rides starting at 11pm. Getting pitched inside. (In fairness, I saw thousands of pitches and only suffered two broken hand bones.)

And yes, the low strikes. My entire career, anytime I’d get a low strike called there would be someone from the dugout yelling, “He’s six-eight!” Hopefully, by that time, it wasn’t news to the umpire. My hitting coach in A-ball told me to wear my pants Hunter Pence style. Above the knees. He figured the umpire would see the bottom of the zone better. I figured that would get me ejected.

So I can honestly say I sympathize with Aaron Judge. Travis Sawchik has done great work on Judge’s relationship with the bottom of the zone. It makes sense that a guy that big is a strike zone anomaly, but do other guys have the same problem? I used Statcast data to investigate.

The MLB pitch data features anywhere between 50 and 90 columns of information for every single pitch thrown. One of them is “sz_bot,” or strike-zone bottom. I used this number to adjust the strike zone for each hitter. The problem is, sz_bot varies. Of the hitters who have seen at least 500 pitches in 2018, the top of the zone measurement (sz_top) has an average range of 2.8 feet, while the bottom of the zone (sz_bot) varied an average of 3.4 feet.

Most of this is due to random outliers. One of the columns for David Freese, for example, suggests his strike zone on one pitch extended up 11 feet. To address this, I took the median strike-zone top and bottom for each hitter instead of the average.

Once determining the approximate strike-zone boundaries for each hitter, I isolated somewhat arbitrary window at the top and bottom of the zone. The window at the top of the zone is simply every pitch that is coded as being at least half the diameter of the baseball above sz_top. The bottom window is every pitch located between half a ball below sz_bot and one foot below sz_bot. The batters receiving strike calls on these pitches are, in theory, those who are the greatest victims of low strike calls.

Not surprisingly, Judge is way ahead. In fact, there’s a statistically significant difference between him and Peralta, who’s still ahead of everyone else in baseball. These guys also happen to have an average height of 75.4 inches, or a little over 6-foot-3.

What about high pitches?

Brian Dozier is the Aaron Judge of the top of the zone! Like Judge, Dozier is ahead of second place by a statistically significant margin. Also notably, the average height of the batters here is 71.8 inches, or nearly four inches shorter those those from the low-strike sample.

I also wanted to see if low strikes plague hitters year-to-year. Here’s a chart comparing the rate strikes called in the low window over the past two seasons.

Aaron Judge is such an outlier that I actually had to take him out before drawing the regression line. The overall chart has a correlation of 0.65. Clearly there’s some consistency. The guys who get low strikes called on them tend to keep getting those pitches called.

Is height the deciding factor? Here’s what the low-strike percentage looks like as a function of height. (The heights are listed in discrete values, but I added normally distributed noise.)

This regression line had to be drawn without Judge, David Peralta, or Jose Altuve. Removing the outliers, we get an R squared of .23. Height isn’t the only factor at play here, in other words, but it does explain about a quarter of the variance in that low strike. Look at Brandon Belt! He’s not a short guy, but he’s getting a good bottom of the zone.

What about high strikes? This one looks a little different:

Year-to-Year High-Strike Correlation Metric Correlation with 2017 R-Squared with Height 2018 High-Strike Percentage R = 0.715 0.05

Height doesn’t seem to the be factor here. Umpires have a better view of the high pitch, so that’s not too surprising. The high pitch is also less sensitive to catcher framing. (Sorry, Stephen Vogt, I know I just said the f-word.) Good catchers let high breaking balls get deep but the real skill is sticking that low pitch. For the young catchers out there, one of the big barriers to entry on that front is flexibility. The lower you get, the easier that low pitch is to stick. Catchers have crazy hip flexibility. When I played, many would be in the weight room at 6:30am on spring-training days doing what can best be described as “painful-looking yoga.”

What’s interesting is that the percentage of high strikes doesn’t correlate with height, but still correlates year-to-year. There’s something intrinsic to a hitter that results in all these supposed bad calls.

One theory, at least as it relates to Dozier, is that his swing throws off the strike-zone coding.

Dozier starts tall and then, as the pitch is delivered, begins sinking into his legs. That’s an athletic timing movement, but it also possibly affects both the umpire and data perception of the top of the zone.

If low strikes correlate with height, but high strikes do not, can we still visualize the different strike zones? I built a model from the entire 2017 season to get predicted probabilities for each pitch this season. This introduces a year-to-year bias, but I wanted a consistent measure of pitches, even if that measure is based on last year’s standards.

Once I eliminated missing data, there were about 350,000 taken pitches last season. I built a tenfold cross validated regression using a lightning fast algorithm called XGBoost. It ran 10 iterations (each time creating a random training set and giving a single predicted value for 230,000 taken pitches in 2018) in a total of 15 seconds. Technology!

The next step was separating the data by height. I filtered this data so I was looking at hitters over 6-foot-4 who had seen at least 300 pitches. I also broke it down to just righties with fewer than two strikes. This produced a list of eight players, as follow:

For the other group, I did the same thing except restricted it to players listed at 5-foot-8 or less. Here’s that group of nine:

Here are the 50% contour lines of the predicted strike zone for each height group:

In the chart above, I also drew in the median strike zone parameters for each group. Again, we’re just looking at right-handed hitters with fewer than two strikes.

A couple observations. The inside corner looks a tiny bit bigger for tall guys. Tall hitters tend to stand farther from the plate, which is possible what makes that pitch look a little better. Plus, pitchers tend to attack tall hitters inside, so there’s a better chance those are executed locations (rather than misses that make the catcher reach outside his body to the glove side). But what we’re really interested in is the low pitch. There’s a difference, especially on that pitch down and away. But it’s not as pronounced as the differences in coded zone boundaries.

There are some takeaways here. First, the real strike zone does vary by batter height, but it doesn’t take into account the entire variation. Second, some hitters have a higher percent of high strikes called, but it doesn’t appear to be related to their height. That surprised me, and I’d love to explore that further or hear people’s theories. Third, and not surprisingly, tall guys do get some extra low strikes.

I can’t complain, though. Ben Lindbergh pointed out to me that I had an unusually fair zone called in the 530 or so pitches I saw playing for Oakland. In fact, Baseball Prospectus had my number exactly equal to the number of predicted called strikes. Instead, I’ll keep my pants rolled down and stick to stories about the legroom on bus rides through the Texas League.