This is Nate Freiman’s first post as part of his August residency. Nate is a former MLB first baseman. He also played for Team Israel in the 2017 World Baseball Classic and spent time in the Atlantic and Mexican Leagues. He can be found on Twitter @natefreiman. His wife Amanda routinely beats him at golf.

Editor’s Note: a version of this work was recently presented at SaberSeminar 2018.

In 2011, I was playing at High-A for the Padres. I’d graduated from the Midwest League to Lake Elsinore in the California League. (They have the cool storm-eyes logo, but it scares my toddler so my old hats are in boxes.) Since we were so close to San Diego, we got lots of guys on MLB rehab assignments. I was a senior sign making $1,300 a month, so it was huge when someone like Orlando Hudson came through and bought us Outback.

During their assignments, every MLB guy got The Question: “What’s it like up there?” The best answer I ever heard was, “Chuck E Cheese for adults.” O-Dog, as Hudson was known, had a pretty strong reply, too: “Better balls, better lights, and a better zone.”

In this case, “better zone” means two things. The first is size. (“That’s outside!”) The second is consistency. (“That’s been a strike all day!”) And O-Dog was right: the umpiring (just like the play on the field) does get better as you go up. We’d be in some cramped clubhouse, playing cards, and eating our $11 PB+Js, watching the big club, when a pitcher would inevitably yell, “That’s a strike!” And maybe it was… by Northwest League standards.

But those standards are different than the ones at higher levels. For example: have you ever seen a check swing get overruled? I have. In Boise, back in 2009. The hitter at the plate checked his swing, and the umpire responded by yelling, “Yes he did!” After the batting team complained, however, the home-plate umpire decided to appeal to his colleague at third base, who ruled it not a swing. I’ve never seen something like that before or since.

It’s no secret that the umpiring in the majors is superior to the sort found in the minors. It’s also no secret that part of the superior umpiring is a smaller, more well defined zone. But what about the different levels of the minors? Does the strike zone get smaller at each level? Does it get more consistent? I wanted some answers.

Building the Model

In order to get them, I needed minor-league TrackMan data. That data is all proprietary, but one team sent some of it to me on the condition of anonymity. (If anyone from that organization is reading this, thank you again!) The org in question sent me a sample of 20,000 taken pitches divided across the four full-season levels. The team trimmed the data to contain only horizontal and vertical location, pitcher and batter handedness, count, and a binary “strike” or “ball” call. There was no other identifying information.

This broke down to about 5,000 pitches per level. I eliminated two-strike counts and just looked at righty batters. This left me an average of 2,265 pitches per level.

Then I built a model using a machine-learning algorithm called a Support Vector Machine (SVM). The model learns the data and predicts a 0 or 1 (ball or strike) for new pitches.

I generated a grid of 1,030 borderline pitch locations in half-inch increments. It looked like a thick picture frame around the edges of the strike zone. Since the data wasn’t filtered for strike-zone height, the high/low predictions weren’t reliable. I cut off the grid at the MLB average for strike-zone height. I wouldn’t be checking for low pitches, just inside and outside. I then had the model predict the call at each location.

To avoid overfitting, I ran the model 10 times at each level using 10 different random subsets of the data. Finally, I took the averages.

Borderline Strike % Call by Level Level Borderline Grid Strike % (n=1030) Triple-A 35.77 Double-A 37.50 High-A 42.22 Low-A 45.13

Sure enough, the zone gets a little smaller as the levels increase. The jumps aren’t huge — the increase from High-A to Double-A was the only one statistically significant at the 5% level — but they add up. Each two-level (or greater) jump is significant.

But look at the top and bottom of the ladder. There’s a ten percentage point difference between Low-A and Triple-A!

Using Probability

Unfortunately, the strike zone isn’t simply yes or no. The pieces I’ve read at The Hardball Times, FanGraphs, and Baseball Prospectus treat the zone as sort of a Schrodinger’s Cat, where each pitch is basically just a probability. It’s something of a quixotic quest to picture the strike zone as its Platonic ideal, to believe that the essence of a pitch is actually a ball or a strike. But imagine arguing a call by saying, “That’s a terrible call! You actualized a 9% strike probability!”

To address this point, I ran the SVM again, but as a regression. This time, the algorithm returned a continuous value between 0 and 1 for each location. These are our called-strike probabilities.

It makes a big difference if a borderline pitch is 30% or 10%. Both would be zeroes in the classifier, but if you’re in a 2-1 count with runners on and you take a close pitch, you’d much prefer that 90% chance of going 3-1 than the 70% one. It can make a difference.

To visualize the probabilities, I made a chart that looked kind of like a bell curve. On the x-axis are our probabilities. The higher the graph at that point, the more borderline pitches had that probability.

We’d expect a more aggressive zone to be higher towards the center and right, while a more conservative zone would be more evenly distributed.

Sure enough, there is a noticeable difference between Low-A and Triple-A. The trigger is quicker down there.

We see that Low-A umpires might be more likely to call borderline pitches strikes, but which borderline pitches? For this, I used a process I learned in Analyzing Baseball Data with R by Jim Albert and Max Marchi. (Marchi signed my copy at Saber Seminar!) I extracted the 0.5 contour curve from the regressions and plotted them together. Each line represents an approximate boundary between pitches that are likely (p>0.5) to be called strikes and ones that are likely (p<0.5) to be balls. The line itself is where we flip the coin. As in, “Keep flipping a coin out there, Blue!”

(Note: never call the umpire “Blue.” Especially in pro ball. You won’t be ejected, just deservedly shamed.)

The outside corner looks pretty much the same at each level. It’s the inside corner that gets narrower as we climb the ladder. Again, we need to ignore the low and high pitch because the data wasn’t corrected for strike-zone height.

Let’s compare the Low-A and Triple-A corners. The chart above has the 50% lines from each level. We could get a much better sense for the difference if we compared Low-A and Triple-A using more of the contour lines.

Sure enough, the outside looks pretty much the same. The inside corner really stretches farther in at the low levels.

What About Consistency?

In order to test consistency, I used another SVM classifier. The only difference in this case is that, instead of predicting a random grid of pitches, this method predicts pitches from within the data set.

I broke each data set (averaging 2,265 pitches) into a training set consisting of a random sample of 70% of the pitches and a test set consisting of the other 30%. The idea is to build a model based on the training set, use it to predict the test set, and then compare the predicted calls in the test set with the actual calls. Once again, I ran the model with a tenfold cross validation.

This process is similar to the one experiences while watching from the on-deck circle. The training set establishes the zone while the test pitches are the ones that may or may not be consistent with that particular zone. The higher the percentage of test pitches that agree with the predictions, the more consistent the zone.

The overall accuracy represents the percentage of test pitches that have the same call as the model predicted. Again, this isn’t a measure of whether the call is correct. It’s a measure of whether it agrees with the established zone at that level.

False positives are pitches that the model predicts should be called balls but were called strikes. False negatives are pitches that were predicted to be strikes but were called balls. If we were evaluating the model, it would be the other way around. But we’re evaluating the calls.

Here is a plot of the false positives and false negatives from each iteration.

The first thing I notice here is that false positives aren’t necessarily strikes. Again, we are just checking for consistency, not accuracy. The expected Low-A zone goes well past a ball off the plate in both directions.

There are a couple things to keep in mind here. First is that Triple-A catchers are more advanced receivers than their Low-A counterparts, so there’s a good chance the cluster of false positives in the Triple-A chart is due the capacity of the respective catchers to stick the low pitch.

Second, the false negatives in Low-A strayed farther off the plate. That will be important in a second.

Let’s take all the pitches from all 10 iterations and look at the accuracy.

Borderline Strike Accuracy by Level Level Borderline Pitch Consistency (%) False Positives on Borderline Pitches (%) False Negatives on Borderline Pitches (%) Triple-A 79.30 20.33 20.92 Double-A 75.97 25.68 21.84 High-A 75.26 26.53 22.55 Low-A 77.64 24.55 20.46

Once again, Triple-A has the most conservative zone. It is the only level with a lower false-positive rate than the false-negative rate.

This looks like incriminating evidence for Low-A. Strike probabilities are skewed slightly higher, and the false negatives fall farther from the edge of the plate. Together, that means fewer false positives.

Not surprisingly, there are the fewest false negatives. My experience in Low-A was that the umps erred on the side ringing a batter up. Apparently there was some truth to that.

Takeaways

We’ve seen here that the zone seems bigger in A-ball, specifically on the inside corner to a righty. If I were a minor-league hitting coach, I’d want to communicate this to my players. But not so they know to swing.

The minor leagues exist for the purpose of developing players. The habits developed by young hitters at the lower levels tend to follow them up the ladder. I’d want them to know that the big inside corner is a feature of the level, rather than a reality of pro ball. Instead, I’d communicate that swinging in off the plate before two strikes, even if it’s a strike, is inimical to development.

I’d say that it will do more good learning to take that pitch than learning to hit it. That’s a ball in Double-A or Triple-A, let alone the big leagues.

Hitting is hard enough. Don’t let the guy back there dictate what you swing at. Until two strikes, of course.