In the sortables section of Baseball Prospectus, there is a report called Batter Plate Discipline. If you’re trying to get a handle on how good hitters are at reacting to balls and strikes, this section contains measurements on such things as swing and contact rates. A natural way to divide such rates is based on the strike zone: A swing at a pitch inside the zone is a different event than at one outside the zone. A whiff on a pitch middle-middle is a disparate event from a whiff on a pitch way outside, so it makes sense to tabulate them in different columns. There is a problem with this dichotomy, however: There is no strike zone.

In the words of Michael Lopez (who borrowed in turn from Bobby Ojeda), the strike zone is a unicorn. By this I mean not that the strike zone does not exist at all, but rather that it does not exist in the way that Major League Baseball defines it. The rulebook definition is a rectangular solid hanging in space, with infinitesimally thin boundaries which, once touched, trigger strike calls.

The actual strike zone is a shifting, nebulous cloud of probability density. It moves up and down with the whims of the umpire and the policy changes of MLB. It shrinks and expands as the count becomes favorable or unfavorable. It grows when an experienced pitcher throws and contracts when bad pitch framers snatch pitches with exaggerated body movements. It’s not a static thing, fixed for every hitter of every height; instead, it is constantly dynamic, flowing with the previous pitches and at-bats in a game. I render here no judgment as to whether the strike zone ought to be this way. I’m only noting that according to all available evidence, it is this way.

As a first pass approximation, it’s okay to think of the zone as that imaginary rectangular solid, i.e. in its unicorn form. But with the greater granularity afforded us by PITCHf/x data, we can do better. We ought to reckon strike calls in terms of their probabilities. In so doing, we can get a more accurate and authentic picture of a hitter’s plate discipline. The decision to swing is indefensible on a pitch four feet off the edge of the plate, but perfectly understandable on a pitch painting the black.

To accomplish this task, we first need a model of how likely each pitch is to be determined a strike, with all of the messy factors which distort the rulebook strike zone incorporated into it. A few weeks ago, in writing about umpire consistency, I built just such a model. It works by “learning” the strike zone as it is actually called from a subset of the pitches, and it is quite accurate, able to predict the umpire’s call 92.4 percent of the time (out-of-sample).

With these strike probabilities in hand, I could then go about determining a player’s innate plate discipline within the context of the probabilistic strike zone. To do so, I modelled each player’s propensity to swing as a function of the probability of a strike and the count (using a logistic regression model).

I picked out four hitters with various different plate discipline profiles and plotted the model’s estimate of their response to increasing strike probabilities here. On the x-axis, you can think of each hitter as receiving pitches closer and closer to the center of the zone as you travel left to right. The y-axis charts their modelled response in terms of the probability of a swing.

There are four hitters depicted here, each with a very different profile. The model estimates two different, intertwined characteristics for each hitter: One, their intercept, which we can interpret as their baseline rate of swinging, even at pitches which are sure to be called balls; and two, their responsiveness, or the degree to which they respond to seeing pitches of increasing strike probability.

We can see that all four hitters bend toward swinging more as a function of increasing strike probability, but each has a slightly different shape to their curve. Salvador Perez swings too much at pitches that are sure to be called balls. He also seems to show little active discrimination of the strike zone, since his line shows the lowest slope. When a pitch is a sure ball, he swings ~20 percent of the time; when it’s a sure strike, he swings ~45 percent of the time. While that is a significant difference, it pales in comparison to most other hitters.

David Ortiz is the opposite of Perez, and superlative from a plate discipline perspective. He swings at very few balls, and he has a dramatic bend upward so that he swings second most often at certain strikes. Second most often, of course, because Pablo Sandoval swings at legitimately almost everything in the zone. But as the slope of his line implies, despite a strong predilection toward swinging, he’s also vastly more likely to swing at a strike than a ball. He is effective, at least in part, because he can make this distinction accurately.

Finally, I put Mike Olt on the graph as an example of how a hitter can fail in the opposite manner, that is, by being overly patient. Olt rarely swings at anything, and even when a pitch floats through the middle of the strike zone, he’s more likely to let it go than take a cut. He combines David Ortiz’s propensity to leave the bat on his shoulder with something close to Salvador Perez’s inability to differentiate pitches as balls or strikes. In this way, he fails by being too patient*, as opposed to overly aggressive.

Based on these results, the definition of plate discipline seems more closely connected to the bend or slope of the line. It is this slope, which I’ll call “responsiveness,” that determines how a hitter changes his behavior in reaction to each pitch.

Here are the hitters with the worst responsiveness.

All but one hitter on this list suffers from a lower-than-average walk rate (Corey Dickerson; more on him in a moment). There are several youngsters here with questionable command of the strike zone, as well as, curiously, Will Venable and Ryan Braun. Despite their inclusion, the median hitter on this list is fairly young at 25.

Here are the hitters with the best strike responsiveness.

This list is fairly expected. All have above-average walk rates except for one: Kurt Suzuki. Some of these players have extreme walk rates and are known for their plate discipline, like Freddie Freeman and Carlos Santana. Some others have only slightly above average rates, but are known as patient, careful hitters, like Dustin Pedroia. These hitters with the most extreme responsiveness are a median 29 years of age. There appears to be a moderate concordance between age and responsiveness.

There’s an additional layer of complexity here that helps to explain some of the outliers on the above list. As I have written about before, not all hitters receive similarly difficult-to-hit pitches. The best hitters, and in particular those with the most power, tend to see pitches further from the zone, which are more likely to be called balls. A hitter may have a good walk rate while being less patient simply because he sees fewer strikes.

The above analysis doesn’t adjust for this fact. Hence you have hitters who might be innately patient, but don’t get many walks because most of their pitches are strikes, such as, perhaps, Kurt Suzuki (or at least the 2014 version of Suzuki). Conversely, a good hitter like Corey Dickerson doesn’t have to be very patient in order to rack up walks, because opposing pitchers are avoiding his strike zone as much as possible.

I can measure the tendency for a pitcher to avoid or attack the strike zone by taking the average strike probability of all of the pitches a hitter sees. If I then combine this information in a multiple linear regression with the average tendency of a hitter to swing (the intercept), and their responsiveness, I can explain about 50 percent of the variation in walk rates between hitters**. I think that this is not too shabby, since none of these predictor variables directly measures outcomes, instead using the more granular pitch-by-pitch information.

We can go much further with this probabilistic approach to plate discipline. Next time, I’ll try to unify the plate discipline statistics with count-based linear weights to get a full appraisal of the benefit of each player’s plate discipline. By accurately computing the probability of a strike, we can assign a fractional value to each decision to take or swing, contingent upon the likely result. For example, a hitter who takes a pitch in the dead center of the zone loses more value than one who fails to swing at a pitch on the edge.

There’s some complexity here, in that each hitter gets different values from his swings: Mike Trout is more likely to hammer a line drive than Stephen Drew, so Trout should be more aggressive in swinging at pitches of equal strike probability than Drew. But by breaking hitting down into the individual components and assigning a value to each on a per-pitch level, I’m hoping that we can develop a more nuanced view of hitter’s strengths and weaknesses. In developing a per-pitch metric of hitting (as opposed to a per-PA equivalent), we can harness the greater sample size of pitches relative to plate appearances, and hopefully get better true-talent estimates of hitting ability which stabilize faster and tell us more about each hitter.

*He also has a terrible problem with whiffs (34%!), but that’s a topic for another time.

**Tolerance > .5.