As a Twitter dork, I’m exposed to a lot of discussion about swinging strike rates (SwStr%), so much so that it almost feels like it has supplanted xFIP (or other comparable metrics) as a catch-all way to evaluate pitchers. Dude has a 12.5% whiff rate! Sweet. It’s not for naught — swinging strike rate bears a strong correlation to strikeout rate (K%), which comprises substantial portions of the regression equations that underpin the aforementioned xFIP and its counterparts. Swinging strike rate’s correlation to the following metrics (using data from the last five years of 714 pitchers who threw at least 100 innings in a given season):

K%: r = 0.83

r = 0.83 SIERA*: r = 0.61

r = 0.61 xFIP*: r = 0.55

r = 0.55 FIP: r = 0.50

r = 0.50 ERA: r = 0.40

(*See footnote.)

It also correlates strongly year over year (among 392 player-seasons during the same timeframe in which the pitcher threw 100 innings in the current and subsequent seasons):

SwStr% t+1 : r = 0.74

So there’s sufficient reason to like SwStr% as a go-to, especially when trying to identify pitchers whose strikeout rates may regress in the direction implied by their swinging strike rates. (The same could be said for hitters and plate discipline.)

Problem is, I see a lot of discussion about whiff rates early in the season. Given small sample sizes, the community generally understands it’s difficult to evaluate a player reliably based on his outputs across 13 innings or 40 plate appearances. Accordingly, we look to peripherals — essentially, the inputs to the outputs, which, we would hope, offer more granularity and help isolate a signal in the noise. At this point in the season — two weeks in — I’m not sure that’s the case. Moreover, it has become increasingly clear to me we don’t have universally understood benchmarks for what constitutes an effective pitch within its own pitch type.

Reliability

A year and a half ago, Jonah Pemstein and Sean Dolinar published a long-needed update on reliability. In it, they calculate Cronbach’s alpha, a statistical representation of reliability (similar to how the Pearson correlation coefficient, r, cited above, singularly describes how strongly two variables correlate with each other), for a great many metrics given a great many denominators. I can see how long it takes for weighted on-base average (wOBA) to become reliable on strictly ground balls. Things of that nature.

Accordingly, you can assess the reliability of swinging strike rates for pitchers not only overall but also by pitch type. I hate to break it to you: a pitcher’s swinging strike rate doesn’t become acceptably reliable until after 400 pitches or so (four or five starts, or, like, a third of a season for relievers) and doesn’t see diminishing marginal returns until 700 to 800 pitches (seven to nine starts, or two-thirds of a season for the healthiest workhouse relievers). The aggregation of several pitch types contributes to a noisier measure of reliability for overall whiff rates; each individual pitch type becomes reliable more quickly but each at different “speeds,” so to speak:

Fastballs: ~220 pitches

~220 pitches Sinkers: ~360 pitches

~360 pitches Change-ups: ~130 pitches

~130 pitches Curves: ~180 pitches

~180 pitches Sliders: ~230 pitches

As of this morning, 28 pitchers have broached the 300-pitch threshold (only three are at 400+), and I doubt any of them have thrown 70% fastballs, 60% curves, 80% sliders, or 120% sinkers. In other words, we’re still one or two starts away from seeing anything truly meaningful in pitching stats. That’s not to say there aren’t meaningful patterns forming before eyes — patterns that will exist well beyond the threshold of reliability. The existence of the pattern within a metric before said metric has become acceptably reliable does not invalidate the concept of reliability. It is to say, however, that not patterns that look meaningful now will be meaningful in three or four more starts. (That’s, like, the critical axiom to small samples, man.)

What am I getting at here? Maybe Patrick Corbin‘s slider, even though it was already one of the baseball’s filthiest standalone pitches, will not continue accruing swinging strikes at a 32% clip. Maybe it’ll regress closer to his career average slider whiff rate of 24%. That wouldn’t be so bad! It’d still be really good. But it wouldn’t be elite, and it’s incredibly unlikely, historically speaking, that he will sustain a 30+% whiff rate on his slider through June, let alone for an entire season. Shohei Ohtani is the closest thing Major League Baseball has to a unicorn, but even he is unlikely to sustain a 45% whiff rate on his splitter — it’d be orders of magnitude better than any other pitch thrown in the last decade. It’s possible, but considering baseball’s best contemporary pitches — Clayton Kershaw‘s curve, Corey Kluber’s slider, etc. — it’s hard to believe Ohtani could outperform them by such an improbably large margin.

As for evaluating relievers: good luck.

Benchmarking

Not all pitches are created equal. This is true, obviously, when comparing apples (fastballs) to oranges (sliders). But it’s increasingly evident when comparing apples (fastballs) to other apples (other fastballs). Few other curves hold a candle to Kershaw’s; same, among sliders relative to Kluber’s. There’s variance not only by pitch type (apples to oranges) but also within pitch types (apples to apples) such that it seems shortsighted (to me) to assess whiff rates (or batted ball frequencies) by pitch type in a vacuum. Context is important, especially in understanding a pitcher’s capacity to improve by way of changes to his pitch mix.

The following table summarizes, for the last three years, by pitch type, the league-average whiff rate,

Whiff Rate (SwStr%) by Pitch Type Pitch Type 2015 2016 2017 Splitter 17% 18% 19% Slider 17% 17% 17% Change 16% 16% 15% Curve 13% 12% 13% Cutter 10% 11% 12% Knuckleball 10% 12% 11% Fourseam 9% 9% 9% Sinker 5% 6% 6% SOURCE: PITCHf/x These whiff rates span all starters and relievers. When parsed, the league-average rate for each pitch would be slightly lower for starters and slightly higher for relievers.

ground ball rate,

Ground Ball Rate (GB%) by Pitch Type Pitch Type 2015 2016 2017 Sinker 56% 55% 54% Splitter 56% 54% 53% Curve 52% 50% 50% Change 50% 50% 49% Knuckleball 45% 43% 48% Cutter 46% 46% 45% Slider 47% 45% 44% Fourseam 36% 36% 36% SOURCE: PITCHf/x

and isolated power:

Isolated Power (ISO) by Pitch Type Pitch Type 2015 2016 2017 Curve 0.117 0.127 0.138 Slider 0.124 0.136 0.146 Splitter 0.131 0.139 0.151 Change 0.145 0.154 0.162 Sinker 0.142 0.158 0.169 Cutter 0.152 0.163 0.171 Fourseam 0.178 0.190 0.198 Knuckleball 0.161 0.153 0.207 SOURCE: PITCHf/x

Here are all three tables summarized for each pitch exclusively for 2017:

Pitch Type Benchmarks, 2017 Pitch Type SwStr% GB% ISO Splitter 19% 53% 0.151 Slider 17% 44% 0.146 Change 15% 49% 0.162 Curve 13% 50% 0.138 Cutter 12% 45% 0.171 Knuckleball 11% 48% 0.207 Fourseam 9% 36% 0.198 Sinker 6% 54% 0.169 SOURCE: PITCHf/x

(Click on column headers to sort at your pleasure and/or leisure.)

(Also, now you can see why a really filthy splitty out of Ohtani is a big deal. Although, again, there’s no way he sustains its current astronomical whiff and ground ball rates. Doesn’t make it any less arousing, though.)

All this benchmarking is a totally nitpicky and personal thing. Completely taxonomic. If a pitcher achieves an above-average whiff rate overall, does it matter how his pitches grade out within their own subtypes? Probably most fantasy baseball enthusiasts don’t give a damn. Matters to me, though (guess I’m annoying). I’m a little exhausted reading analysts boasting about double-digit swinging strike rates for off-speed and/or breaking pitches when such a feat is not only common but should be expected. I hope this helps us refine the adjectives we use to describe pitchers’ offerings and sharpen our sense of what constitutes an effective pitch within a particular pitch category.

As an aside: in a vacuum, it’s abundantly clear pitchers should throw greater and greater percentages of non-fastball offerings. That said, the sport of baseball doesn’t exist in a vacuum. The effectiveness of one pitch may play off the success (and composition, by measure of velocity and movement) of other pitches in an arsenal. And that’s before factoring in health and mechanical concerns. Still, considering these wrinkles, it would behoove most (emphasis on most) pitchers to try developing sliders as well as altogether abandoning their sinkers.

* * *

Footnote, from a soapbox: I have relied on xFIP for a long time as a quick-and-dirty evaluator of pitcher talent. It reached a breaking point this weekend when I was forced (by our Jeff Zimmerman!) to reconcile the fact that xFIP, while adequately accounting for the volatility in HR/FB rates (and, thus, being objectively superior to FIP), colossally fails to account for every other type of batted ball. And non-homer fly balls are way more damaging than ground balls. Such is the nature of cognitive dissonance — it’s sometimes difficult to even realize you’re experiencing it and even more difficult to combat it. I was somewhere in between, which makes my violation of sabermetric law, in my opinion, less forgivable than pure ignorance.

Part of the cognitive dissonance stemmed from not fully understanding SIERA. This is my own damn fault. I sat down this weekend to really read about it. Its combination of interactions and nonlinearity is intelligently specified (which, given Matt Swartz created it, is not surprising). I’m not sure I love the idea of “net ground balls” (ground balls minus fly balls) in lieu of just a ground ball rate, the latter of which would inherently imply a pitcher’s (1) average launch angle allowed, (2) rate of home runs allowed, and (3) amount of damage allowed by means of isolated power (ISO). However, I’m flagrantly not trying to re-specify the model (yet), so Swartz’s SIERA stands as superior to my nonexistent metric.

Anyway, welcome to my small epiphany. If someone tells you FIP is superior to xFIP, they’re wrong. (I greatly respect Jeff and agree with him almost always, but I disagree on his interpretation of why xFIP fails — my own interpretation for which is described two paragraphs ago. Also, there’s a reason xFIP correlates more strongly with ERA.) That said, there’s really no reason not to use SIERA in lieu of either iteration of FIP (again, as evidenced by its even-stronger correlation with ERA).

This aside is not meant as disrespect to or disregard for Baseball Prospectus’ DRA, either. It seems very involved and well-executed, but I am generally reluctant to trust something deliberately opaque. I like to know exactly, rather than generally, how an equation works — a luxury and spoil of working in the public sphere of sabermetric research.