"If this ball was any juicier, it'd be a fruit" - Unknown

The General Idea

A few months ago, I was chatting with someone about the phenomenon of the juiced ball in baseball, and they said something I couldn't get out of my head: "If I was a GM, I would hate evaluating players after this season." It's totally possible that this ball is the new normal, and we absolutely should judge players based solely on their performance last year. But if it's not, then the results of 2019 can't be trusted. And then how do you evaluate players? That's what I set out to find out this past week.

This isn't an attempt to figure out how the different ball affected a specific pitch (i.e. Edwin Diaz struggling to throw a slider with this new ball). There are a ton of different factors in play there, and it's tricky to isolate the effect of a different baseball. But we can look at the outcomes if pitchers gave up the exact same kind of contact in 2019 with the ball from the 2015-2018 seasons. How, you may ask? By the miracle that is AI.

I built a neural network to predict whether or not a ball in play will be a home run. It's based on Google's tensorflow library and trained on statcast data from the 2015-2018 season, and therefore makes its predictions based on how the baseball travelled during those years. I'm feeding it exit velocity, launch angle, hit location, and home team (which serves as useful proxy for park dimensions). For those interested in the dataset, documentation can be found here. This doesn't account for everything; weather is a dominant factor that I don't have good data for, and the actual dimensions of parks would be very useful. But all things considered, it's effective: when I compare the predictions it makes for events from 2015 to 2018 with the actual outcomes, it has an average accuracy of 97.8%.





A Brief Note on Neural Networks

I'm sure most of you have heard about neural networks in some way, shape, or form; most things that I've seen haven't really given a particularly good explanation on their functionality. Given that it's a fundamental part of this study, I figured it might make sense to touch on how they work; just skip ahead if this isn't your cup of tea.

A neural network is a kind of predictive model; the general idea behind a predictive model is fairly straightforward. You specify its structure, throw a bunch of data at it, and then it can make predictions based on trends that it sees in the data. That data is split up into the dependent variable (which the model tries to predict), and the independent variables (which the model looks for trends in to make its prediction). The most easily visualized predictive model is a decision tree; they look like this:

Note: artist's rendering

Neural networks are different; I can't look at my model and see exactly how it comes to a decision. In fact, the field of study determining how neural networks arrive at their decisions is one of the hot topics in academia. But the good news for us is that neural networks are impressively good at pattern recognition, even if we don't know how they do it; as a result, they're way better at predicting things than most other models.

The important thing to take away from this is that a model's structure will change to fit the data you give it. And that structure is responsible for how the model will predict things. In this case, the model is fitted to data from 2015-2018; it's predictions are based on how the baseball traveled during those seasons. That means that when I ask it to make a prediction on data from 2019, it will essentially look at how a baseball from 2015-2018 would travel in those exact same conditions.

I assess the effectiveness of the model by splitting the data into groups: one group to train the model with, and then another group to test the model with. Testing just means I compare the predicted output to the actual output. You'll note that I don't test with data that I've trained with; the model has already seen that data and would theoretically be perfect. You want to test with data that's new to the model. And to prevent any outlying results, I trained and tested 5 different models on the same data (from 2015-2018) to get an accuracy of 97.8%

Back to Baseball

So we now have a tool that predicts if a batted ball would be a home run using the ball from 2015-2018. The predictions have a really high accuracy (97.8%) when compared with the results during the 2015-2018 seasons, and if the ball was actually juiced in 2019, we should see a drop in our accuracy when comparing the predictions for the 2019 season to the results. And lo and behold, we do! From 97.8% to 96.6%. That is . . . not what I was expecting.

But I am nothing if not stubborn to the point that it's seriously detrimental to my well-being. If we were to randomly guess if any given ball in play was a home run or not, the probability of us being right is 50%. But anyone who has watched a game of baseball knows that if I were to guess that every ball in play is a home run, I wouldn't be right 50% of the time. Simply put, most balls in play aren't home runs, and the neural network is good at predicting those more routine balls in play correctly. Fortunately, we can look at the confusion matrix, which looks at predicted results vs. actual results, to get a better idea of what's going on. Apologies in advance, as the html embedded tables here don't like me; here's the confusion matrix when this model is run on all 2019 data:



Actually Not a HR Actually a HR Predicted No HR 53011 1431 Predicted HR 660 5937

So a few things jump out right off the bat here:

The bulk of our accuracy score comes from being able to predict plays that aren't HRs. There are cases where our neural network predicts a HR and it turns out to be wrong. Some of these are probably due to weather or park factors, but this is also an inexact science. The delta here is still that approximately 770 more balls that were predicted to stay in the park left. When it came to predicting plays where the outcome was a HR, our model had an 80.57% accuracy.

The Victims

So who got hit the hardest by the juiced ball according to this study? I recalculated hr/fb ratio based on the predictions and took the delta with a player's actual hr/fb rate. Because small sample sizes are a thing, I set the minimum number of fly balls to be 20. Bearing that in mind, here are our big winners:

Player HR/FB differential Additional HRs allowed Tyler Chatwood +13.63% 3 Oliver Drake +11.11% 4 Adam Morgan +10% 2 Fernando Rodney +9.09% 2 Heath Fillmyer +9.09% 2 Nathan Eovaldi +7.84% 5 Joe Jimenez +7.69% 3 Tommy Kahnle +7.14% 2 Andrew Kittredge +7.14% 2 Elvis Luciano +6.89% 2

Notable Mets on the list include Walter Lockett (+5% HR/FB, tied with Matt Harvey), Edwin Diaz (+2.6% HR/FB, a hair above Gerrit Cole), Jacob deGrom (+1.83% HR/FB), and Noah Syndergaard (+0% HR/FB). I know Diaz's name gets bandied about in this debate given his absurd spike in HR/FB from 10.4% in 2018 to 26.8% in 2019. But he gave up a lot of hard contact last year, and our neural network makes predictions based on the quality of contact. This tells us that, while he may have had a harder time throwing his slider due to the ball, the juiced ball did not dramatically change the outcome of balls put into play against him.

There's definitely more to unpack here; I'd love to see what home runs the neural network predicted wouldn't leave the park, and it would be fascinating to analyze the trends behind the most victimized pitchers. But that sounds like a lot of work, and I'm tired, so I think I'll go to bed instead.