Big news everyone – Justin Upton is still on the trade block. He has been for about three years now, and he will continue to be until he is mercifully traded to a team that wants him more than Arizona seems to. With yesterday’s reports about Upton’s availability, more than a few people on Twitter asked me about Upton’s home/road splits, and whether or not we should expect him to regress significantly if he’s traded to a less hitter friendly ballpark. In case you haven’t seen them, Upton has enormous home/road splits, and it’s a career trend, not just a one year blip.

Split PA BB% K% ISO BABIP AVG OBP SLG wOBA wRC+ Home 1496 11% 22% 0.241 0.361 0.307 0.389 0.548 0.399 138 Away 1534 9% 24% 0.157 0.310 0.250 0.325 0.406 0.320 96

Over 3,000 plate appearances, Upton’s been one of the game’s best hitters while playing in Arizona, but a slightly below average hitter when playing anywhere else. Chase Field is one of the best hitters parks in all of baseball, and the numbers suggest that Upton has taken full advantage of the hitter’s paradise that he has called home for the last five years. In fact, if you look at the biggest home/road splits since 2008, Upton features prominently on the list.

This all makes sense. A Colorado hitter as at the top, a couple of Texas guys, right-handed doubles machine who played in Fenway, a a couple of power hitters in Chicago, and Upton are all among the 10 who have the biggest splits between their home and road performance. If you scroll to the bottom, you find Buster Posey, Chase Headley, Will Venable, and Adrian Gonzalez all among those who were hurt the most – a Giant and three Padres. Again, nothing surprising here. San Francisco and San Diego are notoriously pitcher friendly. While split data is often unreliable in small samples, over a five year period, you’re going to see things start to make sense. Justin Upton derives a benefit from playing in Arizona. Buster Posey is hurt by playing in San Francisco. None of this is news.

However, there can be a temptation to take split data like this at face value. After all, we’re dealing with over 1,000 plate appearances in both home and road data for most of these guys, so it doesn’t seem like small sample problems should exist. But they do, and while the lists above are interesting, you shouldn’t read too much into the specific numbers for the individual players, and you definitely shouldn’t treat a player’s road numbers as if they represent his park neutral true talent levels.

For starters, home field advantage is a real thing, and most players hit better at home than they do on the road. Last year, non-pitchers posted an aggregate .327 wOBA in their home parks and a .314 wOBA on the road. In 2011, it was .326/.315. In 2010, it was .335/.317. For the 714 players who have garnered at least 100 PA at both home and road over the last five years, the weighted average comes out to a 14 point wOBA advantage at home. Pretty much every player is better than his road performance alone suggests. Home field advantage is not solely an effect of the dimensions and weather, and hitters derive some benefit from playing in their home park even if it is not a hitter friendly park. It is entirely possible for the dimensions and weather to wipe out that effect, and then some, so that hitting at home is a net negative in some parks, but the negatives are smaller than the positives in large part due to the non-park related aspects of home field advantage.

If you’re more of a graphical person, here’s a visual representation of hitters home/road wOBAs over the last five years.

Second, we cannot pretend that “away” is the same thing for every hitter, nor is “away” an even playing time distribution in neutral parks. Upton plays in the NL West. Because of unbalanced schedule, his career road games have skewed heavily towards San Francisco, San Diego, Los Angeles, and Colorado; 45% of his career “away” plate appearances have come in those four parks. Maybe Colorado and San Diego cancel each other out to some degree, but that still leaves a big chunk of games in cooler weather west coast cities, and not surprisingly, Upton hasn’t hit well in either LA or San Francisco.

In fact, when you look at a hitter who plays in an extreme hitters park at home, and then you only look at his road stats, you’re almost certainly going to be looking at a collection of parks that skew to the pitcher side, because you’ve automatically removed one of the few remaining hitters parks from the sample. Buster Posey’s road numbers include both Colorado and Arizona, but not San Francisco. We should not be surprised that these numbers are better than those published by a guy whose road numbers swap out out a hitter’s paradise for a pitchers haven.

Pretty much any west coast hitter is going to be at a disadvantage in road stats compared to an east coast hitter, due to the unbalanced schedule and the summer climate of the two sides of the U.S. The west coast is much cooler, much less humid, and is home to many of the most extreme pitchers parks in baseball. A guy who plays in the AL or NL West is not going to play half his games in a collection of parks that grade out as average run environments. And, because MLB has put Colorado, Texas, and Arizona — teams that are not actually on the west coast, and play in very different environments than the teams near the water — in the western divisions, the drastic differences in parks within the western divisions helps drive even larger splits. For guys in Texas, Colorado, and Arizona, not only is their home park a great place to hit, but their collection of road parks are heavily slanted towards extreme pitchers parks.

Finally, there’s the simple reality of necessary regression. Even over multiple years, we’re still dealing with noisy data, and noisy data has to heavily regressed if it’s going to be used in a projection. We know the left/right platoon split is real, but we still regress left/right platoon splits more towards league average than a player’s individual split up to the 1,000/2,200 PA levels for left-handers and right-handers. Regression is just a fact of life when it comes to split data, and if you’re not heavily regressing splits, you’re probably using them incorrectly.

If you want to see regression as it relates to home/road splits, Tom Tango did a study on individual player park effects about 10 years ago, but we’ll dig it up here, because it’s still relevant today. He was studying the idea of “reverse platoon splits”, looking for guys who hit well in pitchers parks or pitched well in hitters parks, but the concept of necessary regression for home/road data remains the same. Read the whole thing, but I’ll quote a couple of the more pertinent paragraphs below.

As usual, I am going to hypothesize that a player’s historical splits are not very predictive of his future splits – therefore our best tool for predicting a player’s splits is his average home park factor applied to his home stats. In other words, I am suggesting that the regression rate for a player’s home/road splits is near 100% for a small sample and 80 or 90% (maybe more) for even a large sample. If I am right, then it is correct to simply park adjust a player’s home stats in the traditional way if we want to compare players on a level playing field, without worrying about the fact that any given player might be uniquely affected by his home park in ways that are not captured by that park’s average park factor. … When you do a study like this, the most telling statistics are the aggregate results of each group. If you look at each individual player’s OPS ratio in one year and then the other, you will be tempted to make conclusions one way or another about each individual player. That is what you were trying to avoid in the first place and why you want to look at as many “extreme” players as possible combined in order to get a large sample. Here are the composite results: In 2002, the players in the hitter’s parks who originally all had a “reverse” OPS ratio of a combined .91, had a combined OPS ratio of 1.02 the following year. The average OPS park factor for these parks was 1.04. The players in the pitchers parks who had a “reverse” combined OPS ratio of 1.14 in 2001, ended up with a combined OPS ratio of .89 in 2002. The average OPS park factor for these parks was .96. While further (and better) study, especially establishing a larger sample size, is needed to address this issue, my preliminary conclusion is that a player’s sample home/road ratio, at least for one year, is not at all a reliable predictor of his future home/road splits, and that in fact, the best predictor of a player’s home/road splits is the average multi-year park factor of his home park.

He was testing single year data, which is noisier than multi-year data, but the need to understand the fact that home/road data contains noise is still important. If you don’t, you’re forced to draw some really weird conclusions. For instance, did you know that Andre Ethier has a 58 point wOBA gap between his home and road numbers over the last five years? Dodger Stadium isn’t the absurd pitchers park that it used to be, but we still don’t think that Andre Ethier is a product of his fantastic home hitting environment, right? But, here we are, with over 3,000 plate appearances, and he has a .393 wOBA at home and a .335 wOBA on the road. Given what we know about their home parks, Ethier’s gap is actually bigger than Upton’s, as it translates into 47 points of wRC+. That’s the fourth largest wRC+ gap of any hitter over the last five years. For a hitter in Dodger Stadium, who gets to go to both Colorado and Arizona when he’s on the road.

You know who shows up near the bottom of the list when sorting by wRC+? Poor Adam Dunn, who has had to toil in the pitchers parks of Cincinnati, Washington, and Chicago. Oh, wait, none of those are pitchers parks. And yet, he’s posted a .347 wOBA at home and a .371 wOBA on the road over the last five seasons.

This is noise. This is why you regress, even large samples. And this is why you’re better off using something like wRC+, which takes known park factors into account, then you are using a player’s individual home/road splits. Or, better yet, use a projection system that also accounts for aging curves and park adjustments.

Whatever you do, though, don’t just look at a player’s road stats and assume that it’s a window into his real talent level, with the difference between his home and road stats being a mirage of the park he played in. That’s simply not how home/road splits work.