Do Spring Training Records Mean Anything?

By Jesse Radin

As a Giants fan, right now, I am hoping the answer is yes. But right away, without any kind of research or information, there is a cautionary tale of the 2005 spring training. I watched, excited, as the Giants went 20-12 that spring. But then the news of Barry Bonds injury hit the team, and the Giants were below .500 for the first time since 1996. So it is clear that we need to be cautious before being hopeful (or pessimistic) about a teams spring training records.

There are several factors that are at play here. First and foremost, teams tend to play their major league players more often in the latter part of spring training, so teams that end spring training on a high note would be more likely to succeed in the regular season. Secondly, the difference between runs scored and runs allowed tends to be a more accurate predictor of success, especially with so few games to be played.

Model Summary Model R R Square Adjusted R Square Std. Error of the Estimate 1 .208a .043 .039 .068075 a. Predictors: (Constant), Spring WPCT

Initially, while only comparing spring training winning percentage to regular season winning percentage, there appears to be a weak correlation. However, this correlation only accounts for 4.3% of the variation in regular season winning percentages. Since we also have the variables for runs scored and runs allowed for the regular season in our model, this is not a indication that spring training records have no real predictive factor. It would be unfair to compare them to the runs scored and allowed in the regular season, because those have an extremely high correlation with winning percentages.

Coefficientsa Model Unstandardized Coefficients Standardized Coefficients t Sig. B Std. Error Beta 1 (Constant) .426 .025 17.241 .000 Spring WPCT .141 .046 .208 3.071 .002 a. Dependent Variable: Reg Pct

However, despite a weak correlation, with just spring training winning percentages and regular season winning percentages, there appears to be a significant relationship. I will now add the difference between runs scored and runs allowed in spring training games to see how that changes the relationship between the variables. It is my belief that DIFF is more important than winning percentage in predicting regular season records.

Model Summary Model R R Square Adjusted R Square Std. Error of the Estimate 1 .242a .058 .049 .067698 a. Predictors: (Constant), Difference, Spring WPCT

Coefficientsa Model Unstandardized Coefficients Standardized Coefficients t Sig. B Std. Error Beta 1 (Constant) .484 .040 12.084 .000 Spring WPCT .031 .076 .046 .416 .678 Difference .000 .000 .203 1.822 .070 a. Dependent Variable: Reg Pct

Immediately, we can tell that I was correct in my calculation. No longer is spring winning percentage significant at all. DIFF is significant, but only at the P < 0.1 level. We are now accounting for 5.8% of the variation in regular season winning percentage. Another interesting calculation would be to compare spring DIFF to regular season DIFF, because that may lead to more significant results than a winning percentage that has been derived from around 35 games.

Model Summary Model R R Square Adjusted R Square Std. Error of the Estimate 1 .209a .043 .039 101.60093 a. Predictors: (Constant), Difference

Interestingly enough, there is not much difference between these numbers and the first numbers we derived from comparing the two winning percentages. Thus, it would make sense to assume that all four variables are tied together.

Coefficientsa Model Unstandardized Coefficients Standardized Coefficients t Sig. B Std. Error Beta 1 (Constant) -.101 7.014 -.014 .988 Difference .741 .241 .209 3.075 .002 a. Dependent Variable: Reg Difference

Again, the Beta is nearly the same as the first calculation. I will now add the spring winning percentage into this calculation to see if it yields different results than our second regression did.

Model Summary Model R R Square Adjusted R Square Std. Error of the Estimate 1 .214a .046 .036 101.72719 a. Predictors: (Constant), Spring WPCT, Difference

In this case, the 4.6% of RDIFFs variation being explained is lower than the second regressions 5.8% of RWPCT.

Coefficientsa Model Unstandardized Coefficients Standardized Coefficients t Sig. B Std. Error Beta 1 (Constant) -41.676 60.172 -.693 .489 Difference .520 .399 .146 1.302 .194 Spring WPCT 79.056 113.638 .078 .696 .487 a. Dependent Variable: Reg Difference

This calculation shows that neither DIFF or spring WPCT have significance when calculating the RDIFF. Thus, this either shows the lack of predictive powers of spring training performance, or that RDIFF does not account for something that winning percentage does. Overall, it can be concluded that there is some value in the run scored/allowed differential in spring training, but not in spring training winning percentages.

For comparisons sake, how do real runs scored and allowed correlate with winning percentage?

Model Summary Model R R Square Adjusted R Square Std. Error of the Estimate 1 .925a .856 .855 .026476 a. Predictors: (Constant), Reg Runs Scored, Reg Runs Allowed

This shows that 85.6% of WPCTs variation can be explained by the number of runs teams score and allow.

Coefficientsa Model Unstandardized Coefficients Standardized Coefficients t Sig. B Std. Error Beta 1 (Constant) .513 .018 28.839 .000 Reg Runs Allowed .000 .000 -.869 -30.277 .000 Reg Runs Scored .001 .000 .811 28.240 .000 a. Dependent Variable: Reg Pct

The Beta shows that pitching is slightly more important than hitting, but both are very important. It also shows that a team with equal runs scored and allowed would have a .513 winning percentage. The most important question now is what happens if we add the spring training WPCT and DIFF into this model?

Model Summary Model R R Square Adjusted R Square Std. Error of the Estimate 1 .927a .859 .856 .026362 a. Predictors: (Constant), Spring WPCT, Reg Runs Allowed, Reg Runs Scored, Difference

This shows that there is very little change in the predictive value. It only now predicts 85.9% of the variation of winning percentage, but the punishment for having even more variables (Adjusted R Square) removes that and brings us back to 85.6%.

Coefficientsa Model Unstandardized Coefficients Standardized Coefficients t Sig. B Std. Error Beta 1 (Constant) .522 .023 22.407 .000 Reg Runs Allowed .000 .000 -.861 -29.695 .000 Reg Runs Scored .001 .000 .802 27.496 .000 Difference .000 .000 .069 1.591 .113 Spring WPCT -.017 .029 -.024 -.562 .575 a. Dependent Variable: Reg Pct

This shows that there is essentially no relationship between spring winning percentage and regular season winning percentage. However, even though DIFF is not significant, the .113 shows that it is nearly significant, just missing the P < 0.1 value. Therefore, since we do not know the runs scored and allowed for 2010 teams, DIFF is one of the better predictors that we have, given our limited information.

A cautionary note, however. I noticed that the ESPN winning percentages of teams include games not calculated in the standings. It includes games played against minor league teams. That may be one cause of the lack of significance between spring training records and regular season records. The Giants are 16-6 in games that count in the standings, and the team behind them (Indians) are 12-6. However, the Indians have a winning percentage of .722 and the Giants have a winning percentage of .727. This may have the effect of weakening all the correlations, especially if DIFF includes the games against minor league teams.

There is also this graph of spring training percentages compared to regular season percentages.

This is slightly encouraging for the Giants. It appears that the slight significance is caused more by correlation at the high end. It seems that teams that win 70% or more of their games in spring training tend to have a slight advantage over the others, but since there are only around 10 data points at that level, it is not too significant. However, the general predictions for the Giants Im seeing (around 80-90) wins fit in with where they would be based on their current spring training record. However, nothing is set and stone, and a Double Six Dollar Burger caused injury to Lard Lad Sandoval would drop the Giants out of contention.