In the past three years I’ve done reviews of baseball projections systems with actual data for those systems for which I could get data. Will Larson maintains a valuable site of projections from many different sources, and most of the sources I’m comparing are from that.

As in the past, I’m computing root mean square error (RMSE) and mean absolute error (MAE) for each source compared to actual data. For these tests, I am doing a bias adjustment, so the errors are relative to the average of a source. I care more about how a system projects players relative to its own projected averages than about how well it projlects the overall league average.

I have data from these systems:

In addition, I’ve computed a source “All Consensus”, which is a simple average of each of the above (ignoring a source if it doesn’t project some particular category).

Not all the models had enough data to compute wOBA, so the tables (below the jump) only include those sources which do. The other sources do affect the All Consensus values for those stats where they do have data.

First, as an “apples-to-apples” comparison, I’m comparing only those players projected by each system (279 total):

Source Num Avg wOBA MAE RMSE Actual 279 0.3270 0.0000 0.0000 All Consensus 279 0.3387 0.0236 0.0303 Steamer 279 0.3354 0.0240 0.0307 Steamer/Razzball 279 0.3364 0.0242 0.0308 Zips 279 0.3386 0.0245 0.0309 RotoValue 279 0.3353 0.0242 0.0313 RV Pre-Australia 279 0.3351 0.0241 0.0313 CAIRO 279 0.3361 0.0247 0.0315 Davenport 279 0.3343 0.0243 0.0316 MORPS 279 0.3432 0.0250 0.0318 Marcel 279 0.3200 0.0247 0.0318 Oliver 279 0.3394 0.0250 0.0325 CBS 279 0.3410 0.0253 0.0326 Fans 279 0.3470 0.0257 0.0329 y2013 279 0.3374 0.0314 0.0427

The lowest errors came from the Consensus, so there may be some marginal improvement from averaging multiple sources. But the spread among systems was rather small. Steamer did best among actual systems, but they all did markedly better than my simple benchmark of 2013 data. ZiPS was second-best, followed by the two RotoValue models (yay!). Marcel remains quite competitive here, though, which shows that a basic model can still do quite well.

Next I’m rerunning the analysis using 20 points worse than league average wOBA for any player not projected, and now comparing the 643 players projected by at least 1 system:

Source MLB wOBA StdDev MAE RMSE Missing Actual 643 0.3186 0.0427 0.0000 0.0000 0 Steamer 643 0.3271 0.0260 0.0270 0.0364 21 Zips 643 0.3291 0.0274 0.0275 0.0365 49 Steamer/Razzball 643 0.3288 0.0251 0.0272 0.0366 166 All Consensus 643 0.3305 0.0250 0.0272 0.0366 2 Davenport 643 0.3264 0.0250 0.0276 0.0376 158 CAIRO 643 0.3267 0.0284 0.0283 0.0377 32 Oliver 643 0.3294 0.0315 0.0282 0.0379 25 MORPS 643 0.3353 0.0257 0.0285 0.0380 111 Fans 643 0.3403 0.0255 0.0285 0.0384 320 RV Pre-Australia 643 0.3285 0.0247 0.0285 0.0385 11 CBS 643 0.3339 0.0261 0.0287 0.0387 295 RotoValue 643 0.3288 0.0246 0.0286 0.0387 7 Marcel 643 0.3121 0.0234 0.0288 0.0387 104 y2013 643 0.3255 0.0474 0.0364 0.0503 137

The errors are a bit bigger, as this set includes more players, and those who will play less (and thus be less likely to perform close to their true talent). Steamer is again the best single system, this time edging out ZiPS slightly, and the Consensus now just behind Steamer/Razzball. Oliver, CBS, and Fangraphs Fans, which all lagged Marcel in the smaller set, now do better, as all systems now have lower errors than Tango’s monkey system. My model, however, dropped back relative to the other systems, which implies my projections for less strong players may be relatively weaker than other systems.

The spread between the best and worst system in RMSE is just 0.0023, even smaller than last year’s spread, while the gap from the weakest system to 2013 data is over 5 times as large. So using projections is better than simply relying on last year’s data. Steamer also came out on top in the comparison I did last year, but the spread between systems is smaller this time, so which projections you use matters far less than that you use projections.

Update: Rudy Gamble of Razzball.com asked if I could rerun the analysis for players with 500 or more PA. So here’s the table:

Source MLB wOBA StdDev MAE RMSE Missing Actual 149 0.3352 0.0316 0.0000 0.0000 0 All Consensus 149 0.3422 0.0221 0.0222 0.0276 0 Steamer/Razzball 149 0.3400 0.0241 0.0226 0.0280 0 Steamer 149 0.3391 0.0237 0.0227 0.0281 0 Davenport 149 0.3382 0.0229 0.0229 0.0284 2 Zips 149 0.3428 0.0240 0.0233 0.0287 0 MORPS 149 0.3470 0.0242 0.0229 0.0288 1 RV Pre-Australia 149 0.3387 0.0239 0.0229 0.0292 0 CAIRO 149 0.3401 0.0252 0.0237 0.0294 1 RotoValue 149 0.3388 0.0239 0.0232 0.0294 0 Marcel 149 0.3225 0.0230 0.0233 0.0302 2 CBS 149 0.3444 0.0268 0.0241 0.0305 4 Fans 149 0.3516 0.0256 0.0241 0.0307 6 Oliver 149 0.3441 0.0283 0.0241 0.0313 0 y2013 149 0.3429 0.0411 0.0307 0.0425 3

This is very much like the apples-to-apples table above, as very few systems didn’t have a projection. This is a set of smaller, and better, players, and the overall errors are lower, but the ordering remains about the same.