With the CFB semi-final tomorrow, I’m going walk through a statistical algorithm to evaluate the quality of college football teams. In a later post, I’ll look at the BCS and CFB playoff systems, and evaluate their effectiveness. But today, I’m just going to apply these models to make predictions for this year, and do a little exploration into some of the strongest rated teams of the last 14 years. For data, I scraped the outcomes of every game since 2002 from ESPN.com (the scraper and cleaning took a bit of work; contact me and I’m happy to share). This contained every game for Division 1A and 1AA teams, including some against teams in lower divisions. I didn’t differentiate between lower division teams, but rather assumed that they will all be of similar relative strength.

In order to tackle a rating system, we are going to make a few assumptions:

The only factors in determining the winner of a given game are The team qualities of the two teams The location of the game (Home, Away, or Neutral) A random component

Team qualities do not vary over the course of a season (i.e. we will not include recency bias or account for injuries)

Team quality is transitive and absolute (i.e. we will not account for a rock-paper-scissors-like “matchup” effect)

I created a few different types of models using these tenants; each are described below.

Win Based Model:

First, I’m going to try a method which accounts only for the win/loss outcomes of every games. Imagine each team has an underlying team quality which can be quantified which represents the true win percentage of the team were they to play an infinite number of games against a full cross section of every team in the league. The probability of team A beating team B can be expressed fairly easily

Using this equation, we can find a p for each team which optimally explains the outcomes of all the games we saw in the season. Effectively, these team qualities can give us an adjusted winning percentage which accounts for the strength of your schedule and of your opponents’ schedules and so forth. I’ll solve this optimization using a mixed effects model. It’s a lot like a regression where the coefficients we are solving for are the team strengths. Also, although I showed a simplified equation above, the model I crunch accounted for location of the game. A home team hosting a team of the same rating would have a roughly 70% chance of winning.

According to this model, here are the top teams of all time heading into their bowl games. Of course, this is strictly evaluating each team’s dominance over the field of opponents in that season. It is impossible to statistically evaluate an evolving talent pool across season:

Team Season Record O Win % O-O Win % Model Expected Prob. of Beating Average Team Bowl Outcome (Opponent) LSU 2011 13-0 65% 59% 95% L (Alabama) Alabama 2009 13-0 62% 58% 95% W (Texas) Alabama 2016 13-0 65% 59% 94% – Auburn 2010 13-0 64% 58% 94% W (Oregon) Texas 2005 12-0 59% 60% 94% W (USC)

This model seems to have pretty reasonable results. All of these teams are undefeated, and all played in a national championship. Small wonder that LSU comes out on top for this ranking. They were undefeated through a schedule which included 8 ranked teams and 3 top 3 teams (at the time of play). Shockingly, however, they were upset by an Alabama team whom the model only gave a 35% chance to win.

The problems with using only Wins is obvious: it doesn’t capture any nuance. When a 3 point win is counted the same as a 40 point win, we have no reliable way of comparing undefeated teams to each other, except to say that one had a tougher schedule.

Point Based Model:

Imagine now that the strength for each team, instead of measuring win probability, is measuring its point advantage over the average team. Add a random component to explain the lack of consistency for teams, and each game could then just be seen as just an equation. For example, if Stanford beats USC by 10 points:



Where ϵ is a random noise variable. This starts to look a lot like a traditional regression problem. Given enough data points, we can start to solve for the strength value of any team. I’ll use a similar mixed effects model, but with a new target of point differentials, to evaluate these ratings. Once again, location was included; a home team gets an advantage of about 8 points over a similar visiting team. Top 5 teams of all time are below:

Team Season Record Ave. Pt Diff O Pt. Diff O-O Pt. Diff Expect Pts Over Average Opponent Bowl Outcome Florida State 2013 13-0 42 3 5 44 W (Auburn) Texas 2005 12-0 36 4 6 40 W (USC) Kansas State 2002 10-2 35 4 6 38 W (AZ St.) Alabama 2016 13-0 29 10 6 38 – Florida 2008 12-1 32 7 5 38 W (Oklahoma)

In a lot of ways, this model actually performs a lot better since it has more detailed information. In predicting the outcome of bowl games, this model is 63% accurate, as opposed to the 55% achieved by the win only model. In that same Alabama-LSU matchup in 2011, for example, LSU was only favored by 0.75 points. However, we also get some counter-intuitive results. Kansas State in 2002 lost two games (including their only ranked opponent), and didn’t even play in the national championship. But they blew out the rest of the competition by so much that it made up for the losses. This year’s Alabama comes out ranked 3rd all-time in both models, which looks great for their chances, but is by no means a lock (more on that later). Let’s examine them a little closer in the context of all teams in the last 14 years.

There were a total of 31 undefeated teams since 2012 heading into post season. Some played in Division 1AA and clearly aren’t contenders for top all time. However four teams clearly are set apart, Alabama in 2016, Florida in 2013, Texas in 2005, and LSU in 2011. Your preference between them is just based on which you value. Is it a clearer picture of strength to consistently beat tough opponents, or to win by large margins? However, I think there is a way to split the difference…

Pythagorean Expectation:

There are many ways to go about this, but I used a win margin measure based on a method called Pythagorean Expectation, defined as:

Below shows a comparison of different values of a. A value of 1 gives a straight line; ultimately I chose a value around 2, as it yielded the best statistical accuracy.

I also added a smoothing parameter which will discount a large win margin in low scoring. The idea is that we can add 7 points to the score of each team to shrink the win margin back towards 50% by a small amount, correcting for a ‘small sample size’ in low scoring games. If, for example, a team won 10-3 (win margin of .97), instead we will treat it instead as a 17-10 win (win margin of .83). This will have a smaller effect in high scoring games; a 45-40 win would have a win margin of .58 instead of .59. The concept comes from Additive Smoothing, which Pierre Laplace invented to evaluate the probability of the sun rising tomorrow.

The result for this model isn’t as easy to understand, but below I show the comparative top 5 all-time teams amongst all the models. Alabama in 2016 is the only team to appear in all 3 lists.

Rank Pythagorean Model Win Model Points Model 1 LSU2011 LSU2011 Florida State2013 2 Florida State2013 Alabama2009 Texas2005 3 Alabama2016 Alabama2016 Kansas State2002 4 Florida2008 Auburn2010 Alabama2016 5 Alabama2011 Texas2005 Florida2008

Computer Rankings are Over-Confident for Inter-Conference Games

The problem is that this model underestimates its error when it comes to relative values between conferences. This is because the number of effective samples is so small. Take Washington, for example. Their rating is heavily based on the estimated rating of their opponents, primarily in the PAC 12. It is, therefore, a great poverty of information that the PAC 12 has only played the ACC once, the Big Ten 4 times, and the SEC twice. The relative strength of the entire ACC compared to the PAC 12 rides quite heavily on a game between Oregon and Virginia, two non-players.

Games Between Major Conferences:

Conference Wins Losses Vs. ACC 6 3 SEC ACC 3 1 Big Ten Big Ten 3 1 PAC 12 Big Ten 1 0 SEC ACC 0 1 PAC 12 PAC 12 0 2 SEC

To overcome this poverty, it’s worth digging in a bit more into common opponents. To visualize the connectedness among teams, I formulated network graph which shows which teams played each other during the 2016 regular season. The nodes are sized by team rating.

Clearly the teams are much more tightly interconnected within a conference than across conferences. Using this graph, I constructed a method to determine “effective samples.” Essentially, I look for the shortest path between teams, and give it weight based on its length. A game between the two teams directly counts as 1, a path of length 2 (common opponent) counts as ½, a path of length three counts as ¼ and so one (the 1/2 was chosen empirically, based on the amount of noise in the system). Once I count a path, I remove all nodes used along the path, so as not to re-use specific games. Then, I repeat the process until I’ve added up all paths of any length between the teams. Effectively, the closer two teams are in the network, the more samples we will have to determine their relative strength.

Let’s dig a bit further into some predictions for the two upcoming playoff semi-final games to get an idea of confidence and an alternate formulation of relative team strength.

Who is Going to Win the Playoff?

Alabama and Washington

The win only model gives Alabama a 66% chance of winning, the points model gives Alabama a 7 point edge, or about a 69% chance of winning. Below are stats for their connectedness

1 common opponent (path of length 2) Washington Lost to USC by 13, who in turn lost to Alabama by 46.

No paths of length 3, besides the ones through USC

There were about 75% as many “effective samples” as one game directly between the two teams

Ohio State and Clemson

The win only model gives Ohio State a 52% chance of winning, the points model gives Ohio State a 7 point edge, or about a 70% chance of winning. However, the latter model is likely over confident as it has very little relevant data to estimate the relative ratings of the two teams.

No common opponents

2 paths of length 3 Ohio State beat Indiana by 21, who lost to Wake Forest by 5, who lost to Clemson by 22. Ohio State lost to Penn St. by 3, who lost to Pittsburgh by 3, who beat Clemson by 1.

There are about 77% as many effective samples as one game between the two teams

I also created an adaptation of the effective sample method which looks at point differentials between teams, essentially creating a relative strength measure between two teams which gives more weight to local information as opposed to the global information used in the primary models. This model is also tentative about Ohio State, giving them a 53% chance of victory. However, this model is much more confident in Alabama, bumping their odds to 90% and 19 point expected margin.

In general, it looks like Ohio State-Clemson is going to be a close call. Alabama’s chances look pretty good by any available measure, but don’t go running to the bookies just yet: with sample sizes this small, nothing is a sure thing!

Addendum: What did we Learn from the first round of CFBP Games

Before the CFB championship, I wanted to post an update showing how all of the teams shifted according to my models after the bowl games. As I discussed in my last post, the sample size between conferences is tiny, so the bowl games (all of which are inter-conference and against good, well matched teams) have given us a ton of new information. Recall that I used two primary models, one which only accounts for wins, and the other which accounts for point margins. Below, I show how much each team’s ratings shifted according to each model.

Keep in mind that a lot of this shift occurs because we have a better idea of the relative strength of conferences. Take Clemson for example: they improved drastically due to their routing of Ohio State. But they also benefitted from a strong performance by ACC teams across the board, so their strength of schedule improved bumping their rating up further. Here’s how each of the conferences fared in the bowl games based on the average change in rating across all the teams in the conference (not just the teams involved in a bowl game).

Final Look at the Championship Game

Alabama gained some ground with their routing of Washington, but not nearly as much improvement as Clemson. Recall that I used network statistics to create a more local weighted “relative power” between two teams, and to calculate effective sample size we have when comparing two teams. The information from the bowl games bumped our effective samples comparing Alabama and Clemson from 1.1 to 1.3, and Clemson’s chances at winning have improved according to every model I created.

Probability of Alabama beating Clemson

Model Before Bowl Games After Bowl Games Win Only 63% 61% Points 75% 69% Relative Power 62% 57%

The raw point spread moved from Alabama -9 to Alabama -7. This time, I’ll also make a bolder prediction; using a more detailed model pitting each offense against each defense, I came up with a predicted score of 30-23. This looks to be in line with Vegas who opened at 7 points to Alabama and a point total of 54.5. Let’s see what happens Monday!

edit: I decided to add a betting table. This is the output of 100,000 simulations for the game and shows the odds of every score for the end of regulation. The most likely individual score according to this table is 14-7 Alabama at around 200:1. Anywhere in white or not shown has odds of greater than 1,000:1 (oddly enough, this includes my average predicted score of 30-23). According to this same model, the odds of the Alabama-Washington outcome was a relatively likely 200:1 while the Clemson-Ohio State outcome was an astronomical 50,000:1

Addendum: What did We Learn from Round 1 of the CFBP?

Before the CFB championship, I wanted to post an update showing how all of the teams shifted according to my models after the bowl games. As I discussed in my last post, the sample size between conferences is tiny, so the bowl games (all of which are inter-conference and against good, well matched teams) have given us a ton of new information. Recall that I used two primary models, one which only accounts for wins, and the other which accounts for point margins. Below, I show how much each team’s ratings shifted according to each model.

Keep in mind that a lot of this shift occurs because we have a better idea of the relative strength of conferences. Take Clemson for example: they improved drastically due to their routing of Ohio State. But they also benefitted from a strong performance by ACC teams across the board, so their strength of schedule improved bumping their rating up further. Here’s how each of the conferences fared in the bowl games based on the average change in rating across all the teams in the conference (not just the teams involved in a bowl game).

Final Look at the Championship Game

Alabama gained some ground with their routing of Washington, but not nearly as much improvement as Clemson. Recall that I used network statistics to create a more local weighted “relative power” between two teams, and to calculate effective sample size we have when comparing two teams. The information from the bowl games bumped our effective samples comparing Alabama and Clemson from 1.1 to 1.3, and Clemson’s chances at winning have improved according to every model I created.

Probability of Alabama beating Clemson

Model Before Bowl Games After Bowl Games Win Only 63% 61% Points 75% 69% Relative Power 62% 57%

The raw point spread moved from Alabama -9 to Alabama -7. This time, I’ll also make a bolder prediction; using a more detailed model pitting each offense against each defense, I came up with a predicted score of 30-23. This looks to be in line with Vegas who opened at 7 points to Alabama and a point total of 54.5. Let’s see what happens Monday!

edit: I decided to add a betting table. This is the output of 100,000 simulations for the game and shows the odds of every score for the end of regulation. The most likely individual score according to this table is 14-7 Alabama at around 200:1. Anywhere in white or not shown has odds of greater than 1,000:1 (oddly enough, this includes my average predicted score of 30-23). According to this same model, the odds of the Alabama-Washington outcome was a relatively likely 200:1 while the Clemson-Ohio State outcome was an astronomical 50,000:1