2014 marked the beginning of a new era for College football. The old system whereby a combination of human polls and computers chose the 2 top teams to battle it out for the championship has given way to a 4 team playoff, chosen by a playoff committee staffed by a number of high profile people from both within college football and outside it.

An interesting scenario has arisen with just 1 weekend left of regular season games to play. TCU and Baylor, from the BIG 12 conference both have 10 wins and 1 loss each, and both are in contention for a spot in the playoffs. TCU’s lone loss came against Baylor, with Baylor losing a conference game to West Virginia (whom TCU beat). The team that has won the head to head matchup is often the first tie-breaker used when teams have identical Win/Loss records, with the SEC, PAC-12, B1G and ACC all using head to head record as a tiebreaker to choose the division champions (with the conference champion decided by a championship game). The BIG 12, curiously, has no tiebreaker for conference champion, allowing the championship to be tied, but uses head to head matchup as the tie-breaker to choose who gets a favourable bowl game matchup.

The playoff committee however, whose mission is to choose “The best four teams… based on strength of schedule, head-to-head results, comparison of results against common opponents, championships won and other factors” has ranked TCU at #3, ahead of Baylor at #6, leading to accusations that it is not using common sense, has ignored tradition or it’s own rules, or worse, is simply un-american.

The question we have is: assuming teams have identical win/loss records, does the result of a game between the two teams give us information about which team is better? On first glance one would certainly think so, since better teams will more often win a game against inferior teams, so the team that won the game between those two teams is probably the better side. However the condition “assuming teams have identical win/loss records” changes the scenario, because it means that the team that won the head to head matchup has lost to another team where the team that lost the head to head matchup didn’t.

If we are only looking at games in a round robin conference schedule, assuming the two teams we are ranking are the two candidates for conference winner, this means the winner of the head to head matchup lost to a worse team than the loser of the head to head matchup did. Nearly every statistical system to rank sports teams would consider losing to a lower rated team as worse than losing to a higher rated team. So we have two competing views.

The model

I created a model that simulated match-ups between hypothetical teams in a 10-team conference (the historically named BIG 12 actually has 10 teams). Each team was randomly assigned an ELO rating between 1000 and 1800. Each team played each other once, for a total of 9 games, with either 5 or 4 home games. Playing at home gave the team a 50 point boost in ELO rating, which corresponds to roughly an extra 7% chance of winning, or an extra 2.5 point spread.

Each conference game was randomly played out according to the ELO ratings. For example, if Team A with a rating of 1400 played at home against Team B with rating of 1600, there would be a difference of 1600 — (1400 + 50) = 150 rating points between them, which gives Team A a 29.7% chance of winning. The winner was then chosen randomly such that Team A would win 29.7% of the time, and Team B 70.3% of the time.

After each team had played each other, the teams were ranked two different ways: firstly, just using Win/Loss record, with ties broken by random selection. Secondly, using Win/Loss record, with ties broken by the winner of the head to head matchup. The winner of the conference using each method was compared to the team that actually had the best underlying rating, tallying the number of times the team with the best ELO rating won the conference. The teams were then given completely new random ratings and the simulation repeated, 5,000,000 times.

If one ranking method was better than the other in determining which was the better team, then over 5,000,000 simulations that ranking’s winner should match the best ELO rated team a statistically significant number of times more than the other ranking method.

The Results

Over 5,000,000 simulations, there was a tie for first place in the conference 31.29% of the time. Ranking using Win/Loss record and random tie-breaking resulted in the best team winning the conference 47.92% of the time. Ranking using Win/Loss record with head to head tie-breaking results in the best team winning the conference 47.84% of the time. So it appears using head to head as a tiebreaker actually decreased the accuracy over randomly picking a team as a tiebreaker. A difference of 0.08% is very small, but over 5,000,000 simulations it is statistically significant: this gives a z-score of 3.283 and a resulting p-value of 0.0005, which means there is only a 0.05% chance this was the result of luck.

An explanation for this could be that when tie-breaking using head to head record, an undue advantage is going to be given to the team that happened to be at home for the head to head matchup between those two teams. This is an extremely small effect though, as it only makes a difference in 0.08% of cases. This means, on average, over 1,250 years your head to head tiebreaker will choose the worse team one time more than the random tiebreaker does.

To test this, the simulation was rerun twice, once with home advantage removed, and the other with home advantage increased to 200 ratings points. With home advantage removed, random tie-breaking chose the best team 48.06% of the time, with head to head tie-breaking choosing the best team 48.03% of the time. A z-score of 1.328 gives a 9% chance this difference is due to luck, which would not generally be considered statistically significant. With home advantage increased, random tie-breaking chose the best team 45.81% of the time, with head to head tie-breaking choosing the best team 45.41% of the time. A z-score of 17.77 gives a less than 0.0001% chance this difference is due to luck. It seems that the ranking methods are practically indistinguishable if all games were played at a neutral site, with an extremely slight advantage to random ranking increasing in step with the advantage of playing at home.

Conclusion

When looking at intra-conference results for teams with identical records, there is no statistical evidence that using head to head results as a tiebreaker will add any information as to which team is actually better. Using head to head matchup at best offers no extra information, and at worst decreases the chance of picking the best teams, especially given that we have far more sophisticated methods of analysing team’s strengths than merely Win/Loss and head to head match-ups. Taking teams that are tied for Win/Loss in a conference and randomly picking their names out of a hat will likely produce the better team more often than using head to head record as a tie-breaker.

This of course ignores any notion of which team may be more “deserving” of being called a winner, and taking the head to head winner as a champion seems intuitively like a fairer result for most people. However, if we are choosing the “best” team, as the College Football Playoff committee is tasked with doing, we have to acknowledge that human intuition is often inaccurate, biased, and sometimes flat out wrong.

Assumptions & Misc

A rating difference of 800 points between the worst possible team and the best possible team represents ~1% chance of the worst team in a conference beating the best team. This seems like a sensible spread, as we have TCU playing Iowa State this weekend estimated at around 97% to 99% chance of TCU winning, using ESPN’s Football Power Index and betting lines respectively. Note this 800 points is the maximum possible ratings difference, so perhaps a value slightly larger might be warranted.

An advantage of 3 points to a home team is estimated by Jeff Sagarin and also this research. A 3 point spread in favour of a team gives it roughly 60% chance of winning, alternatively this site has a variety of research that would suggest a ~58% chance of winning, all other things being equal, for the home side. I think a ratings boost between 50 and 100 points would be the best estimate.

Choosing ELO as a way of producing teams and simulating results could be changed, and of course actual football games are far more complex than a simple rating system. But it’s hard to image a system that would substantially change the result. Any method of simulating games is going to represent team strength by one or more parameters, and if the team that won the head to head matchup is actually the better side of the two, then it probably wouldn’t have lost the other game. If we are more convinced that Team A beating Team B means A is better than B, then A losing to C where B didn’t becomes equally more baffling, so we are never closer to that information being useful.

The code to run the simulation is available at https://github.com/mehwoot/cfb, and you are welcome to download it and play around with the parameters. I couldn’t find any difference in results from changing parameters, aside from home advantage causing the performance of head to head tie-breaking to degrade slightly as covered above.