Every holiday season, families all over the United States get together to reflect, spend time with each other, and most importantly, watch an inordinate amount of college football bowl games. Until recently, these bowl games were the only postseason offering for avid college football fans. Since the 2014 addition of the College Football Playoff, however, many fans worried that the “less important” bowl games (meaning, bowl games without impact in the playoff) would suffer from far less ratings, leading to lower payouts for the winning teams and overall less interest.

For someone like me who will watch football games between Bowling Green and Northern Illinois on a Wednesday night in October, *NO* bowl game seems irrelevant. Typically, this personal sentiment combined with a lack of late December/early January activity culminates in my participation in a Bowl Pick’em Pool with some friends. The format is simple: each game has a “confidence number” that you can assign to it, 1-41, and you can put them in any order you choose. More confident that a team is going to win? Assign more points to it. Picking a random upset because your cousin went to the school? Assign less points to it. Pictured below is a sample pool entry.

The way I normally complete these pools is by arbitrarily “feeling out” my confidence for individual matchups, leaving a large middle zone of games that I don't really have any strong feelings about. This year, I figured I would throw as much information as I could use to “rate” the teams in a spreadsheet and use IBM Watson Analytics to get some quick insights on which teams might be better picks.

I took each matchup and coded each team based on whether or not they were favored based on the current lines from Las Vegas, obtained from gambling website MyBookie. I used the betting lines as the “market opinion” of each team. If a team is favored by 10 points, it stands to reason there is a higher confidence in the “market” that they will win versus a team that is favored by 3 points. I also went on ESPN.com and was able to obtain the percent that each team was picked for their contest, in order to give me another insight to “market sentiment.” I also realized I needed an objective way to rank the teams that didn’t have anything to do with the market, so I was able to find ELO ranking data (a ranking system of teams based on their wins/losses/scores of each game) on WarrenNolan.com (special shout out to Warren, who provided some useful historical data as well!).

At this point, I now had metrics to determine each team’s:

Predicted final score differential (betting line)

Predicted final outcome (Winner/Loser)

Objective ranking metric (ELO Number)

Objective rank (ELO rank)

After uploading the spreadsheet to Watson Analytics (WA), I began noticing that there were a great deal of insight to be gained from these ranks.

This chart includes the difference between the favored team’s ELO and the underdog team’s ELO, sorted by highest differential from left to right. The height of each bar is the betting line for the favored team, so favored teams are the only ones listed. The color of each bar indicates whether the favored team is also the most picked team on ESPN.com. I used the ESPN.com numbers as a predictor for my friends’ choices, which I knew wouldn’t work for certain matchups (we have an inherent bias to many schools that our friends went to/played ball at) but would for most.

For instance, according to the chart, Oregon-Boise State was the most backwards matchup for a favored team. Oregon was favored in the game by 7 points, but was ELO-ranked significantly worse than Boise State. Oregon was overvalued in this matchup. On the right side of the chart, South Florida looks like the biggest slam dunk to win. They have a significantly higher ELO than their opponent, Texas Tech. Then it would beg the question - why is South Florida only favored by 2 points, where right next to them is Florida Atlantic, the highest spread and a significantly higher ELO? Shouldn't the matchup with the highest differential ALSO have the highest spread? In this case, South Florida is undervalued.

This became the thought process for looking at each game. Which spreads were chosen accurately and made a lot of sense, and which spreads made less sense? Which teams are over or under valued? And how much stock should I put in to each team’s ELO vs the spread that was given to each game?

There were some guiding principles that I used from my experience in watching college football to tweak my idea for where I should “confidence rank” each game.

Although Vegas betting lines are scary accurate at times, according to some research by Boyd's Bets (a sports betting/statistics blog), college football has the lowest predicted/actual score correlation of all major US sports.

My unfamiliarity with ELO makes me somewhat hesitant to rely on it, but objective measures of how good a team may or may not be are difficult to come by.

Unfortunately, my main goal of getting rid of a middle zone of games that I don’t feel strongly about was not helped. Summary statistics for the Vegas lines:

Mean – 5.4

– 5.4 Median - 5

- 5 Mode – 7

We have a huge spike in spreads between 6.5 and 7.5, which isn't really high enough to make a confident guess that one team will win over the other. Many of these 6-7 point lines are the matchups that haven’t occurred before, which means there isn’t a lot of information on which team’s style will work better given the circumstances.

Based on these guidelines, I arranged my games in the order I felt best, following the chart above and the guidelines I listed. I started at 41 and worked my way down in order to make my 30-41 point games as certain as possible. My results are below, with my correct picks on the bottom and my wrong picks on the top. The left-most teams in the first stack are what hurt me the most, and the left-most bottom stack teams are the ones that helped me the most.

I got 21 correct, 18 incorrect. For the all-important point slots of 30-41, I left a significant amount of points on the table. If you can do well on the 30-41 stretch of games, it gets incredibly difficult to lose, as 426 of a possible 861 points are contained in those 12 games. 7 of a possible 12 is over 50%, but definitely leaves room for improvement.

I think one of the more interesting parts of my quasi-experiment is that I didn’t do much better than I typically do when I “guess” more of my answers. It’s unclear whether that is caused by the unpredictability of this bowl season in particular or if my data was a poor evaluator of team’s predicted performance. I’ve heard many coaches stress that 1-1 matchups between the players are much more relevant in their game than many analysts account for. I know that in my opinion, there were some incredible performances that no analyst really could have predicted. Just to name a few: Appalachian State trouncing Toledo, Marshall having 3 touchdown plays that went 68 yards and up (2 of them were runs), Georgia and Oklahoma having an insane boxing match of a game go into 2 overtimes, and of course, Tua Tagovailoa instantly becoming an Alabama legend.

For the title game, I was the only member of the bowl pool who chose Georgia and would have won outright with a Bulldog victory. Unfortunately, a Georgia DB completely blew his coverage and allowed the 19 year-old Tua to cap off one of the most unsatisfying comebacks in college football history. Go figure!

For the next time we run this, it’ll be worth it to make a few changes. First of all, I think that the model could be helped by more objective factors of measurement. ELO was a fine composite ranking and actually turned out more accurate than I would have guessed. If I had purely chosen the games based on which team had the higher ELO, ranked in order of highest to lowest delta, my score would have been higher by 8 points, good for a second place tie in the pool. I think I should come up with multiple methods of ranking them using different metrics, and then use historical data to find out which metric is the best predictor. I also think that being less reliant on the Las Vegas spreads will be useful next time. Even though they would have given me a similar score to using the ELO ranking, the order of hits/misses seemed random and disjoint.

Thanks for reading -- I hope a college football fan's attempt at making sense of this crazy bowl season was interesting.

For the purposes of this article, all visualizations were created using IBM Watson Analytics (www.watsonanalytics.com).