What is this? This is college football play-by-play data for the years 2014-2019 and it will also be updated weekly throughout the current season. This data was scraped from the internet and is not guaranteed to be accurate in some areas, but it should be extremely accurate overall. I spent months on this project as play-by-play data is extremely valuable for data analysis in sports and very hard to get without paying an exorbitant fee, especially in an underdeveloped data market like college football.

What is the advantage of this data? This data goes deeper than data you will find nearly anywhere else on the internet. It has things like betting lines (spread, O/U, and moneyline), targeted receiver on incomplete passes, extreme levels of detail (who the opponent is, the timestamp of the play, who the QB was for that pass), down and distance information, situational data like the current score, and hidden stats that most sites don't aggregate like how many times a player fumbled even if they recovered. All together, this data is much more detailed than data you will find on a website that just gives you the total stats by year or game. As a hypothetical you would be able to check how many times Rondale Moore was targeted when the Purdue Boilermakers are within 5 yards of scoring in the 4th quarter when down by 7 or less points. Comparing Rondale's numbers to the other receivers in that offense would hypothetically show the trust his coaches have for him when the game is on the line (he was targeted 2/5 times in those situations and scored both times).

Shown below is just one neat application of the data. This chart shows the trend between the AwayScore - HomeScore in the 4th quarter of games in which the away team is favorited by sportsbooks. As the away team is more favorited by the sportsbooks, the score differential has a trend of increasing. This might seem obvious to a casual observer, but now we have the data to prove it!