The lack of public sports data sources has been a major obstacle in the creation of modern, reproducible research and sports analytics. To help, we at Lionbridge AI have created a cheat sheet of publicly available machine learning datasets categorized by sport.

Soccer Datasets

football.db: A free and open public domain football database & schema for use in any programming language.

FIFA 19 complete player dataset: Detailed attributes for every player registered in the latest edition of the FIFA 19 database scraped from SoFIFA.

Fifa 18 More Complete Player Dataset: An extension of the previous dataset, this version contains several extra fields and is pre-cleaned to a much greater extent.

World Cup Dataset: This dataset shows all information about historical World Cups as well as all match data.

International football results from 1872 to 2018: This dataset contains 40,000 results of football matches from the very first official match in 1972 up until 2018. Matches range from FIFA World Cup to regular friendly matches.

Basketball Datasets

NBA shot logs: Data on shots taken during the 2014-2015 season, which player took the shot, where on the floor was the shot taken from, who was the nearest defender, how far away was the nearest defender, time on the shot clock, and much more.

NBA Player of the Week Data: Player of the week data from 1984-5 to 2018-9 seasons, scraped from the Basketball real gm site.

Daily Fantasy Basketball: This dataset contains 20 days of DraftKings NBA fantasy basketball contest data scraped at the end of 2017.

NCAA Basketball: This dataset contains data about NCAA Basketball teams, teams, and games. It covers play-by-play and box scores from 2009 and final scores from 1996.

American Football datasets

NFLsavant.com: A website dedicated to providing NFL statistics in a simple interface. All data is compiled from publicly available NFL play-by-play data.

Detailed NFL Play-by-Play Data 2009-2018: Regular season plays from 2009-2016 containing information on: players, game situation, results, and advanced metrics such as expected point and win probability values.

NFL Draft Outcomes: All players selected in the NFL Draft from 1985-2015 including outcome statistics.

Racing Datasets

Ergast Formula One Dataset: An experimental web service which provides a historical record of motor racing data for non-commercial purposes.

Formula 1 Race Data: This dataset contains data from 1950 all the way through the 2017 season, and consists of tables describing constructors, race drivers, lap times, pit stops and more.

Miscellaneous Sports Datasets

FiveThirtyEight – Anews and sports site with data-driven articles. They make their datasets openly available on Github.

SPORTS-1M: 1M sports videos of average length-5.5mins labelled for 487 sports classes.

120 years of Olympic history: A historical dataset on the Olympic Games, including all the Games from Athens 1896 to Rio 2016 with data scraped from sports-reference.com.

Daily and Sports Activities Data Set: Motion sensor data of nineteen sports activities performed by 8 subjects in their own style for 5 minutes.

Lahman’s Baseball Database: A complete history of major league baseball stats from 1871 to 2018, including batting and pitching stats, standings, team stats, managerial records, post-season data, and more.

NHL Game Data: Game, team, player and play data including x,y coordinates measured for each game in the NHL in the past 6 years.

In case you missed our previous dataset compilations, you can find them all here. Still can’t find the custom data you need to train your model? Lionbridge AI provides machine learning data in dozens of languages for machine learning project needs.

Contact us to learn how Lionbridge AI can improve your training data.