Abstract

Data and analytics have been part of the sports industry from as early as the 1870s, when the first boxscore in baseball was recorded. However, it is only recently that advanced data mining and machine learning techniques have been utilized for facilitating the operations of sports franchises. While part of the reason is related with the ability to collect more fine-grained data, an equally important factor for this turn to analytics is the huge success and competitive advantage that early adopters of investment in analytics enjoyed (popularized by the best-seller ``Moneyball'' that described the success that Oakland Athletics had with analytics). Draft selection, game-day decision making and player evaluation are just a few of the applications where sports analytics play a crucial role today. Apart from the sports clubs, other stakeholders in the industry (e.g., the leagues' offices, media, etc.) invest in analytics. The leagues increasingly rely on data in order to decide on potential rule changes. For instance, the most recent rule change in NFL, i.e., the kickoff touchback, was a result of thorough data analysis of concussion instances. In this tutorial we will review the literature in data mining and machine learning techniques for sports analytics. We will introduce the audience to the design and methodologies behind advanced metrics such as the adjusted plus/minus for evaluating basketball players, spatial metrics for evaluating the ability of a player to spread the defense in basketball, and the Player Efficiency Rating (PER). We will also go in depth in advanced data mining methods, and in particular tensor mining, that can analyze heterogenous data similar to the ones available in today's sports world.