Back to Curley Lab Website Animating NBA Play by Play using R James P Curley

This tutorial demonstrates how to create an animated visualization of the NBA’s play-by-play data. It heavily builds upon the tutorials written by Rajiv Shah available here:. I will use several functions available at Rajiv’s GitHub page to get the data and process it. I will then get the data into the format required to animate using gganimate.

Depending upon your machine, please allow a little time for the example animation to load at the bottom of the page. If you continue to have difficulties please refresh or try here for the RPubs version.

Load these packages library(RCurl) library(jsonlite) library(dplyr) library(sp) library(ggplot2) library(data.table) library(gganimate)

Source Some Functions First source several functions that we need to extract and plot data. These written by Rajiv and adapted by me. The R file is available here.

Get the Data Although the NBA’s play by play data is no longer made publicly available, a selection of these data have been stored. We will use the data in this repository. I downloaded this game-log if you want to follow along. I stored it as “0021500431.json” following Rajiv’s example. It is a game between the San Antonio Spurs and Minnesota Timberwolves from December 2015. Use the ‘sportvu_convert_json’ function to convert the raw json data to a R dataframe. all.movements <- sportvu_convert_json("0021500431.json") The loaded data should look like this: str(all.movements) ## 'data.frame': 2646562 obs. of 13 variables: ## $ player_id : chr "2225" "2225" "-1" "-1" ... ## $ lastname : chr "Parker" "Parker" "ball" "ball" ... ## $ firstname : chr "Tony" "Tony" NA NA ... ## $ jersey : chr "9" "9" NA NA ... ## $ position : chr "G" "G" NA NA ... ## $ team_id : num 1.61e+09 1.61e+09 NA NA 1.61e+09 ... ## $ x_loc : num 51.7 51.7 52.9 52.9 60.4 ... ## $ y_loc : num 40.3 40.3 39.9 39.9 31.8 ... ## $ radius : num 0 0 2.5 2.5 0 ... ## $ game_clock: num 716 716 716 716 716 ... ## $ shot_clock: num 13.3 13.3 13.3 13.3 13.3 ... ## $ quarter : num 1 1 1 1 1 1 1 1 1 1 ... ## $ event.id : num 2 1 1 2 2 1 2 1 2 1 ... head(all.movements) ## player_id lastname firstname jersey position team_id x_loc ## 1 2225 Parker Tony 9 G 1610612759 51.73003 ## 2 2225 Parker Tony 9 G 1610612759 51.73003 ## 3 -1 ball <NA> <NA> <NA> NA 52.88766 ## 4 -1 ball <NA> <NA> <NA> NA 52.88766 ## 5 201937 Rubio Ricky 9 G 1610612750 60.40236 ## 6 201937 Rubio Ricky 9 G 1610612750 60.40236 ## y_loc radius game_clock shot_clock quarter event.id ## 1 40.31428 0.00000 715.78 13.27 1 2 ## 2 40.31428 0.00000 715.78 13.27 1 1 ## 3 39.89349 2.49642 715.78 13.27 1 1 ## 4 39.89349 2.49642 715.78 13.27 1 2 ## 5 31.81802 0.00000 715.78 13.27 1 2 ## 6 31.81802 0.00000 715.78 13.27 1 1 You can see that the raw data contains infomation about players and the ball. Every row represents the location of a ball or player at a particular time. Each player is given an ID, a jersey number and position and team-id. They also have an x and y coordinate for each time. There is additionally radius information related to the ball’s movement. Next, we need to get some metadata for this particular game. The get_pbp function grabs this information from the stats.nba.com website. gameid = "0021500431" pbp <- get_pbp(gameid) str(pbp) ## 'data.frame': 414 obs. of 33 variables: ## $ GAME_ID : Factor w/ 1 level "0021500431": 1 1 1 1 1 1 1 1 1 1 ... ## $ EVENTNUM : Factor w/ 414 levels "0","1","10","100",..: 1 2 99 192 288 368 378 387 404 394 ... ## $ EVENTMSGTYPE : Factor w/ 12 levels "1","10","12",..: 3 2 8 5 7 8 5 7 10 1 ... ## $ EVENTMSGACTIONTYPE : Factor w/ 36 levels "0","1","10","108",..: 1 1 15 18 1 15 2 1 7 28 ... ## $ PERIOD : Factor w/ 4 levels "1","2","3","4": 1 1 1 1 1 1 1 1 1 1 ... ## $ WCTIMESTRING : Factor w/ 94 levels "10:01 PM","10:02 PM",..: 14 14 14 15 15 15 15 15 16 16 ... ## $ PCTIMESTRING : Factor w/ 220 levels "0:00","0:01",..: 69 69 66 63 62 61 55 54 47 47 ... ## $ HOMEDESCRIPTION : Factor w/ 182 levels "Bjelica BLOCK (1 BLK)",..: NA 39 NA 91 NA NA 78 NA 171 NA ... ## $ NEUTRALDESCRIPTION : Factor w/ 0 levels: NA NA NA NA NA NA NA NA NA NA ... ## $ VISITORDESCRIPTION : Factor w/ 209 levels "Aldridge 1' Alley Oop Layup (4 PTS) (Duncan 4 AST)",..: NA NA 162 90 94 92 NA 164 NA 160 ... ## $ SCORE : Factor w/ 104 levels "100 - 77","101 - 77",..: NA NA NA NA NA NA NA NA NA 16 ... ## $ SCOREMARGIN : Factor w/ 24 levels "-10","-11","-12",..: NA NA NA NA NA NA NA NA NA 11 ... ## $ PERSON1TYPE : Factor w/ 6 levels "0","1","2","3",..: 1 5 6 5 6 6 5 6 5 6 ... ## $ PLAYER1_ID : Factor w/ 26 levels "0","1495","1610612750",..: 1 6 22 21 14 14 23 22 6 22 ... ## $ PLAYER1_NAME : Factor w/ 23 levels "Adreian Payne",..: NA 9 21 2 10 10 19 21 9 21 ... ## $ PLAYER1_TEAM_ID : Factor w/ 2 levels "1610612750","1610612759": NA 1 2 1 2 2 1 2 1 2 ... ## $ PLAYER1_TEAM_CITY : Factor w/ 2 levels "Minnesota","San Antonio": NA 1 2 1 2 2 1 2 1 2 ... ## $ PLAYER1_TEAM_NICKNAME : Factor w/ 2 levels "Spurs","Timberwolves": NA 2 1 2 1 1 2 1 2 1 ... ## $ PLAYER1_TEAM_ABBREVIATION: Factor w/ 2 levels "MIN","SAS": NA 1 2 1 2 2 1 2 1 2 ... ## $ PERSON2TYPE : Factor w/ 3 levels "0","4","5": 1 3 1 1 1 1 1 1 1 1 ... ## $ PLAYER2_ID : Factor w/ 25 levels "0","1495","1626145",..: 1 2 1 1 1 1 1 1 1 1 ... ## $ PLAYER2_NAME : Factor w/ 24 levels "Adreian Payne",..: NA 21 NA NA NA NA NA NA NA NA ... ## $ PLAYER2_TEAM_ID : Factor w/ 2 levels "1610612750","1610612759": NA 2 NA NA NA NA NA NA NA NA ... ## $ PLAYER2_TEAM_CITY : Factor w/ 2 levels "Minnesota","San Antonio": NA 2 NA NA NA NA NA NA NA NA ... ## $ PLAYER2_TEAM_NICKNAME : Factor w/ 2 levels "Spurs","Timberwolves": NA 1 NA NA NA NA NA NA NA NA ... ## $ PLAYER2_TEAM_ABBREVIATION: Factor w/ 2 levels "MIN","SAS": NA 2 NA NA NA NA NA NA NA NA ... ## $ PERSON3TYPE : Factor w/ 4 levels "0","1","4","5": 1 4 1 4 1 1 1 1 2 1 ... ## $ PLAYER3_ID : Factor w/ 9 levels "0","1495","200746",..: 1 4 1 6 1 1 1 1 1 1 ... ## $ PLAYER3_NAME : Factor w/ 8 levels "Danny Green",..: NA 1 NA 4 NA NA NA NA NA NA ... ## $ PLAYER3_TEAM_ID : Factor w/ 2 levels "1610612750","1610612759": NA 2 NA 2 NA NA NA NA NA NA ... ## $ PLAYER3_TEAM_CITY : Factor w/ 2 levels "Minnesota","San Antonio": NA 2 NA 2 NA NA NA NA NA NA ... ## $ PLAYER3_TEAM_NICKNAME : Factor w/ 2 levels "Spurs","Timberwolves": NA 1 NA 1 NA NA NA NA NA NA ... ## $ PLAYER3_TEAM_ABBREVIATION: Factor w/ 2 levels "MIN","SAS": NA 2 NA 2 NA NA NA NA NA NA ... As you can see, this contains loads of stuff that we don’t need. Let’s just keep what we want: pbp <- pbp[-1,] #first row is NAs colnames(pbp)[2] <- c('event.id') #will use this to merge on all.movements df pbp0 <- pbp %>% select (event.id,EVENTMSGTYPE,EVENTMSGACTIONTYPE,SCORE) pbp0$event.id <- as.numeric(levels(pbp0$event.id))[pbp0$event.id] head(pbp0) ## event.id EVENTMSGTYPE EVENTMSGACTIONTYPE SCORE ## 2 1 10 0 <NA> ## 3 2 5 45 <NA> ## 4 3 2 5 <NA> ## 5 4 4 0 <NA> ## 6 5 5 45 <NA> ## 7 6 2 1 <NA> tail(pbp0) ## event.id EVENTMSGTYPE EVENTMSGACTIONTYPE SCORE ## 409 477 6 1 <NA> ## 410 478 3 11 107 - 83 ## 411 480 3 12 108 - 83 ## 412 482 2 1 <NA> ## 413 483 4 0 <NA> ## 414 484 13 0 108 - 83 Each event.id refers to a different play in the game. Now we can merge the two files: all.movements <- merge(x = all.movements, y = pbp0, by = "event.id", all.x = TRUE) dim(all.movements) ## [1] 2646562 16 This is a lot of information! Over 2.6 millions rows. Let’s pick one event id to visualize. Here I will use the same one as Rajiv’s example - eventid=303. id303 <- all.movements[which(all.movements$event.id == 303),] dim(id303) ## [1] 10890 16 head(id303) ## event.id player_id lastname firstname jersey position team_id ## 1644741 303 -1 ball <NA> <NA> <NA> NA ## 1644742 303 203937 Anderson Kyle 1 F 1610612759 ## 1644743 303 201937 Rubio Ricky 9 G 1610612750 ## 1644744 303 201988 Mills Patty 8 G 1610612759 ## 1644745 303 203952 Wiggins Andrew 22 G-F 1610612750 ## 1644746 303 203937 Anderson Kyle 1 F 1610612759 ## x_loc y_loc radius game_clock shot_clock quarter ## 1644741 5.43835 24.73073 10.63683 359.75 5.49 3 ## 1644742 65.31054 22.12468 0.00000 346.42 19.03 3 ## 1644743 46.60167 20.00475 0.00000 376.60 22.70 3 ## 1644744 38.77574 21.41917 0.00000 359.40 23.71 3 ## 1644745 11.18441 34.04307 0.00000 359.40 23.69 3 ## 1644746 8.62043 2.05544 0.00000 364.39 7.11 3 ## EVENTMSGTYPE EVENTMSGACTIONTYPE SCORE ## 1644741 1 98 67 - 48 ## 1644742 1 98 67 - 48 ## 1644743 1 98 67 - 48 ## 1644744 1 98 67 - 48 ## 1644745 1 98 67 - 48 ## 1644746 1 98 67 - 48 Wow! So much information - this event along as over 10 thousand rows !

Making a static plot The first thing to notice is that there are 850 different timepoints in this particular length(table(id303$game_clock)) ## [1] 850 We will extract data to make several dataframes. We will use time 361.15. A different function is used to get the data for the ball, players and the convex hull. Players This shows the x and y locations of the 10 players on the court at this time. playerdf <- player_position1(df=id303, eventid=303,gameclock=361.15) playerdf ## ID X Y jersey ## 1 1610612750 6.56514 24.39048 32 ## 2 1610612750 13.67987 34.34244 22 ## 3 1610612759 16.08250 25.92709 30 ## 4 1610612759 20.31601 46.23162 33 ## 5 1610612750 15.89886 20.33154 9 ## 6 1610612759 24.47131 10.80858 8 ## 7 1610612750 11.03182 7.93721 8 ## 8 1610612759 7.59973 16.73085 20 ## 9 1610612750 20.11542 29.46991 5 ## 10 1610612759 14.81859 2.42889 1 Convex Hull Plot This function grabs the points required for plotting the convex hull of the player’s location. See Rajiv’s website for more info. chulldf <- chull_plot(df=id303, eventid=303, gameclock=361.15) chulldf ## ID X Y ## 1 1 11.03182 7.93721 ## 2 1 6.56514 24.39048 ## 3 1 13.67987 34.34244 ## 4 1 20.11542 29.46991 ## 5 1 11.03182 7.93721 ## 6 2 24.47131 10.80858 ## 7 2 14.81859 2.42889 ## 8 2 7.59973 16.73085 ## 9 2 20.31601 46.23162 ## 10 2 24.47131 10.80858 Ball Info This simply gets the ball information from the id303 dataframe. Notice this dataframe also includes ID and jersey variables (NA). This is so we can bind together with the player location dataframe later. ballposdf <- ball_position1(df=id303, eventid=303, gameclock=361.15) ballposdf ## ID X Y jersey ## 1 NA 7.80244 17.77459 <NA> Plotting Next these pieces of information are plotted using ggplot by overlaying it on a geom made by Ed Kupfer. fullcourt() + geom_point(data=playerdf,aes(x=X,y=Y,group=ID,color=factor(ID)),size=6) + #players geom_text(data=playerdf,aes(x=X,y=Y,group=ID,label=jersey),color='black') + #jersey number geom_polygon(data=chulldf,aes(x=X,y=Y,group=ID,fill=factor(ID)),alpha = 0.2) + #convex hull geom_point(data=ballposdf,aes(x=X,y=Y),color='darkorange',size=3) + #ball scale_color_manual(values=c("lightsteelblue2","orangered2")) + scale_fill_manual(values=c("lightsteelblue2","orangered2")) + theme(legend.position="none")