A while ago I found this fantastic post about NBA shot charts built in Python. Since my Python skills are quite basic I decided to reproduce such charts in R using data scraped from the internet and ggplot2 .

Getting the Data

First we need the shot data from stats.nba.com. This blog post from Greg Reda does a great job explaining how to find the underlying API and extract data from a web app (in this case, stats.nba.com).

To get shot data for Stephen Curry we will use this url. The url shows the shots taken by Curry during the 2014-15 regular season in a JSON structure. Note also that Season , SeasonType and PlayerID are parameters in the url. Stephen Curry’s PlayerID is 201939.

Time to get this data into R and for that I use the package rjson and replace the PlayerID parameter with the R object PlayerID .

UPDATE: the NBA stats website has changed the JSON structure of its shot detail data. In this code, I added the new argument PlayerPosition and it should work just fine.

library(rjson) # shot data for Stephen Curry playerID <- 201939 shotURL <- paste("http://stats.nba.com/stats/shotchartdetail?CFID=33&CFPARAMS=2014-15&ContextFilter=&ContextMeasure=FGA&DateFrom=&DateTo=&GameID=&GameSegment=&LastNGames=0&LeagueID=00&Location=&MeasureType=Base&Month=0&OpponentTeamID=0&Outcome=&PaceAdjust=N&PerMode=PerGame&Period=0&PlayerID=",playerID,"&PlayerPosition=&PlusMinus=N&Position=&Rank=N&RookieYear=&Season=2014-15&SeasonSegment=&SeasonType=Regular+Season&TeamID=0&VsConference=&VsDivision=&mode=Advanced&showDetails=0&showShots=1&showZones=0", sep = "") # import from JSON shotData <- fromJSON(file = shotURL, method="C")

Now we have the JSON data as a R list object with 3 elements. The element important for the chart is the resultSets which contains the coordinates of each shot, shot type, range, made/missed flag and more. But first the data needs to be unlisted and saved as a data frame.

# unlist shot data, save into a data frame shotDataf <- data.frame(matrix(unlist(shotData$resultSets[[1]][[3]]), ncol=24, byrow = TRUE)) # shot data headers colnames(shotDataf) <- shotData$resultSets[[1]][[2]] # covert x and y coordinates into numeric shotDataf$LOC_X <- as.numeric(as.character(shotDataf$LOC_X)) shotDataf$LOC_Y <- as.numeric(as.character(shotDataf$LOC_Y)) shotDataf$SHOT_DISTANCE <- as.numeric(as.character(shotDataf$SHOT_DISTANCE)) # have a look at the data View(shotDataf)

Basic Chart

We can now produce a basic plot using ggplot2.

# simple plot using EVENT_TYPE to colour the dots ggplot(shotDataf, aes(x=LOC_X, y=LOC_Y)) + geom_point(aes(colour = EVENT_TYPE))

This plot surely looks familiar. But it can improved by overlaying a basketball half court and fixing the aspect ratio of our court/plot. To solve the basketball court problem I simply googled “NBA half court” and found this. (EDIT: the jpg court file is no longer there. Instead, use this)

Shot Charts

Lets plot the data again but this time using the image overlay. For that I will use the packages grid and jpeg . The image is overlaid by using the ggplot2 function annotation_custom . For the axis limits I use -250 to 250 in axis x and -50 to 420 in axis y (I found these to be a good fit after a few hit-and-misses). These dimensions are also the exact length to width ratio of an official NBA half court, but they might differ if you use a different half court image.

library(grid) library(jpeg) # half court image courtImg.URL <- "https://thedatagame.files.wordpress.com/2016/03/nba_court.jpg" court <- rasterGrob(readJPEG(getURLContent(courtImg.URL)), width=unit(1,"npc"), height=unit(1,"npc")) # plot using NBA court background and colour by shot zone ggplot(shotDataf, aes(x=LOC_X, y=LOC_Y)) + annotation_custom(court, -250, 250, -50, 420) + geom_point(aes(colour = SHOT_ZONE_BASIC, shape = EVENT_TYPE)) + xlim(-250, 250) + ylim(-50, 420)

There are a few things to note here. First you may see an error that reads “Removed 7 rows containing missing values (geom_point)“. In this case, Stephen Curry attempted 7 backcourt shots during the final seconds of a quarter. I am not interested in these shots and as a result of my y-axis limits, these are not going to be displayed. Secondly, note how shots labeled as “Left Corner 3” in green are actually located on the right side of the court. I will solve this problem by flipping the x-axis from left to right. One more thing: the coordinates are not fixed. As we resize the plot, it becomes distorted. This can be solved by using the coord_fixed function.

# plot using ggplot and NBA court background image ggplot(shotDataf, aes(x=LOC_X, y=LOC_Y)) + annotation_custom(court, -250, 250, -50, 420) + geom_point(aes(colour = SHOT_ZONE_BASIC, shape = EVENT_TYPE)) + xlim(250, -250) + ylim(-50, 420) + geom_rug(alpha = 0.2) + coord_fixed() + ggtitle(paste("Shot Chart

", unique(shotDataf$PLAYER_NAME), sep = "")) + theme(line = element_blank(), axis.title.x = element_blank(), axis.title.y = element_blank(), axis.text.x = element_blank(), axis.text.y = element_blank(), legend.title = element_blank(), plot.title = element_text(size = 15, lineheight = 0.9, face = "bold"))

This is a much improved shot chart. The x-axis is now flipped, right corner shots appear on the right of the court and left corner shots appear on the left of the court. Coordinates have been fixed meaning that no matter how the chart is resized, the court maintains its true aspect ratio. The axis and legend titles have disappeared and a title for the plot, containing the name of the player, has been added. One cool aesthetic and informative feature in this plot are the rugs on each axis created by geom_rug . It works as a density plot and a guide of “hot zones” for each player.

Adding Player Picture

It is also possible to scrape player pictures as pointed out by Savvas Tjortjoglou in his post. Stephen Curry’s picture can be found at http://stats.nba.com/media/players/132×132/201939.png where 201939 is Curry’s PlayerID . I will also make a few changes to the geom_point settings.

library(grid) library(gridExtra) library(png) library(RCurl) # scrape player photo and save as a raster object playerImg.URL <- paste("http://stats.nba.com/media/players/132x132/",playerID,".png", sep="") playerImg <- rasterGrob(readPNG(getURLContent(playerImg.URL)), width=unit(0.15, "npc"), height=unit(0.15, "npc")) # plot using ggplot and NBA court background ggplot(shotDataf, aes(x=LOC_X, y=LOC_Y)) + annotation_custom(court, -250, 250, -52, 418) + geom_point(aes(colour = EVENT_TYPE, alpha = 0.8), size = 3) + scale_color_manual(values = c("#008000", "#FF6347")) + guides(alpha = FALSE, size = FALSE) + xlim(250, -250) + ylim(-52, 418) + geom_rug(alpha = 0.2) + coord_fixed() + ggtitle(paste("Shot Chart

", unique(shotDataf$PLAYER_NAME), sep = "")) + theme(line = element_blank(), axis.title.x = element_blank(), axis.title.y = element_blank(), axis.text.x = element_blank(), axis.text.y = element_blank(), legend.title = element_blank(), plot.title = element_text(size = 17, lineheight = 1.2, face = "bold")) # add player photo and footnote to the plot pushViewport(viewport(x = unit(0.9, "npc"), y = unit(0.8, "npc"))) print(grid.draw(playerImg), newpage=FALSE) grid.text(label = "thedatagame.com.au", just = "centre", vjust = 50)

This time I highlighted shots made in green and shots missed in red. I also added transparency to each points by using alpha = 0.8 . The player photo and the footnote were added using functions from package grid .

Hexbin Shot Charts

Another cool way to display data with ggplot2 is to use hexbin instead of geom_point . You will need to install and load the package hexbin and use the function stat_binhex (which replaces geom_point and its components).

library(hexbin) # plot shots using ggplot, hex bins, NBA court backgroung image. ggplot(shotDataf, aes(x=LOC_X, y=LOC_Y)) + annotation_custom(court, -250, 250, -52, 418) + stat_binhex(bins = 25, colour = "gray", alpha = 0.7) + scale_fill_gradientn(colours = c("yellow","orange","red")) + guides(alpha = FALSE, size = FALSE) + xlim(250, -250) + ylim(-52, 418) + geom_rug(alpha = 0.2) + coord_fixed() + ggtitle(paste("Shot Chart

", unique(shotDataf$PLAYER_NAME), sep = "")) + theme(line = element_blank(), axis.title.x = element_blank(), axis.title.y = element_blank(), axis.text.x = element_blank(), axis.text.y = element_blank(), legend.title = element_blank(), plot.title = element_text(size = 17, lineheight = 1.2, face = "bold")) # add player photo and footnote to the plot pushViewport(viewport(x = unit(0.9, "npc"), y = unit(0.8, "npc"))) print(grid.draw(playerImg), newpage=FALSE) grid.text(label = "thedatagame.com.au", just = "centre", vjust = 50)

We know that Stephen Curry is an excellent 3-point shooter. In fact, he has taken 639 out of 1,341 shots from above the 3-line (left, right and centre). But this chart also reveals how active he is under the rim: 284 shots were attempted deep inside the paint, most of them were driving lay-up shots originated from Curry’s lighting fast transitions from defence all the way to the basket.

Accuracy Charts

Now I will have a look at shot accuracy for each of the 6 zones in the data (excluding backcourt shots). After excluding these shots, the data is summarised by shot zones using ddply . X and Y locations are averaged, shots made are summed up and attempted shots are counted and aggregated. I also create a column for accuracy labels. Again, I use ggplot along with geom_point for points location and geom_text for labels locations.

# exclude backcourt shots shotDataS <- shotDataf[which(!shotDataf$SHOT_ZONE_BASIC=='Backcourt'), ] # summarise shot data library(plyr) shotS <- ddply(shotDataS, .(SHOT_ZONE_BASIC), summarize, SHOTS_ATTEMPTED = length(SHOT_MADE_FLAG), SHOTS_MADE = sum(as.numeric(as.character(SHOT_MADE_FLAG))), MLOC_X = mean(LOC_X), MLOC_Y = mean(LOC_Y)) # calculate shot zone accuracy and add zone accuracy labels shotS$SHOT_ACCURACY <- (shotS$SHOTS_MADE / shotS$SHOTS_ATTEMPTED) shotS$SHOT_ACCURACY_LAB <- paste(as.character(round(100 * shotS$SHOT_ACCURACY, 1)), "%", sep="") # plot shot accuracy per zone ggplot(shotS, aes(x=MLOC_X, y=MLOC_Y)) + annotation_custom(court, -250, 250, -52, 418) + geom_point(aes(colour = SHOT_ZONE_BASIC, size = SHOT_ACCURACY, alpha = 0.8), size = 8) + geom_text(aes(colour = SHOT_ZONE_BASIC, label = SHOT_ACCURACY_LAB), vjust = -1.2, size = 8) + guides(alpha = FALSE, size = FALSE) + xlim(250, -250) + ylim(-52, 418) + coord_fixed() + ggtitle(paste("Shot Accuracy

", unique(shotDataf$PLAYER_NAME), sep = "")) + theme(line = element_blank(), axis.title.x = element_blank(), axis.title.y = element_blank(), axis.text.x = element_blank(), axis.text.y = element_blank(), legend.title = element_blank(), legend.text=element_text(size = 12), plot.title = element_text(size = 17, lineheight = 1.2, face = "bold")) # add player photo and footnote to the plot pushViewport(viewport(x = unit(0.9, "npc"), y = unit(0.8, "npc"))) print(grid.draw(playerImg), newpage=FALSE) grid.text(label = "thedatagame.com.au", just = "centre", vjust = 50)

Note how the “Above the Break 3” point is located inside the 3-point line area. This is because the 3-point shots attempted from the corners drive the y-axis average location down close to the basket. You can adjust the y-axis by adding, lets say, 20 to shotS$MLOC_Y for “Above the Break 3” . But I will leave as is.

Now, the same accuracy chart for James Harden from the Houston Rockets.

Curry isn’t the 2014-15 MVP by chance. He made 48.7% of field goals attempted during the regular season. From the left 3-point corner, he converted 63.2% of shots attempted (almost 2 in every 3 attempts). Under the rim Curry is very effective with 66.5% accuracy when going for those quick lay-ups and finger rolls.

James Harden, the other MVP contender, is also a great 3-point shooter, but not as accurate as Curry. Harden is slightly better from the right 3-point corner but Curry is better from every other zone in the court.

You can find the code on my GitHub page. I also uploaded a list of 490 player ID’s and players who have available shot location data in the NBA stats web app. All you need to do is replace the object PlayerID with the ID of the player you would like to plot.