Code Show All Code

Hide All Code



Download Rmd Exploring NBA team salary cap room with nbastatR and heatmaply Alex Bresler

This R Markdown Notebook will provide you a quick tutorial on how to use my package, nbastatR, and the plotly extension, heatmaply to create an interactive heatmap exploring the future cap space of the 30 NBA teams.

Step 1: Load the required packages

Please note that if you don’t have the following packages installed follow the code in the comments. I also recommend that even if you have these packages installed that you run the code in the comments and update to the most recent development version of each package.

Step 2: Acquire the data

This is the most important step, bringing the data into R to explore. In order to do that we will use nbastatR and its function get_teams_yahoo_team_salary_data which brings in team salary data from Yahoo’s fantastic new team salary pages.

all_team_data <- get_teams_yahoo_team_salary_data( use_all_teams = T, nest_data = F )

You got team summary salary data for Boston Celtics You got team summary salary data for Brooklyn Nets You got team summary salary data for New York Knicks You got team summary salary data for Philadelphia 76ers You got team summary salary data for Toronto Raptors You got team summary salary data for Chicago Bulls You got team summary salary data for Cleveland Cavaliers You got team summary salary data for Detroit Pistons You got team summary salary data for Indiana Pacers You got team summary salary data for Milwaukee Bucks You got team summary salary data for Atlanta Hawks You got team summary salary data for Charlotte Hornets You got team summary salary data for Miami Heat You got team summary salary data for Orlando Magic You got team summary salary data for Washington Wizards You got team summary salary data for Golden State Warriors You got team summary salary data for Los Angeles Clippers You got team summary salary data for Los Angeles Lakers You got team summary salary data for Phoenix Suns You got team summary salary data for Sacramento Kings You got team summary salary data for Dallas Mavericks You got team summary salary data for Houston Rockets You got team summary salary data for Memphis Grizzlies You got team summary salary data for New Orleans Pelicans You got team summary salary data for San Antonio Spurs You got team summary salary data for Denver Nuggets You got team summary salary data for Minnesota Timberwolves You got team summary salary data for Oklahoma City Thunder You got team summary salary data for Portland Trail Blazers You got team summary salary data for Utah Jazz

Step 3: Mung the and summarize the data

The next step involves a little data munging. The first thing we will do is convert the idSeason variable into a factor in order to preserve the variable’s order.

Next, since we only want to explore future available cap space we limit our items to the projected amount of cap space and cap holds {if you don’t know what a cap hold is I highly advise you to read about them here}.

Next, since we want to look at the data by team and season, we group by the season, team, and item. After this we summarize the totals, and convert the values into amount in millions. Then we convert the data from tidy long form to wide form.

Spreading the data creates NA values since certain teams have no cap holds in the future, to fix that we replace NAs with zeros. After that we create a new variable that is the sum of the amount of available cap space and the amount of projected cap holds. This is the number we will explore that shows nominally the amount of money available for teams to sign players while not being over the salary cap, note that after 2016-17 the salary cap number is a projection.

The final step involves a little more tidying to get the data frame into our desired data format.

Finally, we will select the only the variables we want to visualize, season, and projected cap space and add team slug as the row-name of the data.

team_cap_space <- all_team_data %>% mutate(idSeason = idSeason %>% factor(ordered = T)) %>% dplyr::filter(nameItem %in% c('amountCapSpace', 'amountCapHold')) %>% dplyr::select(idSeason, slugTeamYahoo, nameItem, value) %>% group_by(idSeason, slugTeamYahoo, nameItem) %>% summarise(value = sum(value, na.rm = T) / 1000000) %>% ungroup %>% spread(nameItem, value) %>% replace_na(list(amountCapHold = 0)) %>% mutate(amountSpaceLessHold = amountCapHold + amountCapSpace) %>% dplyr::select(idSeason, slugTeamYahoo, amountSpaceLessHold) %>% gather(item, amountSpaceLessHold, -c(idSeason, slugTeamYahoo)) %>% dplyr::select(-item) %>% spread(idSeason, amountSpaceLessHold) row.names(team_cap_space) <- team_cap_space$slugTeamYahoo

Warning: Setting row names on a tibble is deprecated.

plot_data <- team_cap_space %>% dplyr::select(-slugTeamYahoo)

Team cap space by season, excluding cap holds This data makes some assumptions that are not likely in reality including, player Early Termination Options aren’t exercised and both Team Options and Player Options are exercised. Put another way it assumes that the player finishes out the contract in its absolute term. While this may be an unreasonable assumption, it also means that we know now that in all likelihood teams will have more available cap space than we see here in the event Player/Team options are not exercised, and Early Termination Options are exercised.

Step 4: Calculate the optimal number of clusters This is an important step. There are many ways to try to optimize the number of clusters within a set of data. I personally prefer the mclust package which we will use today. For those interested in other possible clustering optimization methods this stackoverflow post is an absolute MUST read. team_clusters % Mclust() %>% .$z %>% dim %>% .[2] [1] 6 Step 5: Plot the clustered heat maps The last step is the to plot the heat maps. We will do 2 heat maps, 1 of using the nominal amount of projected cap space, and the other using the amount of projected cap space scaled at mean zero. Please note that this code will only work if you have the Myriad Pro font library installed. You can either install the font or just take the font out of the code. Plot 1: Unscaled heatmap unscaled <- plot_data %>% heatmaply( column_text_angle = 0, k_row = team_clusters, Colv = FALSE, Rowv = T ) %>% layout( margin = list(l = 130, b = 40), font = list( family = "MyriadPro-Cond", # Requires Myriad Pro font to be installed on your computer size = 12, color = "#7f7f7f" ), title = "NBA Cap Space by Team and Season, Less Cap Holds (millions)" ) Plot 1: Scaled heatmap scaled <- plot_data %>% heatmaply( column_text_angle = 0, k_row = team_clusters, Colv = FALSE, Rowv = T, scale = 'col' ) %>% layout( margin = list(l = 130, b = 40), font = list( family = "MyriadPro-Cond", size = 12, color = "#7f7f7f" ), title = "NBA Cap Space by Team and Season, Scaled Mean 0" ) Conclusion Now you have successfully explored the entire landscape of NBA salary through the 2020-21 season in only a few lines of code all thanks to R and it’s community of package developers! Until next time, keep learning and exploring, preferably in R.

---
title: "Exploring NBA team salary cap room with nbastatR and heatmaply"
author: "Alex Bresler"
date: "2016-07-07"
output: 
  html_notebook: 
    css: semantic_css/semantic.min.css
    fig_height: 5
    fig_width: 12
    toc: no
---
<script src="semantic_css/semantic.min.js"></script>
<br>
<p>This [R Markdown Notebook](http://rmarkdown.rstudio.com/r_notebooks.html) will provide you a quick tutorial on how to use my package, [nbastatR](https://github.com/abresler/nbastatR), and the [plotly](https://plot.ly/) extension, [heatmaply](https://github.com/talgalili/heatmaply) to create an interactive heatmap exploring the future cap space of the 30 NBA teams.</p>

<h4>Step 1: Load the required packages</h4>

Please note that if you don't have the following packages installed follow the code in the comments.  I also recommend that even if you have these packages installed that you run the code in the comments and update to the most recent development version of each package.

```{r load_packages, echo=T, message=FALSE, warning=FALSE, results='hide'}
lapply(
  c('heatmaply', # devtools::install_github('talgalili/heatmaply')
    'mclust', #install.packages('mclust')
    'dplyr', # devtools::install_github('hadley/dplyr')
    'plotly', # devtools::install_github('ropensci/plotly')
    'nbastatR', # devtools::install_github('abresler/nbastatR')
    'purrr', # devtools::install_github('hadley/purrr')
    'tidyr' # devtools::install_github('hadley/tidyr')
  ),
  library,
  character.only = T
)

options(digits = 4)

```

<h4>Step 2: Acquire the data</h4>

This is the most important step, bringing the data into R to explore.  In order to do that we will use `nbastatR` and its function `get_teams_yahoo_team_salary_data` which brings in team salary data from Yahoo's fantastic new team salary pages.

```{r get_data}
all_team_data <-
  get_teams_yahoo_team_salary_data(
    use_all_teams = T,
    nest_data = F
  )

```


<h4>Step 3: Mung the and summarize the data</h4>

The next step involves a little data munging.  The first thing we will do is convert the `idSeason` variable into a factor in order to preserve the variable's order.  

Next, since we only want to explore future available cap space we limit our items to the projected amount of cap space and cap holds {if you don’t know what a cap hold is I highly advise you to read about them [here](http://www.cbafaq.com/salarycap.htm#Q14)}.  

Next, since we want to look at the data by team and season, we group by the season, team, and item. After this we summarize the totals, and convert the values into amount in millions. Then we convert the data from [tidy](http://vita.had.co.nz/papers/tidy-data.html) long form to wide form.  

Spreading the data creates NA values since certain teams have no cap holds in the future, to fix that we replace NAs with zeros.  After that we create a new variable that is the sum of the amount of available cap space and the amount of projected cap holds.  This is the number we will explore that shows nominally the amount of money available for teams to sign players while not being over the salary cap, note that after 2016-17 the salary cap number is a projection.  

The final step involves a little more tidying to get the data frame into our desired data format.

Finally, we will select the only the variables we want to visualize, season, and projected cap space and add team slug as the row-name of the data.

```{r}
team_cap_space <-
  all_team_data %>%
  mutate(idSeason = idSeason %>% factor(ordered = T)) %>%
  dplyr::filter(nameItem %in% c('amountCapSpace', 'amountCapHold')) %>%
  dplyr::select(idSeason, slugTeamYahoo, nameItem, value) %>%
  group_by(idSeason, slugTeamYahoo, nameItem) %>%
  summarise(value = sum(value, na.rm = T) / 1000000) %>%
  ungroup %>%
  spread(nameItem, value) %>%
  replace_na(list(amountCapHold = 0)) %>%
  mutate(amountSpaceLessHold = amountCapHold + amountCapSpace) %>%
  dplyr::select(idSeason, slugTeamYahoo, amountSpaceLessHold) %>%
  gather(item, amountSpaceLessHold, -c(idSeason, slugTeamYahoo)) %>%
  dplyr::select(-item) %>%
  spread(idSeason, amountSpaceLessHold)

row.names(team_cap_space) <-
  team_cap_space$slugTeamYahoo

plot_data <-
  team_cap_space %>%
  dplyr::select(-slugTeamYahoo)
```

#### Team cap space by season, excluding cap holds

This data makes some assumptions that are not likely in reality including, player [Early Termination Options](http://www.cbafaq.com/salarycap.htm#Q59) aren't exercised and both [Team Options](http://www.cbafaq.com/salarycap.htm#Q59) and [Player Options](http://www.cbafaq.com/salarycap.htm#Q59) are exercised.  Put another way it assumes that the player finishes out the contract in its absolute term.  While this may be an unreasonable assumption, it also means that we know now that in all likelihood teams will have more available cap space than we see here in the event Player/Team options are not exercised, and Early Termination Options are exercised.

<br>
```{r results = 'asis', echo=F}
p2 <- 
  plot_data

rownames(p2) <-
  team_cap_space$slugTeamYahoo %>%
  paste0(
    "<a href ='",
    all_team_data %>% arrange(slugTeamYahoo) %>% .$urlTeamSalaryYahoo %>% unique,
    "' target='_blank'>",
    .,
    "</a>"
  )
formattable::formattable(p2)
```

<h4>Step 4: Calculate the optimal number of clusters</h4>

This is an important step.  There are many ways to try to optimize the number of clusters within a set of data.  I personally prefer the [mclust](http://www.stat.washington.edu/mclust/) package which we will use today.  For those interested in other possible clustering optimization methods [this stackoverflow post](http://stackoverflow.com/questions/15376075/cluster-analysis-in-r-determine-the-optimal-number-of-clusters) is an absolute **MUST** read.

```{r team_clusters, include=TRUE}
team_clusters <-
  plot_data %>% Mclust() %>% .$z %>% dim %>% .[2]
  
```

<h4>Step 5: Plot the clustered heat maps</h4>

The last step is the to plot the heat maps.  We will do 2 heat maps, 1 of using the nominal amount of projected cap space, and the other using the amount of projected cap space scaled at mean zero.  Please note that this code will only work if you have the [Myriad Pro](http://fontsup.com/font/myriad-pro-semibold-condensed.html) font library installed.  You can either install the font or just take the font out of the code.

<h5>Plot 1: Unscaled heatmap</h5>

```{r heatmap_salaries, include=TRUE}
unscaled <-
  plot_data %>%
  heatmaply(
    column_text_angle = 0,
    k_row = team_clusters,
    Colv = FALSE,
    Rowv = T
  ) %>%
  layout(
    margin = list(l = 130, b = 40),
    font = list(
      family = "MyriadPro-Cond", # Requires Myriad Pro font to be installed on your computer
      size = 12,
      color = "#7f7f7f"
    ),
    title = "NBA Cap Space by Team and Season, Less Cap Holds (millions)"
  )

```

```{r unscaled_plot, results = 'asis', echo=F, fig.align = 'center', fig.width=12}
unscaled
```


<h5>Plot 1: Scaled heatmap</h5>

```{r heatmap_salaries_scaled, include=TRUE}
scaled <-
  plot_data %>%
  heatmaply(
    column_text_angle = 0,
    k_row = team_clusters,
    Colv = FALSE,
    Rowv = T,
    scale = 'col'
  ) %>%
  layout(
    margin = list(l = 130, b = 40),
    font = list(
      family = "MyriadPro-Cond",
      size = 12,
      color = "#7f7f7f"
    ),
    title = "NBA Cap Space by Team and Season, Scaled Mean 0"
  )

```

```{r scaled_plot, results = 'asis', echo=F, fig.align = 'center', fig.width=12}
scaled
```

<h4>Conclusion</h4>

Now you have successfully explored the entire landscape of NBA salary through the 2020-21 season in only a few lines of code all thanks to R and it's community of package developers!  Until next time, keep learning and exploring, preferably in R.
