6 Degrees of separation in the NBA: why suspending it was inevitable

Visualizing the NBA’s chain of travel connections with Python (with data & code)

Visualising the network of connections in the NBA

As of the 11th of March, the National Basketball Association has suspended its season. It’s another significant development in 2020 that has been dominated by various COVID-19 related events, which has brought entire cities to complete standstill.

(Well, probably not a complete standstill. The human spirit is an amazing thing — take a look at this neighbourhood in Sicily.)

(That was not related at all; I just wanted to add something fun bright during these terrible times.)

As much as the good folks in Sicily are making the most the cards taht they’ve been dealt, the reality is tough. We are going through an era where reducing the risk of contagion is paramount, and mechanisms such as social isolation and self-quarantine are being deployed to this end.

Although it is saddening to see millions of people’s favourite pastimes like the NBA be postponed or delayed, it was probably also inevitable in this climate.

The fact is that few arguments exists for for anything other than suspension of sporting events, especially one with a network as wide the NBA. Doubly so in this case as a player has tested positive for the virus.

The NBA season involves an obscene amount of travel — almost 40,000 miles per team per season on average, and it doesn’t take long for one team to be eventually be connected to all other teams in the league.

NBA teams’ annual distances travelled (from this article that I previously wrote)

So, I would like to use this article to analyse and visually demonstrate just how interconnected the NBA really is, given their busy travel and playing schedule.

As usual, I am going to use Python for the analysis, and Plotly for the visualisation. I hope this is something of a useful, informative and interesting respite in this challenging times.

Preparation

Data

I include the code and data in my GitLab repo here ( sixdegs_leagues directory). So please feel free to play with it / improve upon it.

Packages

I assume you’re familiar with python. Even if you’re relatively new, this tutorial shouldn’t be too tricky, though.

You’ll need pandas and plotly . Install each (in your virtual environment) with a simple pip install [PACKAGE_NAME] .

Getting started

The goal of this analysis is to identify how long it would take for each team, before they would be connected to ‘chain of contacted’ teams with a source team from a particular date.

That is, connecting a team through a team that they’ve played who has played another team …. and so on, until they get to the source team. Essentially we’ll be playing six degrees of separation / Kevin Bacon, but with NBA teams, and visualising our results.

Load data

Load the schedule data csv into Python / pandas with

nba_sch_data = pd.read_csv(‘srcdata/2020_nba_schedule_fulldata.csv’, index_col=0)

The data contains an ‘ est_time ’ column that contains the game date (and tipoff time) on US Eastern time in string format. This isn’t as convenient for operating on, so let’s create a new ‘ datetime ’ column containing this data with the pd.to_datetime function.

nba_sch_data = nba_sch_data.assign(datetime=pd.to_datetime(nba_sch_data.est_time))

Load our dictionary of team colours, and make a list of teams with:

with open('srcdata/teamcolor_dict.pickle', 'rb') as f:

teamcolor_dict = pickle.load(f)

teamcolor_dict = {k.replace(' ', '_').upper(): v for k, v in teamcolor_dict.items()}

teams_list = nba_sch_data.home_team.unique()

Pick a ‘source’ team / date

Let’s pick a start date, and a source team. We can do it randomly; let’s pick March 10, 2020, which is the day before the league shutdown.

# Pick a random/particular "seed" team

seed_date = '2020-03-10' # YYYY-MM-DD

seed_tm = random.choice(teams_list)

In my case, seed_tm turns out to be the Kings (might be the only lottery they’ve won in a while).

Create a new DataFrame

To start with, we will create a new Pandas DataFrame capturing each team’s data related to how they fit into the chain of connections from the source team. The DataFrame would include:

team name ( team )

) whether they are connected on the chain of contacted teams ( contacted )

) date first connected on to the chain ( date )

) team that connected them to the chain ( con_from ), and

), and number of degrees of separation ( deg_sep ).

teams_contacted_list = [{'team': tm, 'contacted': False, 'date': None, 'con_from': None, 'deg_sep': None} for tm in teams_list]

teams_contacted_df = pd.DataFrame(teams_contacted_list)

Calculate degrees of separation

Broadly, our algorithm is going to:

Loop through the schedule data from on or after the seed_date ,

, evaluate if a game involves one team on the chain and one off the contact chain,

change the ‘off the chain’ team record to be ‘on’ the chain, and

record the degrees of separation.

To start coding that up, we set the initial row of data for the source team:

And then, we can iterate through the schedule data to find games when a team is being added to the ‘chain’ of contacts. Take a look at my implementation here:

What I’m doing is to look at statuses of each team in a game, and firstly whether there is a mismatch — if both teams are on the chain, or off the chain, no action is needed.

Then, I update the contacted status, and then for the team that’s not on the chain, the data is updated. The date contacted is based on the date of the game, and the degrees of separation is one added to the opposing team’s.

Inspecting the results:

(I went through that relatively quickly — let me know if you have any questions). The results look pretty sensible. Let’s get to the visualisations!

Visualising degrees of separation

Histograms

First of all, we’ll plot a histogram to see how many degrees of separation teams have from our random source team:

fig = px.histogram(teams_contacted_df, x='deg_sep', template='plotly_white')

fig.update_layout(bargap=0.5)

fig.show()

Histogram of degrees of separation

Interestingly, most of them (19 out of 29) are only 3 degrees or fewer of separation! Wow!.

Number of days’ separation

Similarly, we can take a look at the number of days taken for teams to come into contact.

Here I am creating a numpy array called days, which is the number of days’ of separation from the origin of the chain.

Then, I call a horizontal bar graph with Plotly Express. It should be relatively straightforward. (Plotly.py bar chart documentation if you are not familiar.)

days = (teams_contacted_df.date-min(teams_contacted_df.date)) / np.timedelta64(1, 'D')

teams_contacted_df = teams_contacted_df.assign(days=days) fig = px.bar(

teams_contacted_df, x='days', y='team',

orientation='h', template='plotly_white',

labels={'team': 'Team', 'days': 'Days until contacted.'}

)

fig.update_layout(bargap=0.2)

fig.show()

Days’ of separation for each team from the (randomly chosen) Sacramento Kings, from Mar 10 2020

In this scenario, the furthest a team is from the source team is only 15 days, and about half the league would come in to contact within 9 days.

We can add more detail into this plot — remember how we captured the team that added each team onto the chain? Let’s add that data. We could do that visually by team colour. In Plotly Express, we simply need to pass the column name (‘ con_from ’) to the parameter ‘ color ’.

# Days of separation, grouped by source team

fig = px.bar(

teams_contacted_df, x='days', y='team',

orientation='h', template='plotly_white', color='con_from',

labels={'team': 'Team', 'days': 'Days until contacted', 'con_from': 'Contact source:'}

)

fig.update_layout(bargap=0.2)

fig.show()

Days’ of separation from the (randomly chosen) Kings, grouped by previous link to the chain

That’s a lot more informative — although the colours are arbitrary and confusing. I might for instance think that the Lakers were added to the chain by the Celtics (due to the green). Let’s change the colours.

There are different way to do this — you could for instance pass a list of colours to the px.bar function call under the color_discrete_sequence parameter in their order of apperance.

I did it here by looping through the data under the hood of the fig object:

for i in range(len(fig['data'])):

fig['data'][i]['marker']['color'] = teamcolor_dict[fig['data'][i]['name']]

Rendering the figure once again:

Days’ of separation from the (randomly chosen) Kings, with ‘real’ team colours.

It’s not perfect — it does suffer from the problem of some team colours being similar, but I think it works well and draws your eyes to the right places.

Interesting, isn’t it? Predictably, most of the propagation is done by the original team (Kings), but it’s only a small percentage actually. Most of the work is done by its ‘child’ nodes — and that’s the power of network propagation.

Can we plot this on a map? Yup, let’s go for it.

Mapping the future

We’re going to first of all need arena coordinates — load this file where I’ve done just that, and create a new ‘ teamupper ’ column, to make the team name formatting consistent with our other dataframes.

arena_df = pd.read_csv('srcdata/arena_data.csv', index_col=0)

arena_df = arena_df.assign(teamupper=arena_df.teamname.str.replace(' ', '_').str.upper())

Arena locations

As a first exercise, let’s briefly plot these locations to check:

# Load mapbox key

with open('../../tokens/mapbox_tkn.txt', 'r') as f:

mapbox_key = f.read().strip()



# Plot a simple map

fig = px.scatter_mapbox(arena_df, lat="lat", lon="lon", zoom=3, hover_name='teamname')

fig.update_layout(mapbox_style="light", mapbox_accesstoken=mapbox_key) # mapbox_style="open-street-map" to use without a token

fig.update_traces(marker=dict(size=10, color='orange'))

fig.show()

This code creates a new ‘scatter_mapbox’ type plot with Plotly Express, plotting each row of arena_df at each latitude and longitude. I set a marker size and colour here also, which is optional.

It’s important to note that because this call uses the ‘light’ style which requires a Mapbox token. You can get a free one with Mapbox, which is what I have. Alternatively, use mapbox_style=”open-street-map” instead to use it without a key. It’ll work just fine, although I don’t think it looks as good. (If you are interested, this Plotly doc goes into further detail.)

NBA arena locations

Mapping connections

We have a DataFrame ( teams_contacted_df ) which captures when each team was added to the chain of connections, and by whom.

Extending this data, we’ll add the location of the origin team and the destination team. The data doesn’t indicate the team’s travel, more the path of connections. For example, if the L.A. Lakers are added to the chain by the San Antonio Spurs, but the game is in L.A., the ‘origin’ location would still be in San Antonio.

Then, armed with this data, we can create a new plot — one where each row from one team to another is represented by a line.

Again, there are multiple ways to do this. I chose to do this by creating a new data series for each connection. (There’s an argument to create a new series for each day, or for each team also. I will demonstrate what I did, but feel free to try your own variations!)

So what I am doing here is to use classic Plotly ( plotly.graph_objects ), and:

Initialise a new figure

Add the first trace, which is all of the arenas

For each row of the teams_contacted_df DataFrame, add a new trace of a line connecting the ‘origin’ team arena, to the ‘destination’ team arena.

DataFrame, add a new trace of a line connecting the ‘origin’ team arena, to the ‘destination’ team arena. Each trace is given the name of the date of the connection, and the ‘origin’ and ‘destination’ team names (which you see in the legend)

Representing each team’s connections on a map

You’d have noticed a few things — one, the map loads at a very low zoom level, we are not showing any country boundaries, and there is no title. Let’s fix all that:

Which results in this plot:

Team’s connections on a map, with more formatting

I think it’s quite informative, but it’s lacking some information still. I would prefer to look at the map and see immediately which arena it is (I’m an Australian and still get Houston/Dallas/San Antonio confused), and also to see what degree of separation each city was.

Finishing touches — on-map text

This plot allows us to add text to the map directly along with our marker. Let’s add text to display the team name, and on the next line how many degrees of separation that team has had from our source.

The text simply needs to be passed on as a list; we can built it by:

txt_list = list()

for i in range(len(arena_df)):

teamupper = arena_df.iloc[i].teamupper

temp_row = teams_contacted_df[teams_contacted_df.team == teamupper]

temp_txt = (

temp_row.team.values[0].split('_')[-1]

+ '<BR>' + str(temp_row.deg_sep.values[0]) + ' degrees of sep.'

)

txt_list.append(temp_txt)

Then, each team’s text becomes something like: ‘ HEAT<BR>5 degrees of sep. ’

Being D3.js based, simple HTML tags like <BR> will be rendered on page, which is neat.

So now the call for arenas can include this list:

fig.add_trace(go.Scattergeo(

mode="markers+text",

lon=arena_df['lon'], lat=arena_df['lat'], marker=dict(size=8, color='slategray'),

hoverinfo='text', text=txt_list, name='Arenas'

))

Note also that the value of mode parameter in the Scattergeo call has changed to “markers+text” . This tells Plotly to render the text data on the screen also.

The final result looks like this:

Team’s connections on a map, with text on the map

Just like that, we can plot all 30 teams’ locations, the ‘network connection’ path to the ‘source’, and the connections’ details onto an interactive map!