Effectively visualize data across time to tell better stories

Build clean, easy-to-read time-series data visualizations to support your narratives with Python and Plotly.

The 2018–2019 NBA Regular Season

It goes without saying, but time is integral to almost everything that we do, including in building narratives that inform our understanding of the world around us. This also remains the case when it comes to data-based storytelling, where we often need to visualise data across time. Whether it is a company’s stock prices, a candidate’s election polling data, or a country’s population numbers, change in the data over time is a critical element of the narrative.

But visualising time-series data can be a challenging proposition. This is especially so when there are multiple data series, criss-crossing each other in small spaces like an unruly horde of 6 year olds playing soccer for the first time.

Take a look at the chart below, which was posted by a StackOverFlow user.

Can you read this chart? (from StackOverFlow)

As you might imagine, the poster was seeking help on how to make the chart clearer to read. It is very difficult to even identify individual traces in the plot, let alone make heads or tails of any patterns that might exist.

I note that this chart only includes four traces.

On the other hand, take a look at this beauty from Andrew Flowers at FiveThirtyEight.

This chart includes just under 60(!) traces. But FiveThirtyEight’s use of (de)saturation, judicious use of colours and labelling changes the game drastically. Even out of these 60 data series, the key 10 series are shown clearly, allowing the reader to identify each one and to follow it over time in support of the prose written.

Obviously we would all like our charts to look like the latter more than the former. So in this post I would like to talk about doing just that, using my usual data visualization tool (Plotly) to visualise data across time. As for the data, let’s use the 2018–2019 NBA season data to attempt to re-tell its story; of course, we could have easily used data from stocks, population of a town, or my weight over time. Let’s get going.

Before we get started

Data

I include the code and data in my GitLab repo here ( nba_storytime directory). So please feel free to play with it / improve upon it.

Packages

I assume you’re familiar with python. Even if you’re relatively new, this tutorial shouldn’t be too tricky, though.

You’ll need pandas and plotly . Install each (in your virtual environment) with a simple pip install [PACKAGE_NAME] .

Get your data

The dataset that I’ve chosen is a collection of each NBA team’s results from games. Load the csv file with:

all_records_df = pd.read_csv('srcdata/2018_2019_season_records.csv', index_col=0)

The dataset includes every game, including from the playoffs. The dataset includes game dates as a string, so we can filter them out by date:

records_df = all_records_df[pd.to_datetime(all_records_df.date) < pd.to_datetime('2019-04-12')]

It looks like this:

The data includes each team’s cumulative net scores, games played, wins, and losses by dates of these games.

In other words, the data tells the stories of each team’s season, over time. Let’s take a look at how to best tell each team’s story, how to highlight the key moments, and to discover any interesting stories through data.

Storytelling with Data

Introduction

Let’s plot the data as a scatter graph. We can generate the graph to look at cumulative wins per games played:

import plotly.express as px

fig = px.scatter(records_df, x='date', y='wins', color='team')

fig.show()

Wins vs Games played — scatter

This is a pretty good start, but it’s quite a crowded graph, especially at the start of the season. A few of the colours stand out here and there, but it seems clear to me that there is too much overlapping going on, and it would be clear to pick out particular datasets, or to notice patterns in the data.

Let’s take a look at some other ways to visualise this data.

Lining up the data

What if we connect up the scatter plot points to convert it to a line graph?

(Code-wise, this will do it: fig.update_traces(mode=’markers+lines’) )

Wins vs Games played — line graph with markers

Okay, that’s not bad. Because we want to tell a story of a team’s progression, a line graph is probably a better choice than a scatter. (If you would prefer to not see the points, specify: fig.update_traces(mode=’lines’) )

Wins vs Games played — line graph only

Still, there are thirty traces here and it’s hard to pick out which trace is doing what. Also, our legend on the side are in seemingly a random order, making discovery difficult. Let’s fix that.

All we need to do is to pass on an argument of category parameter values, as a list in whatever order you would prefer. I just sort the data here alphabetically, and pass it through like:

tm_names = [i for i in records_df.team.unique()]

tm_names.sort()

fig = px.scatter(records_df, x='games', y='wins', color='team', category_orders={'team': tm_names})

With ordered names

The legend on the right is now in alphabetical order, allowing us to find and isolate the correct traces more easily.

Speaking of finding traces — did you notice that the colours of each trace have changed? That’s because these colours are arbitrarily assigned . (Of course they are — we haven’t specified any colour mappings!) That’s fine in some cases, but we can do better. Oftentimes, a data source will have a colour that the reader will associate with it. In this case, why don’t we assign team colours to each trace?

Colourful language

I include a pickled dictionary file that I’ve put together, in the format of {teamname: teamcolour} . Load it with:

import pickle

with open('srcdata/teamcolor_dict.pickle', 'rb') as f:

team_col_dict = pickle.load(f)

And we use a list comprehension to construct a list of colours in the same order as our list of teams ( tm_names ) that we used above:

team_cols_list=[team_col_dict[tm] for tm in tm_names]

Simply pass this list as an argument for color_discrete_sequence :

fig = px.scatter(records_df, x='games', y='wins', color='team', category_orders={'team': tm_names}, color_discrete_sequence=team_cols_list)

fig.update_traces(mode='markers+lines')

fig.show()

Now with team colours!

It’s not perfect — too many of the team colours are some variant of dark green, and red. But, at least it is now a little easier to get an idea of which trace might refer to which team.

Remember the FiveThirtyEight graph that we looked at above? That graph showed the majority of traces in desaturated grey, helping the coloured traces stand out. Let’s try something similar here.

The easiest thing to do is to modify our list of colours that we passed on. Simply keep colour of the trace that we’re interested in, and change all other colours to a grey. There’s a ton of ways to do this, but I did it by constructing a whole new list:

base_col = '#C0C0C0'

team_cols_list = list()

for i in range(len(tm_names)):

tm = 'TORONTO_RAPTORS'

if tm_names[i] == tm:

team_cols_list.append(team_col_dict[tm])

else:

team_cols_list.append(base_col)

Plot the graph in the exact same way, and you’ll see:

Highlight one trace in particular (the eventual NBA Champions, in this case)

Want to highlight multiple traces? That’s easy to do. Let’s add the Raptors’ division rivals.

base_col = '#C0C0C0'

hero_teams = ['TORONTO_RAPTORS', 'PHILADELPHIA_76ERS', 'BOSTON_CELTICS', 'BROOKLYN_NETS', 'NEW_YORK_KNICKS']

team_cols_list = list()

for i in range(len(tm_names)):

if tm_names[i] in hero_teams:

tm = tm_names[i]

team_cols_list.append(team_col_dict[tm])

else:

team_cols_list.append(base_col)

fig = px.scatter(records_df, x='games', y='wins', color='team', category_orders={'team': tm_names}, color_discrete_sequence=team_cols_list)

fig.update_traces(mode='markers+lines')

fig.show()

Atlantic Division Records (sorry, Knicks fans)

That’s great, but with just one minor problem. Some of the traces (like the black, Brooklyn Nets’ trace) are obscured behind the grey traces. I understand that this is due to the way Plotly renders a scatter plot. The traces are essentially rendered top to bottom on the legend. Because the Nets’ name comes up towards the top, it is buried towards the bottom.

So let’s shuffle the order of our team names. We do that by constructing our team names list differently — after which the identical remaining code can be used.

hero_teams = ['TORONTO_RAPTORS', 'PHILADELPHIA_76ERS', 'BOSTON_CELTICS', 'BROOKLYN_NETS', 'NEW_YORK_KNICKS']

tm_names = [i for i in records_df.team.unique() if i not in hero_teams]

tm_names.sort()

tm_names = tm_names + hero_teams

Better, right? (Still sorry, Knicks fans)

Finishing Touches

Subplots

Our plot is starting to look pretty good — so let’s just add a couple of finishing touches on them. One — I would like to compare the two ‘Conferences’ in the league, and highlight the Southwest division on the West.

Add the division teams to our list, and pass ‘conference’ to our facet_col parameter to create the two subplots:

# Separate Conferences, and add highlights for

hero_teams = [

'TORONTO_RAPTORS', 'PHILADELPHIA_76ERS', 'BOSTON_CELTICS', 'BROOKLYN_NETS', 'NEW_YORK_KNICKS',

'HOUSTON_ROCKETS', 'SAN_ANTONIO_SPURS', 'DALLAS_MAVERICKS', 'MEMPHIC_GRIZZLIES', 'NEW_ORLEANS_PELICANS'

]

tm_names = [i for i in records_df.team.unique() if i not in hero_teams]

tm_names.sort()

tm_names = tm_names + hero_teams

team_cols_list = list()

for i in range(len(tm_names)):

if tm_names[i] in hero_teams:

tm = tm_names[i]

team_cols_list.append(team_col_dict[tm])

else:

team_cols_list.append(base_col)

fig = px.scatter(records_df, x='games', y='wins', color='team',

category_orders={'team': tm_names}, color_discrete_sequence=team_cols_list, facet_col='conference')

fig.update_traces(mode='markers+lines')

fig.show()

Subplots

By separating the two groups of data into subplots, we can immediately make out interesting contrasts between the two conferences. It is clear that the subplot on the right side (Western conference) includes far more traces that end up higher, indicating more teams with higher win totals. Except for the one outlier trace, this chart would indicate that the league’s worst teams in the 2018–19 season were predominantly in the Eastern conference.

Plot themes/templates

This is super neat, and I just learned about it so I thought I’d briefly share here — even though I am learning how to use them. Plotly includes a ‘themes/templates’ feature, which allows you to quickly change appearances of plots.

I will likely delve into more detail on another post — but for now, just be aware that passing a template parameter will let you change from a host of plot formats, like:

fig = px.scatter(records_df, x='games', y='wins', color='team', category_orders={'team': tm_names},

color_discrete_sequence=team_cols_list, facet_col='conference', template="plotly_white")

With ‘plotly_white’ theme

‘plotly_black’ theme

Maybe you’re more of a ggplot2 person

The trace colours haven’t changed due to my specifying the team colours for traces, but theming will change default colourmaps also. Check out the documentation here.

Titles and font sizes

For completeness, here is a version of the plot with title, subplot name & axis titles all sorted out for formatting — to produce something close to what I will be using in an analytics article.

fig = px.scatter(records_df, x='games', y='wins', color='team', category_orders={'team': tm_names},

color_discrete_sequence=team_cols_list, facet_col='conference', template="ggplot2",

title='2018-19 NBA Regular season wins',

labels={'games': 'Games played', 'wins': 'Season Wins'}

)

fig.for_each_annotation(lambda t: t.update(text=t.text.replace('conference=', 'Conference: ')))

fig.update_traces(mode='markers+lines')

fig.show()

All of the above might be pretty straightforward, except for the .for_each_annotation method.

What I’m doing here is to replace parts of the automatically generated subplot heading text (like ‘conference=Eastern’ ) with ‘Conference: ‘ which to my mind is more readable.

This is done by a lambda function, where Plotly’s .for_each_annotation does the job of going through each annotation object and applying the lambda.

Here is the result:

Here’s one that I made earlier

This is seemingly pretty simple data, but in this form, the data is just so much easier to digest than our first plots. If you don’t believe me — just scroll back up, and take a look at the first couple of plots that we generated. I can wait.

Right?

By highlighting individual traces, we can quickly show not only how these particular teams were placed within the context of the conference, but also in the sea of their own divisions. And each trace’s gradient shows hot (steep) or cold (flat) streaks that they went through at various stages of the season.

Additionally, online Plotly charts offer interactivity with mouseover data, which serves the users well in not having to go back and forth between the graph and the legend.