Shots distribution for the Colorado Avalanche for the 2018–2019 season

Introduction

When working with small spaces like a mobile application, it’s very easy to run out of room when trying to visualize large datasets. We get stuck with charts that have overlapping data points, making it almost impossible to discover patterns in the data. The more data we try to visualize, the more apparent this issue becomes. One solution is to sample or filter our data, but we still run into the risk of hiding patterns to our users.

In this article, we’re going to try and visualize the locations of all shots made by the Colorado Avalanche for the 2018–2019 NHL season. We’ll be using the data collected by the folks at MoneyPuck, which is an amazing resource for hockey analytics. Since we’ll be dealing with roughly 2,500 points of data, we’ll be building a simple backend service in Python to access and modify our shot data. This saves us from having to store a ton of data on the client, and will help us in the long run when we start dealing with more advanced charts. Our backend will be a simple Flask application on a local machine that will include a couple of endpoints for our Flutter app.

Our first attempt will be a scatter plot, a chart generally used to explore the relationship between two numerical variables (x and y coordinates of the rink). As we build our chart, we’ll find that scatter charts are easily susceptible to overplotting. By plotting each data point as a circle, our chart on a mobile device would end up looking something like this:

Yuck.

Fortunately, there are several workarounds that we can use to reduce this type of overplotting. We’ll take some of these ideas, and see what we can do with Flutter to build a chart that can inspire meaningful insights. This article dives straight into code, so some basic knowledge of Dart and Python is recommended.

Building the chart

In my last article, we leveraged Flutter’s CustomPainter to draw a radar chart by painting lines and shapes onto a canvas. The logic behind this chart was pretty straightforward, and only required some trig functions from dart:math to figure out the position and angle of the charts. We’ll run through the same approach for our charts, and try to separate our logic into separate reusable components.

Starting with the rink

The awesome dataset provided by MoneyPuck contains the physical coordinates for each shot taken. Instead of using a boring Cartesian chart, we can take the x/y coordinates of our data points and plot them directly onto a rink outline.

The quickest approach to building our rink would be to simply use an image. We could overlay the data points on top of the image using a Stack widget. Our data points should fall in the correct spot if we can ensure the dimensions/ratio of the image matches the standard NHL rink dimensions. Another approach would be to leverage our CustomPainter skills and build the rink outline ourselves. Using the standard dimensions, we can easily define the positions of our rink features (face-off circles, lines, goal crease, etc.) We’d have full control in the design of the rink, and would be able to adjust the outline during run-time. We can also guarantee that we’ll never run into any resolution issues. Since this is an article around CustomerPainter , it’s only fitting that we use this approach :)

We mainly care about the location of a shot relative to the opponent’s net, so we only need to paint half a rink. The following code snippet should give you a good start on building a rink outline with CustomPainter .

Rink outline painter

We can wrap our rink outline custom painter in a CustomPaint widget that fits the width and height of its parent. We can use the handy AspectRatio widget to force our chart to maintain the rink ratio of 100/85 (we’re plotting half a rink).

Outline of a standard NHL ice rink

Grabbing our data

As previously mentioned, we’ll be using a simple Flask server to retrieve our data points. We’ll be using the pandas library to read the data from a .csv file, and transform it into x/y coordinates. We’ll start with a single GET endpoint that will fetch the shots data for a given team (all shots are grabbed if a team is not specified.) We’ll need to adjust the location of our data points to ensure that we’re plotting against a single net. Our data points should range from [0:100] on the x-axis, and [-51:50] on the y-axis. We can provide a team_code query parameter so that we can filter through the .csv file. For Colorado, we can use the COL team code.

Basic Flask application

On the Flutter side, we can leverage the popular http package to communicate with our new GET endpoint. The package is Future based, so we can easily write an async method for each endpoint. For simplicity, I have used ngrok to open a public URL to the Python Flask server. This allows me to use the same URL to connect to my mobile device or an Android emulator. You could also simply point to your localhost endpoint or use the 10.0.0.2 proxy with an emulator. I’m also using the vector_math package to serialize the shots data, which is helpful when dealing with 2-D / 3-D data points.

Getting shots data from our Flask server

Building the scatter plot

To accurately plot our data points over our rink outline, we’ll need to scale the points to fit the height and width of our canvas. This will be a common problem across all of our charts, so we’ll build a re-usable abstract custom painter that can scale our data points into offsets.

Base Cartesian plot painter

The implementation of our scatter plot painter becomes pretty easy: draw a circle for each scaled data point. Once we have our painter, we can now overlay our scatter plot on top of our rink outline chart.

Scatter plot painter

Stacking our ice rink outline and scatter plot

Reducing circle size and including transparency to avoid overplotting

The first scatter plot is pretty much unreadable. Without any notable peaks, it’s impossible to determine the actual frequency and distribution of our shots. We can start to reduce the overplotting by adding some transparency or reducing the size of our circles. Reducing the circle size allows us to fit more data points before they start overlapping. Transparency allows the overlap in our circles to display as darker areas in the chart, making it easier to identify the common shot locations. Looking at the bottom two scatter plots, we can start to see that most shots are located near the inner slot of the rink.

Taking the next step with density charts

If we tried to increase our dataset (i.e. including shots from other teams or other seasons), we’d eventually end up with our first over-plotted scatter plot. Instead of trying to shove thousands of data points into a tiny chart, let’s try and build a 2-D histogram. We can split the dataset into evenly size intervals (bins), and count the number of data points that land inside each interval. We can then normalize the frequency within a range of [0:1], generate a color gradient, and assign a color to each bin. The implementation of our new histogram density painter could look something like this:

Histogram density painter

In our main rink chart, we can now simply swap our ScatterPlotPainter with our new HistogramDensityPainter :

2-D histograms with varying divisions / color gradients

Getting there! Regardless of the size of our data-set, we should be able to get a good picture of our shot distribution. Using a color gradient with sharper contrasts (used in the fourth chart) can further help us emphasize the peaks in our distribution.

We can still do better.

The shape of our distribution is pretty rough, and increasing the number of intervals only dilutes the number of shots that fall within each interval. Let’s take the next step, and try to “smooth” our density chart using kernel density estimation (or KDE). This algorithm produces a probability density function based on the data points that allow us to estimate the density of shots at any location in the rink. We can now increase the number of divisions in our chart without diluting the data. While the algorithm is currently unavailable in Dart and Flutter, we can leverage our Python backend server and use the KDE function available in the SciPy library. We can build a new endpoint that generates a “density” value at each coordinate of the rink. We’ll return a 3-D vector for each shot (x,y, and z as our density value).

Endpoint for generating our KDE values

Our 2-D histogram can now be simplified, since we don’t need to group our shots data into bins. We’ll draw a “pixel” for each density value calculated by our KDE function. The more pixels we generate, the smoother our graph is going to look, but the longer our chart takes to render. We’ll stop at 100 divisions (10,000 pixels), since we start incurring heavy rendering times with diminishing returns on the smoothness of the chart.