Now that we have Bokeh loaded it is time to build the plot. There's a lot going on in the next notebook cell, but each piece is relatively straightforward; I'll try to explain what is going on in each chunk of code.

First we need some colour. We can use the clusters for that, but that means we need to map cluster numbers to colours and reserve gray for noise points. We can construct a palette and then use the LinearColorMapper for this. There is also some code to set a fill_alpha value which we will come back to later.

Next we create a ColumnDataSource . This just connects up our dataframe with bokeh so it can embed data (like the names of the subreddits!) directly in the html for the plot (and not have to round trip back to Python to get that information). We also have some custom javascript for handling alpha values -- again, we'll come back to this.

The next step is to create a figure to plot to, and then add a hover tool (which we wire up to display the subreddit and cluster). We then add circles to the figure, taking x, y and color values for the circles from the ColumnDataSource we created. Then there is some custom callbacks which we'll skip over for now, and the rest, prior to show(plot) is just customising the display style a little.

The result is a scatterplot, colored by cluster, which we can zoom and pan around in, complete with tooltips. The plot itself is pure html and javascript, and we can embed it in a webpage independent of any Python -- it is standalone at this point.