Over on Kaggle, there is an interesting data set of over 130K wine reviews that have been scraped and pulled together into a single file. I thought this data set would be really useful for showing how to build an interactive visualization using Bokeh . This article will walk through how to build a Bokeh application that has good examples of many of its features. The app itself is really helpful and I had a lot of fun exploring this data set using the visuals. Additionally, this application shows the power of Bokeh and it should give you some ideas as to how you could use it in your own projects. Let’s get started by exploring the “rich, smokey flavors with a hint of oak, tea and maple” that are embedded in this data set.

As a pun, it’s a bit on the dry side but I think it has a strong finish.

To get your palette ready, here’s a small tasting of the app we’ll be building:

For this specific dataset, I approached the problem as an interested consumer, not as a datascientist trying to build a predictive model. Basically, I want to have a simple way to explore the data and find wines that might be interesting to purchase. As a wine consumer, I’m mostly interested in price vs. ratings (aka points). An interactive scatter plot should be a useful way to explore the data in more detail and Bokeh is well suited for this kind of application.

Here is a snapshot of the data we will explore in the rest of the article:

I made some minor cleanups and edits of the data which I won’t go through here but all the changes are available in this notebook .

For this analysis, I chose to focus on only Australian wines. The decision to filter the data was somewhat arbitrary but I found that it ended up being a large enough dataset to make it interesting but not so large that performance was a problem on my middle-of-the-road laptop.

I will not spend much time walking through the data but if you are interested in learning more about the data, what it contains and how it could be a useful tool for further building out your skills, please check out the Kaggle page .

I based this example on an application I am developing at work to interactively explore price and volume relationships. I have found that the learning curve is a little steep with the Bokeh app approach but the results have been fantastic. The gallery examples , are another rich source for understanding Bokeh’s capabilities. By the end of this article, I hope you feel the same way I do about the possibilities of using Bokeh for building powerful, complex, interactive visualization tools.

The second method for creating visualization is to build a Bokeh app that provides more flexibility and customization options. The downside is that you do need to run a seperate application to serve the data. This works really well for individual or small group analysis. Deploying to the world at large takes a little more effort.

Bokeh has two methods for creating visualizations. The first approach is to generate HTML documents that can be used standalone or embedded in a jupyter notebook. The process for creating a plot is very similar to what you would do with matplotlib or some other python visualization library. The key bonus with Bokeh is that you get basic interactivity for free.

Building the App

If you are using Anaconda, then install bokeh with conda:

conda install bokeh

For this app, I am going to use the single file approach as described here.

The final file, is stored in the github repo and I will keep that updated if people identify changes or improvements in this script. In addition, here is the processed csv file.

The first step is to import several modules we will need to build the app:

import pandas as pd from bokeh.plotting import figure from bokeh.layouts import layout , widgetbox from bokeh.models import ColumnDataSource , HoverTool , BoxZoomTool , ResetTool , PanTool from bokeh.models.widgets import Slider , Select , TextInput , Div from bokeh.models import WheelZoomTool , SaveTool , LassoSelectTool from bokeh.io import curdoc from functools import lru_cache

The next step is to create a function to load data from the csv file and return a pandas DataFrame. I have wrapped this function with the lru_cache() decorator in order to cache the result. This is not strictly required but is useful to minimize those extra IO calls for loading the data from disk.

@lru_cache () def load_data (): df = pd . read_csv ( "Aussie_Wines_Plotting.csv" , index_col = 0 ) return df

In order to format the details, I am defining the ordering of the columns as well as the list of all the provinces we may want to filter by. For this example, I hard coded the list but in other situations you could dynamically build the list off the data.

# Column order for displaying the details of a specific review col_order = [ "price" , "points" , "variety" , "province" , "description" ] all_provinces = [ "All" , "South Australia" , "Victoria" , "Western Australia" , "Australia Other" , "New South Wales" , "Tasmania" ]

Now that some of the prep work is out of the way, I will get all of the Bokeh widgets set up. The Select , Slider and TextInput widgets capture input from the user. The Div widget will be used to display output based on the data being selected.

desc = Div ( text = "All Provinces" , width = 800 ) province = Select ( title = "Province" , options = all_provinces , value = "All" ) price_max = Slider ( start = 0 , end = 900 , step = 5 , value = 200 , title = "Maximum Price" ) title = TextInput ( title = "Title Contains" ) details = Div ( text = "Selection Details:" , width = 800 )

Here’s what the widgets look like in the final form:

The “secret sauce” for Bokeh is the ColumnDataSource. This object stores the data the rest of the script will visualize. For the initial run through of the code, I will load with all the data. In subsequent code, we can update the source with selected or filtered data.

source = ColumnDataSource ( data = load_data ())

Every Bokeh plot supports interactive tools. Here’s what the tools look like for this specific app:

The actual building of the tools is straightforward. You have the option of defining tools as a list of strings but it is not possible to customize the tools when you use this approach. In this application, it is useful to define the hover tool to show the title of the wine as well as its variety. We can use any column of data that is available to us in our DataFrame and reference it using the @.

hover = HoverTool ( tooltips = [ ( "title" , "@title" ), ( "variety" , "@variety" ), ]) TOOLS = [ hover , BoxZoomTool (), LassoSelectTool (), WheelZoomTool (), PanTool (), ResetTool (), SaveTool () ]

Bokeh uses figures as the base object for creating a visualization. Once the figure is created, items can be placed on the figure. For this use case, I decided to place circles on the figure based on the price and points assigned to each wine.

p = figure ( plot_height = 600 , plot_width = 700 , title = "Australian Wine Analysis" , tools = TOOLS , x_axis_label = "points" , y_axis_label = "price (USD)" , toolbar_location = "above" ) p . circle ( y = "price" , x = "points" , source = source , color = "variety_color" , size = 7 , alpha = 0.4 )

Now that the basic plot is structured, we need to handle changes to the data and make sure the appropriate updates are made to the visualization. With the addition of a few functions, Bokeh does most of the heavy lifting to keep the visualization updated.

The first function is select_reviews. The basic purpose of this function is to load the full dataset, apply any filtering based on user input and return the filtered dataset as a pandas DataFrame.

In this particular example, we can filter data based on the maximum price, province and string value in the title. The function uses standard pandas operations to filter the data and get it down to a subset of data in the selected DataFrame. Finally, the function updates the description text to show what is being filtered.

def select_reviews (): """ Use the current selections to determine which filters to apply to the data. Return a dataframe of the selected data """ df = load_data () # Determine what has been selected for each widgetd max_price = price_max . value province_val = province . value title_val = title . value # Filter by price and province if province_val == "All" : selected = df [ df . price <= max_price ] else : selected = df [( df . province == province_val ) & ( df . price <= max_price )] # Further filter by string in title if it is provided if title_val != "" : selected = selected [ selected . title . str . contains ( title_val , case = False ) == True ] # Example showing how to update the description desc . text = "Province: {} and Price < {} " . format ( province_val , max_price ) return selected

The next helper function is used to update the ColumnDataSource we setup earlier. This is straightforward with the exception of specifically updating source.data versus just assigning a new source.

def update (): """ Get the selected data and update the data in the source """ df_active = select_reviews () source . data = ColumnDataSource ( data = df_active ) . data

Up until now, we have focused on updating data when the user interacts with the custom defined widgets. The other interaction we need to handle is when the user selects a group of points via the LassoSelect tool. If a set of points is selected, we need to get those details and display them below the graph. In my opinion this is a really useful feature that enables some very intuitive exploration of the data.

I will go through this function in smaller sections since there are some unique Bokeh concepts here.

Bokeh keeps track of what has been selected as a 1d or 2d array depending on the type of selection tool. We need to pull out the indices of all selected items and use that to get a subset of data.

def selection_change ( attrname , old , new ): """ Function will be called when the poly select (or other selection tool) is used. Determine which items are selected and show the details below the graph """ selected = source . selected [ "1d" ][ "indices" ]

Now that we know what was selected, let’s get the latest dataset based on any filtering that the user has done. If we do not do this, the indices will not match up. Trust me, it took me a while to figure this out!

df_active = select_reviews ()

Now, if data is selected, let’s get that subset of data and transform it so that it is easy to compare side by side. I used the style.render() function to make the HTML more styled and consistent with the rest of the app. As an aside, this new API in pandas allows for a lot more customization of the HTML output of a DataFrame. I’m keeping it simple in this case, but you can explore more in the pandas style docs .

if selected : data = df_active . iloc [ selected , :] temp = data . set_index ( "title" ) . T . reindex ( index = col_order ) details . text = temp . style . render () else : details . text = "Selection Details"

Here is what the selection looks like.

Now that the widgets and other interactive components are built and the process for retrieving and filtering data is in place, they all need to be tied together.

For each control, make sure updates call the update function and include the old and new values.

controls = [ province , price_max , title ] for control in controls : control . on_change ( "value" , lambda attr , old , new : update ())

If there is a selection, call the selection_change function.

source . on_change ( "selected" , selection_change )

The next section controls the layout. We setup the widgetbox as well as the layout .

inputs = widgetbox ( * controls , sizing_mode = "fixed" ) l = layout ([[ desc ], [ inputs , p ], [ details ]], sizing_mode = "fixed" )

We need to do an initial update of the data, then attach this model and its layout to the current document. The last line adds a title for the browser window.

update () curdoc () . add_root ( l ) curdoc () . title = "Australian Wine Analysis"

If we want to execute the app, run this from the command line:

bokeh serve winepicker.py

Open up the browser and go to http://localhost:5006/winepicker and explore the data.