I’ve been using Python for data exploratory analysis and, through my studies, a big variety of visualization tools was presented to me. I started with matplotlib, then proceeded to make more complex plots with seaborn, and interactive models with Bokeh and Plotly. Each one of these tools work in a different way and are capable of doing different things. Here is a summary of my thoughts and experience with them.

I will be presenting some visualization examples for each tool. All of them are based in the Iris Dataset, which can be imported with seaborn with:

Matplotlib

Matplotlib is the basis for static plotting in Python. Many other visualization tools are built on top of it, such as seaborn and Pandas DataFrames plot method. It is versatile meaning it is able to plot anything, but non-basic plots can be very verbose and complex to implement. On the other hand, basic plots such as histograms and scatter plots are very easy to do. If your goal is to make simple and quick basic plots, there is no big drawback. However, for more complex such as pair plots and heat maps, it is interesting to use some higher-level tools.

The matplotlib’s website contains a very complete documentation, some tutorials and a huge variety of examples, which makes its use a lot easier.

We can quickly build a standard histogram with the following lines:

However, we need to overlap plots for a simple customization like coloring according the species:

Considering such difficulty for a basic task, I recommend using seaborn for plotting anything multi-dimensional.

Seaborn

Seaborn is my go-to tool for static plotting. It integrates very well with Pandas DataFrames, making it possible to assign column names to the axis, which makes the code clearer. The plots are naturally prettier and easy to customize with color palettes. There are many built-in complex plots like cluster maps, which are very convenient for analyzing data quickly and effectively.

Making complex plots with low effort is accomplished by some features like automatic labels for the axes and grouping with specialized support for categorical variables. Even complex tasks like multi-plotting are abstracted to a high-level grid structure.

Pair plots are my favorite statistical plot to understand mixed data, and it’s automatically generated with the pairplot method:

Documentation is my favorite thing about seaborn. Everything is minimalist and very well organized. All parameters are well explained in a simple language and didactic examples are provided along with the documentation page, demonstrating how some parameters can affect the visualization. So, everything you need is in one place, and in a non-overwhelming way.

Looking at the examples, you can find the most interesting plots to analyze a specific dataset. For example, I thought the strip plot would give me good insights for the Iris Dataset.

The style similarity between this plot and matplotlib’s is noticeable. In fact, the return type of stripplot() is a matplotib axes, meaning we can use all its methods if we want to add or change something from what seaborn generated. In the strip plot example, we added a legend with plt module.

Bokeh

Bokeh is a web-focused tool for creating interactive plots. It supports streaming datasets and integrates with Pandas by using the ColumnDataSource class. There are built-in tools that can be included on a widget box attached to the plot and used to explore the data in an interactive way, such as zooming in, selecting and overlaying a crosshair.

Plotting with this tool is different because it is built around glyphs. bokeh.models is a low-level interface that implements glyphs like Line and Circle , which are visual shapes with properties attached to the data, like its coordinates, size, and color. The higher-level interface bokeh.plotting offers methods to display glyphs, but it is not as abstracted as one may expect. To plot a histogram, for example, the edges of each bin must be calculated to compose two lists, which are passed to plotting method quad that displays Quad glyphs. The obtained plot is pretty and interactive, but a lot of manual work is required.

Bokeh server is a Flask Blueprint for building interactive web applications. This can also be accomplished with bokeh.embed , whose methods can translate plots into JavaScript and HTML components. Both allow callback functions, bringing interaction to another level by reacting to sliders or any other information input supported by HTML.

Bokeh documentation is organized according to its modules, meaning one has to understand the modules to navigate and find specific information. Apart from that, it’s a complete documentation that provides some examples as well.

When plotting multiple plots with the same range, Bokeh synchronizes them so interactions made in one can affect the other ones. This is a really useful tool for exploring and comparing details of the data.

The module bokeh.io generates a .html file containing the plot and all its interactive features. It can be run with or without a server, making it very easy to deploy.

Plot.ly

Plot.ly is a JSON-based plot tool for interactive visualization. Every graph can be defined by a JSON object with two keys named data and layout . It also offers tools for interaction, and they are much more straightforward to use and customize. Most plots come with hover labels and legends for groups, both interactive by default, which is awesome.

We can build Flask applications to deploy Plot.ly by using Dash framework. With Dash we can use callback functions to interact with HTML components such as dropdowns and buttons.

One very exciting thing about Plot.ly is the Chart Studio. It makes possible to create and edit plots with an online tool anyone could use. Most plot parameters can be set in a human-friendly interface, and the data can be imported from a CSV file or a connection to a SQL database. The one thing I missed was a way to export the created plot in the JSON format. This being the case, one has to manually reproduce the creation on a computer friendly language, such as a python dictionary.

There is a library full of categorized examples editable with Chart Studio, making this tool very easy to explore. On the other hand, documentation is a mess. Everything is at the same big and heavy webpage, and the best way to navigate is using the native search bar. Description of JSON objects used to specify details of the plot are displayed vertically in a tree, but everything is static, and as the page is really big, it is hard to scroll through. There is also missing information about parameters and most of them are not well explained.

The plot type can be defined with type item inside data definition. In the example, I define a different violin for each measurement for each species, coloring according to the measurement. By defining the legendgroup of each plot, clicking on the legend will make multiple plots appear or disappear.

There is a reasonable amount of manual work required, but the final code looks clean and the plot is useful, interactive and pretty.

Final Thoughts

All of these tools offer a huge variety of plots and choosing between them can be even more difficult than choosing the tool itself. Whichever tool you choose, I would recommend taking a close look at the examples provided and understanding how they are organized. I recently found out a website called From data to viz, where there are some interesting guidelines to choose the plot type according to features of the dataset being analyzed.

My choice of seaborn when I don’t need interaction features is due to plotting and customization being very much easier. For interactive plots, however, I don’t think there is a better choice. My personal way of thinking makes me like Plot.ly better, thanks to its dictionary structure. I find Bokeh’s glyphs approach to be more design guided.

You can make a better choice by looking at the galleries of these tools and trying them out yourself. Remember to also take a look at the documentation so you can have an idea of how much work you will have in order to make your own plots. Anyway, I hope my experience helps you in some manner.