We offer an interactive, decision tree-style tool, which examines the data you have and proposes a set of potentially appropriate visualizations to represent your dataset.

By Yan Holtz

Data visualization is a key step in the data science process. Choosing the right graphic to explore data or to convey insight efficiently is an everyday task for a data scientist.

From Data to Viz is a classification of chart types based on input data format. It comes in the form of a decision tree leading to a set of potentially appropriate visualizations to represent the dataset.

A decision tree for data visualization

The project is built on two underlying philosophies. First, that most data analysis can be summarized in about twenty different dataset formats. Second, that both data and context determine the appropriate chart.

Thus, the suggested method consists in identifying and trying all feasible chart types to find out which suits your data and idea best.

Once this set of graphic identified, data-to-viz.com aims to guide you toward the best decision.





Figure 1: Data to Viz poster displaying the decision trees. See online for an interactive version.Prints are available.

Example

Let’s consider a dataset composed by one numeric and one categorical variable. For instance, the quantity of weapons exported (numeric) per country (categorical). What kind of visualization can we apply to it?

This data format is represented by a branch of the decision tree. We have a dataset composed by both numeric and categorical data, we have one variable of each, and we have only one observation per group. The decision tree suggests many appropriate chart types:





Of course, the most common solution is probably to build a barplot. The lollipop plot is a good alternative with so many groups, resulting in a less cluttered figure. Note that both can be done in a circular version, giving a more eye-catching but less accurate output. Last but not least, treemap and circular packing are good options if you’re interested in how the whole is divided.

Here is an overview of the chart you get when applied to the weapon dataset:





Figure 2: five representations of the same dataset: from left to right, top to bottom: barplot, lollipop plot, circular barplot, circular packing, treemap.

Content

The website does not only lead you to a set of potential visualization. It also aims to help you picking the right one. Several sections help in this task:

– portfolio – an overview of all chart possibilities. For each, an extensive description is given, showing variations, pros and cons, common pitfalls and more.

– stories – for each input data format, a real-life example is analyzed to illustrate the different chart types applicable to it. It also explains which graphic is good to answer which question.

– caveat gallery – a list of common dataviz pitfalls, with suggested workarounds.

About the caveat gallery

The best way to visualize data efficiently is probably to avoid the most common pitfalls. The caveat gallery lists about 40 common caveats, and the list is still growing.

For instance, it points out that a barplot is much more insightful when ordered. Moreover, it is a good practice to make it horizontal when you have long labels. If you have many bars, the lollipop plot is probably a good alternative to declutter the graphic and avoid a Moire effect. Of course, the many downsides of pie and donut charts are described, even if they are still suggested in the decision tree.





Figure 3: overview of the gallery of dataviz caveats

Building your chart.

From Data to Viz aims to give general advices for data visualization in general and is not targeting coders especially.

However, 100% of the charts are made using R, mostly using ggplot2 and the tidyverse. The reproducible code snippets are always available. The biggest part of the website is built using R Markdown, using a good amount of hacks described here.

The website is tightly linked with the R graph gallery and the Python graph gallery. Once you’ve identified the graphic that suits your needs, you will be redirected to the appropriate section of the gallery to get the code in your favourite language.

Conclusion

Dataviz is a world with endless possibilities and this project does not claim to be exhaustive. However, it should provide the user with a good starting point. Moreover, it is a valuable tool for students and people willing to learn more about data visualization best practices.

The project is hosted on Github. Any comment, issue or pull request is very welcome. You can also reach me on twitter (@R_Graph_Gallery) or drop me an email at yan.holtz.data@gmail.com.

Bio: Yan Holtz is a passionate data analyst and bio-informatician currently working for the Queensland Brain Institute of Brisbane. He has a special attraction for data visualization which led him to build the R and the Python graph galleries. He can be reached at: yan.holtz.data@gmail.com.

Related: