Over the past days I had the pleasure of spending time with Tableau Zen Master Joe Mako. It was a pleasure not only for the education (which was exceptional), but for the example: which was extraordinary. Joe is a kind & gentle man, devoted to improving the planet with compassion and improving the lives of others.

One vehicle for his devotion has been 10,000 hours of reverse engineering Tableau. Hence, when Joe describes The Art and Science of Tableau, those with an interest in breaking through the conceptual barriers to reach a Master's Plateau listen.

Joe's talk was aimed at the Relational end of the learning spectrum. If not step-by-step instructions, he spoke of those underlying concepts that a data chef should seek to understand. He taught that understanding the relationship between these concepts will enable you to venture beyond cook-book instructions to create something new.

Below are the various topics that Joe has touched upon yesterday. For each of them I provide a brief summarization, with links to additional learning.

Data Densification

Data Blending 1 vs. Data Blending 2

Domain Padding

Scaffolding

The Four Pill Types

Data Densification

Each mark within a vis in Tableau represents one record of data. Yet, not every such record is sourced from the underlying source of your data. To render a vis, Tableau's data interpreter engine will reshape the source data, adding records as required, to render the requested visualization.

As an aside, for a Jedi level understanding of the data interpreter engine that performs this reshaping, a TCC13 session follows the lifespan of a click from browser to core and back again, through the desktop & server data interpreter and rendering systems. That session was titled Understanding Tableau’s Rendering Pipeline and the Impact on Your Views. A recording can be found here:

Sessions

So then! Data densification. In the sheet below, with no sales for Tea in the South, we see that Tableau has printed 15 marks onto the vis. Each of those 15 marks is sourced from the underlying Coffee Chain sample data.

However, when replacing sum of sales with either a Running Total or an INDEX() table calc: data densification occurs. What were once 15 marks from the database are now 16 marks rendered onto the vis. Both sheets are rendered from the same query to the underlying data .

Densification, then, is the result of the data interpreter: reshaping the data and adding rows as necessary to render your requested vis. In the rendering pipeline, densification occurs after querying the data source & before rendering the vis.

In other scenarios, another such reshaping action performed by the data interpreter is called an UNPIVOT. With this post already at 2,000 words, I'll need to save the UNPIVOT for another day.

Disabling Densification

How do you make this extra mark go away, without impacting the computation? In other words: How do you turn off data densification?

In this example:

Compute Using is already set to Product Type Place a second Product Type pill onto the Detail shelf Change the Product Type on the Columns shelf from a Dimension to an Attribute

Here is where an important awareness of the 4 Pill Types is required, along with the impact of using either a Discrete Dimension or a Discrete Measure on the Columns/Rows while at the same time employing a table calc for the same field with compute using.

Joe began his talk with the 4 Pill Types. Myself I've started here with Densification to show a real-world example of why understanding the 4 Pill Types is so important.

This Coffee Chain example above is a concise & visual summary of Joe's own explanation of data densification, for which the original conversation is available here:

Table Calculations - Why apply them 'By Cell'

The above is also just one example. It is important to understand that densification occurs. And the form it takes will vary based on the layout of your visual canvas.

Densification depends on pill type, pill arrangement, whether domain padding is being requested via Show Missing Values, compute using settings of table calcs, mark types, data structure/density, etc.

Order of Operations

Like many of the master concepts, one's understanding of data densification is highly correlated to their understanding the internal order of operations within Tableau. A constantly moving target, this internal order of operations seems to change with each new version. Hence, I would argue, the onus is now on Tableau to publish official documentation, and to maintain it with each new release. Transparency is vital helping us understand & accurately anticipate the behaviour of the tool. The mysterious black box is nobody's friend.

Data Blending 1 vs Data Blending 2

In Joe's words:

DB1: blending occurs at the level of detail rendered in the vis

blending occurs at the level of detail rendered in the vis DB2: blending occurs at a level of detail deeper than the marks in the vis

blending occurs at a level of detail deeper than the marks in the vis So it follows, with DB2, a summarization of the blended data set is rendered

Jonathan Drummey offers a deeper explanation. Prior to Version 8, “Data Blending 1″ was based upon:

The dimensional relationships between the primary & secondary data sources either automatic, or customized via Data->Edit Relationships… The dimension pills in the view on Rows, Columns, Pages, or a Marks Card

And starting from Version 8, “Data Blending 2″ offers new and useful complexity. DB2 is based upon:

The dimensional relationships between the primary & secondary data sources either automatic, or customized via Data->Edit Relationships… Those dimensions with linking turned on Linking dimensions may or may not be in the view

Linking for a related dimension that is in the view can be disabled

For further reading on DB1 vs. DB2, the detailed content from Jonathan is found here:

Identifying (and Using) Quick Filter Selection Status

Domain Padding

Domain padding is what you get by turning on Show Missing Values for dates or bins. It works between the minimum and maximum range of existing values in a pane.

For example, in the workbook attached to idea #1796 , there are 4 rows of data with times that range only from midnight until noon.

When we turn on domain padding via Show Missing Values, the data is padded between the range of existing values. That is, only between midnight and noon:

This is what domain padding does. The crux of idea #1796 is to allow for the minimum and maximum values for domain padding to be determined by the user.

In the absence of this user defined range, Jonathan describes the need for work-arounds to either pad the data manually with Custom SQL or a query, or to partially pad the data using one or two rows that have the min and/or max values for the field in question & use Tableau's domain padding from there.

So why is Domain Padding listed here as a Master Tableau Concept? When combined with the scaffolding techniques described below, creative use of Domain Padding opens new frontiers to the Tableau data chef.

Jonathan offers a detailed review of domain padding in his post:

Tableau Data Blending, Sparse Data, Multiple Levels of Granularity, and Improvements in Version 8

.. upon which Michael Sandberg builds further in Part 4 of his series on data blending:

An Introduction to Data Blending - Part 4 (Data Blending Design Principles)

Scaffolding

Scaffolding is a coin termed by Joe Mako to describe the process of

Using a "scaffold data source" in Tableau to build up a temporary structure for the purpose of painting data onto it.

During his talk yesterday, Joe described the process of achieving flow by understanding & working with (not against) the shape that Tableau expects from your underlying data. He showed an example where, by brute force, someone had built a pixel-perfect dashboard through the meticulous alignment 80 separate worksheets. What a tedious pain!

With his scaffolding approach, Joe was able to reproduce the same dashboard using only four worksheets. His achieving flow message clearly had our attention.

Joe gives a detailed overview of his novel scaffolding technique in a Think Data Thursdays session. To study up on this one, set aside an hour of focus time & search for Data Scaffolding in the TDT Video Library:

Space: TDT Video Library | Tableau Support Community

For a powerful look at just what can be done with a combination of Scaffolding and the creative use of Domain Padding, see Jonathan's post below on Basic Monte Carlo Simulations. You can render up to 1 Million marks on Tableau Public (up to 2.5 Million on Tableau Desktop), with a data source that uses only two rows.

Basic Monte Carlo Simulations in Tableau

This, to me, is the Master Tableau Mindset.

The Four Pill Types

This topic is where Joe began his talk on The Art and Science of Tableau. And I've saved it for last. So without further ado here is the Master's key:

Discrete vs. Continuous affects rendering

Dimension vs. Measure affects computation

Any pill can be any of the four combinations

Any field can be used multiple times, in multiple ways

Please rest here for a moment and ponder those points above. Let them sink in. Spend time with them. Allow them to become a fundamental understanding upon which your Tableau work migrates towards flow.

For each and every pill, the choice is up to you. If you want to use two or more pills from the same field in your vis, carry on! Tableau is agnostic to the data. Just think of how creative you can be!

Discrete vs. Continuous = Render

The choice between blue = discrete and green = continuous ultimately drives how the vis will render. This is an important distinction that affects the behaviour of marks, filters, colors, dates, and more.

Dragging a discrete (blue) pill onto the rows or columns shelf produces a hierarchy of headers. It doesn't matter if the pill is a dimension or a measure. This understanding opens new doors that were previously insurmountable. For example, the ability to sort by a table calculation.

Tableau normally allows a dimension to sort by a regular aggregate, but not by a table calculation. Now with our improved understanding of the 4 pill types' flexibility, we know that a second copy of the table calc, used as a discrete pill to the left of the dimension you wish to sort, will sort your table calc values alphanumerically. This trick is number 4 out of 15 such table calc innovations published by Jonathan Drummey.

Top 10 Table Calculations - The Next N, Where N >= 15

For more classic material on continuous vs. discrete pills, Tom Brown from the Information Lab offers a complete write-up of all the various details here:

Blue things and Green things

Dimension vs. Measure = Computation

The choice of whether a pill is treated as a dimension or measure drives how computations are performed.

Dimension = GROUPBY

= GROUPBY Measure = Aggregation

Dimension pills are the equivalent of a GROUPBY in the underlying query to the data source. Adding new dimensions changes the means by which calculations will occur when measure pills are brought into view.

Placing a measure onto the vis renders marks, which represent an aggregation of the measure's values. The computation of that aggregation will depend upon the layout of the dimension pills.

In the two screen shots below, after the HOUR dimension is added, notice how the same SUM(Number of Records) is now performed to GROUP BY the HOUR.

Every pill is flexible. To make a change, you can simply right-click & choose between measure, dimension, discrete, or continuous. If you need to use two pills for the same field, carry on! Tableau is agnostic to the data.

What's more, as we've seen earlier, specific tasks are only accomplished in this way. Sorting by a table calc requires a second copy of the same table calc, this time as a discrete pill. The disabling data densification example places discrete pill onto the Columns/Rows while at the same time employing a table calculation using a second pill for the same field & choosing itself as the compute using.

In Summary

Joe Mako's visit to San Francisco for the Tableau 8.2 Launch Event was a special occasion. He spoke to the "relational end" of the learning spectrum and focused on achieving flow by working with (not against) the tool. He touched on a variety of concepts during a brief period of time. And many of those concepts can only be described using complex vocabulary:

Data Densification

Data Blending 1 vs. Data Blending 2

Domain Padding

Scaffolding

The Four Pill Types

In this post I have summarized those concepts & provided links to more detailed learning material. Many of the sources I've cited here required extensive searching (with specific google hacks) to find. This illustrates just how disparate these sources of "master knowledge" currently are.

In compiling this post I've observed that many roads lead back to Joe Mako and Jonathan Drummey. They are good people doing good things. If you want to follow in their footsteps and also contribute good to the world, please consider lending your skills to datakind:

DataKind | DataKind

May we all gain a "relational understanding" of these underlying concepts in Tableau and reach the Master's Plateau. And may we all apply our skills towards solutions with a greater good than selling advertising.

The version 8.2 launch event was far more than just a good party.

Twitter / DataPsientist: It's a @tableau party with ...

Word Count: 2,199

References