This post helps you to understand how the granularity & the shape of your underlying data will affect the visualization work you do in Tableau.

I was recently stumped by a common problem, one for which Jonathan Drummey back in 2012 has started a page to document the many scenarios in which this type of complication can occur.

My particular scenario was Headcount, when given the Arrival & Departure Dates. And Jonathan's collection of similar scenarios is Utilization, Queues, Census, Throughput for Blocks, Floors, Units, Rooms, Offices, etc. - What do you do?

The most interesting thing about each of the scenarios in Jonathan's collection is not their individual solution in isolation. But rather, the underlying pattern behind those solutions: the what they share in common.

And when Joe Mako helps me get through something on a Sunday afternoon, you know the answer is worth sharing!

The number one, most important facet of learning Tableau, and learning from Joe, is to recognize the patterns that recur. By recognizing common patterns when working with data, and by learning the behaviors of Tableau, one learns to reach a flow state with similar encounters in the future, even while the details may vary.

So let’s look at the patterns.

The Scenarios

shift starts & shift ends for employees

utilization rates with start time & end time

work orders that arrive, take different amounts of time to process, and are then are completed

patient records with admit date & discharge date

etc.

The Pattern

What each scenario has in common is the shape of the data

One row per incident, with two date columns per row. One date for the beginning & another date for the end

And the type of question being asked is always something like:

Among these observations, compare the difference between the two date values

Tableau Prefers Tidy

In his paper on the subject of tidy data in the Journal of Statistical Software³, Hadley Wickham draws this definition:

Each variable forms a column Each observation forms a row Each type of observational unit forms a table

To which, I would add:

4. Related tables share a common column that allows them to be linked

Tableau Prefers Tidy Data. And rightly as an analyst, you also prefer tidy data!

A Discussion of Shape

But wait there. My data is tidy.

For all of these scenarios, every data set has a single column per variable. Each collection of observations is comprised in a single table. And each observation is in a new row.

The survey data has one new row for every survey response

The work order data has one new row per work order

The patient records have one new row per patient

etc.

So here we make the distinction..

Lookup vs. Transactional

The data structure, or shape, for each of these scenarios is indeed perfectly tidy: as a lookup table. And the tidy lookup shape of these data sets can easily answer lookup questions, like:

Does Garrett arrive before Cecilia ?

Who is allergic to Nuts ?

How many Gluten Free will arrive on Wednesday ?

etc.

When comparing the open & close dates, this lookup shape very easily allows Tableau to calculate the number of open work orders for one specific date. On April 26th, for example.

But the lookup shape does not easily answer these transactional questions across multiple dates:

On each day, across all dates, how many Gluten Free are coming for dinner ?

On each day, during the entire month of March, what was the count of open work orders ?

etc.

Because the observations begin & end at a varying pace, the lookup data shape doesn't allow for Tableau to easily perform these calculations.

In fact, asking transactional questions of a lookup data shape is nerve wracking!

The Solution

So the common pattern to each solution in Jonathan's collection of scenarios is to reshape the data to make it transactional.

Some scenarios have solved their problem by reshaping with SQL, some may prefer to choose Alteryx. I've reshaped my headcount data using R.

But it doesn’t matter how. The point is, transactional questions call for tidy transactional data.

One record for each transaction

What’s Interesting

What’s interesting here is that, by the time Joe could set me straight, I had found a way to answer my question without reshaping the data. That approach required a scaffold & complex table calculations.

So while it may be true, that if you torture yourself for long enough then you can get the lookup data to talk; and while it may be fun to explore what's possible with scaffolding & table calculations, those complex solutions are difficult to maintain. And they aren't very flexible.

Reshaping is Easy

Moreover, if you're working in a spreadsheet, then reshaping your lookup data to be transactional for these scenarios needn't require SQL, or Alteryx, or R.

In this case, making your lookup data transactional is literally as easy as duplicating the data set & inserting a new column.

Half of the duplicated records should now have the date_type = “begin date” , and half the duplicated records should now have the date_type = “end date” .

In the work orders example:

transaction_type = "open" and

and transaction_type = "close"

It’s that easy.

Granularity

Important to note! Now you’ve now doubled the number of records in your data set. So, for these lookup questions:

How many total patients have we treated ever

How many Gluten Free exist in the data set

You must be absolutely careful to avoid duplicating the value of your aggregate measures.

Each of those old-fashioned lookup questions should still be answered from the original lookup table. Or, perhaps, from your new transactional data, but with a filter to consider only half of the duplicated records.

In Summary

The great Noah Salvaterra once said,

"The granularity is always in your face."

And the great Joshua Milligan writes,

In the end, this post is all about granularity.

Word Count: 989

References