Bolstered by the brain trust at DataBlick, this post considers various uses for Data Blending in Tableau, and argues for more formal data preparation as the best alternative when blending breaks down.

For Starters

If you're just getting started, first some useful resources:

All of the 2014 conference materials are an excellent resource. There are ten different talks with the keyword “blending", and my Tableau Conference Television makes it easy to find what you’re looking for.

So now, on with the show!

Slide Projector

As an analogy, think of Tableau as a slide projector for your data where each Tableau Data Source is a slide.

Born from a hackathon among Tableau’s engineers, Data Blending is indeed a clever hack! It allows us to place more than one slide into the projector at once :)

Starting in version 8, "Data Blending 2” also allows us to manually turn off & on the linking fields, regardless of whether those fields are utilized in the view. The difference between DB1 & DB2 is one of the Master Tableau Concepts. Cool stuff! And worthwhile to understand.

Yet, robust as it is, there is a time & place for blending in Tableau. Much of the time, my "in the flow” preference will be to use the projector with a single tidy data source.

The more complicated the requirements become, the more frustrating my across-the-blend experiences tend to be. And there are also occasions when data blending is perfect.

So, let's sift through some scenarios to separate wheat from chaff.

Great for Measures

One great use for Data Blending is to summarize measures from your secondary data source. This is where blending is at its best & you can play to its strengths.

As an example scenario: sales data originates from your Data Warehouse, upon which you've built a single tidy data source. Your regional sales manager has revised her quarterly plan, which she sends you by e-mail in a spreadsheet to compare with the actuals.

This is the perfect use for data blending in Tableau.

The revised plan numbers are hot off the press. They aren't available in your primary data source, and the task at hand is to compare aggregate measures (actuals vs. plan), by linking on one or more common dimensions (like region, salesperson, or category).

Exploratory Prototyping

If the general preference is to use the slide projector with a single tidy data source, then frequently a data discovery phase must also exist (during which we will research & design that data source).

Or, perhaps data discovery is the only goal. We only want answers, and we want them quickly!

In these exploratory prototyping modes, some sacrifices to performance, "flow" and the end-user experience are happily made in exchange for rapid data discovery.

Contributing ideas for the post, Jonathan Drummey said:

Data Blending is great for one-off analyses or proofs of concept where the speed of using a blend is the advantage.

Then when it comes time to have something for production (where there's more complexity to the data structure, a need for something more maintainable, higher volumes, etc.) I'll do the necessary data prep.

Scaffolding

Using a scaffold data source to build up a temporary structure for the purpose of painting data onto it, "scaffolding" is another Master Tableau Concept.

And scaffolding is also a great example of how data blending can, at times, make the impossible possible inside of Tableau.

A great example of a scaffold would be if you want to build a calendar view: something similar to what Interworks & Andy Kriebel have described here and here.

If instead of the Gregorian calendar you need to display the transactions inside of your company's fiscal calendar, then you can use the fiscal calendar as a scaffold data source and blend your transactions by linking on, for example:

[order date] == [fiscal date].

This is a quick win, and easy to do with data blending.

Problem Context

Yet, data blending is not a panacea.

While transaction data does frequently originate from one source, today's reality is that additional measures & attributes external to this primary data must often be analyzed together with the primary data.

And often the requirements are more complex than the relatively simple scenarios above. We frequently need to slice, dice, filter, and perform calculations upon those secondary attributes and measures.

Tableau’s strength as a visualization engine is in rendering views of your primary data, and in building interactive dashboards on that primary data.

So while blending can be extremely helpful, the “blend" in Tableau also comes with a fair number of limitations, especially when attempting to build a production polished, highly interactive dashboard.

For examples of these limitations:

Blended Boolean Column Totals explains why column totals break down across the data blend

Blending often builds a temp table in the data source. And from a performance perspective: Temp Tables Take Time

Idea 2250 explains that when the linking dimension is not in the view, non-additive aggregates from the secondary source, like COUNTD() , MEDIAN() , and the RAWSQLAGG_xxx functions are not supported

, , and the functions are not supported And Idea 2273 provides a good long list of other limitations

The Problem Definition

Because disparate data often arrives at differing levels of detail, and the requirements are often interwoven & complex, building a highly interactive dashboard with multiple facets that each cross the blend can easily degenerate into a Rubix Cube of frustrations.

Just when you’ve worked around one limitation to get the to greens line up.. you encounter another one that breaks the reds. And fixing the reds can break the whites, etc.

As a result in my own recent experience, with data coming from three distinct sources: the only option with blending was to always bring all the detail into the view. And then from there, to use table calcs to summarize back up again to the desired level of detail.

Computing across multiple dimensions, including time, those table calcs quickly became complex. And because of the granular data volume, the table calcs also performed poorly.

So it just wasn’t practical to achieve the desired results in a production quality dashboard via blending, across multiple data sources at differing granularities; and with each data source providing dimensions to filter by.

The good news ? Just as soon as those disparate data sets are joined together into a single, tidy data source then building is a breeze again!

Think Data Preparation

When you find yourself facing a Rubix Cube of frustration, working around one limitation only to encounter another, this is the signal that you're trying to jam too many slides into the projector all at the same time.

Regardless of which approach you choose, the goal of your data prep is to unify those disparate sources into a single, tidy data set.. at a single, common granularity.

In other words: you want one slide for your Tableau projector.

Some Alternatives

1. Alteryx

A flexible & multi-faceted swiss army knife, Alteryx enables the point & click construction of customized, maintainable, repeatable, and self-documenting data manipulation pipelines.

It’s no wonder why so many data workers today are using Alteryx as their tool of choice for data prep, prior to visual analysis in Tableau.

2. SQL & Scripting Languages

What Alteryx can do quickly via point & click, the talented analyst can also accomplish for FREE with a little bit of time, SQL, Python, R, or similar data transformation languages.

3. Case Statements

Like many of the tricks up my sleeve, this creative solution comes from DataBlick Joe.

If your secondary dimension values are really just labels for your primary dimensions, and/or they are used to apply higher-level (coarser) groupings, then you can easily bring those secondary dimensions into your primary data source with a calculated field.

CASE [primary dimension]

WHEN “dimension value A” then “secondary dimension value 1"

WHEN “dimension value B” then “secondary dimension value 2"

WHEN “dimension value C” then “secondary dimension value 3"

WHEN “dimension value D” then “secondary dimension value 4"

END

This trick will work even if your dimensions are of a high cardinality, with hundreds of entity values. To build the large case statement, just follow these instructions from Alexander Mou.

# In Summary

Keep calm and use the flow. As a rule of thumb, Tableau works best when all of the dimensions are in a single data source, at a common level of detail.

Data Blending is often great, but not always. And when you find you have too many slides for the projector: get prepared.

A few initial data preparation steps to unify & tidy, prior to visualization with Tableau, will keep your visual analysis work in the happy zone.

Now you’re playing to Tableau’s strengths again!

Word Count: 1,529

References