Photo by Mika Baumeister on Unsplash

Spreadsheet softwares allow us to analyse large amount of data to get new insights on process or business. We all have tons of spreadsheets full of data sitting on our hard drives, but it is not always easy to exploit this data. The main reason is that we often tend to organize our spreadsheets for human analysis.

The thing is, computers do not think the same ways as we do, us mere humans. In this article I will give you some best practices to organize your spreadsheet for data analysis and visualization with your favorite spreadsheet software or, better, with external tools such as Data Studio.

Dimensions and Metrics

Dimensions and Metrics (ruler by monkik on FlatIcon)

It is important to understand that any data can be divided between Dimensions and Metrics.

Metrics measure, well, things. Metrics are always numbers and can be aggregated using formulas such as sums, average, etc. You will also sometime come across the term fact or measure, which are synonyms.

Dimensions are used to describe your metrics. Dimensions can be any kind of data, but is generally a date or some text (e.g. a property like color, a category like gender, etc.). You will also sometime come across the term segment.

Dimension values, the data contained by the dimension, are names, descriptions or other characteristics of a category (e.g. for the color dimension: red, blue, yellow).

Good spreadsheet design is all about storing your dimensions and metrics in a way that will allow easy filtering and pivots on your dimensions and easy aggregation for your metrics.

Unpretty your spreadsheets

Before studying how to store the data let’s first look at the spreadsheet organisation. Designing for data analysis by automated tools is not the same as for sharing some graph to your favorite colleague from the marketing department.

Step 1 : unpretty your spreadsheets

Remove merged cells : merged cells can’t be used in pivot and are not well supported by external tools

Remove multi-line headers

Remove titles, formatting, file version, date and authors (use the file history and versioning for that)

Untidy your data

Now let’s take a look at the way to store your data :

Each row must represent one unaggregated measure

measure Each cell must contain only one dimension or measure

Each column must have a unique name

Step 2 : Untidy your data (credit : Ethan P. White et al. in Ideas in Ecology and Evolution)

We also naturally tend to store our data already pivoted, because it often makes sense for what we are doing or it saves us from some repeated data. However, the key to fully exploit your data is to store it unprocessed in the rawest possible way, and that means unpivoted.

If you are not sure what that means, just remember to use one and only one column for each type of dimension and metric.

In the exemple above, HabitatX, HabitatY and HabitatZ are dimension values of the same dimension : they must be stored in the same column. The thing they measure, abundance, must also be stored in the same column. After unpivot we are able to filter and pivot on the habitat dimension and aggregate the abundance metric, which was not possible with the original format.

Separate source data from analysis

Step 3 : separate data from analysis

You should keep your raw source data in one tab, untouched, then in one or several other tabs, have the views and analysis of the original data. To import data from another tab, use the following import statement :

Import data from another tab

Or inside a formula :

Import data from another tab in a formula

Wrap up

Following those steps, here is the transformation of my sample sheet :

When using the pivot function we go from this situation where pivot dimensions are named from cryptic column letters :

To this much meaningful situation :

Notice how automated suggestions also now make much more sense and how we separated raw data from analysis in different tabs.

We are now also able to filter and aggregate by site and KPI dimensions which was not possible before :

To go a step further, the data is now ready to be analysed or exposed to the world using a data visualization tool like Data Studio !

Thanks for reading ! If you enjoyed this article, don’t hesitate to share it.

I am learning everyday, if I made any mistake please feel free to correct me and add your suggestions in the comment section.