SQL Recipes Growth Rates Creating Pareto Charts to visualize the 80/20 principle

Creating Pareto Charts to visualize the 80/20 principle in SQL

A Pareto chart highlights the most important factors amongst a large number of factors. A social scientist in Italy, Vilfredo Pareto discovered that roughly 20% of the population owned 80% of the wealth. From there, he noticed that these proportions described many other aspects of society. This led him to propose the 80-20 rule. The Pareto chart, named in his honor, shows the large contribution from a small proportion of the population.

You'll notice the 80-20 principle in action across your business:

a small proportion of customers generate the most customer support load

a small proportion of customers contribute the most to the over all revenue – financial statements usually list this as a risk factor

a small proportion of web pages generate the most pageviews

a small proportion of adwords keywords generate the most clicks and conversions, etc.

The Pareto chart has two components:

a column chart where the height of the bar is the frequency of that particular category

a line chart that shows the cumulative percentage contribution to the total

For this example, we'll be aggregating the top pages on our site by pageviews and a cumulative percentage contribution to the total number of page views in that period.

1. Frequency distribution using GROUP BY

To calculate the total number of pageviews, categorized by page title and sorted descending by pageviews:

select title, count(*) as pageviews from visits where visits.dt > (now() - '2 weeks'::interval) -- just calculate for the last two weeks group by title order by 2 desc

The above query will give us a table like the following:

title pageviews Calculating Percentage (%) of Total Sum 354 Calculating Month-over-Month Growth Rate 156 Calculating Relationships with Correlation Matrices 156 Forecasting, Identifying Trends with Linear Regression 134 ... ...

2. Cumulative sum using Window functions

We'll next have to calculate a cumulative, running total (read more about SQL Running Totals.) For this, we'll be making use of window functions:

select title, pageviews, sum(pageview) over (order by pageviews desc) as cumsum from ( // query as before ) tbl;

The result will now look like this:

title pageviews cumsum Calculating Percentage (%) of Total Sum 354 354 Calculating Month-over-Month Growth Rate 156 510 Calculating Relationships with Correlation Matrices 156 666 Forecasting, Identifying Trends with Linear Regression 134 800 ... ... ...

The last remaining step is to divide the cumulative sum by the total number of pageviews to calculate a percentage. We can precalculate the total number of pageviews with a common table expression:

with total_pageviews as ( select count(pageviews) as cnt from visits where visits.dt > (now() - '2 weeks'::interval) ) select title, pageviews, sum(pageview) over (order by pageviews desc) * 1.0 / total_pageviews.cnt as cumsum from ( select title, count(*) as pageviews from visits where visits.dt > (now() - '2 weeks'::interval) -- just calculate for the last two weeks group by title order by 2 desc ) tbl;