Blog

SQL is great for aggregating: counting, summing, averaging numbers are straightforward tasks in SQL. When looking at a distribution, it is often useful to look at them in context and compare the relative size of each part of the proverbial pie.

We want to track the number of users that we acquire from each marketing source. To do this we can write a simple query to calculate users by source:

select date_trunc('week',created_at) as date , source , count(1) from users group by date , source

We can plot this data, and track the growth of users by source over time:

It’s great to know how many users are coming from each source, but what if we want to know what proportion of sources are coming from a given source on any given week? Let’s take a look!

Redshift

Redshift has the handy ratio_to_report window function to help us out. This function divides the current value by the sum of all values in the dictated partition. In our query, we want to divide the count of users from one source by the count of users from all sources.

select date_trunc('week',created_at) as date , source , count(1) as count , ratio_to_report(count(1)) over ( partition by date_trunc('week',created_at) ) as proportion from users group by date, source

Postgres

Postgres does not have the same ratio_to_report function, but we can use sum() over in a window function to achieve a similar result. We will first create a common table expression with the count of users:

with users_by_source as ( select date_trunc('week',created_at) as date , source , count(1) as count from users group by date , source )

We can then use our users_by_source CTE, with a sum()over function to divide by the total for each week.

select date , source , 1.0 * count / sum(count) over (partition by date) from users_by_source

MySQL

If our database is MySQL, or if we don’t want to use window functions, we can do this with a few simple subqueries. We will first want to calculate the total for each date:

select yearweek(created_at) as date , count(1) as count from users group by date

Next, we want the users by source:

select yearweek(created_at) as date , source , count(1) as count from users group by date, source

Finally, we can join and divide:

select T1.date , t1.source , t1.count / t2.count from ( select yearweek(created_at) as date , source , count(1) as count from Users group by date , source ) as t1 join( select yearweek(created_at) as date , count(1) as count from Users group by date ) as t2 on t1.date = t2.date

Visualize with Sisense

You can make your bar chart proportional for a visual representation of the data:

You can use our pivot tables to quickly calculate the percentage-of-column to show the data in tabular form.

Now you can spend less time writing queries and more time analyzing the results of your data!