In this case, a spreadsheet is definitely not what you want to end up with. Let’s try to calculate this in SQL. Without window functions, a query that gets you the final result would look like this:

WITH

monthly_revenue as (

SELECT

date_trunc(‘month’,datetime)::date as month,

sum(amount) as revenue

FROM orders

GROUP BY 1

)

,prev_month_revenue as (

SELECT

t1.*,

t2.revenue as prev_month_revenue

FROM monthly_revenue t1

LEFT JOIN monthly_revenue t2

ON datediff(month,t2.month,t1.month)=1

)

SELECT *,

round(100.0*(revenue-prev_month_revenue)/prev_month_revenue,1) as revenue_growth

FROM prev_month_revenue

ORDER BY 1

You’d have to calculate monthly revenue, then get the result of the previous month using self-join and use it in the final formula. The logic is broken down into 3 steps for clarity. You can’t break it down further, but even so the second step can be quite confusing.

What you have to keep in mind about datediff is that that the minuend (i.e. what you subtract from) is the third parameter of the function and the subtrahend (i.e. what you subtract) is the second parameter. I personally think that’s a bit counterintuitive for subtraction, and the self join concept itself is not basic. There’s actually a much better way to express the same logic:

WITH

monthly_revenue as (

SELECT

date_trunc(‘month’,datetime)::date as month,

sum(amount) as revenue

FROM orders

GROUP BY 1

)

,prev_month_revenue as (

SELECT *,

lag(revenue) over (order by month) as prev_month_revenue

FROM monthly_revenue

)

SELECT *,

round(100.0*(revenue-prev_month_revenue)/prev_month_revenue,1) as revenue_growth

FROM prev_month_revenue

ORDER BY 1

Let’s break down the lag(… line of code:

lag is a window function that gets you the previous row

revenue is the expression of what exactly you would like to get from that row

over (order by month) is your window specification that tells how exactly you would like to sort the rows to identify which row is the previous one (there’s plenty of options). In our case, we told the database to get the previous month’s row for every given month.

This is a generic structure of all SQL window functions: function itself, expression and other parameters, and window specification:

function (expression, [parameters]) OVER (window specification)

It is very clean and powerful and has countless opportunities. For example, you can add partition by to your window specification to look at different groups of rows individually:

WITH

monthly_revenue as (

SELECT

date_trunc(‘month’,datetime)::date as month,

state,

sum(amount) as revenue

FROM orders

GROUP BY 1,2

)

,prev_month_revenue as (

SELECT *,

lag(revenue) over (partition by state order by month) as prev_month_revenue

FROM monthly_revenue

)

SELECT *,

round(100.0*(revenue-prev_month_revenue)/prev_month_revenue,1) as revenue_growth

FROM prev_month_revenue

ORDER BY 2,1

Calculating revenue by state in the first subquery and updating the window specification to take the new grouping into account, you can look at each state individually and see what the revenue growth by state is.

Partitions are extremely useful when you need to calculate the same metric over different segments.

Running total

Another common request is to calculate running total over some period of time. This is the sum of the current element and all previous elements, for example, this is how the running total monthly revenue would look in our dataset:

Data visualized with Statsbot

And the query to get this is below:

WITH

monthly_revenue as (

SELECT

date_trunc(‘month’,datetime)::date as month,

sum(amount) as revenue

FROM orders

GROUP BY 1

)

SELECT *,

sum(revenue) over (order by month rows between unbounded preceding and current row) as running_total

FROM monthly_revenue

ORDER BY 1

The new thing here is the rows between unbounded preceding and current row part that is called “frame clause.” It’s a way to tell you which subset of other rows of your result set you’re interested in, relative to the current row. The general definition of the frame clause is:

rows between frame_start and frame_end

where frame_start can be one of the following:

unbounded preceding which is “starting from the first row or ending on the last row of the window”

N preceding is N rows you’re interested in

current row

and frame_end can be:

unbounded following which is “starting from the first row or ending on the last row of the window”

N following is N rows you’re interested in

current row

It’s very flexible, except that you have to make sure the first part of between is higher than the second part, i.e. between 7 preceding and current row is totally fine, between 7 preceding and 3 preceding is fine too, but between 3 preceding and 7 preceding would throw an error. You can sum, count, and average values within the selected window. You can see a few examples in the query below:

SELECT

*

,sum(amount) over () as amount_total

,sum(amount) over (order by order_id rows between unbounded preceding and current row) as running_sum

,sum(amount) over (partition by customer_id order by datetime rows between unbounded preceding and current row) as running_sum_by_customer

,avg(amount) over (order by datetime rows between 5 preceding and current row) as trailing_avg

FROM orders

ORDER BY 1

Every combination makes sense for particular use cases:

amount_total is specified without a window and returns the entire total of $3400. Use it when you want to get the total in the same query as individual rows without grouping and joining back the grouped result.

running_sum is the running total. Use it when you want to see how some variable such as revenue or website visits is accumulated over a period of time.

running_sum_by_customer is the same as above but broken down by segment (you can see how revenue from each individual customer grows, and on bigger data it can be cities or states).

trailing_avg shows the average amount of the last 5 orders. Use trailing average when you want to learn the trend and disguise volatility.

The ordering is critical here since the database needs to know how to sum the values. The result can be completely different when different ordering is applied. The picture below shows the result of the query: