Aggregate functions are convenient – they solve a business need and they make development easy. Unfortunately, not all business requirements are so easy to solve. Let’s look at one example: rolling averages.

The Rolling Average

A rolling average is a simple concept; an average is computed over a fixed subset of data. Rolling average calculations are most frequently used with time series data and help remove short term fluctuations while highlighting long term trends – utility bills often feature a rolling average of consumption to help the customer understand their usage. Consumers aren’t concerned about electricity usage being high on one day in August when there was a heat wave; they want to see how their consumption is changing over time.

Rolling Averages with Common Table Expressions: 2005 – 2008R2

Starting with SQL Server 2005, it became easy to write a rolling average in a single T-SQL statement using a Common Table Expression. CTEs rapidly became popular; an incredible amount of prevously difficult functionality was made possible through CTEs including recursive queries and rolling averages. Take a look at this example in the AdventureWorks2012 sample database:

WITH cte AS ( SELECT DENSE_RANK() OVER ( ORDER BY tm.YearName, tm.MonthOfYearNumber ) AS r , tm.YearName AS [year] , tm.MonthOfYearNumber AS [month] , SUM(SubTotal) AS SubTotal FROM dbo.TimeMaster tm LEFT JOIN Sales.SalesOrderHeader AS soh ON tm.ActualDate = soh.OrderDate WHERE tm.ActualDate BETWEEN '2005-07-01' AND '2008-08-01' GROUP BY tm.YearName , tm.MonthOfYearNumber ) SELECT cte1.[year] , cte1.[month] , AVG(cte1.SubTotal) AS AverageSubTotal FROM cte AS cte1 JOIN cte AS cte2 ON cte1.r > ( cte2.r - 12 ) AND cte1.r <> cte2.r GROUP BY cte1.[year] , cte1.[month] ORDER BY cte1.[year] , cte1.[month] ; 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 WITH cte AS ( SELECT DENSE_RANK ( ) OVER ( ORDER BY tm . YearName , tm . MonthOfYearNumber ) AS r , tm . YearName AS [ year ] , tm . MonthOfYearNumber AS [ month ] , SUM ( SubTotal ) AS SubTotal FROM dbo . TimeMaster tm LEFT JOIN Sales . SalesOrderHeader AS soh ON tm . ActualDate = soh . OrderDate WHERE tm . ActualDate BETWEEN '2005-07-01' AND '2008-08-01' GROUP BY tm . YearName , tm . MonthOfYearNumber ) SELECT cte1 . [ year ] , cte1 . [ month ] , AVG ( cte1 . SubTotal ) AS AverageSubTotal FROM cte AS cte1 JOIN cte AS cte2 ON cte1 . r > ( cte2 . r - 12 ) AND cte1 . r <> cte2 . r GROUP BY cte1 . [ year ] , cte1 . [ month ] ORDER BY cte1 . [ year ] , cte1 . [ month ] ;

While not the most straightforward approach to constructing a rolling average, the CTE manages to get the job done. In this query, we are using the CTE to create a work table and then performing a self-join. This same sort of thing is possible using a temporary table or table variable, but the CTE accomplishes it in one statement and is, arguably, easier to read.

Common Table Expressions also hide a dark secret – SQL Server executes the CTE body every time the CTE expression, cte in this example, is referenced. The more complex the Common Table Expression is, the more work that has to be performed. Running this rolling average with STATISTICS IO turned on, it’s easy to see the multiple executions in the form of two scans on each table:

Table 'TimeMaster'. Scan count 2, logical reads 14, physical reads 1, read-ahead reads 5, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. Table 'SalesOrderHeader'. Scan count 2, logical reads 208, physical reads 1, read-ahead reads 102, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. Table 'Worktable'. Scan count 1, logical reads 147, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. 1 2 3 4 5 6 7 8 9 Table 'TimeMaster' . Scan count 2 , logical reads 14 , physical reads 1 , read - ahead reads 5 , lob logical reads 0 , lob physical reads 0 , lob read - ahead reads 0. Table 'SalesOrderHeader' . Scan count 2 , logical reads 208 , physical reads 1 , read - ahead reads 102 , lob logical reads 0 , lob physical reads 0 , lob read - ahead reads 0. Table 'Worktable' . Scan count 1 , logical reads 147 , physical reads 0 , read - ahead reads 0 , lob logical reads 0 , lob physical reads 0 , lob read - ahead reads 0.

On a database this small, this doesn’t pose major performance problems, but this will cause big problems for a moderately sized database.

Rolling Averages with Window Functions: 2012 and beyond

SQL Server 2012 provided better support for windowing functions. Although support for OVER() was already available in SQL Server 2005, SQL Server 2012 brings considerably more functionality to the table. By using the ROW or RANGE clause of the windowing function, it’s possible to simplify the query and improve performance. Take a look:

SELECT YearName , MonthOfYearNumber , AVG(st) OVER ( PARTITION BY YearName, MonthOfYearNumber ORDER BY YearName, MonthOfYearNumber ROWS 12 PRECEDING ) FROM ( SELECT tm.YearName , tm.MonthOfYearNumber , SUM(COALESCE(SubTotal, 0)) AS st FROM dbo.TimeMaster tm LEFT JOIN Sales.SalesOrderHeader AS soh ON tm.ActualDate = soh.OrderDate WHERE tm.ActualDate BETWEEN '2005-07-01' AND '2008-08-01' GROUP BY tm.YearName , tm.MonthOfYearNumber ) AS x ; 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 SELECT YearName , MonthOfYearNumber , AVG ( st ) OVER ( PARTITION BY YearName , MonthOfYearNumber ORDER BY YearName , MonthOfYearNumber ROWS 12 PRECEDING ) FROM ( SELECT tm . YearName , tm . MonthOfYearNumber , SUM ( COALESCE ( SubTotal , 0 ) ) AS st FROM dbo . TimeMaster tm LEFT JOIN Sales . SalesOrderHeader AS soh ON tm . ActualDate = soh . OrderDate WHERE tm . ActualDate BETWEEN '2005-07-01' AND '2008-08-01' GROUP BY tm . YearName , tm . MonthOfYearNumber ) AS x ;

Although the two queries are remarkably different, the biggest difference is the introduction of ROWS 12 PRECEDING . This takes the place of the self join in the previous example. Instead of writing out a join ourselves, we simply tell SQL Server that we’d like an average of st over the last 12 rows sorted by year and month. What kind of effect does this have on the work SQL Server performs?

Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. Table 'TimeMaster'. Scan count 1, logical reads 7, physical reads 1, read-ahead reads 5, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. Table 'SalesOrderHeader'. Scan count 1, logical reads 104, physical reads 1, read-ahead reads 102, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. 1 2 3 4 5 6 7 8 9 Table 'Worktable' . Scan count 0 , logical reads 0 , physical reads 0 , read - ahead reads 0 , lob logical reads 0 , lob physical reads 0 , lob read - ahead reads 0. Table 'TimeMaster' . Scan count 1 , logical reads 7 , physical reads 1 , read - ahead reads 5 , lob logical reads 0 , lob physical reads 0 , lob read - ahead reads 0. Table 'SalesOrderHeader' . Scan count 1 , logical reads 104 , physical reads 1 , read - ahead reads 102 , lob logical reads 0 , lob physical reads 0 , lob read - ahead reads 0.

111 reads instead of 369 reads. Clearly this change makes for a substantial performance improvement for SQL Server. We’ve reduced the number of reads, eliminated some query complexity, and made it somewhat obvious to the future developers how they could modify or build on this going forward. Changing the new query to a rolling average by day instead of by month is simple and requires even fewer lines of code:

SELECT ActualDate , st AS SubTotal , AVG(st) OVER ( ORDER BY ActualDate ROWS 365 PRECEDING ) AS RollingAverageSales FROM ( SELECT ActualDate , SUM(COALESCE(soh.SubTotal, 0)) AS st FROM dbo.TimeMaster tm LEFT JOIN Sales.SalesOrderHeader soh ON tm.ActualDate = soh.OrderDate WHERE tm.ActualDate BETWEEN '2005-07-01' AND '2008-08-01' GROUP BY ActualDate ) AS x ORDER BY x.ActualDate; 1 2 3 4 5 6 7 8 9 10 11 12 SELECT ActualDate , st AS SubTotal , AVG ( st ) OVER ( ORDER BY ActualDate ROWS 365 PRECEDING ) AS RollingAverageSales FROM ( SELECT ActualDate , SUM ( COALESCE ( soh . SubTotal , 0 ) ) AS st FROM dbo . TimeMaster tm LEFT JOIN Sales . SalesOrderHeader soh ON tm . ActualDate = soh . OrderDate WHERE tm . ActualDate BETWEEN '2005-07-01' AND '2008-08-01' GROUP BY ActualDate ) AS x ORDER BY x . ActualDate ;

This performs the name number of logical and physical reads as the monthly rolling average using a window function.

Summing Up

There you have it – two different ways to perform a rolling average in SQL Server. One method is clearly a lot easier than the other. There are a number of optimizations in SQL Server 2012 to make it easy for you to build this functionality and to improve SQL Server performance at the same time.