Aggregation is a source of confusion in any type of ORM and Django is no different. The documentation provides a variety of examples and cheat-sheets that demonstrate how to group and aggregate data using the ORM, but I decided to approach this from a different angle.

In this article I put QuerySets and SQL side by side. If SQL is where you are most comfortable, this is the Django GROUP BY cheat-sheet for you.

Image by Jason Leung

Table of Contents

To demonstrate different GROUP BY queries, I will use models from Django's built-in django.contrib.auth app.

>>> from django.contrib.auth.models import User

Django ORM produces SQL statements with long aliases. For brevity, I will show a cleaned-up, but equivalent, version of what Django executes.

SQL Logging To see the SQL actually executed by Django, you can turn on SQL logging in the Django settings.

Let's count how many users we have:

SELECT COUNT ( * ) FROM auth_user ; User . objects . count ()

Counting rows is so common that Django includes a function for it right on the QuerySet. Unlike other QuerySets we'll see next, count returns a number.

Django offers two more ways to count rows in a table.

We'll start with aggregate :

SELECT COUNT ( id ) AS id__count FROM auth_user ; from django.db.models import Count User . objects . aggregate ( Count ( 'id' ))

To use aggregate we imported the aggregate function Count . The function accepts an expression to count. In this case, we used the name of the primary key column id to count all the rows in the table.

Aggregate NULL Aggregations ignore NULL values. For more on how aggregations handle NULL , see 12 Common Mistakes and Missed Optimization Opportunities in SQL.

The result of aggregate is a dict:

>>> from django.db.models import Count >>> User . objects . aggregate ( Count ( 'id' )) {"id__count": 891}

The name of the key is derived from the name of the field and the name of the aggregate. In this case, it's id__count . It's a good idea not to rely on this naming convention, and instead provide your own name:

SELECT COUNT ( id ) as total FROM auth_user ; >>> from django.db.models import Count >>> User . objects . aggregate ( total = Count ( 'id' )) {"total": 891}

The name of the argument to aggregate is also the name of the key in the resulting dictionary.

Using aggregate we got the result of applying the aggregate function on the entire table. This is useful, but usually we want to apply the aggregation on groups of rows.

Let's count users by their active status:

SELECT is_active , COUNT ( id ) AS total FROM auth_user GROUP BY is_active ( User . objects . values ( 'is_active' ) . annotate ( total = Count ( 'id' )))

This time we used the function annotate . To produce a GROUP BY we use a combination of values and annotate :

values('is_active') : what to group by

: what to group by annotate(total=Count('id')) : what to aggregate

The order is important: failing to call values before annotate will not produce aggregate results.

Just like aggregate , the name of the argument to annotate is the key in the result of the evaluated QuerySet. In this case it's total .

To apply aggregation on a filtered query you can use filter anywhere in the query. For example, count only staff users by their active status:

SELECT is_active , COUNT ( id ) AS total FROM auth_user WHERE is_staff = True GROUP BY is_active ( User . objects . values ( 'is_active' ) . filter ( is_staff = True ) . annotate ( total = Count ( 'id' )))

Like filter, to sort a queryset use order_by anywhere in the query:

SELECT is_active , COUNT ( id ) AS total FROM auth_user GROUP BY is_active ORDER BY is_active , total ( User . objects . values ( 'is_active' ) . annotate ( total = Count ( 'id' )) . order_by ( 'is_staff' , 'total' ))

Notice that you can sort by both the GROUP BY key and the aggregate field.

To produce multiple aggregations of the same group, add multiple annotations:

SELECT is_active , COUNT ( id ) AS total , MAX ( date_joined ) AS last_joined FROM auth_user GROUP BY is_active from django.db.models import Max ( User . objects . values ( 'is_active' ) . annotate ( total = Count ( 'id' ), last_joined = Max ( 'date_joined' ), ))

The query will produce the number of active and inactive users, and the last date a user joined in each group.

Just like performing multiple aggregations, we might also want to group by multiple fields. For example, group by active status and staff status:

SELECT is_active , is_staff , COUNT ( id ) AS total FROM auth_user GROUP BY is_active , is_staff ( User . objects . values ( 'is_active' , 'is_staff' ) . annotate ( total = Count ( 'id' )))

The result of this query includes is_active , is_staff and the number of users in each group.

Another common use case for GROUP BY is to group by an expression. For example, count the number of users that joined each year:

SELECT EXTRACT ( 'year' FROM date_joined ), COUNT ( id ) AS total FROM auth_user GROUP BY EXTRACT ( 'year' FROM date_joined ) ( User . objects . values ( 'date_joined__year' ) . annotate ( total = Count ( 'id' )))

Notice that to get the year from the date we used the special expression <field>__year in the first call to values() . The result of the query is a dict, and the name of the key will be date_joined__year .

Sometimes, the built-in expressions are not enough, and you need to aggregate on a more complicated expression. For example, group by users that have logged in since they signed-up:

SELECT last_login > date_joined AS logged_since_joined , COUNT ( id ) AS total FROM auth_user GROUP BY last_login > date_joined from django.db.models import ( ExpressionWrapper , Q , F , BooleanField , ) ( User . objects . annotate ( logged_since_joined = ExpressionWrapper ( Q ( last_login__gt = F ( 'date_joined' )), output_field = BooleanField (), ) ) . values ( 'logged_since_joined' ) . annotate ( total = Count ( 'id' )) . values ( 'logged_since_joined' , 'total' )

The expression here is fairly complicated. We first use annotate to built the expression, and we mark it as a GROUP BY key by referencing the expression in the following call to values() . From here on, it's exactly the same.

Using conditional aggregation, you can aggregate only a part of the group. Conditions come in handy when you have multiple aggregates. For example, count the number of staff and non-staff users by the year they signed-up:

SELECT EXTRACT ( 'year' FROM date_joined ), COUNT ( id ) FILTER ( WHERE is_staff = True ) AS staff_users , COUNT ( id ) FILTER ( WHERE is_staff = False ) AS non_staff_users FROM auth_user GROUP BY EXTRACT ( 'year' FROM date_joined ) from django.db.models import F , Q ( User . objects . values ( 'date_joined__year' ) . annotate ( staff_users = ( Count ( 'id' , filter = Q ( is_staff = True )) ), non_staff_users = ( Count ( 'id' , filter = Q ( is_staff = False )) ), ))

The SQL above is from PostgreSQL, which along with SQLite is currently the only database backend that supports the FILTER syntax shortcut (formally called "selective aggregates"). For other database backends, the ORM will use CASE ... WHEN instead.

tip I previously wrote about aggregations with filters. Check out my 9 Django tips for working with databases.

The HAVING clause is used to filter on the result of an aggregate function. For example, find the years in which more than a 100 users joined:

SELECT is_active , COUNT ( id ) AS total FROM auth_user GROUP BY is_active HAVING COUNT ( id ) > 100 ( User . objects . annotate ( year_joined = F ( 'date_joined__year' )) . values ( 'is_active' ) . annotate ( total = Count ( 'id' )) . filter ( total__gt = 100 ))

The filter on the annotated field total added an HAVING clause in the generated SQL.

For some aggregate functions such as COUNT , it is sometimes desirable to only count distinct occurrences. For example, how many different last names are there per user active status:

SELECT is_active , COUNT ( id ) AS total , COUNT ( DISTINCT last_name ) AS unique_names FROM auth_user GROUP BY is_active ( User . objects . values ( 'is_active' ) . annotate ( total = Count ( 'id' ), unique_names = Count ( 'last_name' , distinct = True ), ))

Notice the use of distinct=True in the call to Count .

Aggregate fields are often just the first step to a greater question. For example, what is the percent of unique last names by user active status:

SELECT is_active , COUNT ( id ) AS total , COUNT ( DISTINCT last_name ) AS unique_names , ( COUNT ( DISTINCT last_name ):: float / COUNT ( id ):: float ) AS pct_unique_names FROM auth_user GROUP BY is_active from django.db.models import FloatField from django.db.models.functions import Cast ( User . objects . values ( 'is_active' ) . annotate ( total = Count ( 'id' ), unique_names = Count ( 'last_name' , distinct = True ), ) . annotate ( pct_unique_names = ( Cast ( 'unique_names' , FloatField ()) / Cast ( 'total' , FloatField ()) ))

The first annotate() defines the aggregate fields. The second annotate() uses the aggregate function to construct an expression.

So far we've used only data in a single model, but aggregates are often used across relations. The simpler scenario is of a one-to-one or a foreign key relation. For example, say we have a UserProfile with a one-to-one relationship to the User, and we want to count users by the type of profile:

SELECT p . type , COUNT ( u . id ) AS total FROM auth_user u JOIN user_profile p ON u . id = p . user_id GROUP BY p . type ( User . objects . values ( 'user_profile__type' ) . annotate ( total = Count ( 'id' )))

Just like GROUP BY expressions, using relations in values will group by that field. Note that the name of the user profile type in the result will be 'user_profile__type'.

A more complicated type of relation is the many to many relationship. For example, count in how many groups each user is a member:

SELECT u . id , COUNT ( ug . group_id ) AS memberships FROM auth_user LEFT OUTER JOIN auth_user_groups ug ON ( u . id = ug . user_id ) GROUP BY u . id ( User . objects . annotate ( memberships = Count ( 'groups' )) . values ( 'id' , 'memberships' ))

A user can be a member of more than one group. To count the number of groups the user is member of we used the related name "groups" in the User model. If the related name is not explicitly set (and not explicitly disabled), Django will automatically generate a name in the format {related model model}_set . For example, group_set .

To dig deeper into the ORM and GROUP BY in particular, check out these links: