Identifying the problem

The filtered URL looked like this:

/admin/transactions/created__year=2017&created__month=11

We identified the “heavy” query as the one fetching the data to populate the list. The query performed the join with all the lookup tables and applied the filters we listed in the ModelAdmin. The relevant WHERE clause was:

WHERE created BETWEEN

'2017-01-01T00:00:00+00:00'::timestamptz AND

'2017-12-31T23:59:59.999999+00:00'::timestamptz

AND EXTRACT('month' FROM created AT TIME ZONE 'UTC') = 11)

When we inspected the execution plan of this query we found this snippet at the bottom:

Index Scan using transactions_transaction_created_8fd61a4b on transactions_transaction (cost=0.43..90663.65 rows=4561 width=471) Index Cond: ((created >= '2017-01-01 02:00:00+02'::timestamp with time zone) AND (created <= '2018-01-01 01:59:59.999999+02'::timestamp with time zone)) Filter: (date_part('month'::text, timezone('UTC'::text, created)) = '11'::double precision)

Postgres decided to use the index on the `created` column. The estimate was 4,561 rows and the cost estimate maxed at 90,000. This estimate was wildly inaccurate which made us drew our first conclusion:

The estimate made by the database is very low and inaccurate.

We first suspected that the low estimate was due to stale statistics on the column. We gathered stats on the column and tried again:

analyze transaction_transaction (created);

No change. Estimate remained the same.

We decided to give the query another look. We found this snippet in the WHERE clause a bit odd:

WHERE created BETWEEN

'2017-01-01T00:00:00+00:00'::timestamptz AND

'2017-12-31T23:59:59.999999+00:00'::timestamptz

AND EXTRACT('month' FROM created AT TIME ZONE 'UTC') = 11)

Django applied a filter on the full year and then used EXTRACT to filter only the month.

We decided to check what happens if we remove the EXTRACT function and instead simplify the condition:

WHERE created BETWEEN

'2017-11-01T00:00:00+00:00'::timestamptz AND

'2017-11-30T23:59:59.999999+00:00'::timestamptz

The execution plan for that query was:

Index Scan using transactions_transaction_created_8fd61a4b on transactions_transaction (cost=0.43..1265.75 rows=13716 width=471)

Index Cond: ((created >= ‘2017–11–01 02:00:00+02’::timestamp with time zone) AND (created <= ‘2017–12–01 01:59:59.999999+02’::timestamp with time zone))

The estimate is still low but the cost estimate is now significantly lower. This execution plan helped us reach our second conclusion:

The way Django apply the filter makes it difficult for the database to optimize the query.

Photo by Braden Collum on Unsplash

The problem with the way date hierarchy is implemented

Django filters the queryset for a given level in the date hierarchy using a database function to extract the relevant date part.

A function is opaque to the database optimizer. If you have a range-based (BTREE) index on the field, using EXTRACT does not limit the range at all. The index is not utilized properly which might lead to a sub-optimal execution plan.

Once we had a better understanding of the problem we started discussing possible solutions.

Function based index

A function based index is an index on an expression. In our case, an appropriate function based index might look like this:

create index transactions_transaction_created_month_brin on transactions_transaction using brin(extract(‘month’ from created at time zone ‘UTC’));

The index made the query run faster but it came at a cost.

The downside to this approach is having to maintain additional indexes for each level of the hierarchy (day and month). Additional indexes slow down insert and update operations, and take up space.

Another downside is the index size. We used a BRIN index in this case to minimize the size of the index. The table is naturally clustered by the creation date so this is an ideal use case for a BRIN index. When this is not the case, a similar BTREE index can become quite heavy.

Simplify the condition used by Django date hierarchy

We decided to see if we can simplify the condition used by Django to apply the date hierarchy.

To implement the filter differently we first need to understand how Django admin applies filters on a queryset.

In django/contrib/admin/views/main.py there is a function called get_filters. It might look scary at first but what it does is:

Extract all the query parameters from the URL. Get all the ListFilter declared in the ModelAdmin and apply them one by one to the parameter list. Each ListFilter receive the parameter list, takes what it needs and remove the value from the parameter list. Any parameters left after all ListFilter ’s were applied are processed using the “default” Django filter. This is why, for example, it’s possible to filter the queryset directly from the URL even when a ListFilter is not explicitly defined.

If you look at the date hierarchy query parameters you’ll see that there is nothing special about them — they are just regular URL params. This sparked an idea:

Implement a ListFilter to grab the relevant date hierarchy parameters from the parameter list and apply a custom filter on the queryset.

The implementation

Let’s start by grabbing the date hierarchy fields from the parameter list:

class RangeBasedDateHierarchyListFilter(admin.ListFilter):

title = '' def __init__(self, request, params, model, model_admin):

self.date_hierarchy_field = model_admin.date_hierarchy

self.date_hierarchy = {} date_hierarchy_field_re = re.compile(

r'^{}__(day|month|year)$'.format(

self.date_hierarchy_field

)) for param in list(params.keys()):

match = date_hierarchy_field_re.match(param)

if match:

period = match.group(1)

self.date_hierarchy[period] = int(params.pop(param))

We take the date_hierarchy field name from the model_admin.

In our case it was created . We create a Regex pattern to identify the parameters in the URL. The pattern is always the name of the date hierarchy field + the period (day, month, year).

In our case the possible parameters are created__day , created__month and created__year . We iterate the parameter list, pop any date hierarchy parameter that match our criteria and store the period and the value in a dict.

Now, this is where all the magic happens:

class RangeBasedDateHierarchyListFilter(admin.ListFilter): …



def queryset(self, request, queryset):

tz = timezone.get_default_timezone()

from_date, to_date = get_date_range_for_hierarchy(self.date_hierarchy, tz) return queryset.filter(**{

'{}__gte'.format(self.date_hierarchy_field): from_date,

'{}__lt'.format(self.date_hierarchy_field): to_date,

})

Django will call the queryset function of our ListFilter with a queryset, and the function is expected to return the filtered queryset.

In the example above this is happening in a separate function (so we can test it, you know…):

def get_date_range_for_hierarchy(date_hierarchy, tz):

"""Generate date range for date hierarchy. date_hierarchy <dict>:

year (int)

month (int or None)

day (int or None)

tz <timezone or None>:

The timezone in which to generate the datetimes.

If None, the datetimes will be naive. Returns (tuple):

from_date (datetime.datetime, aware if tz is set) inclusive

to_date (datetime.datetime, aware if tz is set) exclusive

""" from_date = datetime.datetime(

date_hierarchy['year'],

date_hierarchy.get('month', 1),

date_hierarchy.get('day', 1),

) if tz:

from_date = tz.localize(from_date) if 'day' in date_hierarchy:

to_date = from_date + datetime.timedelta(days=1) elif 'month' in date_hierarchy:

assert from_date.day == 1

to_date = (from_date + datetime.timedelta(days=32)).replace(day=1) else:

to_date = from_date.replace(year=from_date.year + 1) return from_date, to_date

The function receives the dict we constructed in __init__ and returns a date range to filter on. The queryset function then applies a simpler range filter that our database can better utilize.

To use the simplified range condition in our ModelAdmin we need to add it as a list_filter :

class TransactionAdmin(admin.ModelAdmin):

...

date_hierarchy = 'created'

list_filter = (

...

RangeBasedDateHierarchyListFilter,

)

...

The result

After we deployed this change the performance of the page improved drastically. Our database was happy, the support team was happy and so were we.

The full implementation can be found in our django-lightweight-date-hierarchy package on github and pypi along with some other goodies:

If you missed the previous article in our journey to optimize the date hierarchy you can find it here: