BRIN indexes

A B-Tree index is structured like a tree. The cost of looking up a single value is the height of the tree + 1 for the random access to the table. This makes B-Tree indexes ideal for unique constraints and (some) range queries.

The disadvantage of B-Tree index is its size — B-Tree indexes can get big.

It’s not uncommon to think there are no alternatives but databases offer other types of indexes for specific use cases.

Starting with Django 1.11 there is a new Meta option for creating indexes on a model. This gives us an opportunity to explore other types of indexes.

PostgreSQL has a very useful type of index called BRIN (Block Range Index). Under some circumstances BRIN indexes can be more efficient than B-Tree indexes.

Let’s see what the official documentation has to say first:

BRIN is designed for handling very large tables in which certain columns have some natural correlation with their physical location within the table.

To understand this statement it’s important to understand how BRIN index works. As the name suggest, a BRIN index will create a mini index on a range of adjacent blocks in the table. The index is very small and it can only say if a certain value is definitely not in the range or if it might be in the range of indexed blocks.

Let’s do a simplified example of how BRIN works to help us understand.

Say we have these values in a column, each is one block:

1, 2, 3, 4, 5, 6, 7, 8, 9

Let’s create a range for each 3 adjacent blocks:

[1,2,3], [4,5,6], [7,8,9]

For each range we are going to keep the minimum and maximum value in the range:

[1–3], [4–6], [7–9]

Using this index, let’s try to search for the value 5:

[1–3] — Definitely not here.

— Definitely not here. [4–6] — Might be here.

— Might be here. [7–9] — Definitely not here.

Using the index we limited our search to blocks 4–6.

Let’s take another example, this time the values in the column are not going to be nicely sorted:

[2,9,5], [1,4,7], [3,8,6]

And this is our index with the minimum and maximum value in each range:

[2–9], [1–7], [3–8]

Let’s try to search for the value 5:

[2–9] — Might be here.

— Might be here. [1–7] — Might be here.

— Might be here. [3–8] — Might be here.

The index is useless — not only did it not limit the search at all, we actually had to read more because we fetched both the index and the entire table.

Going back to the documentation:

…columns have some natural correlation with their physical location within the table.

This is key for BRIN indexes. To get the most out of it, the values in the column must be roughly sorted or clustered on disk.

Now back to Django, what field do we have that is often indexed and will most likely be naturally sorted on disk? That’s right, I’m looking at you auto_now_add.

A very common pattern in Django models is this:

class SomeModel(Model):

created = DatetimeField(

auto_now_add=True,

)

When auto_now_add is used Django will automatically populate the field with the current time when the row is created. A created field is usually also a great candidate for queries so it’s often indexed.

Let’s add a BRIN index on created :

from django.contrib.postgres.indexes import BrinIndex class SomeModel(Model):

created = DatetimeField(

auto_now_add=True,

) class Meta:

indexes = (

BrinIndex(fields=['created']),

)

To get a sense of the difference in size I created a table with ~2M rows with a date field that is naturally sorted on disk:

B-Tree index: 37 MB

BRIN index: 49 KB

That’s right, no mistake.

There are a lot more to consider when creating indexes than the size of the index. But now, with Django 1.11 support for indexes, we can easily integrate new types of indexes into our apps and make them lighter and faster.