Django's object-relational mapper (ORM) is a huge part of the reason for the framework's massive success. The ORM makes talking to databases easy by abstracting away the details of connections, cursors and queries. It allows developers to think in high-level Python and focus on business logic rather than low-level database plumbing.

However, its greatest strength is also its greatest weakness. The fact that it abstracts away database queries, particularly when it comes to lazily traversing relationships between objects, means that the queries emitted by the application become difficult to reason about. It's not possible to tell, just by looking at a piece of code, which queries will be run when it is executed. This can cause huge performance problems.

Example

To illustrate, consider the following simple line of code:

user . address

This line might be found in, say, a Django template ( <div>{{ user.address }}</div> ). Or (if your Django project exposes an API for a single-page or mobile app) perhaps specified as a field in a Django REST framework UserSerializer .

Here's the question: how many database queries are executed when this line of code is run?

The problem is that just by looking at this piece of code in isolation, you simply can't tell. Maybe address is just a field on the User model, so no extra queries are needed. Unless, of course, the field has been included in a call to defer on the queryset, meaning that accessing it will require a query. Or maybe address is actually a foreign key pointing at a separate Address model? Or maybe a user can have many addresses, and the address attribute is actually a method on the User model which fetches the address marked primary from the set of related addresses?

The point is that without looking at the model and/or the view, there's no way of knowing. Django's template system is designed to be used by frontend developers who only know HTML, so they can't be expected to look at and understand the Python code. Good object-oriented design encourages us to hide implementation details, so developers wouldn't naturally think to care about what happens when an attribute is accessed, or when a method is called. This makes it very easy for seemingly innocuous changes to client requirements ("can we add the user's address to each row in the user table?") to result in a serious performance hit.

It should be noted that Django does, of course, provide tools for solving the n+1 queries problem - select_related and prefetch_related as well as annotate and aggregate let you perform more complex queries to pull out the data you need in the most efficient way possible. But often, the only way to know that these optimisations are needed is after the performance problems materialise - usually once your code is in production, as your database starts to fill up with real user data (in other words, the "n" in the "n+1 queries problem" gets bigger).

Avoiding implicit queries

So how can we make sure this kind of problem doesn't occur? The key insight here is that idiomatic Django code encourages the intermingling of two basic steps during request handling that should instead be kept separate. Those two phases are:

Fetch the data you need Present the data to the user

Django (and Django REST framework) blurs these steps together by allowing queries to happen when model attributes are accessed (in a template or serializer) and in other places like model methods, custom template tags, or in a SerializerMethodField . The queries that happen implicitly during the "presentation" phase are the ones that cause the problems. Adding prefetch_related and select_related pushes those queries (conceptually) into the first phase.

By identifying these two steps, we can adopt programming patterns where they are kept separate. We could (and indeed should) try to enforce this separation with code review. Our team could (and should) establish a set of recommended design patterns around making the two steps explicit in our codebases.

But without a way to programmatically enforce the separation, mistakes will always be able to slip through. Ideally, we'd have a way to force queries to happen in step 1, and prevent them from happening in step 2. This means that if a frontend developer working in a Django template added a call to an attribute that incurred a query, they would be informed straight away, rather than only noticing later when the performance of the application went down the drain.

In order to achieve this, we have created a simple library that allows developers to mark explicitly the areas of code that are allowed to run database queries, and those that aren't. At the basic level, it does this using Python's context manager syntax: