00:00

22 December 2010

tl;dr: Don’t retrieve a whole row just to get the primary key you had anyway. Don’t iterate in the app; let the database server do the iteration for you.

There’s a couple of bad habits I see a lot in Django code (including, sadly, my own), which is abuse of a ForeignKey field. Let’s take the classic example:

class Publisher(Model): # We accept the default 'id' column name = TextField() ... class Book(Model): # Likewise title = TextField() topic = TextField() publisher = ForeignKey(Publisher) # Remember this creates a publisher_id column

Now, let’s say we have a book:

b = Book.objects.get(title="Interior Landscapes")

And we want the ID of the publisher.

Don’t do this:

pub_id = b.publisher.id

This works, but it’s absurd: It does a separate select to fetch the entire Publisher object, and then extracts the ID. But, of course, it already had the ID, because that’s how it retrieved the publisher object. Instead, just go straight to the created ID field:

pub_id = b.publisher_id

Next, don’t use iteration to build lists if you can get the data directly out of the database. For example, suppose we want the list of publishers who publish books with topic “Surreal Architecture”. Far too often, I see this:

surreal_books = Books.objects.filter(topic="Surreal Architecture") surreal_publishers = set([book.publisher.id for book in surreal_books])

In this case, Django will send one query to get the list of books, and then do a separate query for each publisher to get the publisher id… even though they’re already in memory.

surreal_publishers = set([book.publisher_id for book in surreal_books])

This is better, since it doesn’t have to retrieve each publisher, but far better is to make the database do all the work:

surreal_publishers_qs = Books.objects.filter(topic="Surreal Architecture").values('publisher_id').distinct()

The result set, in this case, is a bit of an odd duck: It’s a list of dictionaries, each dict being of the form { 'publisher_id': <id value> } . Of course, Python being Python, it’s not hard to transform that into a set:

surreal_publishers = set([entry['publisher_id'] for entry in surreal_publishers_qs])

And we didn’t have to do any raw SQL!