19:23

7 November 2009

In part 1, we ran down a list of the standard Django features for controlling transactions. Now, we’re going to look at some ways to optimize how these tranactions happen.

Let’s look at the SQL that our create_order() view function generated, with the transaction middleware turned on, and no transaction decorators on the function:

LOG: statement: SET DATESTYLE TO 'ISO' LOG: statement: SHOW client_encoding LOG: statement: SHOW default_transaction_isolation LOG: statement: BEGIN; SET TRANSACTION ISOLATION LEVEL READ COMMITTED LOG: statement: SET TIME ZONE E'America/Los_Angeles' LOG: statement: SELECT version() LOG: statement: INSERT INTO "example_address" ("street_address", "city", "state", "zip") VALUES (E'1313 Mockingbird Lane', E'Mockingbird Heights', E'CA', E'90026') LOG: statement: SELECT CURRVAL('"example_address_id_seq"') LOG: statement: INSERT INTO "example_order" ("customer_name", "shipping_address_id") VALUES (E'Herman Munster', 5) LOG: statement: SELECT CURRVAL('"example_order_id_seq"') LOG: statement: COMMIT

Setting aside the overhead statements (and, good grief, there are a lot of those), what’s wrong with this? Well, not much, except that as of 8.2, PostgreSQL has a much better way of getting the primary key (or any other field, for that matter) back from an INSERT statement, which is the INSERT … RETURNING construct. It’s perfect for just this situation, when you’re inserting a record and creating the primary key at the same time, and want to et the primary key back for further operations.

Now, let’s look at the SQL we get from our show_order() function:

LOG: statement: SET DATESTYLE TO 'ISO' LOG: statement: SHOW client_encoding LOG: statement: SHOW default_transaction_isolation LOG: statement: BEGIN; SET TRANSACTION ISOLATION LEVEL READ COMMITTED LOG: statement: SET TIME ZONE E'America/Los_Angeles' LOG: statement: SELECT version() LOG: statement: SELECT "example_order"."id", "example_order"."customer_name", "example_order"."shipping_address_id" FROM "example_order" WHERE "example_order"."id" = 1 LOG: statement: ROLLBACK

In this case, we don’t need that transaction at all.

At it happens, there’s a Django feature that can address both these problems at once, as long as you are running PostgreSQL 8.2 with the postgresql_psycopg2 backend…

Enter the autocommit

This functionality is kind of buried in the Django documentation, almost apologically stuck in the Notes about supported databases section. It’s an option you add to your settings.py file:

DATABASE_OPTIONS = { "autocommit": True, }

What does this do? The documentation, I’m afraid, is pretty obscure about the actual effect. Groveling around in the code, we discover that what it does is prevent Django from automatically starting a new transaction on the first database operation. If we switch it on, our SQL traffic for show_order() now looks like:

LOG: statement: SET DATESTYLE TO 'ISO' LOG: statement: SHOW client_encoding LOG: statement: SHOW default_transaction_isolation LOG: statement: SET TIME ZONE E'America/Los_Angeles' LOG: statement: SELECT version() LOG: statement: SELECT "example_order"."id", "example_order"."customer_name", "example_order"."shipping_address_id" FROM "example_order" WHERE "example_order"."id" = 1

That’s much better! Well, how about for create_order ?

Environment: Request Method: GET Request URL: http://127.0.0.1:8000/createorder/ Django Version: 1.1 Python Version: 2.5.2 Installed Applications: ['djangotrans.example'] Installed Middleware: ('django.middleware.common.CommonMiddleware', 'django.contrib.sessions.middleware.SessionMiddleware', 'django.contrib.auth.middleware.AuthenticationMiddleware') Traceback: File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/django/core/handlers/base.py" in get_response 92. response = callback(request, *callback_args, **callback_kwargs) File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/django/db/transaction.py" in _commit_on_success 240. res = func(*args, **kw) File "/Users/xof/Documents/Dev/examples/djangotrans/../djangotrans/example/views.py" in create_order 9. address.save() File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/django/db/models/base.py" in save 410. self.save_base(force_insert=force_insert, force_update=force_update) File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/django/db/models/base.py" in save_base 495. result = manager._insert(values, return_id=update_pk) File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/django/db/models/manager.py" in _insert 177. return insert_query(self.model, values, **kwargs) File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/django/db/models/query.py" in insert_query 1087. return query.execute_sql(return_id) File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/django/db/models/sql/subqueries.py" in execute_sql 324. return self.connection.ops.fetch_returned_insert_id(cursor) File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/django/db/backends/__init__.py" in fetch_returned_insert_id 171. return cursor.fetchone()[0] Exception Type: ProgrammingError at /createorder/ Exception Value: no results to fetch

Hm. That’s not what we expected, at all.

As it happens, there’s a bug in Django 1.1 when using autocommit; it blows up if the very first thing that you do on a new database connection is an INSERT that expects a return value back. As a workaround, we can add a line to our create_order() function to force it to do some traffic before the first insert:

def create_order(request): cur = connection.cursor() # Work around Django 1.1 autocommit bug address = Address(street_address="1112 E Broad St", city="Westfield", state="NJ", zip="07090") address.save() order = Order(customer_name="Gomez Addams", shipping_address=address) order.save() return HttpResponse("Created: " + unicode(order))

And now, we get…

LOG: statement: SET DATESTYLE TO 'ISO' LOG: statement: SHOW client_encoding LOG: statement: SHOW default_transaction_isolation LOG: statement: SET TIME ZONE E'America/Los_Angeles' LOG: statement: SELECT version() LOG: statement: INSERT INTO "example_address" ("street_address", "city", "state", "zip") VALUES (E'1112 E Broad St', E'Westfield', E'NJ', E'07090') RETURNING "example_address"."id" LOG: statement: INSERT INTO "example_order" ("customer_name", "shipping_address_id") VALUES (E'Gomez Addams', 82) RETURNING "example_order"."id"

Wow, that’s great! Notice the nice, efficient RETURNING clause, and notice… uh, the complete lack of a transaction block around the INSERTs. This is why this particular function is called “autocommit”: since we’re not running inside an enclosing transaction, each statement is “autocommitted” upon completion. (Calling a feature “autocommit” when the practical effect is that the framework does not automatically generate COMMIT statements is perhaps not the best choice of nomenclature.)

So, what are our options? Let’s add the @transaction.commit_on_success decorator, and see if that improves matters:

LOG: statement: SET DATESTYLE TO 'ISO' LOG: statement: SHOW client_encoding LOG: statement: SHOW default_transaction_isolation LOG: statement: BEGIN; SET TRANSACTION ISOLATION LEVEL READ COMMITTED LOG: statement: SET TIME ZONE E'America/Los_Angeles' LOG: statement: SELECT version() LOG: statement: INSERT INTO "example_address" ("street_address", "city", "state", "zip") VALUES (E'1112 E Broad St', E'Westfield', E'NJ', E'07090') RETURNING "example_address"."id" LOG: statement: INSERT INTO "example_order" ("customer_name", "shipping_address_id") VALUES (E'Gomez Addams', 84) RETURNING "example_order"."id" LOG: statement: COMMIT

Perfect.

Wait. What’s going on?

The following is a digression, and isn’t really required to use autocommit … but you might find it interesting.

You might well be thinking, “That’s all a very nifty trick, but what in the world does INSERT ... RETURNING have to do with autocommit? And why is autocommit limited to PostgreSQL 8.2, when that behavior has been standard with PostgreSQL since ever?”

First, note how Django gets the result of an INSERT if it does have the INSERT … RETURNING syntax available:

LOG: statement: INSERT INTO "example_address" ("street_address", "city", "state", "zip") VALUES (E'1313 Mockingbird Lane', E'Mockingbird Heights', E'CA', E'90026') LOG: statement: SELECT CURRVAL('"example_address_id_seq"')

That is, it calls the currval function. This works great… as long as you are assured that you are the same connection as you were when you issued the insert. But why wouldn’t you be?

Good question, and the answer is: Pooling.

Pretty much any site with any degree of load will, at some point, need pooling software like pgpool-II or PgBouncer to multiplex connections between the application and database server.

The base mode on both of these fine products simulates as closely as possible being directly connected to the database: Each time the client connects to the intermediate connection pooler, it appears that a new, isolated connection has opened up.

However, more aggressive forms of connection pooling are possible. For example, PgBouncer has transaction pooling, in which the connection to the database is only assigned to a client while it has a transaction open. Between transactions, the connection between PgBouncer and the database could switch, which would cause (really strange and hard-to-reproduce) errors.

“But,” you say, having read the PgBouncer docs, “it’s inside a transaction! That keep it from switching to a new connection, even when you’re using transaction pooling!”

That’s true… unless you are running with autocommit turned on, and take no special precautions to make sure that your INSERT is inside of a transation block (remember, it’s not by default, as we saw above).

Note that it is never wrong to use INSERT … RETURNING; it’s just required if you are using transaction pooling (Django doesn’t know if you are or not, so it assumes that you are), with autocommit turned on, and if you do not take special pains to put each INSERT or group thereof into its own transaction.

Back to Reality.