SS Pendleton sinking. This is the default“fail whale” image for Open EdX installations. Credit: U.S. Coast Guard.

Here at edX we use the red-black (or blue-green) deployment method. Ben Schmaus’ post on the Netflix Technology Blog details red-black deployments. The important detail about this deployment method is that, for some period of time, traffic is going to both the old code and new code. That detail is especially important when deploying database migrations that alter database columns and tables in a manner that is backwards-incompatible with the previous release.

Let’s go through a couple examples with our user table, auth_user. It has a few different columns, but we’ll use the full_name column for the examples.

Say we decide to change the column’s name from full_name (with an underscore) to fullname (no underscore). Our code in production is using full_name. When it’s time to deploy this new release, we simply generate a migration and deploy it. Since we are using red-black deployments, our old code is still looking for the original column name, full_name. However, the new deployment changed the name to fullname, so the original code starts failing.

Instead of renaming the column, say we delete it completely. Again, the database is modified when we deploy, and the original code that is still running will fail.

Because we operate in an environment where new and old code are running simultaneously against the same database, new code must always be compatible with the older database schema. Newer deployments can add tables and columns, but neither can be deleted unless the old code is no longer referencing the deleted tables or columns.

Migrating the right way

How do we properly drop a column or table? Two releases:

Remove all references to the column or table, including updating the model to not refer to the field/column anymore. Drop the column or table.

If we want to do a rename, how do we do that? Three releases:

Add the new column via migration. Start using the new column, and replace all usage of the old column. Remove the old column from the table via migration.

Returning to our example with the auth_user table. If we still want to drop the full_name column, we should do the following:

Remove every usage of the full_name column in our codebase. Release that change to production, and ensure older code is no longer running. (We once had a stale ASG in production a few hours after a release, and it caused a few issues when we dropped a column.) Create a database migration to drop the column. Release it. (This step intentionally left bank…because nothing broke in production!)

There actually is a potential third step above: cleanup. Depending on what your previous migrations are doing, you may want to clean them up to avoid wasting time on new deployments from scratch — in local development environments or for external consumers, if your code is open source. Another option, if your migration framework supports it, is to squash the migrations. However, this focus of this post is on ensuring migrations get to production without downtime or errors. If you want to learn more about migration cleanup, let us know.

We have written a lot about database migrations on our wiki. While written from the perspective of Open edX Django developers, the information should be adaptable to other frameworks. Take a look at Everything About Database Migrations.