When we started Staffjoy, the original continuous deployment script automatically migrated schemas in our MySQL database prior to deploying the application version. However, with Staffjoy V2, we stopped automated database migrations in favor of manual migrations.

Bryan on Twitter reviewed our open-source code and asks why:

The short answer: Migrating databases and deploying an new version of application code at the same time creates a race condition that can take down an application.

Migrating Before Deploying? 🤔

Most deployment systems migrate a database before deploying new application code. We did this for our V1 repos — and it normally worked quite well. When adding a new feature, we would need to add new database columns. So, the deploy would migrate the schema to add those columns to the production databases, then afterward the code that used those columns would deploy. In most cases, this strategy works well.

Dropping Columns 🚨

Migrating before deploys causes problems when modifying or dropping a column. If you run a migration that removes a column in a table, the running application may still expect that column to exist. If the column is not there, it can take down the application until the code deploy finishes.

This may seem theoretical, but most ORMs, such as SQLAlchemy, select every column individually in a row when loading a model. So, if you drop an unrelated column in the `users` table, trying to verify a user’s password can fail until the database includes the normalization that the model expects.

We encountered this issue in production first when dropping a column in our `users` table in Staffjoy V1. SQLAlchemy failed to load the user model, so all authentication failed. This created a 100% error rate in our API, which prevented users from accessing the application, clocking in, or doing anything until the application deploy finished. Our AWS Elastic Beanstalk deploys for Staffjoy V1 were quite slow, so the application was functionally unavailable for over 10 minutes as the error rate precipitously dropped each time a new application container replaced an old one.

Recap — How to migrate a database while deploying an application 📖

In most cases:

Add a column or table before deploying application code

deploying application code Remove a column or table after deploying application code

deploying application code Renaming a column or table is a special case, and in most cases should be avoided in order to prevent downtime ⛔️

Migration-only deploys ❓

Technically, a way to automatically migrate tables without race conditions is to only allow deploys to contain either database changes or application changes. So, if you are adding a new feature, deploy the database migration, then in a separate deploy change the application code. If you are removing a column, deploy the application code change, then afterward run a deploy that contains the database migration.

In practice, I find enforcing this pattern to be unrealistic. Making database changes in full before modifying application code takes significant planning and decisiveness. When working with ORMs that automatically generate migrations, application changes are functionally required to generate a migration. Testing these deploy race conditions is difficult, and when developers become complacent with combining deploying migrations and application changes together, catching issues that could create downtime due to race conditions can be hard.

Compromise: Manual deploys 🙃

Due to the above issues, all database changes in Staffjoy V2 were manually run. That way, we could decide how to migrate the database on a case-by-case basis. We would also be prepared to quickly rollback changes if error rates increased.