Database Schema and Zero Downtime Deployment

Zero downtime deployment is one of the most important features that most web applications are trying to achieve. Creating a process or rolling out new containers for the new releases can be managed by the operating systems or the cluster tool used in the deployment, such as Swarm or Kubernetes.

This feature can be easily broken by new changes in the database schema if the engineers did not pay attention to the database migrations that they implemented and considered running them on production.

The database migrations or changes on the schema are usually applied during the release deployment. First, the migrations will be applied to the database server, resulting in a new database schema. Then the release will be deployed and a new process with the new code will be running and serving the user's requests.

For a while during the deployment time and before rolling the new release, the old source code will be running against the new database schema. The length depends on the deployment script. If it is a manual deployment, it may last a long time. This means that both the old and new releases must be compatible with the new database schema. Otherwise, the end-users will start facing errors and crashes while using the web application.

This risk does not apply to every database schema change. For instance, adding a new column in a table or adding a new table is safe and can be done without any issues. On the other hand, removing a column or a table is very harmful and cannot be done in a single migration because the old release will not be compatible with the new schema. Let’s say you decide to remove the email address from the users table and add it to the contact details table. Applying this change in a single migration means that sending emails to users will not work using the old release once we apply the database changes simply because the email column is not available anymore.

Instead of applying this change in a single release, it should be done in two releases. The first release should take care of duplicating the email column in the contact details table and switching the source code to use the contact details table as a source for the user email. And the second release will take care of removing the email column from the users table. This is much safer since the email column is not used by any source code when it is removed.

Another example of database changes that may lead to crashes during the deployment to production environments is the changes that are performed on big tables. For instance, adding an index on a column on the database tables is a common task during software development, and it is usually needed to improve the performance of the application when searching for records based on that column’s values. However, adding an index to a database table that has 4 million records can take a long time to be applied to the table, and it could block the application deployment. There is also a risk that the database query will time out and result in an error.

These changes could easily pass local and testing environments simply because the data size on these environments is not the same as in production. This risk is not restricted to creating a new index on a database table. In fact, any change that needs to be performed on a large table (e.g. adding a new column, renaming or removing columns) has the same risk.

It is highly recommended to consider the production database schema and size for all changes on the database level as early as possible during the software development process to avoid these types of crashes in the production environments.