Incidents happen all the time because of bad code deploys. You write some code that passes code review, it then is automatically shipped to production after a test suite passes, and BAM, an outage happens. This fairly common occurrence has ways to prevent it entirely. Using some simple ideas we can defend ourselves from the hidden mistakes that code reviews and chaos engineering sometimes won’t catch.

Defensive programming is great for codifying how a bug could be introduced, and raising an error right before it would happen, or choosing an alternative path.

If you have a style guide that specifies things like

“Always use concurrent indexes”

“Always have a timeout on a remote read”

“Never constrain to the account model in a foreign key”

Then this guide is for you.

Defending against bad database migrations

There are dozens of ways to brick a website with a database migration. A super common mistake is adding an index to a table that has millions of records in it and is critical for a system to work. For many websites, this might be your users table, or in our case, the change events table. Since we use Postgres, our table will lock when an index is added to it. This prevents writes to it while the index is being created. With very large tables, though, this process can take a longer than ideal amount of time.

Adding indexes with Rails is drop dead simple, so it’s easy to miss in a code review this extremely harmful situation. For example, we can add an index to a massive table with 5 lines of code:

class AddIndexToTable < ActiveRecord::Migration\[5.2\] def change add\_index :huge\_table, :user\_id end end

This will lock our huge _ table until the index is done creating. This means we’re out of luck if we want to write data, our program will just hang waiting to do it and eventually time out (hopefully).

So how do we defend against it?

Using the power of Ruby, we can actually wrap the method add _ index in migrations to add a check if an index is not concurrent. This is possible with the use of the prepend method for Ruby modules.

First, let’s define a class at config/initializers/defend _ noncurrenct _ indexes.rb and fill it with:

ActiveRecord::Base.connection module DefendNonConcurrentIndexes Error = Class.new(StandardError) def add\_index(table\_name, column\_name, options = {}) unless options\[:algorithm\].to\_s == 'concurrently' raise Error, 'attempt to add index without concurrent algorithm' end super end end

This forces a connection to the database and then defines a module called DefendNonConcurrentIndexes. We define the method add _ index with the same signature as the migration one defined in ActiveRecord::ConnectionAdapters::SchemaStatements. (See ActiveRecord::ConnectionAdapters::SchemaStatements.

In this code, we can check that the algorithm key is set to the value of concurrently, if not, raise an error saying that it must have the concurrency flag set.

Now, we need to include this module in our connection adapter class:

ActiveRecord::ConnectionAdapters::PostgreSQLAdapter.class _ eval do prepend DefendNonConcurrentIndexes end

Now, whenever we try to run a migration that doesn’t create an index concurrently, we’ll see:

$ rails db:migrate \== 20190729145605 AddIndexToUsers: migrating ================================== \-- add\_index(:users, :name) rails aborted! StandardError: An error has occurred, this and all later migrations canceled: attempt to add index without concurrent algorithm

Great! But, when we add the correct key, the error can be a bit cryptic:

PG::ActiveSqlTransaction: ERROR: CREATE INDEX CONCURRENTLY cannot run inside a transaction block

We can help any future developer that hits this by providing a hint, let’s modify our defense code to add a nice statement about it.

module DefendNonConcurrentIndexes Error = Class.new(StandardError) def add\_index(table\_name, column\_name, options = {}) unless options\[:algorithm\].to\_s == 'concurrently' raise Error, 'attempt to add index without concurrent algorithm' end super rescue ActiveRecord::StatementInvalid => exception if exception.message.include?('CREATE INDEX CONCURRENTLY cannot run inside a transaction block') raise Error, "Need to specify disable\_ddl\_transaction! in migration to use this type of index" end raise exception end end

Now when we fix the algorithm but forget to disable the transaction, we’ll get a nice hint for that as well.

Defending against indexes that omit the concurrent algorithm a table is a great way to prevent outages from read locks in Postgres. With ~20 lines you can defend and break a CI suite before anything ever gets deployed.

Note: If you’re using Rubocop, you can create a check for this in that as well, for example GitLab has done this here.

Defending against foreign key reversal

At FireHydrant we associate all customer data to an account using an account _ id, that means every table has this column with a foreign key constraint attached. We also prefer using CASCADE on these foreign keys so we don’t need to rely on our models to define dependent: :destroy. This has an absolutely terrifying possibility though: What if a key is reversed to the accounts table?

For example, if we add a foreign key constraint that says “accounts table depends on this users table” and then delete a user from that table… POOF. The account record will also be deleted, and all of the data associated due to our foreign key. That would be a very bad day for us.

This scares the bejesus out of us so we defend against it. A user can create the migration no problem, but we check everytime the application boots. This means the developer will likely catch it before it even hits CI. And if it does hit CI, it will break the build anyways.

module DefendReversedForeignKeys Error = Class.new(StandardError) end ActiveRecord::Base.connection.tables.each do |table\_name| foreign\_keys = ActiveRecord::Base.connection.foreign\_keys(table\_name) foreign\_keys.each do |key| if key.from\_table == ‘accounts’ raise DefendReversedForeignKeys::Error, “a foreign key exists from #{key.from\_table} to #{key.to\_table}, this is reversed and could cause catastrophic damage. Key name: #{key.options\[:name\]}" end end end

First we need to find all of the tables that are in the database, then we get all of the foreign keys associated with the table. If we find a foreign key that has been defined in our database that could wreak havoc, we raise an error.

This could also be used to check if foreign keys are missing on tables as well! Tweak as needed.

Always have a timeout

Timeouts by nature are a version of defensive programming. They prevent locking a program forever by giving up after a set amount of time. It makes sense to always have a timeout on for a remote read from a server, but in Ruby… that’s hard.

The problem is that Ruby actually has a terrible Net interface and no way to set global timeouts for open or read operations. So we have to do something similar to our add _ index defense.

require 'net/http' module DefendNetTimeouts Error = Class.new(StandardError) def initialize(\*) super \# Guarantee our own timeouts for Net::HTTP clients self.open\_timeout = 10 self.read\_timeout = 10 end private def request(\*) if open\_timeout > 10 raise Error, "request attempted with an open timeout > 10 seconds (was #{open\_timeout})" end if read\_timeout > 10 raise Error, "request attempted with a read timeout > 10 seconds (was #{read\_timeout})" end super end end Net::HTTP.class\_eval do prepend DefendNetTimeouts end

Ok, this is a partial solution to a larger problem. Because Ruby has such a poorly designed HTTP library in its standard package, several other gems exist to compensate. This will only ensure timeouts are set on any Net::HTTP call. But if you use something like Typhoeus (or a gem you use uses it), you’re out of luck. Luckily most of those libraries do offer global timeouts.

The one nice thing about this implementation is that it makes sure all of the Net::HTTP calls have a timeout. So if you’re using several gems that make Net::HTTP calls (for example RestClient), this will defend against those really well.

Conclusion

Defending against common mistakes is a great way to prevent accidental incidents. Pull requests are nice to check logic, but hunting for things like reversed foreign keys is not a good thing for engineers to focus on, let code do that for you!