First Thing’s First: The Database

As our apps grow larger, they inevitably tend to hold more and more data, and require increasingly complex queries. And because CPUs are fast, disks are slow, and our apps don’t usually require too much computation, sites are frequently I/O bound. This is almost guaranteed to be the case if you’re neglecting to optimize your database and queries completely.

Indexing: Why and When

One of the lowest-hanging pieces of fruit in a Rails app (or any database-connected framework) is indexing. An often-used analogy for database indexing is the index in the back of a book; rather than having to read the whole book, you can scan an alphabetized index for the location of what you’re looking for, and then skip to that page. It’s the same with indexing; it lets our database know where to find what we’re asking for.

Generally, any column you frequently use to find ( WHERE ) or sort ( ORDER ) results is probably a candidate for an index. Foreign keys should typically also be indexed to improve performance. You can check your logs (or the Rails server output in the console) to check which queries are slow.

This might not matter if you’ve only got a few hundred rows of data — if you’re just keeping a record of all the Coldplay fans in the world, for example. But as you start to get tens or hundreds of thousands of rows per table, this can add up quickly. In steps indexing.

ActiveRecord already does some indexing for us automatically — every entry has its own id , serving as its primary key. Primary keys are indexed by default; you may have noticed that finding something by id is faster than, say, by name .

But you probably don’t always look for things by id. A user likely wants to search for things by their name, or some of their properties. If you find that this is the case, it may be time to index the columns of the table you frequently use to locate an item.

Let’s add an index to name on our model. (Let’s say it’s a Person model). After generating the migration in console ( rails g migration add_index_to_name_column_on_person) , we add the index:

class AddIndexToNameColumnOnPerson < ActiveRecord::Migration[5.2]

def change

add_index :people, :name

end

end

Let’s see how this plays out in a database with about 30,000 records.

Before: Person.find_by(name: "Jon Snow") => 25.3ms

After: Person.find_by(name: "The Hound") => 0.6ms

Not too bad. We didn’t know where in Westeros Jon Snow was, and so our search took over 25ms. However, because we knew The Hound was at the local tavern ordering chicken, we found him immediately. (Then backed away slowly.)

This trend happily continues even if we’ve got 100 Jon Snows in all of Westeros; it doesn’t take much longer to find 100 indexed Jon Snows than one. Generally speaking, this is when indexing is most useful; when most values are unique. It generally makes more sense to index on relatively unique things like names, than, for example, hair color or gender.

If we’ve got 100,000 rows and you’re looking for Jon Snow by sex, that’s about 50,000 entries you’ve still got to go through to find him… it doesn’t save much time, and because you’ve put an index on 100,000 items, you’ve also taken up space (indexes aren’t free, so if you don’t need them, don’t use them).

What if, though, we’ve got a category which has a bunch of options, only one of which we actually care about? Maybe we’ve got a ‘classification’ column in the Person table, and 99.5% of people are commoners, merchants and soldiers, while the group we care about — heroes — comprises only ~0.5%? Wouldn’t indexing the whole column just to search for heroes be kind of wasteful? (Yes.)

Partial Indexing

Partial indexing allows us to index a subset of a column. So if we find we’ve got an attribute (like ‘classification’) which we frequently use to locate a small group of records, we can index only the rows in that column that match the criteria we’re looking for — in this case, heroes.

Note: I’ve shortened “classification” to “cls” below for the sake of space and formatting; you would of course have to use the full column name ‘classification’ for this to work properly.

class AddPartialIndexToPersonOnCls < ActiveRecord::Migration[5.2]

def change

add_index :people, :cls, where: "(cls = hero)"

end

end

This speeds things up, saves space and doesn’t waste memory. If partial indexing aligns with your needs, you should absolutely use it instead of a full index.

Pluck

A friend of mine was recently building an app, and complained about how it took around a minute to load a particular page for the first time. When I looked at his controller action, it looked something like this:

@items = Item.all.map { | item | [ item.name, item.category ] }

All he wanted was to get each item’s name and category into its own array, but he was retrieving the full Item table (a model which had dozens of columns), converting them to full-blown Ruby objects with .map , and finally mapping over them to get their name and category.

Unsurprisingly, this was a bit of a problem as he had around 80,000 items .

We modified his controller action to use the pluck method instead, and the time dropped from ~60 seconds to around three*:

@items = Item.pluck(:name, :category)

The moral of the story? Pluck, along with its cousin select, can save a ton of memory and keep your site from slowing to a crawl.

*We later paginated his results, dropping his page loads to comfortably under a second