At one point or another I think we have all heard the popular developer mantra “Make it work, make it right, make it fast.” As remarkably simple as that sounds the “make it fast” part can be tricky to approach. In my experience when it came to researching performance optimizations the wealth of information and tooling left me feeling overwhelmed.

I did however, eventually come across one of the most common performance antipatterns (N + 1) and found it to be a great place to start. In this post I will go through an exercise I used to approach this problem using Ruby’s Activerecord library, and use batching as a technique to solve it.

What is N + 1?

In short N + 1 is a query problem. If you use Ruby’s Activerecord library you know it “insulates” you from having to write SQL. Each of those queries has overhead and if you are not careful that aforementioned insulation can come back to bite you.

Let’s get started.

The relationships between the tables to be queried are such that a User has many albums through UserAlbums, and Albums have many Users also through UserAlbums.

(N + 1) occurs when querying Parent/Child relationships (like the ones defined above) in relational databases like mySQL.

The queries to be issued to the database will come from the code below. The plan is to iterate through each user in our database, grab their associated albums, and print the names of each album to the console. Note: A use case for a method like this might be when one wants to render all user data to an index page on the web.

On line 1 you will see Activerecord’s “Benchmark” class with the method “measure” being called on it. That is how we will measure the execution time of the code. Also, notice the block is appended with “.real” which reports the execution time of the code within the block and returns the real time execution of that block in milliseconds.

When this code is executed Activerecord generates the below SQL queries under the hood.

Activerecord is actually using an INNER JOIN here to help prevent this from becoming an 2(n) problem, but what that means will become more clear by the end of this post. Thanks Activerecord!

Let’s break these queries down. To retrieve our data Activerecord writes 1 SELECT query to retrieve all users, and another SELECT for every user’s associated album data.

During this exercise I found it help to think of this as 1 + N because of the order in which the queries are issued. Flipping these addends does not change the sum so let’s break it down with that in mind.

The “1” is the query on line one that selects all users.

“N” is the number of additional and separate queries needed to be made in order to retrieve each user’s album data. (Line 2 of the code snippet above will be issued for every associated album.)

queries needed to be made in order to retrieve each user’s album data. (Line 2 of the code snippet above will be issued for every associated album.) In summary if we issued queries for 10,000 user’s and each user had an associated album the total amount of queries issued would be 10,001. That’s a lot of overhead!

In this example we will issue queries for 10,000 user’s data and then a separate set of queries limited to 1,000 users. Though for dramatic effect we’ll start by querying all 10,00 users and their associated data. Let’s take a look at N + 1 in action.

Each one of the purple and blue lines you see above are individual queries for each user’s associated album(s).

Initially seeing this mesmerizing console activity and noticing people look over my shoulder in the coffee shop made me feel like hackerman, but in reality this a problem… especially in terms of scalability. On average these 10,001 queries took a whopping 12.3 seconds to finish executing. If only Activerecord provided some powerful way(s) to help load the data we need before iterating over it…