Today we have something about a problem well-known to experienced developers. We will discuss the N+1 queries problem and how to deal with it using ActiveRecord.

UPDATE 14.06.2018: Updated information about the #includes and #preload methods.

N+1 queries problem

This is a typical and old problem well known to all experienced coders. What is it about? Let's assume that we have a Clients table. Each record in that table is represented by the Client model in our simple app. We also have an Addresses table and the corresponding Address model. These entities are related - one client can have many addresses, and every address has exactly one owner (a Client). Overall, a simple and typical one-to-many association.

The problem occurs when we try to fetch the 1-to-many relationship for more than one parent object (in our case, it's Client).

Client.all.each { |client| client.addresses.to_a }

(calling "to_a" forces our model to fetch data from the database)

This code will trigger an N+1 number of queries, where N is the number of clients. For example, if we have 1000 clients - we will trigger 1001 queries. One to fetch all clients, and one for to fetch each client's address.

I do not want to say that there is something wrong with that approach. Not at all. It is called "lazy loading" and could be really useful, but it really depends on the specific situation.

Solution #1 - preload

We will use the #preload method offered by Rails' ActiveRecord.

What do we achieve by that doing? The code above will only trigger two queries! One to fetch all the clients, and one to fetch associated addresses for all clients at once. 1001 then, 2 now. Sounds good.

No magic or complicated logic here :) Just a simple SELECT for clients, and another SELECT (using te IN clause) for addresses.

Solution #2 - eager_load

#eager_load also preloads associations, but it does it in one query! Ok... do not get too excited :) Less does not always means better .

Client.eager_load(:addresses).each { |client| client.addresses.to_a }

Any magic here? No, not really. It just creates an SQL query using LEFT OUTER JOIN - and this type of join is very important. LEFT OUTER JOIN guarantees that we will receive all clients - even if they do not have any associated addresses (as opposed to INNER JOIN - see below).

Solution #3 - includes

There is some magic behind the #includes method, but the documentation is not clear about this. How many queries will be called depends on what additional instructions you use (where / order etc). If we add additional conditions related to preloading associations, this method will behave like the eager_load method. Otherwise, it will behave like the preload method.

Client.includes(:addresses)

The code above will behave the same as if we had used the preload method (two queries will be triggered).

Client.includes(:addresses).where('addresses.city = ?', 'smth').references(:addresses)

This code will trigger one query (using LEFT OUTER JOIN), so it will work just like the eager_load method.

joins

Not marked as a solution, because the #joins method does not preload assocations! It is just an interface to create an SQL query using INNER JOIN. Because of how SQL's INNER JOIN works, the following code snippet will not return all of our clients :

Client.joins(:addresses)

Why? It will skip records for which the join condition is satisfied. In our case: it will skip clients without any addresses, so we will receive only clients with at least one address.

No preloading any associations here. Calling :

Client.joins(:addresses).each { |client| client.addresses.to_a }

will trigger 1001 queries to our database.

Another important thing: Assume that 600 of our clients have at least one address associated. Will we receive the result with 600 client objects? The answer is: 600 is the minimum number. Again, this is because of how SQL's INNER JOIN works. Such a query will return a record for all matching pairs, so invoking it will return each client with each address separately. We can easily get rid of duplicates using the #distinc t method.

Benchmarks

Let's compare the performance of each solution. In my database, I have 1000 clients, and each client has 3 addresses associated. We will measure the real time of calling "to_a" on each client's address associations (just to force loading data from database). I used an average of 10 measurements, without enabling caching.

Time [s] Queries [-] Memory allocated (Memory retained) [bytes] lazy-loading 1.201 1001 297139 (1104) includes / preload 0.142 2 128105 (1070) eager_load 0.162 1 125954 (1143) joins 2.875 1001 873977 (1191)

While comparing lazy-loading to includes, we observe an 88% reduction in time spent performing the query! We can also see the benefits in the amount of allocated memory. Brilliant! Does it mean that we should always preload all associations?

No, absolutely not! We do not always need to preload related data. Sometimes lazy-loading is very useful and more efficient. It really depends on the situation.

I just want you to know that the "N+1 queries" problem exists, how to deal with it, and whether fighting it can be beneficial to your app.