Hi there! I’m here to tell you about a Rails gem that has saved me some time and that could save you some time too.

The gem’s name is ActiveRecord Import, but before I get into what it does let me first describe the problem I was facing.

The Problem

Imagine a big system with two databases and with lots of users, in the tens of thousands park. Each user had a first name, last name, e-mail, etc. Nothing special, a run of the mill users table. Those users were present in database 1, but also had a replica in the database 2.

On that system, these users had to have permissions to access certain pages. For a user to access a given page they had to receive an invitation from an allowed user. To represent this behaviour we added an invitations table to database 1. This table had information about both the inviter and invitee and also the page to access.

After some time and for reasons that are out of the scope of this post, we had to migrate the invitations to database 2. So the solution that we came up with at the time was to write a rake task that would migrate the invitation table.

The task was pretty straightforward. It went through all the invitations from database 1. And for each one of those invitations created a new invitation based on database 2. But remember, we had one invitation for each user. And so we were talking about tens of thousands of inserts in a database.

My first reaction was “This is going to take forever…”. I’m not the most patient person and having to supervise a task that could take a lot of time wasn’t on my priority list. So I spoke with a colleague here at Runtime Revolution to see if he knew a way to speedup my task. He pointed out that I could use a gem called ActiveRecord Import to speedup the inserts in my database.

Bulk Inserts

ActiveRecord Import is a gem that helps you do bulk insert. And before I start talking about the gem I thought it could be a good idea to mention what is a bulk insert.

Bulk insert is a database mechanism that allows you to insert many rows of data into your database. This mechanism as some advantages over doing many single inserts.

One of them is the main theme in this post and is the insert performance. What I mean by performance is the time that takes to insert a given amount of data in the database. In this perspective, bulk insert are much faster than doing many single inserts.

Another advantage that is inherent to bulk inserts is the RAM memory footprint. This footprint can be problem if you try to insert data using ActiveRecord objects. But with bulk insert there is the possibility of not having to instantiate all the rows objects. Most bulk inserts tools allow you to use data structures that have a smaller memory use, such as arrays or hash. And with those data structures you can reduce your memory use.

Yet, there are some disadvantages with the use of such a mechanism. For example, you could have ActiveRecord models with some callbacks. But those callbacks would most likely fail. This happens because you cannot guarantee that all required resources are in-memory.

ActiveRecord Import?

Well, as I said, ActiveRecord Import is a gem that helps you do bulk inserts using ActiveRecord. Sure, that doesn’t sound that exciting but it actually is.

If you have to insert thousands of records using ActiveRecord you’ll notice how slow it can be. In such a case, you’ll do what I first did in my rake task. For each user in one database, create a new user in the other database. This will generate an insert SQL statement per user, which means thousands of inserts.

That’s far from great…

In cases like that this gem will decrease your inserts from thousands to one!!! Yes, you read it right. What ActiveRecord Import does is to include all your inserts in one big insert per model.

Imagine you had two models, A and B. And you had to insert 1000 A records that each had 1000 B records. With the regular approach you would end up with 1 000 000 inserts. Using ActiveRecord Import you can greatly reduce the number of inserts and improve your performance.

Here’s a list of the great features this gem has:

Uses ActiveRecord objects, or arrays of columns and values to import data

Uses a recursive option to handle embedded related models (PostgreSQL only)

Skips or enforce the model validations

Can only import a set of the model attributes

Divides your import into batches, that is, defining the number of row per insert

Handles duplicates, either by ignoring them or define which attributes can update

To get a good feel of how easy it is to use ActiveRecord Import I’ve prepared a basic example.

Let’s try it!

Let’s say we had an application that stored books and each one of their sentences. To store that information the application uses two models, a Book model and a Sentence model. Our Book model has a title, an author, a language and a bunch of Sentences. A Sentence model record belongs to a Book and stores the text of a given sentence. Pretty simple.

We decide to get some free books from Project Gutenberg and store them in our database. To get that information on our database we write a task that first parses the books and sentences. Then it builds the records and stores them.

The task should look something like this.

What have we here?

Our task:

Reads the txt formatted books from the application’s public directory

For each book, extracts the book contents (details and sentences)

Builds the records for the Book and Sentence models without saving them

And proceeds to import the books in two different ways

I’ve included the code that handles the text extraction and building the records. But I’m not getting into the implementation details.

After getting the ActiveRecord objects the task starts saving them into the database. First by using the save method from ActiveRecord. Then using ActiveRecord Import import method like this:

Book.import(books, recursive: true)

This is how simple using ActiveRecord Import is. You only have to grab the model you want to import. Pass the ActiveRecord objects and some options and let the gem do the work. In the case above, we pass the recursive option to handle the embedded Sentence objects. But this gem has some more options/features as I mentioned earlier.

What about that Benchmark stuff??

I’m sure you’ve noticed the ruby Benchmark syntax on the task code. I’ve thrown that into the example so that you could see the impact this gem can have in this task performance.

This task used 6 books (including a Portuguese classic!). The PragmaticSegmenter gem extracted a total of 175,698 sentences from those books. Here are the books/sentences details:

After running the task three times with these books I’ve got these results:

These results show us how slow the normal way of importing data with ActiveRecord can be. Getting on average a speed-up of around 32x. I would say it’s pretty good result for the amount of effort you have to put in to use ActiveRecord Import.

What have we learned?

ActiveRecord Import can be a great tool that requires almost no effort to use. I’m sure this is not a silver bullet for all your data importing problems.

This was a simple toy example to show the capabilities of this gem. But let’s face it, if you have a job that takes hours I bet ActiveRecord Import would reduce the time your job takes.

You can check out the example application here: