Rails 6 bulk insert records

3 minute read

There are cases when we want to insert records in bulk. For eg., we have a list of users in CSV and we want to import these users in our application.

Rails has methods like delete_all or update_all to delete and update bulk records respectively. Similar method to insert bulk records was missing.

Rails 6 added insert_all , insert_all! and upsert_all to ActiveRecord::Persistence , to solve the above issue.

Before Rails 6

Before Rails 6, bulk insert was achieved -

Using activerecord-import gem

users = [] 10 . times do | i | users << User . new ( name: "user #{ i } " ) end User . import users

Creating records one by one

10 . times do | i | User . create! ( name: "user #{ i } " ) end

Using the SQL INSERT statement

# Assuming users is an array of user hash # like [{ name: "Sam" }, { name: "Charls" }] sql = "INSERT INTO users VALUES " sql_values = [] users . each do | user | sql_values << "( #{ user . values . join ( ", " ) } )" end sql += sql_values . join ( ", " ) ActiveRecord :: Base . connection . insert_sql ( sql )

In Rails 6

insert_all and insert_all!

Using insert_all we can perform bulk insert as below -

result = User . insert_all ( [ { name: "Sam" , email: "sam@example.com" }, { name: "Charls" , email: "charls@example.com" } ] ) # Bulk Insert (2.3ms) INSERT INTO "users"("name","email") # VALUES("Sam", "sam@example"...) # ON CONFLICT DO NOTHING RETURNING "id" puts result . inspect #<ActiveRecord::Result:0x00007fb6612a1ad8 @columns=["id"], @rows=[[1], [2]], @hash_rows = nil , @column_types = { "id" => #<ActiveModel::Type::Integer:0x00007fb65f420078 ....> puts User . count => 2

As mentioned above, we notice an ON CONFLICT DO NOTHING clause in the query. This is supported by SQLite and PostgreSQL databases. If there is a conflict or a unique key violation during the bulk insert process, it skips the conflicting record and proceeds with the insertion of the next record.

If we need to ensure all rows are inserted we can use insert_all! the bang version directly.

RETURNING "id" clause in the above query return the primary key(s) @rows=[[1], [2]] . If we want to check more attributes apart from id we can use an optional returning option, which expects an array of attribute names.

result = User . insert_all ( [ { name: "Sam" , email: "sam@example.com" }, { name: "Charls" , email: "charls@example.com" } ], returning: %w[id name] ) # Bulk Insert (2.3ms) INSERT INTO "users"("name","email") # VALUES("Sam", "sam@example"...) # ON CONFLICT DO NOTHING RETURNING "id", "name" puts result . inspect #<ActiveRecord::Result:0x00007fb6612a1ad8 @columns=["id", "name"], @rows = [[ 1 , "Sam" ], [ 2 , "Charls" ]], @hash_rows = nil , @column_types = { "id" => #<ActiveModel::Type::Integer:0x00007fb65f420078 ....>

upsert_all

insert_all and insert_all! either skip the duplicate records or raise an exception, if a duplicate record is encountered while bulk inserting.

If a record exists we want to update it or else create a new record. This is known as upsert.

The upsert_all method performs bulk upserts.

result = User . upsert_all ( [ { id: 1 , name: "Sam new" , email: "sam@example.com" }, { id: 1 , # duplicate id here name: "Sam's new" , email: "sam@example.com" }, { id: 2 , name: "Charles" , # name updated email: "charls@example.com" }, { id: 3 , # new entry name: "David" , email: "david@example.com" } ] ) # Bulk Insert (26.3ms) INSERT INTO `users`(`id`,`name`,`email`) # VALUES (1, 'Sam new', 'sam@example.com')... # ON DUPLICATE KEY UPDATE `name`=VALUES(`name`) puts User . count => 3

The second row in the input array has duplicate id 1 and hence the name of the user will be Sam's new instead of Sam new .

The third row in the input array has no duplicate and it will perform an update.

The fourth row with id 3 is a new entry and hence insert operation will be performed here.

Note: upsert_all behaviour is different for different databases. The above example works for MySQL database but will not work for SQLite .

Inserting or updating records sequentially has huge performance issue.

Let’s try to insert 1000 users sequentially and benchmark it.

print Benchmark . measure { 1000 . times { | t | User . create ( name: "name - #{ t } " )} } 7.913459 1.129483 9.439012 ( 15.329382 ) => nil

1000 users created locally in 15.32 seconds in 1000 transactions executing 1000 sequential insert query.

Using insert_all we can prepare dataset of users and import them in one query.

users = 1000 . times . map { | t | { name: "name - " #{t}", created_at: Time.now, updated_at: Time.now }} print Benchmark . measure { User . insert_all ( users ) } 0.267381 0.018721 0.298123 ( 0.401876 )

As seen the time taken to import 1000 users has reduced from 15.32 seconds to just 0.40 seconds.