One of the very noble goals the Ruby community which is being spearheaded by Matz is the Ruby 3x3 plan. The idea is that using large amounts of modern optimizations we can make Ruby the interpreter 3 times faster. It is an ambitious goal, which is notable and inspiring. This “movement” has triggered quite a lot of interesting experiments in Ruby core, including a just-in-time compiler and action around reducing memory bloat out-of-the-box. If Ruby gets faster and uses less memory, then everyone gets free performance, which is exactly what we all want.

A big problem though is that there is only so much magic a faster Ruby can achieve. A faster Ruby is not going to magically fix a “bubble sort” hiding deep in your code. Active Record has tons of internal waste that ought to be addressed which could lead to the vast majority of Ruby applications in the wild getting a lot faster. Rails is the largest consumer of Ruby after all and Rails is underpinned by Active Record.

Sadly, Active Record performance has not gotten much better since the days of Rails 2, in fact in quite a few cases it got slower or a lot slower.

Active Record is very wasteful

I would like to start off with a tiny example:

Say I have a typical 30 column table containing Topics.

If I run the following, how much will Active Record allocate?

a = [] Topic.limit(1000).each do |u| a << u.id end

Total allocated: 3835288 bytes (26259 objects)

Compare this to an equally inefficient “raw version”.

sql = -"select * from topics limit 1000" ActiveRecord::Base.connection.raw_connection.async_exec(sql).column_values(0)

Total allocated: 8200 bytes (4 objects)

This amount of waste is staggering, it translates to deadly combo:

Extreme levels of memory usage

and

Slower performance

But … that is really bad Active Record!

An immediate gut reaction here is that I am “cheating” and writing “slow” Active Record code, and comparing it to mega optimized raw code.

One could argue that I should write:

a = [] Topic.select(:id).limit(1000).each do |u| a << u.id end

In which you would get:

Total allocated: 1109357 bytes (11097 objects)

Or better still:

Topic.limit(1000).pluck(:id)

In which I would get

Total allocated: 221493 bytes (5098 objects)

Time for a quick recap.

The “raw” version allocated 4 objects , it was able to return 1000 Integers directly which are not allocated indevidually in the Ruby heaps and are not subject to garbage collection slots.

The “naive” Active Record version allocates 26259 objects

The “slightly optimised” Active Record version allocates 11097 objects

The “very optimised” Active Record version allocates 5098 objects

All of those numbers are orders of magnitude larger than 4.

How many objects does a “naive/lazy” implementation need to allocate?

One feature that Active Record touts as a huge advantage over Sequel is the “built-in” laziness.

ActiveRecord will not bother “casting” a column to a date till you try to use it, so if for any reason you over select ActiveRecord has your back. This deficiency in Sequel is acknowledged and deliberate:

RubyBench discussion community – 25 Aug 17 Can Sequel be configured so it defer materializes? No. Sequel does not defer typecasting. Typecasting happens at the dataset-retrieval level, not the model level. What Sequel offers instead is the lazy_attributes plugin, which does not select the column during the query, but runs a new query on...

This particular niggle makes it incredibly hard to move to Sequel from ActiveRecord without extremely careful review, despite Sequel being so incredibly fast and efficient.

We have no “fastest” example out there of an efficient lazy selector. In our case we are consuming 1000 ids so we would expect the mega efficient implementation to allocate 1020 or so objects cause we can not get away without allocating a Topic object. We do not expect 26 thousand.

Here is a quick attempt at such an implementation: (note this is just proof of concept of the idea, not a production level system)

$conn = ActiveRecord::Base.connection.raw_connection class FastBase class Relation include Enumerable def initialize(table) @table = table end def limit(limit) @limit = limit self end def to_sql sql = +"SELECT #{@table.columns.join(',')} from #{@table.get_table_name}" if @limit sql << -" LIMIT #{@limit}" end sql end def each @results = $conn.async_exec(to_sql) i = 0 while i < @results.cmd_tuples row = @table.new row.attach(@results, i) yield row i += 1 end end end def self.columns @columns end def attach(recordset, row_number) @recordset = recordset @row_number = row_number end def self.get_table_name @table_name end def self.table_name(val) @table_name = val load_columns end def self.load_columns @columns = $conn.async_exec(<<~SQL).column_values(0) SELECT COLUMN_NAME FROM information_schema.columns WHERE table_schema = 'public' AND table_name = '#{@table_name}' SQL @columns.each_with_index do |name, idx| class_eval <<~RUBY def #{name} if @recordset && !@loaded_#{name} @loaded_#{name} = true @#{name} = @recordset.getvalue(@row_number, #{idx}) end @#{name} end def #{name}=(val) @loaded_#{name} = true @#{name} = val end RUBY end end def self.limit(number) Relation.new(self).limit(number) end end class Topic2 < FastBase table_name :topics end

Then we can measure:

a = [] Topic2.limit(1000).each do |t| a << t.id end a

Total allocated: 84320 bytes (1012 objects)

So … we can manage a similar API with 1012 object allocations as opposed to 26 thousand objects.

Does this matter?

A quick benchmark shows us:

Calculating ------------------------------------- magic 256.149 (± 2.3%) i/s - 1.300k in 5.078356s ar 75.219 (± 2.7%) i/s - 378.000 in 5.030557s ar_select 196.601 (± 3.1%) i/s - 988.000 in 5.030515s ar_pluck 1.407k (± 4.5%) i/s - 7.050k in 5.020227s raw 3.275k (± 6.2%) i/s - 16.450k in 5.043383s raw_all 284.419 (± 3.5%) i/s - 1.421k in 5.002106s

Our new implementation (that I call magic) does 256 iterations a second compared to Rails 75. It is a considerable improvement over the Rails implementation on multiple counts. It is both much faster and allocates significantly less memory leading to reduced process memory usage. This is despite following the non-ideal practice of over selection. In fact our implementation is so fast, it even beats Rails when it is careful only to select 1 column!

This is the Rails 3x3 we could have today with no changes to Ruby!

Another interesting data point is how much slower pluck , the turbo boosted version Rails has to offer, is slower that raw SQL. In fact, at Discourse, we monkey patch pluck exactly for this reason. (I also have a Rails 5.2 version)

Why is this bloat happening?

Looking at memory profiles I can see multiple reasons all this bloat happens:

Rails is only sort-of-lazy… I can see 1000s of string allocations for columns we never look at. It is not “lazy-allocating” it is partial “lazy-casting” Every row allocates 3 additional objects for bookeeping and magic. ActiveModel::Attribute::FromDatabase , ActiveModel::AttributeSet , ActiveModel::LazyAttributeHash . None of this is required and instead a single array could be passed around that holds indexes to columns in the result set. Rails insists on dispatching casts to helper objects even if the data retrieved is already in “the right format” (eg a number) this work generates extra bookkeeping Every column name we have is allocated twice per query, this stuff could easily be cached and reused (if the query builder is aware of the column names it selected it does not need to ask the result set for them)

What should to be done?

I feel that we need to carefully review Active Record internals and consider an implementation that allocates significantly less objects per row. We also should start leveraging the PG gem’s native type casting to avoid pulling strings out of the database only to convert them back to numbers.

You can see the script I used for this evaluation over here: