TL;DR: Subclassing core classes in Ruby can lead to unexpected side effects. I suggest composition over inheritance in all these cases.

Subclassing Review

If you’re familiar with the concept of subclassing, skip down to “The Problem.”

In Ruby, you can make your own classes:

class List end

You can also make subclasses of those classes:

class OrderedList < List end puts OrderedList.new.kind_of?(List) # => true

Now, subclassing represents an “is a” relationship. This means that our OrderedList should be a List in every respect, but with some added behavior. The Liskov Substitution Principle is one formulation of this idea.

The Problem

Ruby has two major bits of code that it provides for your use: the core library and the standard library. The core library can be found here, and contains cllasses that you know and love, like String , Hash , and Array . The standard library can be found here, and contains your favorite hits, like CSV , JSON , and Logger .

One way to think about the difference between core and the standard library is that core is written in C, while the standard library is written in Ruby. Core are the classes that are used the most, so they’re implemented in as low-level a fashion as possible. They’ll be in every single Ruby program, so might as well make them fast! The standard library only gets pulled in by bits and pieces; another way of thinking about the difference is that you need to require everything in the standard library, but nothing in core.

What do you think this code should do?

class List < Array end puts List.new.to_a.class

If you said “it prints Array ,” you’d be right. This behavior really confuses me, though, because List is already an Array ; in my mind, this operation shouldn’t suddenly change the class.

Why does this happen? Let’s check out the implementation of [Array#to_a](https://github.com/ruby/ruby/blob/trunk/array.c#L2064-L2082) :

static VALUE rb_ary_to_a(VALUE ary) { if (rb_obj_class(ary) != rb_cArray) { VALUE dup = rb_ary_new2(RARRAY_LEN(ary)); rb_ary_replace(dup, ary); return dup; } return ary; }

If the class is not an Array , (represented by rb_cArray ), then we make a new array of the same length, call replace on it, and then return the new array. If this C scares you, here’s a direct port to pure Ruby:

def array_to_a(ary) if ary.class != Array dup = [] dup.replace(ary) return dup end return ary end array_to_a(List.new).class # => Array

So why do this? Well, again, this class will be used all over the place. For example, I made a brand new Rails 4 application, generated a controller and view, and put this in it:

ObjectSpace.count_objects[:T_ARRAY]: <%= ObjectSpace.count_objects[:T_ARRAY] %>

ObjectSpace allows you to inspect all of the objects that exist in the system. Here’s the output:

rails arrays

That’s a lot of arrays! This kind of shortcut is generally worth it: 99.99% of the time, this code is perfect.

That last 0.01% is the problem. If you don’t know exactly how these classes operate at the C level, you’re gonna have a bad time. In this case, this behavior is odd enough that someone was kind enough to document it.

Here’s the Ruby version of what I’d expect to happen:

def array_to_a2(ary) return ary if ary.is_a?(Array) dup = [] dup.replace(ary) dup end array_to_a2(List.new).class # => List

This has the exact same behavior except when we’re already dealing with an Array, which is what I’d expect.

Let’s take another example: reverse.

l = List.new l << 1 l << 2 puts l.reverse.class # => Array

I would not expect that calling #reverse on my custom Array would change its class. Let’s look at the C again:

static VALUE rb_ary_reverse_m(VALUE ary) { long len = RARRAY_LEN(ary); VALUE dup = rb_ary_new2(len); if (len > 0) { const VALUE *p1 = RARRAY_RAWPTR(ary); VALUE *p2 = (VALUE *)RARRAY_RAWPTR(dup) + len - 1; do *p2-- = *p1++; while (--len > 0); } ARY_SET_LEN(dup, RARRAY_LEN(ary)); return dup; }

We get the length of the array, make a new blank array of the same length, then do some pointer stuff to copy everything over, and return the new copy. Unlike #to_a , this behavior is not currently documented.

Now: you could make the case that this behavior is expected, in both cases: after all, the point of the non-bang methods is to make a copy. However, there’s a difference to me between “make a new array with this stuff in it” and “make a new copy with this stuff in it”. Most of the time, I get the same class back, so I expect the same class back in these circumstances.

Let’s talk about a more pernicious issue: Strings.

As you know, the difference between interpolation and concatenation is that interpolation calls #to_s implicitly on the object it’s interpolating:

irb(main):001:0> "foo" + 2 TypeError: no implicit conversion of Fixnum into String from (irb):1:in `+' from (irb):1 from /opt/rubies/ruby-2.0.0-p195/bin/irb:12:in `<main>' irb(main):002:0> "foo#{2}" => "foo2" irb(main):001:0> class MyClass irb(main):002:1> def to_s irb(main):003:2> "yup" irb(main):004:2> end irb(main):005:1> end => nil irb(main):006:0> "foo#{MyClass.new}" => "fooyup"

So what about a custom String ?

class MyString < String def to_s "lol" end end s = MyString.new s.concat "Hey" puts s puts s.to_s puts "#{s}"

What does this print?

$ ruby ~/tmp/tmp.rb HeylolHey

That’s right! With String s, Ruby doesn’t call #to_s : it puts the value in directly. How does this happen?

Well, dealing with string interpolation deals with the parser, so let’s check out the bytecode that Ruby generates. Thanks to Aaron Patterson for suggesting this approach. <3

irb(main):013:0> x = RubyVM::InstructionSequence.new(%q{puts "hello #{'hey'}"}) => <RubyVM::InstructionSequence:<compiled>@<compiled>> irb(main):014:0> puts x.disasm == disasm: <RubyVM::InstructionSequence:<compiled>@<compiled>>========== 0000 trace 1 ( 1) 0002 putself 0003 putstring "hello hey" 0005 opt_send_simple <callinfo!mid:puts, argc:1, FCALL|ARGS_SKIP> 0007 leave => nil irb(main):015:0> x = RubyVM::InstructionSequence.new(%q{puts "hello #{Object.new}"}) => <RubyVM::InstructionSequence:<compiled>@<compiled>> irb(main):016:0> puts x.disasm == disasm: <RubyVM::InstructionSequence:<compiled>@<compiled>>========== 0000 trace 1 ( 1) 0002 putself 0003 putobject "hello " 0005 getinlinecache 12, <ic:0> 0008 getconstant :Object 0010 setinlinecache <ic:0> 0012 opt_send_simple <callinfo!mid:new, argc:0, ARGS_SKIP> 0014 tostring 0015 concatstrings 2 0017 opt_send_simple <callinfo!mid:puts, argc:1, FCALL|ARGS_SKIP> 0019 leave => nil

You can see with a string, the bytecode actually puts the final concatenated string. But with an object. it ends up calling tostring , and then concatstrings .

Again, 99% of the time, this is totally fine, and much faster. But if you don’t know this trivia, you’re going to get bit.

Here is an example from an older version of Rails. Yes, you might think “Hey idiot, there’s no way it will store your custom String class,” but the whole idea of subclassing is that it’s a drop-in replacement.

I know that there’s some case where Ruby will not call your own implementation of #initialize on a custom subclass of String , but I can’t find it right now. This is why this problem is so tricky: most of the time, things are fine, but then occasionally, something strange happens and you wonder what’s wrong. I don’t know about you, but my brain needs to focus on more important things than the details of the implementation.

Since I first wrote this post, James Edward Gray II helped me remember what this example is. One of the early exercises in http://exercism.io/ is based on making a DNA type, and then doing some substitution operations on it. Many people inherited from String when doing their answers, and while the simple case that passes the tests works, this case won't:

class Dna < String def initialize(*) super puts "Building Dna: #{inspect}" end end result = Dna.new("CATG").tr(Dna.new("T"), Dna.new("U")) p result.class p result

This prints:

Building Dna: "CATG" Building Dna: "T" Building Dna: "U" Dna "CAUG"

It never called our initializer for the new string. Let's check the source of #tr :

static VALUE rb_str_tr(VALUE str, VALUE src, VALUE repl) { str = rb_str_dup(str); tr_trans(str, src, repl, 0); return str; }

rb_str_dup has a pretty simple definition:

VALUE rb_str_dup(VALUE str) { return str_duplicate(rb_obj_class(str), str); }

and so does str_duplicate :

static VALUE str_duplicate(VALUE klass, VALUE str) { VALUE dup = str_alloc(klass); str_replace(dup, str); return dup; }

So there you have it: MRI doesn't go through the whole initialization process when duplicating a string: it just allocates the memory and then replaces the contents.

If you re-open String , it's also weird:

class String alias_method :string_initialize, :initialize def initialize(*args, &block) string_initialize(*args, &block) puts "Building MyString: #{inspect}" end end result = String.new("CATG").tr("T", "U") # => Building MyString: "CATG" p result.class # => String p result # => "CAUG"

Again, unless you know exactly how this works at a low level, surprising things happen.

The Solution

Generally speaking, subclassing isn’t the right idea here. You want a data structure that uses one of these core classes internally, but isn’t exactly like one. Rather than this:

class Name < String end

do this:

require 'delegate' class Name < SimpleDelegator def initialize super("") end end

This allows you to do the same thing, but without all of the pain:

class Name def to_s "hey" end end "#{Name.new}" # => "hey"

However, this won’t solve all problems:

require 'delegate' class List < SimpleDelegator def initialize super([]) end end l = List.new l << 1 l << 2 puts l.reverse.class # => Array

In general, I’d prefer to delegate things manually, anyway: a Name is not actually a drop-in for a String it’s something different that happens to be a lot like one:

class List def initialize(list = []) @list = list end def <<(item) @list << item end def reverse List.new(@list.reverse) end end l = List.new l << 1 l << 2 puts l.reverse.class # => List

You can clean this up by using Forwardable to only forward the messages you want to forward:

require 'forwardable' class List extend Forwardable def_delegators :@list, :<<, :length # and anything else def initialize(list = []) @list = list end def reverse List.new(@list.reverse) end end l = List.new l << 1 l << 2 puts l.reverse.class # => List

Now you know! Be careful out there!