It's often necessary to make a copy of a value in Ruby. While this may seem simple, and it is for simple objects, as soon as you have to make a copy of a data structure with multiple array or hashes on the same object, you will quickly find there are many pitfalls.

Objects and References

To understand what's going on, let's look at some simple code. First, the assignment operator using a POD (Plain Old Data) type in Ruby.

a = 1

b = a

a += 1

puts b

Here, the assignment operator is making a copy of the value of a and assigning it to b using the assignment operator. Any changes to a won't be reflected in b. But what about something more complex? Consider this.

a = [1,2]

b = a

a << 3

puts b.inspect

Before running the above program, try to guess what the output will be and why. This is not the same as the previous example, changes made to a are reflected in b, but why? This is because the Array object is not a POD type. The assignment operator doesn't make a copy of the value, it simply copies the reference to the Array object. The a and b variables are now references to the same Array object, any changes in either variable will be seen in the other.

And now you can see why copying non-trivial objects with references to other objects can be tricky. If you simply make a copy of the object, you're just copying the references to the deeper objects, so your copy is referred to as a "shallow copy."

What Ruby Provides: dup and clone

Ruby does provide two methods for making copies of objects, including one that can be made to do deep copies. The Object#dup method will make a shallow copy of an object. To achieve this, the dup method will call the initialize_copy method of that class. What this does exactly is dependent on the class. In some classes, such as Array, it will initialize a new array with the same members as the original array. This, however, is not a deep copy. Consider the following.

a = [1,2]

b = a.dup

a << 3

puts b.inspect

a = [ [1,2] ]

b = a.dup

a[0] << 3

puts b.inspect

What has happened here? The Array#initialize_copy method will indeed make a copy of an Array, but that copy is itself a shallow copy. If you have any other non-POD types in your array, using dup will only be a partially deep copy. It will only be as deep as the first array, any deeper arrays, hashes or other objects will only be shallow copied.

There is another method worth mentioning, clone. The clone method does the same thing as dup with one important distinction: it's expected that objects will override this method with one that can do deep copies.

So in practice what does this mean? It means each of your classes can define a clone method that will make a deep copy of that object. It also means you have to write a clone method for each and every class you make.

A Trick: Marshalling

"Marshalling" an object is another way of saying "serializing" an object. In other words, turn that object into a character stream that can be written to a file that you can "unmarshal" or "unserialize" later to get the same object. This can be exploited to get a deep copy of any object.

a = [ [1,2] ]

b = Marshal.load( Marshal.dump(a) )

a[0] << 3

puts b.inspect

What has happened here? Marshal.dump creates a "dump" of the nested array stored in a. This dump is a binary character string intended to be stored in a file. It houses the full contents of the array, a complete deep copy. Next, Marshal.load does the opposite. It parses this binary character array and creates a completely new Array, with completely new Array elements.

But this is a trick. It's inefficient, it won't work on all objects (what happens if you try to clone a network connection in this way?) and it's probably not terribly fast. However, it is the easiest way to make deep copies short of custom initialize_copy or clone methods. Also, the same thing can be done with methods like to_yaml or to_xml if you have libraries loaded to support them.