Today I want to talk about the perils of passing by reference in Ruby. Now, a vast majority of the time, this default in Ruby is magnificent. It's faster, and it saves you memory. However, there are times where it can create a really nasty bug that's hard to find, and hard to squash correctly.

In Ruby, when passing values into a method, or when assigning an object to another object, you will almost always have a reference to the initial object, rather than a copy of its value. To those of you who don't know what passing by reference means, it simply means that any changes you make to the latter variable, or the parameter inside of the method will also make those same changes to the initial object. Take this code snippet for example.

You will see that the initial foo object has changed, even though we changed it inside of a method. Here is another example with assigning to a new var instead of working inside of a method.

Now, in some cases this is exactly what you want, but in some others, this is quite the opposite of what you want.

Recently, I ran in to this exact issue, albeit a slightly more complicated version. In the app I am working on, we have a text parsing algorithm so that we can send rich notifications to users dynamically instead of manually creating the notification text. Here's a simplified version of what it looks like.

To explain, we look for a token in the text template called '{{link_to_post}}' and we replace it with an actual link to the post including the post id we pull from the notification object, and then we return the new text template. Here's a section from the code where I was calling the parse_text method.

Do you see the issue? It took me a very long time, and the help of our lead developer to find it. When I was sending batch notifications like this, all of them would have the same exact text, which means they would all link to the same post, instead of the actual post id I added into the notification object.

The reason for this is the pass by reference functionality in Ruby. When I pass notification_type.text_template in to the parse_text method described above, not only was it outputting the parsed text, it was changing the value of the text template itself on the notification_type object! So after it ran through the first user_post in user_posts, the notification_type.text_template value did not have the '{{link_to_post}}' tag in it, but the full link to the first post. From then on inside the loop, all of the other notifications were getting the value of the initial notification.

In my case, I was actually quite lucky, as there is an easy fix for this issue. In Ruby, the Object class has two methods on it that I could use to solve my issue, dup and clone. These methods are not the same, but for the cases given here, we'll assume they are. For more information on how these two methods differ, here's a good blog post about it. https://coderwall.com/p/1zflyg

These methods both create a shallow copy of the object so that passing these in to another method or variable will not change the values on the original object. In my case, the dup method worked perfectly, and solved my issue, however there are other examples in which these methods would not work.

The key words in the method description for these methods are "shallow copy". What this actually means is that the entirety of the values of the object are copied, but if any of those values are references themselves, those objects on the original object can still be modified through the duped object. Let me give you an example.

The reason for this is that some values on an object in Ruby are not actually objects, but references to those objects. So on the duped object, the reference gets copied, instead of the actual object itself. This is what is meant by shallow copy; it only goes one layer deep. This only applies to more complicated values on an object like an Array, a Hash, or another Object. Simpler data types like string, integer or boolean will indeed be copied, not referenced.

I was lucky in that the value I was actually changing in my parse_text method was a string on the notification_type object, and not something more complicated. As such, the dup method worked to solve my problem, but as you can see, it would not be so hard to come across an example of this where dup or clone would not solve the issue.

Now, I'm sure you're wondering what the nice Ruby way to fix this issue is in those more complex cases, and the simple answer is that if you need to use dup in these cases, you should refactor your code in a way that you don't need to.

However, there are a few workarounds (that feel very hackish) that you can do. One such example is to serialize the object to a string, and then decode it back into an object. This works because by encoding it into a string, you would remove all the references to other objects and replace them with the actual values therein. Two libraries that are built in to Ruby that you can use for this are Yaml and Marshal, and you would use them both the same way, however, the way Marshal does it is faster, so I'll just give that example.

The dump method on Marshal would serialize the object into a string, and the load method would return it into another object. The methods for the Yaml implementation are actually the same: load and dump. There are still more ways you could implement a deep copy of an object in Ruby, but instead of me explaining them myself, here's another blog post that goes in to good detail about when you should use one implementation over another, and some pitfalls to avoid when doing so. http://al2o3-cr.blogspot.com/2008/08/object-arr.html

I hope this helps you in your ventures in to Ruby. Thanks for reading!

-Charlie Pugh