Ruby has many cool features which attract developers, such as the ability to create classes at runtime, alter behavior of any particular object, monitor the number of classes in memory using ObjectSpace, and an extensive list of test-suites. All these things make a developer’s life easier. Today we will discuss one of the most fundamental concepts in Computer Science: Threads and how Ruby supports them.

Introduction

First of all let’s define “thread”. According to Wikipedia

In computer science, a thread of execution is the smallest sequence of programmed instructions that can be managed independently by an operating system scheduler. A thread is a light-weight process.

A thread is a light-weight process. Threads that belong to the same process share that process’s resources. This is why it is more economical to have threads in some cases.

Let’s see how threads can be useful to us.

Basic Example

Consider the following code

def calculate_sum ( arr ) sum = 0 arr . each do | item | sum + = item end sum end @items1 = [ 12 , 34 , 55 ] @items2 = [ 45 , 90 , 2 ] @items3 = [ 99 , 22 , 31 ] puts "items1 = #{ calculate_sum ( @items1 ) } " puts "items2 = #{ calculate_sum ( @items2 ) } " puts "items3 = #{ calculate_sum ( @items3 ) } "

The output of the above program would be

items1 = 101 items2 = 137 items3 = 152

This is a very simple program that will help in understanding why we should use threads. In the above code listing, we have 3 arrays and are calculating their sum. All of this is pretty straightforward stuff. However, there is a problem. We cannot get the sum of the items2 array unless we have received the items1 result. It’s the same issue for items3 . Let’s change the code a bit to show what I mean.

def calculate_sum ( arr ) sleep ( 2 ) sum = 0 arr . each do | item | sum + = item end sum end

In the above code listing we have added a sleep(2) instruction which will pause execution for 2 seconds and then continue. This means items1 will get a sum after 2 seconds, items2 after 4 seconds (2 for items1 + 2 seconds for items2 ) and items3 will get sum after 6 seconds. This is not what we want.

Our items arrays don’t depend upon each other, so it would be ideal to have their sums calculated independently. This is where threads come in handy.

Threads allow us to move different parts of our program into different execution contexts which can execute independently. Let’s write a threaded/multithreaded version of the above program:

def calculate_sum ( arr ) sleep ( 2 ) sum = 0 arr . each do | item | sum + = item end sum end @items1 = [ 12 , 34 , 55 ] @items2 = [ 45 , 90 , 2 ] @items3 = [ 99 , 22 , 31 ] threads = ( 1. .3 ) . map do | i | Thread . new ( i ) do | i | items = instance_variable_get ( "@items #{ i } " ) puts "items #{ i } = #{ calculate_sum ( items ) } " end end threads . each { | t | t . join }

The calculate_sum method is the same as our previous code sample where we added sleep(2) . Our items arrays are the same too. The most important change is the way we have called calculate_sum on each array. We wrapped the calculate_sum call corresponding to each array in a Thread.new block. This is how to create threads in Ruby.

We have done a bit of metaprogramming to get each items array according to the index i in the loop. At the end of the program, we ask threads to process the blocks that we gave them.

If you run the above code sample, you might see the following output (I use might because your output might be different in terms of items arrays sum sequence)

items2 = 137 items3 = 152 items1 = 101

Instead of getting a response for items2 array after 4 seconds and items3 array after 6 seconds, we received the sum of all arrays after 2 seconds. This is great and shows us the power of threads. Instead of calculating the sum of one array at a time, we are calculating sum of all arrays at once or concurrently. This is cool because we have saved 4 seconds which is definitely an indication of better performance and efficiency.

Race Conditions

Every feature comes with a price. Threads are good, but if you are writing multithreaded application code then you should be aware of handling race conditions. What is a race condition? According to Wikipedia

Race conditions arise in software when separate computer processes or threads of execution depend on some shared state. Operations upon shared states are critical sections that must be mutually exclusive. Failure to obey this rule opens up the possibility of corrupting the shared state.

In simple words, if we have some shared data that can be accessed by multiple threads then our data should be OK (meaning, not corrupt) after all threads finish execution.

Example

class Item class < < self ; attr_accessor :price end @price = 0 end ( 1. .10 ) . each { Item . price + = 10 } puts "Item.price = #{ Item . price } "

We have created a simple Item class with a class variable price . Item.price is incremented in a loop. Run this program and you will see following output

Item . price = 100

Now let’s see a multithreaded version of this code

class Item class < < self ; attr_accessor :price end @price = 0 end threads = ( 1. .10 ) . map do | i | Thread . new ( i ) do | i | item_price = Item . price sleep ( rand ( 0. .2 ) ) item_price + = 10 sleep ( rand ( 0. .2 ) ) Item . price = item_price end end threads . each { | t | t . join } puts "Item.price = #{ Item . price } "

Our Item class is the same. However, we have changed the way we are incrementing the value of price . We have deliberately used sleep in the above code to show you possible problems that might occur from concurrency. Run this program multiple times and you will observe two things.

Item . price = 40

First the output is incorrect and inconsistent. Output is not 100 anymore, and sometimes you might see 30 or 40 or 70, etc. This is what a race condition does. Our data is no longer correct and is corrupted each time we run our program.

Mutual Exclusion

To fix race conditions, we have to control the program so that when one thread is doing work another should wait unitl the working thread finishes. This is called Mutual Exclusion and we use this concept to remove race conditions in our programs.

Ruby provides a very neat and elegant way for mutual exclusion. Observe:

class Item class < < self ; attr_accessor :price end @price = 0 end mutex = Mutex . new threads = ( 1. .10 ) . map do | i | Thread . new ( i ) do | i | mutex . synchronize do item_price = Item . price sleep ( rand ( 0. .2 ) ) item_price + = 10 sleep ( rand ( 0. .2 ) ) Item . price = item_price end end end threads . each { | t | t . join } puts "Item.price = #{ Item . price } "

Now run this program and you will following output

Item . price = 100

This is because of mutex.synchronize . One and only one thread can access the block wrapped in mutex.synchronize at any time. Other threads have to wait until the current thread that is processing completes.

We have made our code threadsafe.

Rails is threadsafe and uses Mutex class’s instance to avoid race conditions when multiple threads try to access same code. Look at the following code from the Rack::Lock middleware. You will see that @mutex.lock is used to block other threads that try to access same code. For in-depth detail about multithreading in Rails, read my article. Also, you can visit the Ruby Mutex class page for reference on Mutex class.

Types of Threads in Different Ruby Versions

In Ruby 1.8, there were “green” threads. Green threads were implemented and controlled by the interpreter. Here are some pros and cons of green threads:

Pros

Cross platform (managed by the VM)

Unified behavior / control

Lightweight -> faster, smaller memory footprint

Cons

Not optimized

Limited to 1 CPU

A blocking I/O blocks all threads

As of Ruby 1.9, Ruby uses native threads. Native threads means that each thread created by Ruby is directly mapped to a thread generated at the Operating System level. Every modern programming language implements native threads, so it makes more sense to use native threads. Here are some pros of native threads:

Pros

Run on multiple processors

Scheduled by the OS

Blocking I/O operations don’t block other threads.

Even though have native threads in Ruby 1.9, only one thread will be executing at any given time, even if we have multiple cores in our processor. This is because of the GIL (Global Interpreter Lock) or GVL (Global VM Lock) that MRI Ruby (JRuby and Rubinius do not have a GIL, and, as such, have “real” threads) uses. This prevents other threads from being executed if one thread is already being executed by Ruby. But Ruby is smart enough to switch control to other waiting threads if one thread is waiting for some I/O operation to complete.

Working with threads is quite easy in Ruby, but we have to be careful about various pitfalls and concurrency problems. I hope you enjoyed this article and can apply threading to your Ruby going forward.